DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations
@ 2019-10-08 21:12 Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 1/7] doc/rcu: fix typos Honnappa Nagarahalli
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 21:12 UTC (permalink / raw)
  To: honnappa.nagarahalli, david.marchand, konstantin.ananyev
  Cc: dev, ruifeng.wang, stable, nd

Few typo fixes, some corrections to the documentation and simple fixes to
the test cases.

The last 2 commits contain simple optimizations with good amount of
performance improvements.

v2:
  All instances of size_t fixed (Ruifeng)

Honnappa Nagarahalli (7):
  doc/rcu: fix typos
  doc/rcu: correct the limitation on number of threads
  doc/rcu: add information about storing token and resource
  test/rcu: use size_t instead of int
  test/rcu: use correct nomenclature while printing results
  lib/rcu: add least acknowledged token optimization
  lib/rcu: update QS only when there are updates from writer

 app/test/test_rcu_qsbr.c          |  4 +--
 app/test/test_rcu_qsbr_perf.c     | 27 ++++++++++-------
 doc/guides/prog_guide/rcu_lib.rst | 36 ++++++++++++----------
 lib/librte_rcu/rte_rcu_qsbr.c     |  4 +++
 lib/librte_rcu/rte_rcu_qsbr.h     | 50 +++++++++++++++++++++++++++++--
 5 files changed, 89 insertions(+), 32 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v2 1/7] doc/rcu: fix typos
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
@ 2019-10-08 21:12 ` Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 2/7] doc/rcu: correct the limitation on number of threads Honnappa Nagarahalli
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 21:12 UTC (permalink / raw)
  To: honnappa.nagarahalli, david.marchand, konstantin.ananyev
  Cc: dev, ruifeng.wang, stable, nd

Fix typos.

Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 8fe5b1f73..c019dfca8 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -37,8 +37,8 @@ What is Quiescent State
 -----------------------
 
 Quiescent State can be defined as "any point in the thread execution where the
-thread does not hold a reference to shared memory". It is up to the application
-to determine its quiescent state.
+thread does not hold a reference to shared memory". It is the responsibility of
+the application to determine its quiescent state.
 
 Let us consider the following diagram:
 
@@ -76,7 +76,7 @@ Factors affecting the RCU mechanism
 
 It is important to make sure that this library keeps the overhead of
 identifying the end of grace period and subsequent freeing of memory,
-to a minimum. The following explains how grace period and critical
+to a minimum. The following paras explain how grace period and critical
 section affect this overhead.
 
 The writer has to poll the readers to identify the end of grace period.
@@ -91,14 +91,14 @@ critical sections smaller requires additional CPU cycles (due to additional
 reporting) in the readers.
 
 Hence, we need the characteristics of a small grace period and large critical
-section. This library addresses this by allowing the writer to do
-other work without having to block until the readers report their quiescent
-state.
+section. This library addresses these characteristics by allowing the writer
+to do other work without having to block until the readers report their
+quiescent state.
 
 RCU in DPDK
 -----------
 
-For DPDK applications, the start and end of a ``while(1)`` loop (where no
+For DPDK applications, the beginning and end of a ``while(1)`` loop (where no
 references to shared data structures are kept) act as perfect quiescent
 states. This will combine all the shared data structure accesses into a
 single, large critical section which helps keep the overhead on the
@@ -106,11 +106,11 @@ reader side to a minimum.
 
 DPDK supports a pipeline model of packet processing and service cores.
 In these use cases, a given data structure may not be used by all the
-workers in the application. The writer does not have to wait for all
-the workers to report their quiescent state. To provide the required
-flexibility, this library has a concept of a QS variable. The application
-can create one QS variable per data structure to help it track the
-end of grace period for each data structure. This helps keep the grace
+workers in the application. The writer has to wait only for the workers that
+use the data structure to report their quiescent state. To provide the required
+flexibility, this library has a concept of a QS variable. If required, the
+application can create one QS variable per data structure to help it track the
+end of grace period for each data structure. This helps keep the length of grace
 period to a minimum.
 
 How to use this library
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v2 2/7] doc/rcu: correct the limitation on number of threads
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 1/7] doc/rcu: fix typos Honnappa Nagarahalli
@ 2019-10-08 21:12 ` Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 3/7] doc/rcu: add information about storing token and resource Honnappa Nagarahalli
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 21:12 UTC (permalink / raw)
  To: honnappa.nagarahalli, david.marchand, konstantin.ananyev
  Cc: dev, ruifeng.wang, stable, nd

There is no limitation of 1024 reader threads.

Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index c019dfca8..4d1ed3954 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -120,8 +120,7 @@ The application must allocate memory and initialize a QS variable.
 
 Applications can call ``rte_rcu_qsbr_get_memsize()`` to calculate the size
 of memory to allocate. This API takes a maximum number of reader threads,
-using this variable, as a parameter. Currently, a maximum of 1024 threads
-are supported.
+using this variable, as a parameter.
 
 Further, the application can initialize a QS variable using the API
 ``rte_rcu_qsbr_init()``.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v2 3/7] doc/rcu: add information about storing token and resource
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 1/7] doc/rcu: fix typos Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 2/7] doc/rcu: correct the limitation on number of threads Honnappa Nagarahalli
@ 2019-10-08 21:12 ` Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 4/7] test/rcu: use size_t instead of int Honnappa Nagarahalli
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 21:12 UTC (permalink / raw)
  To: honnappa.nagarahalli, david.marchand, konstantin.ananyev
  Cc: dev, ruifeng.wang, stable, nd

After calling rte_rcu_qsbr_start API, the token and the deleted
resource need to be stored for subsequent query/free.

Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 4d1ed3954..8d0dfcf29 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -162,14 +162,19 @@ running as worker threads.
 The separation of triggering the reporting from querying the status provides
 the writer threads flexibility to do useful work instead of blocking for the
 reader threads to enter the quiescent state or go offline. This reduces the
-memory accesses due to continuous polling for the status.
+memory accesses due to continuous polling for the status. But, since the
+resource is freed at a later time, the token and the reference to the deleted
+resource need to be stored for later queries.
 
 The ``rte_rcu_qsbr_synchronize()`` API combines the functionality of
 ``rte_rcu_qsbr_start()`` and blocking ``rte_rcu_qsbr_check()`` into a single
 API. This API triggers the reader threads to report their quiescent state and
 polls till all the readers enter the quiescent state or go offline. This API
 does not allow the writer to do useful work while waiting and introduces
-additional memory accesses due to continuous polling.
+additional memory accesses due to continuous polling. However, the application
+does not have to store the token or the reference to the deleted resource. The
+resource can be freed immediately after ``rte_rcu_qsbr_synchronize()`` API
+returns.
 
 The reader thread must call ``rte_rcu_qsbr_thread_offline()`` and
 ``rte_rcu_qsbr_thread_unregister()`` APIs to remove itself from reporting its
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v2 4/7] test/rcu: use size_t instead of int
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
                   ` (2 preceding siblings ...)
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 3/7] doc/rcu: add information about storing token and resource Honnappa Nagarahalli
@ 2019-10-08 21:12 ` Honnappa Nagarahalli
  2019-10-21 19:19   ` David Marchand
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 5/7] test/rcu: use correct nomenclature while printing results Honnappa Nagarahalli
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 21:12 UTC (permalink / raw)
  To: honnappa.nagarahalli, david.marchand, konstantin.ananyev
  Cc: dev, ruifeng.wang, stable, nd

Variables used to store the return value of rte_rcu_qsbr_get_memsize
in variables of type 'int'. The variables are of type 'size_t' now.

Fixes: b87089b0bb19 ("test/rcu: add API and functional tests")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_rcu_qsbr.c      |  4 ++--
 app/test/test_rcu_qsbr_perf.c | 11 ++++++-----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
index d1b9e46a2..2f71ec6ad 100644
--- a/app/test/test_rcu_qsbr.c
+++ b/app/test/test_rcu_qsbr.c
@@ -52,7 +52,7 @@ static int
 alloc_rcu(void)
 {
 	int i;
-	uint32_t sz;
+	size_t sz;
 
 	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
 
@@ -81,7 +81,7 @@ free_rcu(void)
 static int
 test_rcu_qsbr_get_memsize(void)
 {
-	uint32_t sz;
+	size_t sz;
 
 	printf("\nTest rte_rcu_qsbr_thread_register()\n");
 
diff --git a/app/test/test_rcu_qsbr_perf.c b/app/test/test_rcu_qsbr_perf.c
index cb2d177b7..280f1811f 100644
--- a/app/test/test_rcu_qsbr_perf.c
+++ b/app/test/test_rcu_qsbr_perf.c
@@ -125,7 +125,7 @@ test_rcu_qsbr_writer_perf(void *arg)
 static int
 test_rcu_qsbr_perf(void)
 {
-	int sz;
+	size_t sz;
 	unsigned int i, tmp_num_cores;
 
 	writer_done = 0;
@@ -188,7 +188,7 @@ test_rcu_qsbr_perf(void)
 static int
 test_rcu_qsbr_rperf(void)
 {
-	int sz;
+	size_t sz;
 	unsigned int i, tmp_num_cores;
 
 	rte_atomic64_clear(&updates);
@@ -234,7 +234,7 @@ test_rcu_qsbr_rperf(void)
 static int
 test_rcu_qsbr_wperf(void)
 {
-	int sz;
+	size_t sz;
 	unsigned int i;
 
 	rte_atomic64_clear(&checks);
@@ -379,7 +379,7 @@ static int
 test_rcu_qsbr_sw_sv_1qs(void)
 {
 	uint64_t token, begin, cycles;
-	int sz;
+	size_t sz;
 	unsigned int i, j, tmp_num_cores;
 	int32_t pos;
 
@@ -498,7 +498,8 @@ static int
 test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 {
 	uint64_t token, begin, cycles;
-	int ret, sz;
+	int ret;
+	size_t sz;
 	unsigned int i, j, tmp_num_cores;
 	int32_t pos;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v2 5/7] test/rcu: use correct nomenclature while printing results
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
                   ` (3 preceding siblings ...)
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 4/7] test/rcu: use size_t instead of int Honnappa Nagarahalli
@ 2019-10-08 21:12 ` Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 6/7] lib/rcu: add least acknowledged token optimization Honnappa Nagarahalli
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 21:12 UTC (permalink / raw)
  To: honnappa.nagarahalli, david.marchand, konstantin.ananyev
  Cc: dev, ruifeng.wang, stable, nd

Use 'quiescent state updates' instead of just 'updates'.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 app/test/test_rcu_qsbr_perf.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/app/test/test_rcu_qsbr_perf.c b/app/test/test_rcu_qsbr_perf.c
index 280f1811f..d35a6d089 100644
--- a/app/test/test_rcu_qsbr_perf.c
+++ b/app/test/test_rcu_qsbr_perf.c
@@ -167,8 +167,10 @@ test_rcu_qsbr_perf(void)
 	/* Wait until all readers have exited */
 	rte_eal_mp_wait_lcore();
 
-	printf("Total RCU updates = %"PRIi64"\n", rte_atomic64_read(&updates));
-	printf("Cycles per %d updates: %"PRIi64"\n", RCU_SCALE_DOWN,
+	printf("Total quiescent state updates = %"PRIi64"\n",
+		rte_atomic64_read(&updates));
+	printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
+		RCU_SCALE_DOWN,
 		rte_atomic64_read(&update_cycles) /
 		(rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
 	printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
@@ -217,8 +219,10 @@ test_rcu_qsbr_rperf(void)
 	/* Wait until all readers have exited */
 	rte_eal_mp_wait_lcore();
 
-	printf("Total RCU updates = %"PRIi64"\n", rte_atomic64_read(&updates));
-	printf("Cycles per %d updates: %"PRIi64"\n", RCU_SCALE_DOWN,
+	printf("Total quiescent state updates = %"PRIi64"\n",
+		rte_atomic64_read(&updates));
+	printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
+		RCU_SCALE_DOWN,
 		rte_atomic64_read(&update_cycles) /
 		(rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
 
@@ -462,7 +466,7 @@ test_rcu_qsbr_sw_sv_1qs(void)
 	rte_free(keys);
 
 	printf("Following numbers include calls to rte_hash functions\n");
-	printf("Cycles per 1 update(online/update/offline): %"PRIi64"\n",
+	printf("Cycles per 1 quiescent state update(online/update/offline): %"PRIi64"\n",
 		rte_atomic64_read(&update_cycles) /
 		rte_atomic64_read(&updates));
 
@@ -578,7 +582,7 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 	rte_free(keys);
 
 	printf("Following numbers include calls to rte_hash functions\n");
-	printf("Cycles per 1 update(online/update/offline): %"PRIi64"\n",
+	printf("Cycles per 1 quiescent state update(online/update/offline): %"PRIi64"\n",
 		rte_atomic64_read(&update_cycles) /
 		rte_atomic64_read(&updates));
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v2 6/7] lib/rcu: add least acknowledged token optimization
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
                   ` (4 preceding siblings ...)
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 5/7] test/rcu: use correct nomenclature while printing results Honnappa Nagarahalli
@ 2019-10-08 21:12 ` Honnappa Nagarahalli
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 7/7] lib/rcu: update QS only when there are updates from writer Honnappa Nagarahalli
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 21:12 UTC (permalink / raw)
  To: honnappa.nagarahalli, david.marchand, konstantin.ananyev
  Cc: dev, ruifeng.wang, stable, nd

When the rte_rcu_qsbr_check API is called, it is possible to
calculate the least valued token acknowledged by all the readers.
When the API is called next time, the readers' token counters do
not need to be scanned if the value of the token being queried is
less than the last least token acknowledged. This avoids the
cache line bounces between readers and writer.

Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_rcu/rte_rcu_qsbr.c |  4 ++++
 lib/librte_rcu/rte_rcu_qsbr.h | 42 +++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
index ce7f93dd3..c9ca66aaa 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.c
+++ b/lib/librte_rcu/rte_rcu_qsbr.c
@@ -73,6 +73,7 @@ rte_rcu_qsbr_init(struct rte_rcu_qsbr *v, uint32_t max_threads)
 			__RTE_QSBR_THRID_ARRAY_ELM_SIZE) /
 			__RTE_QSBR_THRID_ARRAY_ELM_SIZE;
 	v->token = __RTE_QSBR_CNT_INIT;
+	v->acked_token = __RTE_QSBR_CNT_INIT - 1;
 
 	return 0;
 }
@@ -245,6 +246,9 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
 	fprintf(f, "  Token = %"PRIu64"\n",
 			__atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
 
+	fprintf(f, "  Least Acknowledged Token = %"PRIu64"\n",
+			__atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
+
 	fprintf(f, "Quiescent State Counts for readers:\n");
 	for (i = 0; i < v->num_elems; i++) {
 		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index c80f15c00..3f445ba6c 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -83,6 +83,7 @@ struct rte_rcu_qsbr_cnt {
 
 #define __RTE_QSBR_CNT_THR_OFFLINE 0
 #define __RTE_QSBR_CNT_INIT 1
+#define __RTE_QSBR_CNT_MAX ((uint64_t)~0)
 
 /* RTE Quiescent State variable structure.
  * This structure has two elements that vary in size based on the
@@ -93,6 +94,10 @@ struct rte_rcu_qsbr_cnt {
 struct rte_rcu_qsbr {
 	uint64_t token __rte_cache_aligned;
 	/**< Counter to allow for multiple concurrent quiescent state queries */
+	uint64_t acked_token;
+	/**< Least token acked by all the threads in the last call to
+	 *   rte_rcu_qsbr_check API.
+	 */
 
 	uint32_t num_elems __rte_cache_aligned;
 	/**< Number of elements in the thread ID array */
@@ -472,6 +477,7 @@ __rte_rcu_qsbr_check_selective(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 	uint64_t bmap;
 	uint64_t c;
 	uint64_t *reg_thread_id;
+	uint64_t acked_token = __RTE_QSBR_CNT_MAX;
 
 	for (i = 0, reg_thread_id = __RTE_QSBR_THRID_ARRAY_ELM(v, 0);
 		i < v->num_elems;
@@ -493,6 +499,7 @@ __rte_rcu_qsbr_check_selective(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: status: token = %"PRIu64", wait = %d, Thread QS cnt = %"PRIu64", Thread ID = %d",
 				__func__, t, wait, c, id+j);
+
 			/* Counter is not checked for wrap-around condition
 			 * as it is a 64b counter.
 			 */
@@ -512,10 +519,25 @@ __rte_rcu_qsbr_check_selective(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 				continue;
 			}
 
+			/* This thread is in quiescent state. Use the counter
+			 * to find the least acknowledged token among all the
+			 * readers.
+			 */
+			if (c != __RTE_QSBR_CNT_THR_OFFLINE && acked_token > c)
+				acked_token = c;
+
 			bmap &= ~(1UL << j);
 		}
 	}
 
+	/* All readers are checked, update least acknowledged token.
+	 * There might be multiple writers trying to update this. There is
+	 * no need to update this very accurately using compare-and-swap.
+	 */
+	if (acked_token != __RTE_QSBR_CNT_MAX)
+		__atomic_store_n(&v->acked_token, acked_token,
+			__ATOMIC_RELAXED);
+
 	return 1;
 }
 
@@ -528,6 +550,7 @@ __rte_rcu_qsbr_check_all(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 	uint32_t i;
 	struct rte_rcu_qsbr_cnt *cnt;
 	uint64_t c;
+	uint64_t acked_token = __RTE_QSBR_CNT_MAX;
 
 	for (i = 0, cnt = v->qsbr_cnt; i < v->max_threads; i++, cnt++) {
 		__RTE_RCU_DP_LOG(DEBUG,
@@ -538,6 +561,7 @@ __rte_rcu_qsbr_check_all(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: status: token = %"PRIu64", wait = %d, Thread QS cnt = %"PRIu64", Thread ID = %d",
 				__func__, t, wait, c, i);
+
 			/* Counter is not checked for wrap-around condition
 			 * as it is a 64b counter.
 			 */
@@ -550,8 +574,22 @@ __rte_rcu_qsbr_check_all(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 
 			rte_pause();
 		}
+
+		/* This thread is in quiescent state. Use the counter to find
+		 * the least acknowledged token among all the readers.
+		 */
+		if (likely(c != __RTE_QSBR_CNT_THR_OFFLINE && acked_token > c))
+			acked_token = c;
 	}
 
+	/* All readers are checked, update least acknowledged token.
+	 * There might be multiple writers trying to update this. There is
+	 * no need to update this very accurately using compare-and-swap.
+	 */
+	if (acked_token != __RTE_QSBR_CNT_MAX)
+		__atomic_store_n(&v->acked_token, acked_token,
+			__ATOMIC_RELAXED);
+
 	return 1;
 }
 
@@ -595,6 +633,10 @@ rte_rcu_qsbr_check(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 {
 	RTE_ASSERT(v != NULL);
 
+	/* Check if all the readers have already acknowledged this token */
+	if (likely(t <= v->acked_token))
+		return 1;
+
 	if (likely(v->num_threads == v->max_threads))
 		return __rte_rcu_qsbr_check_all(v, t, wait);
 	else
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH v2 7/7] lib/rcu: update QS only when there are updates from writer
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
                   ` (5 preceding siblings ...)
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 6/7] lib/rcu: add least acknowledged token optimization Honnappa Nagarahalli
@ 2019-10-08 21:12 ` Honnappa Nagarahalli
  2019-10-18  9:57 ` [dpdk-dev] [PATCH] rcu: fix reference to offline function David Marchand
  2019-10-21 19:19 ` [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations David Marchand
  8 siblings, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-08 21:12 UTC (permalink / raw)
  To: honnappa.nagarahalli, david.marchand, konstantin.ananyev
  Cc: dev, ruifeng.wang, stable, nd

When the writer is checking the quiescent state status, it is not
deleting any entries in the data structure. This means, the readers
do not need to update their quiescent state during that period.
Readers update the quiescent state only when there are updates
available from the writer.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_rcu/rte_rcu_qsbr.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index 3f445ba6c..0d4664572 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -456,12 +456,14 @@ rte_rcu_qsbr_quiescent(struct rte_rcu_qsbr *v, unsigned int thread_id)
 	 */
 	t = __atomic_load_n(&v->token, __ATOMIC_ACQUIRE);
 
-	/* Inform the writer that updates are visible to this reader.
+	/* Check if there are updates available from the writer.
+	 * Inform the writer that updates are visible to this reader.
 	 * Prior loads of the shared data structure should not move
 	 * beyond this store. Hence use store-release.
 	 */
-	__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-			 t, __ATOMIC_RELEASE);
+	if (t != __atomic_load_n(&v->qsbr_cnt[thread_id].cnt, __ATOMIC_RELAXED))
+		__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
+					 t, __ATOMIC_RELEASE);
 
 	__RTE_RCU_DP_LOG(DEBUG, "%s: update: token = %"PRIu64", Thread ID = %d",
 		__func__, t, thread_id);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [dpdk-dev] [PATCH] rcu: fix reference to offline function
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
                   ` (6 preceding siblings ...)
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 7/7] lib/rcu: update QS only when there are updates from writer Honnappa Nagarahalli
@ 2019-10-18  9:57 ` David Marchand
  2019-10-18 13:58   ` Honnappa Nagarahalli
  2019-10-21 19:19 ` [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations David Marchand
  8 siblings, 1 reply; 14+ messages in thread
From: David Marchand @ 2019-10-18  9:57 UTC (permalink / raw)
  To: dev; +Cc: stable, Honnappa Nagarahalli

Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
I intend to add this patch to this series.
Honnappa, can you review it please ?

---
 lib/librte_rcu/rte_rcu_qsbr.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index 0d46645..0b55859 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -217,7 +217,7 @@ struct rte_rcu_qsbr {
  * call this API before calling rte_rcu_qsbr_quiescent. This can be called
  * during initialization or as part of the packet processing loop.
  *
- * The reader thread must call rte_rcu_thread_offline API, before
+ * The reader thread must call rte_rcu_qsbr_thread_offline API, before
  * calling any functions that block, to ensure that rte_rcu_qsbr_check
  * API does not wait indefinitely for the reader thread to update its QS.
  *
@@ -283,7 +283,7 @@ struct rte_rcu_qsbr {
  * This can be called during initialization or as part of the packet
  * processing loop.
  *
- * The reader thread must call rte_rcu_thread_offline API, before
+ * The reader thread must call rte_rcu_qsbr_thread_offline API, before
  * calling any functions that block, to ensure that rte_rcu_qsbr_check
  * API does not wait indefinitely for the reader thread to update its QS.
  *
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH] rcu: fix reference to offline function
  2019-10-18  9:57 ` [dpdk-dev] [PATCH] rcu: fix reference to offline function David Marchand
@ 2019-10-18 13:58   ` Honnappa Nagarahalli
  2019-10-21 19:25     ` David Marchand
  0 siblings, 1 reply; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-18 13:58 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: stable, nd

<snip>

> Subject: [PATCH] rcu: fix reference to offline function
> 
> Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> I intend to add this patch to this series.
> Honnappa, can you review it please ?
> 
> ---
>  lib/librte_rcu/rte_rcu_qsbr.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h index
> 0d46645..0b55859 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -217,7 +217,7 @@ struct rte_rcu_qsbr {
>   * call this API before calling rte_rcu_qsbr_quiescent. This can be called
>   * during initialization or as part of the packet processing loop.
>   *
> - * The reader thread must call rte_rcu_thread_offline API, before
> + * The reader thread must call rte_rcu_qsbr_thread_offline API, before
>   * calling any functions that block, to ensure that rte_rcu_qsbr_check
>   * API does not wait indefinitely for the reader thread to update its QS.
>   *
> @@ -283,7 +283,7 @@ struct rte_rcu_qsbr {
>   * This can be called during initialization or as part of the packet
>   * processing loop.
>   *
> - * The reader thread must call rte_rcu_thread_offline API, before
> + * The reader thread must call rte_rcu_qsbr_thread_offline API, before
>   * calling any functions that block, to ensure that rte_rcu_qsbr_check
>   * API does not wait indefinitely for the reader thread to update its QS.
>   *
> --
> 1.8.3.1
Thank you David.
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/7] test/rcu: use size_t instead of int
  2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 4/7] test/rcu: use size_t instead of int Honnappa Nagarahalli
@ 2019-10-21 19:19   ` David Marchand
  0 siblings, 0 replies; 14+ messages in thread
From: David Marchand @ 2019-10-21 19:19 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Ananyev, Konstantin, dev, Ruifeng Wang (Arm Technology China),
	dpdk stable, nd

On Tue, Oct 8, 2019 at 11:12 PM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> Variables used to store the return value of rte_rcu_qsbr_get_memsize
> in variables of type 'int'. The variables are of type 'size_t' now.

Added a comment on the two cases we have here: the uint32_t storage
are fine while the int ones were an issue.


>
> Fixes: b87089b0bb19 ("test/rcu: add API and functional tests")
> Cc: stable@dpdk.org
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  app/test/test_rcu_qsbr.c      |  4 ++--
>  app/test/test_rcu_qsbr_perf.c | 11 ++++++-----
>  2 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
> index d1b9e46a2..2f71ec6ad 100644
> --- a/app/test/test_rcu_qsbr.c
> +++ b/app/test/test_rcu_qsbr.c
> @@ -52,7 +52,7 @@ static int
>  alloc_rcu(void)
>  {
>         int i;
> -       uint32_t sz;
> +       size_t sz;
>
>         sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
>
> @@ -81,7 +81,7 @@ free_rcu(void)
>  static int
>  test_rcu_qsbr_get_memsize(void)
>  {
> -       uint32_t sz;
> +       size_t sz;
>
>         printf("\nTest rte_rcu_qsbr_thread_register()\n");
>
> diff --git a/app/test/test_rcu_qsbr_perf.c b/app/test/test_rcu_qsbr_perf.c
> index cb2d177b7..280f1811f 100644
> --- a/app/test/test_rcu_qsbr_perf.c
> +++ b/app/test/test_rcu_qsbr_perf.c
> @@ -125,7 +125,7 @@ test_rcu_qsbr_writer_perf(void *arg)
>  static int
>  test_rcu_qsbr_perf(void)
>  {
> -       int sz;
> +       size_t sz;
>         unsigned int i, tmp_num_cores;
>
>         writer_done = 0;
> @@ -188,7 +188,7 @@ test_rcu_qsbr_perf(void)
>  static int
>  test_rcu_qsbr_rperf(void)
>  {
> -       int sz;
> +       size_t sz;
>         unsigned int i, tmp_num_cores;
>
>         rte_atomic64_clear(&updates);
> @@ -234,7 +234,7 @@ test_rcu_qsbr_rperf(void)
>  static int
>  test_rcu_qsbr_wperf(void)
>  {
> -       int sz;
> +       size_t sz;
>         unsigned int i;
>
>         rte_atomic64_clear(&checks);
> @@ -379,7 +379,7 @@ static int
>  test_rcu_qsbr_sw_sv_1qs(void)
>  {
>         uint64_t token, begin, cycles;
> -       int sz;
> +       size_t sz;
>         unsigned int i, j, tmp_num_cores;
>         int32_t pos;
>
> @@ -498,7 +498,8 @@ static int
>  test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
>  {
>         uint64_t token, begin, cycles;
> -       int ret, sz;
> +       int ret;
> +       size_t sz;
>         unsigned int i, j, tmp_num_cores;
>         int32_t pos;
>
> --
> 2.17.1
>


--
David Marchand


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations
  2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
                   ` (7 preceding siblings ...)
  2019-10-18  9:57 ` [dpdk-dev] [PATCH] rcu: fix reference to offline function David Marchand
@ 2019-10-21 19:19 ` David Marchand
  2019-10-21 19:39   ` Honnappa Nagarahalli
  8 siblings, 1 reply; 14+ messages in thread
From: David Marchand @ 2019-10-21 19:19 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Ananyev, Konstantin, dev, Ruifeng Wang (Arm Technology China),
	dpdk stable, nd

On Tue, Oct 8, 2019 at 11:12 PM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
>
> Few typo fixes, some corrections to the documentation and simple fixes to
> the test cases.
>
> The last 2 commits contain simple optimizations with good amount of
> performance improvements.

Do you have numbers to illustrate?

>
> v2:
>   All instances of size_t fixed (Ruifeng)
>
> Honnappa Nagarahalli (7):
>   doc/rcu: fix typos
>   doc/rcu: correct the limitation on number of threads
>   doc/rcu: add information about storing token and resource
>   test/rcu: use size_t instead of int
>   test/rcu: use correct nomenclature while printing results
>   lib/rcu: add least acknowledged token optimization
>   lib/rcu: update QS only when there are updates from writer

Series applied, thanks.


--
David Marchand


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH] rcu: fix reference to offline function
  2019-10-18 13:58   ` Honnappa Nagarahalli
@ 2019-10-21 19:25     ` David Marchand
  0 siblings, 0 replies; 14+ messages in thread
From: David Marchand @ 2019-10-21 19:25 UTC (permalink / raw)
  To: Honnappa Nagarahalli; +Cc: dev, stable, nd

On Fri, Oct 18, 2019 at 3:58 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> <snip>
>
> > Subject: [PATCH] rcu: fix reference to offline function
> >
> > Fixes: 64994b56cfd7 ("rcu: add RCU library supporting QSBR mechanism")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>

Applied.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations
  2019-10-21 19:19 ` [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations David Marchand
@ 2019-10-21 19:39   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 14+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-21 19:39 UTC (permalink / raw)
  To: David Marchand
  Cc: Ananyev, Konstantin, dev, Ruifeng Wang (Arm Technology China),
	dpdk stable, Honnappa Nagarahalli, nd, nd

<snip>

> 
> On Tue, Oct 8, 2019 at 11:12 PM Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com> wrote:
> >
> > Few typo fixes, some corrections to the documentation and simple fixes
> > to the test cases.
> >
> > The last 2 commits contain simple optimizations with good amount of
> > performance improvements.
> 
> Do you have numbers to illustrate?
Thanks David.
I do not have the numbers for 'least acknowledged token' optimization. The performance test case to test that requires a different design in the performance test cases with additional code. This additional code is nothing but the rte_rcu_qsbr_defer_xxx APIs I am working on. I will be able to publish the numbers with and without this optimization.

I have the performance numbers for the last patch.

Without the patch:
===============
Total RCU updates = 11205362514
Cycles per 1000 updates: 587
Total RCU checks = 20000000
Cycles per 1000 checks: 29946

Perf Test: 12 Readers
Total RCU updates = 1200000000
Cycles per 1000 updates: 425

Perf test: 12 Writers ('wait' in qsbr_check == false)
Total RCU checks = 240000000
Cycles per 1000 checks: 319   <<<<<<<<<============== (see the improvement below)

Perf test: 1 writer, 12 readers, 1 QSBR variable, 1 QSBR Query, Blocking QSBR Check
Following numbers include calls to rte_hash functions
Cycles per 1 update(online/update/offline): 5857
Cycles per 1 check(start, check): 5889

Perf test: 1 writer, 12 readers, 1 QSBR variable, 1 QSBR Query, Non-Blocking QSBR check
Following numbers include calls to rte_hash functions
Cycles per 1 update(online/update/offline): 5859
Cycles per 1 check(start, check): 5891

With patch:
==========
Total RCU updates = 11075332503
Cycles per 1000 updates: 601
Total RCU checks = 20000000
Cycles per 1000 checks: 30265

Perf Test: 12 Readers
Total RCU updates = 1200000000
Cycles per 1000 updates: 425

Perf test: 12 Writers ('wait' in qsbr_check == false)
Total RCU checks = 240000000
Cycles per 1000 checks: 79   <<<<==================  (Improvement)

Perf test: 1 writer, 12 readers, 1 QSBR variable, 1 QSBR Query, Blocking QSBR Check
Following numbers include calls to rte_hash functions
Cycles per 1 update(online/update/offline): 5847
Cycles per 1 check(start, check): 5897

Perf test: 1 writer, 12 readers, 1 QSBR variable, 1 QSBR Query, Non-Blocking QSBR check
Following numbers include calls to rte_hash functions
Cycles per 1 update(online/update/offline): 5851
Cycles per 1 check(start, check): 5894

> 
> >
> > v2:
> >   All instances of size_t fixed (Ruifeng)
> >
> > Honnappa Nagarahalli (7):
> >   doc/rcu: fix typos
> >   doc/rcu: correct the limitation on number of threads
> >   doc/rcu: add information about storing token and resource
> >   test/rcu: use size_t instead of int
> >   test/rcu: use correct nomenclature while printing results
> >   lib/rcu: add least acknowledged token optimization
> >   lib/rcu: update QS only when there are updates from writer
> 
> Series applied, thanks.
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-10-21 19:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-08 21:12 [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations Honnappa Nagarahalli
2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 1/7] doc/rcu: fix typos Honnappa Nagarahalli
2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 2/7] doc/rcu: correct the limitation on number of threads Honnappa Nagarahalli
2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 3/7] doc/rcu: add information about storing token and resource Honnappa Nagarahalli
2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 4/7] test/rcu: use size_t instead of int Honnappa Nagarahalli
2019-10-21 19:19   ` David Marchand
2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 5/7] test/rcu: use correct nomenclature while printing results Honnappa Nagarahalli
2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 6/7] lib/rcu: add least acknowledged token optimization Honnappa Nagarahalli
2019-10-08 21:12 ` [dpdk-dev] [PATCH v2 7/7] lib/rcu: update QS only when there are updates from writer Honnappa Nagarahalli
2019-10-18  9:57 ` [dpdk-dev] [PATCH] rcu: fix reference to offline function David Marchand
2019-10-18 13:58   ` Honnappa Nagarahalli
2019-10-21 19:25     ` David Marchand
2019-10-21 19:19 ` [dpdk-dev] [PATCH 0/7] typo, doc, simple fixes and some optimizations David Marchand
2019-10-21 19:39   ` Honnappa Nagarahalli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).