DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library
@ 2019-08-22  6:34 Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details Ruifeng Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-08-22  6:34 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Document is added with suggested design of integrating RCU
library with other libraries in DPDK.
As an example, LPM library adds the integration. RCU is used
to safely free tbl8 groups that can be recycled. Table will not
be reclaimed or reused until reader finished referencing it.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use.

New API rte_ring_peek is introduced to help on management of
reclaiming FIFO queue.


Honnappa Nagarahalli (1):
  doc/rcu: add RCU integration design details

Ruifeng Wang (2):
  lib/ring: add peek API
  lib/lpm: integrate RCU QSBR

 doc/guides/prog_guide/rcu_lib.rst  |  51 +++++++
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 218 +++++++++++++++++++++++++++--
 lib/librte_lpm/rte_lpm.h           |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/librte_ring/rte_ring.h         |  30 ++++
 lib/meson.build                    |   3 +-
 8 files changed, 320 insertions(+), 15 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
@ 2019-08-22  6:34 ` Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API Ruifeng Wang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-08-22  6:34 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 51 +++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 8fe5b1f73..2869441ca 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -186,3 +186,54 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Integrating QSBR RCU with other libraries
+-----------------------------------------
+
+Lock-free algorithms place additional burden on the application to reclaim
+memory. Integrating memory reclaiming mechanisms in the libraries help
+remove some of the burden. Though QSBR method presents flexibility to
+achieve performance, it presents challenges while integrating with libraries.
+
+The memory reclaiming process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here requires the application to handle 'Initialization'
+and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader threads to report their quiescent state.
+* the application has to register the same RCU variable with the library.
+* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
+
+The library will handle 'Reclaiming Resources' part of the process. The
+libraries will make use of the writer thread context to execute the memory
+reclaiming algorithm. So,
+
+* library should provide an API to register a RCU variable that it will use.
+* library should trigger the readers to report quiescent state status upon deleting the resources by calling ``rte_rcu_qsbr_start``.
+
+* library should store the token and deleted resources for later use to free them after the readers have reported their quiescent state. Since the readers will report the quiescent state status in the order of deletion, the library must store the tokens/resources in the order in which the resources were deleted. A FIFO data structure would achieve the desired results. The length of the FIFO would depend on the rate of deletion and the rate at which the readers report their quiescent state. In the worst case the length of FIFO would be equal to the maximum number of resources the data structure supports. However, in most cases, the length will be much smaller. But, the library should not take the length of FIFO as an input from the application. Instead, it should implement a data structure which should be able to grow/shrink dynamically. Overhead introduced by such a data structure on delete operations should be considered as well.
+
+* library should query the quiescent state and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. This allows the application to do useful work while the readers report their quiescent state. If there are tokens/resources present in the FIFO already, the delete API should peek the head of the FIFO and check the quiescent state status. If the status is success, the token/resource should be dequeued and the resource should be freed. This process can be repeated till the quiescent state status for a token returns failure indicating that subsequent tokens will also fail quiescent state status query. The same process can be incorporated while adding new entries in the data structure if the library runs out of resources.
+
+The 'Shutdown' process needs to be shared between the application and the
+library.
+
+* library should check the quiescent state status of all the tokens that may be present in the FIFO and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of the tokens do not pass the quiescent state check, the library should print an error and stop the memory reclaimation process.
+
+* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the library's shutdown function.
+
+Integrating the resource reclaimation with libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclaimation happens as part of the writer thread without sacrificing
+   a lot of performance.
+#. The library has better control over the resources. For ex: the library can
+   attempt to reclaim when it has run out of resources.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details Ruifeng Wang
@ 2019-08-22  6:34 ` Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-08-22  6:34 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Ruifeng Wang

The peek API allows fetching the next available object in the ring
without dequeuing it. This helps in scenarios where dequeuing of
objects depend on their value.

Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..d3d0d5e18 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 				r->cons.single, available);
 }
 
+/**
+ * Peek one object from a ring.
+ *
+ * The peek API allows fetching the next available object in the ring
+ * without dequeuing it. This API is not multi-thread safe with respect
+ * to other consumer threads.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, object available
+ *   - -ENOENT: Not enough entries in the ring.
+ */
+__rte_experimental
+static __rte_always_inline int
+rte_ring_peek(struct rte_ring *r, void **obj_p)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_head = r->cons.head;
+	uint32_t count = (prod_tail - cons_head) & r->mask;
+	unsigned int n = 1;
+	if (count) {
+		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
+		return 0;
+	}
+	return -ENOENT;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API Ruifeng Wang
@ 2019-08-22  6:34 ` Ruifeng Wang
  2019-08-23  1:23   ` Stephen Hemminger
  2019-08-22 15:52 ` [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Honnappa Nagarahalli
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
  4 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-08-22  6:34 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 218 +++++++++++++++++++++++++++--
 lib/librte_lpm/rte_lpm.h           |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/meson.build                    |   3 +-
 6 files changed, 239 insertions(+), 15 deletions(-)

diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index a7946a1c5..ca9e16312 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_lpm.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index a5176d8ae..19a35107f 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -2,9 +2,11 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
+allow_experimental_apis = true
 sources = files('rte_lpm.c', 'rte_lpm6.c')
 headers = files('rte_lpm.h', 'rte_lpm6.h')
 # since header files have different names, we can install all vector headers
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 3a929a1b1..1efdef22d 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <string.h>
@@ -22,6 +23,7 @@
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
 #include <rte_tailq.h>
+#include <rte_ring.h>
 
 #include "rte_lpm.h"
 
@@ -39,6 +41,11 @@ enum valid_flag {
 	VALID
 };
 
+struct __rte_lpm_qs_item {
+	uint64_t token;	/**< QSBR token.*/
+	uint32_t index;	/**< tbl8 group index.*/
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -381,6 +388,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	if (lpm->qsv)
+		rte_ring_free(lpm->qs_fifo);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
@@ -390,6 +399,145 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
 MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
 		rte_lpm_free_v1604);
 
+/* Add an item into FIFO.
+ * return: 0 - success
+ */
+static int
+__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
+	struct __rte_lpm_qs_item *item)
+{
+	if (rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token) != 0) {
+		rte_errno = ENOSPC;
+		return 1;
+	}
+	if (rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index) != 0) {
+		void *obj;
+		/* token needs to be dequeued when index enqueue fails */
+		rte_ring_sc_dequeue(fifo, &obj);
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Remove item from FIFO.
+ * Used when data observed by rte_ring_peek.
+ */
+static void
+__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
+	struct __rte_lpm_qs_item *item)
+{
+	void *obj_token = NULL;
+	void *obj_index = NULL;
+
+	(void)rte_ring_sc_dequeue(fifo, &obj_token);
+	(void)rte_ring_sc_dequeue(fifo, &obj_index);
+
+	if (item) {
+		item->token = (uint64_t)((uintptr_t)obj_token);
+		item->index = (uint32_t)((uintptr_t)obj_index);
+	}
+}
+
+/* Max number of tbl8 groups to reclaim at one time. */
+#define RCU_QSBR_RECLAIM_SIZE	8
+
+/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
+ * reclaim will be triggered by tbl8_free.
+ */
+#define RCU_QSBR_RECLAIM_LEVEL	3
+
+/* Reclaim some tbl8 groups based on quiescent state check.
+ * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
+ * return: 0 - success, 1 - no group reclaimed.
+ */
+static uint32_t
+__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
+{
+	struct __rte_lpm_qs_item qs_item;
+	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+	void *obj_token;
+	uint32_t cnt = 0;
+
+	/* Check reader threads quiescent state and
+	 * reclaim as much tbl8 groups as possible.
+	 */
+	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
+		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
+		(rte_rcu_qsbr_check(lpm->qsv, (uint64_t)((uintptr_t)obj_token),
+					false) == 1)) {
+		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
+
+		tbl8_entry = &lpm->tbl8[qs_item.index *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+		memset(&tbl8_entry[0], 0,
+				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
+				sizeof(tbl8_entry[0]));
+		cnt++;
+	}
+
+	if (cnt) {
+		if (index)
+			*index = qs_item.index;
+		return 0;
+	}
+	return 1;
+}
+
+/* Trigger tbl8 group reclaim when necessary.
+ * Reclaim happens when RCU QSBR queue usage is over 12.5%.
+ */
+static void
+__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm)
+{
+	if (lpm->qsv == NULL)
+		return;
+
+	if (rte_ring_count(lpm->qs_fifo) <
+		(rte_ring_get_capacity(lpm->qs_fifo) >> RCU_QSBR_RECLAIM_LEVEL))
+		return;
+
+	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
+{
+	uint32_t qs_fifo_size;
+	char rcu_ring_name[RTE_RING_NAMESIZE];
+
+	if ((lpm == NULL) || (v == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->qsv) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	/* round up qs_fifo_size to next power of two that is not less than
+	 * number_tbl8s. Will store 'token' and 'index'.
+	 */
+	qs_fifo_size = 2 * rte_align32pow2(lpm->number_tbl8s);
+
+	/* Init QSBR reclaiming FIFO. */
+	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm->name);
+	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
+					SOCKET_ID_ANY, 0);
+	if (lpm->qs_fifo == NULL) {
+		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation failed\n");
+		rte_errno = ENOMEM;
+		return 1;
+	}
+	lpm->qsv = v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -640,6 +788,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
 	return -EINVAL;
 }
 
+static int32_t
+tbl8_alloc_reclaimed(struct rte_lpm *lpm)
+{
+	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+	uint32_t index;
+
+	if (lpm->qsv != NULL) {
+		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
+			/* Set the last reclaimed tbl8 group as VALID. */
+			struct rte_lpm_tbl_entry new_tbl8_entry = {
+				.next_hop = 0,
+				.valid = INVALID,
+				.depth = 0,
+				.valid_group = VALID,
+			};
+
+			tbl8_entry = &lpm->tbl8[index *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+			__atomic_store(tbl8_entry, &new_tbl8_entry,
+					__ATOMIC_RELAXED);
+
+			/* Return group index for reclaimed tbl8 group. */
+			return index;
+		}
+	}
+
+	return -ENOSPC;
+}
+
 /*
  * Find, clean and allocate a tbl8.
  */
@@ -679,14 +856,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
 }
 
 static int32_t
-tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+tbl8_alloc_v1604(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -708,8 +886,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 		}
 	}
 
-	/* If there are no tbl8 groups free then return error. */
-	return -ENOSPC;
+	/* If there are no tbl8 groups free then check reclaim queue. */
+	return tbl8_alloc_reclaimed(lpm);
 }
 
 static void
@@ -728,13 +906,27 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
 }
 
 static void
-tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm_qs_item qs_item;
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (lpm->qsv != NULL) {
+		/* Push into QSBR FIFO. */
+		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
+		qs_item.index = tbl8_group_start;
+		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0)
+			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
+
+		/* Speculatively reclaim tbl8 groups.
+		 * Help spread the reclaim work load across multiple calls.
+		 */
+		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
+	} else {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	}
 }
 
 static __rte_noinline int32_t
@@ -1037,7 +1229,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -1083,7 +1275,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -1818,7 +2010,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -1834,7 +2026,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 906ec4483..5079fb262 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -21,6 +22,7 @@
 #include <rte_common.h>
 #include <rte_vect.h>
 #include <rte_compat.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -186,6 +188,8 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8 group.*/
+	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
 };
 
 /**
@@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
 void
 rte_lpm_free_v1604(struct rte_lpm *lpm);
 
+/**
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param v
+ *   RCU QSBR variable
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 90beac853..b353aabd2 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -44,3 +44,9 @@ DPDK_17.05 {
 	rte_lpm6_lookup_bulk_func;
 
 } DPDK_16.04;
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
diff --git a/lib/meson.build b/lib/meson.build
index e5ff83893..3a96f005d 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,6 +11,7 @@
 libraries = [
 	'kvargs', # eal depends on kvargs
 	'eal', # everything depends on eal
+	'rcu', # hash and lpm depends on this
 	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
 	'cmdline',
 	'metrics', # bitrate/latency stats depends on this
@@ -22,7 +23,7 @@ libraries = [
 	'gro', 'gso', 'ip_frag', 'jobstats',
 	'kni', 'latencystats', 'lpm', 'member',
 	'power', 'pdump', 'rawdev',
-	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
+	'reorder', 'sched', 'security', 'stack', 'vhost',
 	# ipsec lib depends on net, crypto and security
 	'ipsec',
 	# add pkt framework libs which use other libs from above
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
                   ` (2 preceding siblings ...)
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2019-08-22 15:52 ` Honnappa Nagarahalli
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
  4 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-22 15:52 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, Dharmik Thakkar, nd, Ruifeng Wang (Arm Technology China),
	stephen, Konstantin Ananyev, nd

+ Stephen, Konstantin - for your feedback on the RCU integration design.

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Thursday, August 22, 2019 1:35 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> Dharmik Thakkar <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Ruifeng
> Wang (Arm Technology China) <Ruifeng.Wang@arm.com>
> Subject: [RFC PATCH 0/3] RCU integration with LPM library
> 
> This patchset integrates RCU QSBR support with LPM library.
> 
> Document is added with suggested design of integrating RCU library with other
> libraries in DPDK.
> As an example, LPM library adds the integration. RCU is used to safely free tbl8
> groups that can be recycled. Table will not be reclaimed or reused until reader
> finished referencing it.
> 
> New API rte_lpm_rcu_qsbr_add is introduced for application to register a RCU
> variable that LPM library will use.
> 
> New API rte_ring_peek is introduced to help on management of reclaiming
> FIFO queue.
> 
> 
> Honnappa Nagarahalli (1):
>   doc/rcu: add RCU integration design details
> 
> Ruifeng Wang (2):
>   lib/ring: add peek API
>   lib/lpm: integrate RCU QSBR
> 
>  doc/guides/prog_guide/rcu_lib.rst  |  51 +++++++
>  lib/librte_lpm/Makefile            |   3 +-
>  lib/librte_lpm/meson.build         |   2 +
>  lib/librte_lpm/rte_lpm.c           | 218 +++++++++++++++++++++++++++--
>  lib/librte_lpm/rte_lpm.h           |  22 +++
>  lib/librte_lpm/rte_lpm_version.map |   6 +
>  lib/librte_ring/rte_ring.h         |  30 ++++
>  lib/meson.build                    |   3 +-
>  8 files changed, 320 insertions(+), 15 deletions(-)
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2019-08-23  1:23   ` Stephen Hemminger
  2019-08-26  3:11     ` Ruifeng Wang (Arm Technology China)
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen Hemminger @ 2019-08-23  1:23 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: bruce.richardson, vladimir.medvedkin, olivier.matz, dev,
	honnappa.nagarahalli, dharmik.thakkar, nd

On Thu, 22 Aug 2019 14:34:57 +0800
Ruifeng Wang <ruifeng.wang@arm.com> wrote:

> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>

Having RCU in LPM is a good idea but difficult to find out how to
do it in DPDK. Not everyone wants to use RCU, so making a required part
of how LPM is used will impact users.

Also, it looks like DPDK RCU lacks a good generic way to handle deferred
free. Having to introduce a ring to handle is adding more complexity when
a generic solution would be better (see userspace RCU library for example).
Other parts of DPDK would benefit if deferred free was done better.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
  2019-08-23  1:23   ` Stephen Hemminger
@ 2019-08-26  3:11     ` Ruifeng Wang (Arm Technology China)
  2019-08-26  5:32       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang (Arm Technology China) @ 2019-08-26  3:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, vladimir.medvedkin, olivier.matz, dev,
	Honnappa Nagarahalli, Dharmik Thakkar, nd, nd


> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, August 23, 2019 09:23
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>
> Cc: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com; dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
> 
> On Thu, 22 Aug 2019 14:34:57 +0800
> Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> 
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> 
> Having RCU in LPM is a good idea but difficult to find out how to do it in DPDK.
> Not everyone wants to use RCU, so making a required part of how LPM is
> used will impact users.

LPM users will not be imposed to use RCU. New API is provided to enable the RCU
functionality in LPM library. For users not using RCU, code path is intact, and there
will be no performance drop.

> 
> Also, it looks like DPDK RCU lacks a good generic way to handle deferred free.
> Having to introduce a ring to handle is adding more complexity when a
> generic solution would be better (see userspace RCU library for example).
> Other parts of DPDK would benefit if deferred free was done better.

This requires support from RCU library. 
Needs Honnappa's comment.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
  2019-08-26  3:11     ` Ruifeng Wang (Arm Technology China)
@ 2019-08-26  5:32       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-26  5:32 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China), Stephen Hemminger
  Cc: bruce.richardson, vladimir.medvedkin, olivier.matz, dev,
	Dharmik Thakkar, Honnappa Nagarahalli, nd, nd

<snip>
Thank you Stephen for your comments, appreciate your inputs.

> > On Thu, 22 Aug 2019 14:34:57 +0800
> > Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> >
> > > Currently, the tbl8 group is freed even though the readers might be
> > > using the tbl8 group entries. The freed tbl8 group can be
> > > reallocated quickly. This results in incorrect lookup results.
> > >
> > > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > > Refer to RCU documentation to understand various aspects of
> > > integrating RCU library into other libraries.
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >
> > Having RCU in LPM is a good idea but difficult to find out how to do it in
> DPDK.
> > Not everyone wants to use RCU, so making a required part of how LPM is
> > used will impact users.
> 
> LPM users will not be imposed to use RCU. New API is provided to enable the
> RCU functionality in LPM library. For users not using RCU, code path is intact,
> and there will be no performance drop.
> 
> >
> > Also, it looks like DPDK RCU lacks a good generic way to handle deferred
> free.
Both rcu_defer and call_rcu from 'userspace RCU library' are wrappers on top of the underlying basic mechanisms. Such wrappers can be added. However, I would prefer to integrate RCU into couple of libraries to clearly show the need for wrappers. Integrating RCU in the libraries also removes some burden from the application.

> > Having to introduce a ring to handle is adding more complexity when a
> > generic solution would be better (see userspace RCU library for example).
A ring is required in rcu_defer as well as call_rcu since the pointer needs to be stored while waiting for quiescent state updates. The ring is used in the proposed solution for the same purpose.
I briefly looked through rcu_defer. The solution proposed here seems to be similar to rcu_defer. However, there are several differences.
1) rcu_defer uses a single queue for each updater thread, the proposed solution uses per data structure ring. IMO, this provides a better control over the resources to reclaim. Note that currently, the ring usage itself is not optimized (intentionally) to keep the patches focused on the understanding the design.
2) rcu_defer also launches another thread which wakes up periodically and reclaims the resources in the ring (along with the updater thread calling synchronize_rcu, which blocks, when the queue is full). This requires additional synchronization between the updater thread and the reclaimer thread. The solution proposed here does not need another thread as the DPDK RCU library provides non-blocking reclaiming mechanism, reclaiming is done in the context of the update thread.

> > Other parts of DPDK would benefit if deferred free was done better.
Which other parts are you talking about? The design proposed in 1/3 is a common solution that should apply to other libraries as well.

> 
> This requires support from RCU library.
> Needs Honnappa's comment.


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 0/6] RCU integration with LPM library
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
                   ` (3 preceding siblings ...)
  2019-08-22 15:52 ` [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Honnappa Nagarahalli
@ 2019-09-06  9:45 ` Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
                     ` (14 more replies)
  4 siblings, 15 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Document is added with suggested design of integrating RCU
library with other libraries in DPDK.
As an example, LPM library adds the integration. As an option,
RCU is used to safely free tbl8 groups that can be recycled.
Table will not be reclaimed or reused until reader finished
referencing it.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

New API rte_ring_peek is introduced to help on management of
reclaiming FIFO queue.


Honnappa Nagarahalli (3):
  doc/rcu: add RCU integration design details
  test/lpm: reset total time
  test/lpm: add RCU integration performance tests

Ruifeng Wang (3):
  lib/ring: add peek API
  lib/lpm: integrate RCU QSBR
  app/test: add test case for LPM RCU integration

 app/test/test_lpm.c                | 153 +++++++++++++++-
 app/test/test_lpm_perf.c           | 278 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/rcu_lib.rst  |  52 ++++++
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++--
 lib/librte_lpm/rte_lpm.h           |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/librte_ring/rte_ring.h         |  30 ++++
 lib/meson.build                    |   3 +-
 10 files changed, 751 insertions(+), 21 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06 19:44     ` Honnappa Nagarahalli
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add peek API Ruifeng Wang
                     ` (13 subsequent siblings)
  14 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 52 +++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 8fe5b1f73..211948530 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -186,3 +186,55 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Integrating QSBR RCU with other libraries
+-----------------------------------------
+
+Lock-free algorithms place additional burden on the application to reclaim
+memory. Integrating memory reclamation mechanisms in the libraries help
+remove some of the burden. Though QSBR method presents flexibility to
+achieve performance, it presents challenges while integrating with libraries.
+
+The memory reclamation process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
+
+The application has to handle 'Initialization' and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader threads to report their quiescent state.
+* the application has to register the same RCU variable with the client library.
+* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
+
+The client library will handle 'Reclaiming Resources' part of the process. The
+client libraries will make use of the writer thread context to execute the memory
+reclamation algorithm. So,
+
+* client library should provide an API to register a RCU variable that it will use.
+* client library should trigger the readers to report quiescent state status upon deleting the resources by calling ``rte_rcu_qsbr_start``.
+
+* client library should store the token and deleted resources for later use to free them after the readers have reported their quiescent state. Since the readers will report the quiescent state status in the order of deletion, the library must store the tokens/resources in the order in which the resources were deleted. A FIFO data structure would achieve the desired results. The length of the FIFO would depend on the rate of deletion and the rate at which the readers report their quiescent state. In the worst case the length of FIFO would be equal to the maximum number of resources the data structure supports. However, in most cases, the length will be much smaller. But, the client library should not take the length of FIFO as an input from the application. Instead, it should implement a data structure which should be able to grow/shrink dynamically. Overhead introduced by such a data structure on delete operations should be considered as well.
+
+* client library should query the quiescent state and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. This allows the application to do useful work while the readers report their quiescent state. If there are tokens/resources present in the FIFO already, the delete API should peek the head of the FIFO and check the quiescent state status. If the status is success, the token/resource should be dequeued and the resource should be freed. This process can be repeated till the quiescent state status for a token returns failure indicating that subsequent tokens will also fail quiescent state status query. The same process can be incorporated while adding new entries in the data structure if the client library runs out of resources.
+
+The 'Shutdown' process needs to be shared between the application and the
+client library.
+
+* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function.
+
+* client library should check the quiescent state status of all the tokens that may be present in the FIFO and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of the tokens do not pass the quiescent state check, the client library should print an error and stop the memory reclamation process.
+
+Integrating the resource reclamation with client libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclamation happens as part of the writer thread with little impact on
+   performance.
+#. The client library has better control over the resources. For ex: the client
+   library can attempt to reclaim when it has run out of resources.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] lib/ring: add peek API
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, Ruifeng Wang

The peek API allows fetching the next available object in the ring
without dequeuing it. This helps in scenarios where dequeuing of
objects depend on their value.

Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..d3d0d5e18 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 				r->cons.single, available);
 }
 
+/**
+ * Peek one object from a ring.
+ *
+ * The peek API allows fetching the next available object in the ring
+ * without dequeuing it. This API is not multi-thread safe with respect
+ * to other consumer threads.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, object available
+ *   - -ENOENT: Not enough entries in the ring.
+ */
+__rte_experimental
+static __rte_always_inline int
+rte_ring_peek(struct rte_ring *r, void **obj_p)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_head = r->cons.head;
+	uint32_t count = (prod_tail - cons_head) & r->mask;
+	unsigned int n = 1;
+	if (count) {
+		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
+		return 0;
+	}
+	return -ENOENT;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add peek API Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06 19:44     ` Honnappa Nagarahalli
  2019-09-18 16:15     ` Medvedkin, Vladimir
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration Ruifeng Wang
                     ` (11 subsequent siblings)
  14 siblings, 2 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++++++++--
 lib/librte_lpm/rte_lpm.h           |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/meson.build                    |   3 +-
 6 files changed, 244 insertions(+), 15 deletions(-)

diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index a7946a1c5..ca9e16312 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_lpm.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index a5176d8ae..19a35107f 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -2,9 +2,11 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
+allow_experimental_apis = true
 sources = files('rte_lpm.c', 'rte_lpm6.c')
 headers = files('rte_lpm.h', 'rte_lpm6.h')
 # since header files have different names, we can install all vector headers
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 3a929a1b1..9764b8de6 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <string.h>
@@ -22,6 +23,7 @@
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
 #include <rte_tailq.h>
+#include <rte_ring.h>
 
 #include "rte_lpm.h"
 
@@ -39,6 +41,11 @@ enum valid_flag {
 	VALID
 };
 
+struct __rte_lpm_qs_item {
+	uint64_t token;	/**< QSBR token.*/
+	uint32_t index;	/**< tbl8 group index.*/
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -381,6 +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	rte_ring_free(lpm->qs_fifo);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
@@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
 MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
 		rte_lpm_free_v1604);
 
+/* Add an item into FIFO.
+ * return: 0 - success
+ */
+static int
+__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
+	struct __rte_lpm_qs_item *item)
+{
+	if (rte_ring_free_count(fifo) < 2) {
+		RTE_LOG(ERR, LPM, "QS FIFO full\n");
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token);
+	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index);
+
+	return 0;
+}
+
+/* Remove item from FIFO.
+ * Used when data observed by rte_ring_peek.
+ */
+static void
+__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
+	struct __rte_lpm_qs_item *item)
+{
+	void *obj_token = NULL;
+	void *obj_index = NULL;
+
+	(void)rte_ring_sc_dequeue(fifo, &obj_token);
+	(void)rte_ring_sc_dequeue(fifo, &obj_index);
+
+	if (item) {
+		item->token = (uint64_t)((uintptr_t)obj_token);
+		item->index = (uint32_t)((uintptr_t)obj_index);
+	}
+}
+
+/* Max number of tbl8 groups to reclaim at one time. */
+#define RCU_QSBR_RECLAIM_SIZE	8
+
+/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
+ * reclaim will be triggered by tbl8_free.
+ */
+#define RCU_QSBR_RECLAIM_LEVEL	3
+
+/* Reclaim some tbl8 groups based on quiescent state check.
+ * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
+ * Params: lpm   - lpm object handle
+ *         index - (onput) one of successfully reclaimed tbl8 groups
+ * return: 0 - success, 1 - no group reclaimed.
+ */
+static uint32_t
+__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
+{
+	struct __rte_lpm_qs_item qs_item;
+	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+	void *obj_token;
+	uint32_t cnt = 0;
+
+	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n");
+	/* Check reader threads quiescent state and
+	 * reclaim as much tbl8 groups as possible.
+	 */
+	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
+		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
+		(rte_rcu_qsbr_check(lpm->qsv, (uint64_t)((uintptr_t)obj_token),
+					false) == 1)) {
+		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
+
+		tbl8_entry = &lpm->tbl8[qs_item.index *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+		memset(&tbl8_entry[0], 0,
+				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
+				sizeof(tbl8_entry[0]));
+		cnt++;
+	}
+
+	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt);
+	if (cnt) {
+		if (index)
+			*index = qs_item.index;
+		return 0;
+	}
+	return 1;
+}
+
+/* Trigger tbl8 group reclaim when necessary.
+ * Reclaim happens when RCU QSBR queue usage
+ * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL).
+ */
+static void
+__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm)
+{
+	if (lpm->qsv == NULL)
+		return;
+
+	if (rte_ring_count(lpm->qs_fifo) <
+		(rte_ring_get_capacity(lpm->qs_fifo) >> RCU_QSBR_RECLAIM_LEVEL))
+		return;
+
+	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
+{
+	uint32_t qs_fifo_size;
+	char rcu_ring_name[RTE_RING_NAMESIZE];
+
+	if ((lpm == NULL) || (v == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->qsv) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	/* round up qs_fifo_size to next power of two that is not less than
+	 * number_tbl8s. Will store 'token' and 'index'.
+	 */
+	qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1);
+
+	/* Init QSBR reclaiming FIFO. */
+	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm->name);
+	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
+					SOCKET_ID_ANY, 0);
+	if (lpm->qs_fifo == NULL) {
+		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation failed\n");
+		rte_errno = ENOMEM;
+		return 1;
+	}
+	lpm->qsv = v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
 	return -EINVAL;
 }
 
+static int32_t
+tbl8_alloc_reclaimed(struct rte_lpm *lpm)
+{
+	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+	uint32_t index;
+
+	if (lpm->qsv != NULL) {
+		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
+			/* Set the last reclaimed tbl8 group as VALID. */
+			struct rte_lpm_tbl_entry new_tbl8_entry = {
+				.next_hop = 0,
+				.valid = INVALID,
+				.depth = 0,
+				.valid_group = VALID,
+			};
+
+			tbl8_entry = &lpm->tbl8[index *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+			__atomic_store(tbl8_entry, &new_tbl8_entry,
+					__ATOMIC_RELAXED);
+
+			/* Return group index for reclaimed tbl8 group. */
+			return index;
+		}
+	}
+
+	return -ENOSPC;
+}
+
 /*
  * Find, clean and allocate a tbl8.
  */
@@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
 }
 
 static int32_t
-tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+tbl8_alloc_v1604(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -708,8 +887,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 		}
 	}
 
-	/* If there are no tbl8 groups free then return error. */
-	return -ENOSPC;
+	/* If there are no tbl8 groups free then check reclaim queue. */
+	return tbl8_alloc_reclaimed(lpm);
 }
 
 static void
@@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
 }
 
 static void
-tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm_qs_item qs_item;
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (lpm->qsv != NULL) {
+		/* Push into QSBR FIFO. */
+		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
+		qs_item.index =
+			tbl8_group_start / RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
+		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0)
+			/* This should never happen as FIFO size is big enough
+			 * to hold all tbl8 groups.
+			 */
+			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
+
+		/* Speculatively reclaim tbl8 groups.
+		 * Help spread the reclaim work load across multiple calls.
+		 */
+		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
+	} else {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	}
 }
 
 static __rte_noinline int32_t
@@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -1834,7 +2031,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 906ec4483..5079fb262 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -21,6 +22,7 @@
 #include <rte_common.h>
 #include <rte_vect.h>
 #include <rte_compat.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -186,6 +188,8 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8 group.*/
+	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
 };
 
 /**
@@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
 void
 rte_lpm_free_v1604(struct rte_lpm *lpm);
 
+/**
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param v
+ *   RCU QSBR variable
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 90beac853..b353aabd2 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -44,3 +44,9 @@ DPDK_17.05 {
 	rte_lpm6_lookup_bulk_func;
 
 } DPDK_16.04;
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
diff --git a/lib/meson.build b/lib/meson.build
index e5ff83893..3a96f005d 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,6 +11,7 @@
 libraries = [
 	'kvargs', # eal depends on kvargs
 	'eal', # everything depends on eal
+	'rcu', # hash and lpm depends on this
 	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
 	'cmdline',
 	'metrics', # bitrate/latency stats depends on this
@@ -22,7 +23,7 @@ libraries = [
 	'gro', 'gso', 'ip_frag', 'jobstats',
 	'kni', 'latencystats', 'lpm', 'member',
 	'power', 'pdump', 'rawdev',
-	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
+	'reorder', 'sched', 'security', 'stack', 'vhost',
 	# ipsec lib depends on net, crypto and security
 	'ipsec',
 	# add pkt framework libs which use other libs from above
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (2 preceding siblings ...)
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06 19:45     ` Honnappa Nagarahalli
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time Ruifeng Wang
                     ` (10 subsequent siblings)
  14 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_lpm.c | 153 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 152 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index e969fe051..cfd372395 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,8 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +64,9 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20
 };
 
 #define NUM_LPM_TESTS (sizeof(tests)/sizeof(tests[0]))
@@ -1266,6 +1271,152 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check LPM attached RCU QSBR variable and FIFO queue
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv2);
+	TEST_LPM_ASSERT(status != 0);
+
+	TEST_LPM_ASSERT(lpm->qsv == qsv);
+	TEST_LPM_ASSERT(lpm->qs_fifo != NULL);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add functional test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 18, 100, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (3 preceding siblings ...)
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-18 16:17     ` Medvedkin, Vladimir
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests Ruifeng Wang
                     ` (9 subsequent siblings)
  14 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, stable

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

total_time needs to be reset to measure the cycles for delete API.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 77eea66ad..a2578fe90 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -460,7 +460,7 @@ test_lpm_perf(void)
 			(double)total_time / ((double)ITERATIONS * BATCH_SIZE),
 			(count * 100.0) / (double)(ITERATIONS * BATCH_SIZE));
 
-	/* Delete */
+	/* Measure Delete */
 	status = 0;
 	begin = rte_rdtsc();
 
@@ -470,7 +470,7 @@ test_lpm_perf(void)
 				large_route_table[i].depth);
 	}
 
-	total_time += rte_rdtsc() - begin;
+	total_time = rte_rdtsc() - begin;
 
 	printf("Average LPM Delete: %g cycles\n",
 			(double)total_time / NUM_ROUTE_ENTRIES);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (4 preceding siblings ...)
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06 19:46     ` Honnappa Nagarahalli
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
                     ` (8 subsequent siblings)
  14 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 274 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 271 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index a2578fe90..475e5d488 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,23 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_rcu_qsbr.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+/* Report quiescent state interval every 8192 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 8192
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +36,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +48,13 @@ struct route_rule {
 };
 
 struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +208,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +253,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +300,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +344,248 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(__attribute__((unused)) void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * Single writer, Single QS variable, Single QSBR query,
+ * Non-blocking rcu_qsbr_check
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +609,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +744,8 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
@ 2019-09-06 19:44     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:44 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, nd, paulmck, nd

Adding Paul for feedback on design

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, September 6, 2019 4:45 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>
> Subject: [PATCH v2 1/6] doc/rcu: add RCU integration design details
> 
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> Add a section to describe a design to integrate QSBR RCU library with other
> libraries in DPDK.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  doc/guides/prog_guide/rcu_lib.rst | 52 +++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rcu_lib.rst
> b/doc/guides/prog_guide/rcu_lib.rst
> index 8fe5b1f73..211948530 100644
> --- a/doc/guides/prog_guide/rcu_lib.rst
> +++ b/doc/guides/prog_guide/rcu_lib.rst
> @@ -186,3 +186,55 @@ However, when
> ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid  in debugging
> issues. One can mark the access to shared data structures on the  reader side
> using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if  all the locks are
> unlocked.
> +
> +Integrating QSBR RCU with other libraries
> +-----------------------------------------
> +
> +Lock-free algorithms place additional burden on the application to
> +reclaim memory. Integrating memory reclamation mechanisms in the
> +libraries help remove some of the burden. Though QSBR method presents
> +flexibility to achieve performance, it presents challenges while integrating
> with libraries.
> +
> +The memory reclamation process using QSBR can be split into 4 parts:
> +
> +#. Initialization
> +#. Quiescent State Reporting
> +#. Reclaiming Resources
> +#. Shutdown
> +
> +The design proposed here assigns different parts of this process to client
> libraries and applications. The term 'client library' refers to data structure
> libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of
> DPDK. The term 'application' refers to the packet processing application that
> makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
> +
> +The application has to handle 'Initialization' and 'Quiescent State
> +Reporting'. So,
> +
> +* the application has to create the RCU variable and register the reader
> threads to report their quiescent state.
> +* the application has to register the same RCU variable with the client library.
> +* reader threads in the application have to report the quiescent state. This
> allows for the application to control the length of the critical section/how
> frequently the application wants to report the quiescent state.
> +
> +The client library will handle 'Reclaiming Resources' part of the
> +process. The client libraries will make use of the writer thread
> +context to execute the memory reclamation algorithm. So,
> +
> +* client library should provide an API to register a RCU variable that it will use.
> +* client library should trigger the readers to report quiescent state status
> upon deleting the resources by calling ``rte_rcu_qsbr_start``.
> +
> +* client library should store the token and deleted resources for later use to
> free them after the readers have reported their quiescent state. Since the
> readers will report the quiescent state status in the order of deletion, the
> library must store the tokens/resources in the order in which the resources
> were deleted. A FIFO data structure would achieve the desired results. The
> length of the FIFO would depend on the rate of deletion and the rate at which
> the readers report their quiescent state. In the worst case the length of FIFO
> would be equal to the maximum number of resources the data structure
> supports. However, in most cases, the length will be much smaller. But, the
> client library should not take the length of FIFO as an input from the
> application. Instead, it should implement a data structure which should be able
> to grow/shrink dynamically. Overhead introduced by such a data structure on
> delete operations should be considered as well.
> +
> +* client library should query the quiescent state and free the resources. It
> should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the
> quiescent state. This allows the application to do useful work while the readers
> report their quiescent state. If there are tokens/resources present in the FIFO
> already, the delete API should peek the head of the FIFO and check the
> quiescent state status. If the status is success, the token/resource should be
> dequeued and the resource should be freed. This process can be repeated till
> the quiescent state status for a token returns failure indicating that
> subsequent tokens will also fail quiescent state status query. The same process
> can be incorporated while adding new entries in the data structure if the client
> library runs out of resources.
> +
> +The 'Shutdown' process needs to be shared between the application and
> +the client library.
> +
> +* the application should make sure that the reader threads are not using the
> shared data structure, unregister the reader threads from the QSBR variable
> before calling the client library's shutdown function.
> +
> +* client library should check the quiescent state status of all the tokens that
> may be present in the FIFO and free the resources. It should make use of non-
> blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of the
> tokens do not pass the quiescent state check, the client library should print an
> error and stop the memory reclamation process.
> +
> +Integrating the resource reclamation with client libraries removes the
> +burden from the application and makes it easy to use lock-free algorithms.
> +
> +This design has several advantages over currently known methods.
> +
> +#. Application does not need a dedicated thread to reclaim resources.
> Memory
> +   reclamation happens as part of the writer thread with little impact on
> +   performance.
> +#. The client library has better control over the resources. For ex: the client
> +   library can attempt to reclaim when it has run out of resources.
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2019-09-06 19:44     ` Honnappa Nagarahalli
  2019-09-18 16:15     ` Medvedkin, Vladimir
  1 sibling, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:44 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, nd, Ruifeng Wang (Arm Technology China),
	paulmck, nd

Adding Paul for feedback

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, September 6, 2019 4:46 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Ruifeng Wang (Arm
> Technology China) <Ruifeng.Wang@arm.com>
> Subject: [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
> 
> Currently, the tbl8 group is freed even though the readers might be using the
> tbl8 group entries. The freed tbl8 group can be reallocated quickly. This results
> in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of integrating RCU
> library into other libraries.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_lpm/Makefile            |   3 +-
>  lib/librte_lpm/meson.build         |   2 +
>  lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++++++++--
>  lib/librte_lpm/rte_lpm.h           |  22 +++
>  lib/librte_lpm/rte_lpm_version.map |   6 +
>  lib/meson.build                    |   3 +-
>  6 files changed, 244 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> a7946a1c5..ca9e16312 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name  LIB =
> librte_lpm.a
> 
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> 
>  EXPORT_MAP := rte_lpm_version.map
> 
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build index
> a5176d8ae..19a35107f 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -2,9 +2,11 @@
>  # Copyright(c) 2017 Intel Corporation
> 
>  version = 2
> +allow_experimental_apis = true
>  sources = files('rte_lpm.c', 'rte_lpm6.c')  headers = files('rte_lpm.h',
> 'rte_lpm6.h')  # since header files have different names, we can install all vector
> headers  # without worrying about which architecture we actually need
> headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')  deps +=
> ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> 3a929a1b1..9764b8de6 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #include <string.h>
> @@ -22,6 +23,7 @@
>  #include <rte_rwlock.h>
>  #include <rte_spinlock.h>
>  #include <rte_tailq.h>
> +#include <rte_ring.h>
> 
>  #include "rte_lpm.h"
> 
> @@ -39,6 +41,11 @@ enum valid_flag {
>  	VALID
>  };
> 
> +struct __rte_lpm_qs_item {
> +	uint64_t token;	/**< QSBR token.*/
> +	uint32_t index;	/**< tbl8 group index.*/
> +};
> +
>  /* Macro to enable/disable run-time checks. */  #if
> defined(RTE_LIBRTE_LPM_DEBUG)  #include <rte_debug.h> @@ -381,6
> +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> 
>  	rte_mcfg_tailq_write_unlock();
> 
> +	rte_ring_free(lpm->qs_fifo);
>  	rte_free(lpm->tbl8);
>  	rte_free(lpm->rules_tbl);
>  	rte_free(lpm);
> @@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> 16.04);  MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
>  		rte_lpm_free_v1604);
> 
> +/* Add an item into FIFO.
> + * return: 0 - success
> + */
> +static int
> +__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
> +	struct __rte_lpm_qs_item *item)
> +{
> +	if (rte_ring_free_count(fifo) < 2) {
> +		RTE_LOG(ERR, LPM, "QS FIFO full\n");
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token);
> +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index);
> +
> +	return 0;
> +}
> +
> +/* Remove item from FIFO.
> + * Used when data observed by rte_ring_peek.
> + */
> +static void
> +__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
> +	struct __rte_lpm_qs_item *item)
> +{
> +	void *obj_token = NULL;
> +	void *obj_index = NULL;
> +
> +	(void)rte_ring_sc_dequeue(fifo, &obj_token);
> +	(void)rte_ring_sc_dequeue(fifo, &obj_index);
> +
> +	if (item) {
> +		item->token = (uint64_t)((uintptr_t)obj_token);
> +		item->index = (uint32_t)((uintptr_t)obj_index);
> +	}
> +}
> +
> +/* Max number of tbl8 groups to reclaim at one time. */
> +#define RCU_QSBR_RECLAIM_SIZE	8
> +
> +/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
> + * reclaim will be triggered by tbl8_free.
> + */
> +#define RCU_QSBR_RECLAIM_LEVEL	3
> +
> +/* Reclaim some tbl8 groups based on quiescent state check.
> + * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
> + * Params: lpm   - lpm object handle
> + *         index - (onput) one of successfully reclaimed tbl8 groups
> + * return: 0 - success, 1 - no group reclaimed.
> + */
> +static uint32_t
> +__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
> +{
> +	struct __rte_lpm_qs_item qs_item;
> +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> +	void *obj_token;
> +	uint32_t cnt = 0;
> +
> +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n");
> +	/* Check reader threads quiescent state and
> +	 * reclaim as much tbl8 groups as possible.
> +	 */
> +	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
> +		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
> +		(rte_rcu_qsbr_check(lpm->qsv,
> (uint64_t)((uintptr_t)obj_token),
> +					false) == 1)) {
> +		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
> +
> +		tbl8_entry = &lpm->tbl8[qs_item.index *
> +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +		memset(&tbl8_entry[0], 0,
> +				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> +				sizeof(tbl8_entry[0]));
> +		cnt++;
> +	}
> +
> +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt);
> +	if (cnt) {
> +		if (index)
> +			*index = qs_item.index;
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +/* Trigger tbl8 group reclaim when necessary.
> + * Reclaim happens when RCU QSBR queue usage
> + * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL).
> + */
> +static void
> +__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm) {
> +	if (lpm->qsv == NULL)
> +		return;
> +
> +	if (rte_ring_count(lpm->qs_fifo) <
> +		(rte_ring_get_capacity(lpm->qs_fifo) >>
> RCU_QSBR_RECLAIM_LEVEL))
> +		return;
> +
> +	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL); }
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> +	uint32_t qs_fifo_size;
> +	char rcu_ring_name[RTE_RING_NAMESIZE];
> +
> +	if ((lpm == NULL) || (v == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->qsv) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * number_tbl8s. Will store 'token' and 'index'.
> +	 */
> +	qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1);
> +
> +	/* Init QSBR reclaiming FIFO. */
> +	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm-
> >name);
> +	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);
> +	if (lpm->qs_fifo == NULL) {
> +		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation
> failed\n");
> +		rte_errno = ENOMEM;
> +		return 1;
> +	}
> +	lpm->qsv = v;
> +
> +	return 0;
> +}
> +
>  /*
>   * Adds a rule to the rule table.
>   *
> @@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth)
>  	return -EINVAL;
>  }
> 
> +static int32_t
> +tbl8_alloc_reclaimed(struct rte_lpm *lpm) {
> +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> +	uint32_t index;
> +
> +	if (lpm->qsv != NULL) {
> +		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
> +			/* Set the last reclaimed tbl8 group as VALID. */
> +			struct rte_lpm_tbl_entry new_tbl8_entry = {
> +				.next_hop = 0,
> +				.valid = INVALID,
> +				.depth = 0,
> +				.valid_group = VALID,
> +			};
> +
> +			tbl8_entry = &lpm->tbl8[index *
> +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +			__atomic_store(tbl8_entry, &new_tbl8_entry,
> +					__ATOMIC_RELAXED);
> +
> +			/* Return group index for reclaimed tbl8 group. */
> +			return index;
> +		}
> +	}
> +
> +	return -ENOSPC;
> +}
> +
>  /*
>   * Find, clean and allocate a tbl8.
>   */
> @@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> *tbl8)  }
> 
>  static int32_t
> -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +tbl8_alloc_v1604(struct rte_lpm *lpm)
>  {
>  	uint32_t group_idx; /* tbl8 group index. */
>  	struct rte_lpm_tbl_entry *tbl8_entry;
> 
>  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>  		/* If a free tbl8 group is found clean it and set as VALID. */
>  		if (!tbl8_entry->valid_group) {
>  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 708,8 +887,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> number_tbl8s)
>  		}
>  	}
> 
> -	/* If there are no tbl8 groups free then return error. */
> -	return -ENOSPC;
> +	/* If there are no tbl8 groups free then check reclaim queue. */
> +	return tbl8_alloc_reclaimed(lpm);
>  }
> 
>  static void
> @@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8,
> uint32_t tbl8_group_start)  }
> 
>  static void
> -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>  {
> -	/* Set tbl8 group invalid*/
> +	struct __rte_lpm_qs_item qs_item;
>  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> 
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (lpm->qsv != NULL) {
> +		/* Push into QSBR FIFO. */
> +		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
> +		qs_item.index =
> +			tbl8_group_start /
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> +		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0)
> +			/* This should never happen as FIFO size is big enough
> +			 * to hold all tbl8 groups.
> +			 */
> +			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
> +
> +		/* Speculatively reclaim tbl8 groups.
> +		 * Help spread the reclaim work load across multiple calls.
> +		 */
> +		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
> +	} else {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	}
>  }
> 
>  static __rte_noinline int32_t
> @@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> 
>  	if (!lpm->tbl24[tbl24_index].valid) {
>  		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> 
>  		/* Check tbl8 allocation was successful. */
>  		if (tbl8_group_index < 0) {
> @@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
>  	} /* If valid entry but not extended calculate the index into Table8. */
>  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>  		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> 
>  		if (tbl8_group_index < 0) {
>  			return tbl8_group_index;
> @@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
>  		 */
>  		lpm->tbl24[tbl24_index].valid = 0;
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>  	} else if (tbl8_recycle_index > -1) {
>  		/* Update tbl24 entry. */
>  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> +2031,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked,
>  		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>  				__ATOMIC_RELAXED);
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>  	}
>  #undef group_idx
>  	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> 906ec4483..5079fb262 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #ifndef _RTE_LPM_H_
> @@ -21,6 +22,7 @@
>  #include <rte_common.h>
>  #include <rte_vect.h>
>  #include <rte_compat.h>
> +#include <rte_rcu_qsbr.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -186,6 +188,8 @@ struct rte_lpm {
>  			__rte_cache_aligned; /**< LPM tbl24 table. */
>  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8
> group.*/
> +	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
>  };
> 
>  /**
> @@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);  void
> rte_lpm_free_v1604(struct rte_lpm *lpm);
> 
> +/**
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param v
> + *   RCU QSBR variable
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
> +
>  /**
>   * Add a rule to the LPM table.
>   *
> diff --git a/lib/librte_lpm/rte_lpm_version.map
> b/lib/librte_lpm/rte_lpm_version.map
> index 90beac853..b353aabd2 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -44,3 +44,9 @@ DPDK_17.05 {
>  	rte_lpm6_lookup_bulk_func;
> 
>  } DPDK_16.04;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};
> diff --git a/lib/meson.build b/lib/meson.build index e5ff83893..3a96f005d
> 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,6 +11,7 @@
>  libraries = [
>  	'kvargs', # eal depends on kvargs
>  	'eal', # everything depends on eal
> +	'rcu', # hash and lpm depends on this
>  	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>  	'cmdline',
>  	'metrics', # bitrate/latency stats depends on this @@ -22,7 +23,7 @@
> libraries = [
>  	'gro', 'gso', 'ip_frag', 'jobstats',
>  	'kni', 'latencystats', 'lpm', 'member',
>  	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>  	# ipsec lib depends on net, crypto and security
>  	'ipsec',
>  	# add pkt framework libs which use other libs from above
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration Ruifeng Wang
@ 2019-09-06 19:45     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:45 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, nd, Ruifeng Wang (Arm Technology China),
	paulmck, nd

Adding Paul for feedback

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, September 6, 2019 4:46 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Ruifeng Wang (Arm
> Technology China) <Ruifeng.Wang@arm.com>
> Subject: [PATCH v2 4/6] app/test: add test case for LPM RCU integration
> 
> Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
> Also test LPM library behavior when RCU QSBR is enabled.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  app/test/test_lpm.c | 153 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 152 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c index
> e969fe051..cfd372395 100644
> --- a/app/test/test_lpm.c
> +++ b/app/test/test_lpm.c
> @@ -8,6 +8,7 @@
> 
>  #include <rte_ip.h>
>  #include <rte_lpm.h>
> +#include <rte_malloc.h>
> 
>  #include "test.h"
>  #include "test_xmmt_ops.h"
> @@ -40,6 +41,8 @@ static int32_t test15(void);  static int32_t test16(void);
> static int32_t test17(void);  static int32_t test18(void);
> +static int32_t test19(void);
> +static int32_t test20(void);
> 
>  rte_lpm_test tests[] = {
>  /* Test Cases */
> @@ -61,7 +64,9 @@ rte_lpm_test tests[] = {
>  	test15,
>  	test16,
>  	test17,
> -	test18
> +	test18,
> +	test19,
> +	test20
>  };
> 
>  #define NUM_LPM_TESTS (sizeof(tests)/sizeof(tests[0])) @@ -1266,6
> +1271,152 @@ test18(void)
>  	return PASS;
>  }
> 
> +/*
> + * rte_lpm_rcu_qsbr_add positive and negative tests.
> + *  - Add RCU QSBR variable to LPM
> + *  - Add another RCU QSBR variable to LPM
> + *  - Check LPM attached RCU QSBR variable and FIFO queue  */ int32_t
> +test19(void)
> +{
> +	struct rte_lpm *lpm = NULL;
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	struct rte_rcu_qsbr *qsv;
> +	struct rte_rcu_qsbr *qsv2;
> +	int32_t status;
> +
> +	config.max_rules = MAX_RULES;
> +	config.number_tbl8s = NUMBER_TBL8S;
> +	config.flags = 0;
> +
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Create RCU QSBR variable */
> +	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
> +	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +					RTE_CACHE_LINE_SIZE,
> SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv != NULL);
> +
> +	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	/* Attach RCU QSBR to LPM table */
> +	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	/* Create and attach another RCU QSBR to LPM table */
> +	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +					RTE_CACHE_LINE_SIZE,
> SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv2 != NULL);
> +
> +	status = rte_lpm_rcu_qsbr_add(lpm, qsv2);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	TEST_LPM_ASSERT(lpm->qsv == qsv);
> +	TEST_LPM_ASSERT(lpm->qs_fifo != NULL);
> +
> +	rte_lpm_free(lpm);
> +	rte_free(qsv);
> +	rte_free(qsv2);
> +
> +	return PASS;
> +}
> +
> +/*
> + * rte_lpm_rcu_qsbr_add functional test.
> + *  - Create LPM which supports 1 tbl8 group at max
> + *  - Add RCU QSBR variable to LPM
> + *  - Add a rule with depth=28 (> 24)
> + *  - Register a reader thread (not a real thread)
> + *  - Reader lookup existing rule
> + *  - Writer delete the rule
> + *  - Reader lookup the rule
> + *  - Writer re-add the rule (no available tbl8 group)
> + *  - Reader report quiescent state and unregister
> + *  - Writer re-add the rule
> + *  - Reader lookup the rule
> + */
> +int32_t
> +test20(void)
> +{
> +	struct rte_lpm *lpm = NULL;
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	struct rte_rcu_qsbr *qsv;
> +	int32_t status;
> +	uint32_t ip, next_hop, next_hop_return;
> +	uint8_t depth;
> +
> +	config.max_rules = MAX_RULES;
> +	config.number_tbl8s = 1;
> +	config.flags = 0;
> +
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Create RCU QSBR variable */
> +	sz = rte_rcu_qsbr_get_memsize(1);
> +	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +					RTE_CACHE_LINE_SIZE,
> SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv != NULL);
> +
> +	status = rte_rcu_qsbr_init(qsv, 1);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	/* Attach RCU QSBR to LPM table */
> +	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	ip = RTE_IPV4(192, 18, 100, 100);
> +	depth = 28;
> +	next_hop = 1;
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
> +
> +	/* Register pseudo reader */
> +	status = rte_rcu_qsbr_thread_register(qsv, 0);
> +	TEST_LPM_ASSERT(status == 0);
> +	rte_rcu_qsbr_thread_online(qsv, 0);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(next_hop_return == next_hop);
> +
> +	/* Writer update */
> +	status = rte_lpm_delete(lpm, ip, depth);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	/* Reader quiescent */
> +	rte_rcu_qsbr_quiescent(qsv, 0);
> +
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	rte_rcu_qsbr_thread_offline(qsv, 0);
> +	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(next_hop_return == next_hop);
> +
> +	rte_lpm_free(lpm);
> +	rte_free(qsv);
> +
> +	return PASS;
> +}
> +
>  /*
>   * Do all unit tests.
>   */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests Ruifeng Wang
@ 2019-09-06 19:46     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:46 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, nd, paulmck, nd

Adding Paul for feedback

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, September 6, 2019 4:46 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>
> Subject: [PATCH v2 6/6] test/lpm: add RCU integration performance tests
> 
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> Add performance tests for RCU integration. The performance difference with
> and without RCU integration is very small (~1% to ~2%) on both Arm and x86
> platforms.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_lpm_perf.c | 274 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 271 insertions(+), 3 deletions(-)
> 
> diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c index
> a2578fe90..475e5d488 100644
> --- a/app/test/test_lpm_perf.c
> +++ b/app/test/test_lpm_perf.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #include <stdio.h>
> @@ -10,12 +11,23 @@
>  #include <rte_cycles.h>
>  #include <rte_random.h>
>  #include <rte_branch_prediction.h>
> +#include <rte_malloc.h>
>  #include <rte_ip.h>
>  #include <rte_lpm.h>
> +#include <rte_rcu_qsbr.h>
> 
>  #include "test.h"
>  #include "test_xmmt_ops.h"
> 
> +struct rte_lpm *lpm;
> +static struct rte_rcu_qsbr *rv;
> +static volatile uint8_t writer_done;
> +static volatile uint32_t thr_id;
> +/* Report quiescent state interval every 8192 lookups. Larger critical
> + * sections in reader will result in writer polling multiple times.
> + */
> +#define QSBR_REPORTING_INTERVAL 8192
> +
>  #define TEST_LPM_ASSERT(cond) do {                                            \
>  	if (!(cond)) {                                                        \
>  		printf("Error at line %d: \n", __LINE__);                     \
> @@ -24,6 +36,7 @@
>  } while(0)
> 
>  #define ITERATIONS (1 << 10)
> +#define RCU_ITERATIONS 10
>  #define BATCH_SIZE (1 << 12)
>  #define BULK_SIZE 32
> 
> @@ -35,9 +48,13 @@ struct route_rule {
>  };
> 
>  struct route_rule large_route_table[MAX_RULE_NUM];
> +/* Route table for routes with depth > 24 */ struct route_rule
> +large_ldepth_route_table[MAX_RULE_NUM];
> 
>  static uint32_t num_route_entries;
> +static uint32_t num_ldepth_route_entries;
>  #define NUM_ROUTE_ENTRIES num_route_entries
> +#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
> 
>  enum {
>  	IP_CLASS_A,
> @@ -191,7 +208,7 @@ static void generate_random_rule_prefix(uint32_t
> ip_class, uint8_t depth)
>  	uint32_t ip_head_mask;
>  	uint32_t rule_num;
>  	uint32_t k;
> -	struct route_rule *ptr_rule;
> +	struct route_rule *ptr_rule, *ptr_ldepth_rule;
> 
>  	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
>  		fixed_bit_num = IP_HEAD_BIT_NUM_A;
> @@ -236,10 +253,20 @@ static void generate_random_rule_prefix(uint32_t
> ip_class, uint8_t depth)
>  	 */
>  	start = lrand48() & mask;
>  	ptr_rule = &large_route_table[num_route_entries];
> +	ptr_ldepth_rule =
> &large_ldepth_route_table[num_ldepth_route_entries];
>  	for (k = 0; k < rule_num; k++) {
>  		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
>  			| ip_head_mask;
>  		ptr_rule->depth = depth;
> +		/* If the depth of the route is more than 24, store it
> +		 * in another table as well.
> +		 */
> +		if (depth > 24) {
> +			ptr_ldepth_rule->ip = ptr_rule->ip;
> +			ptr_ldepth_rule->depth = ptr_rule->depth;
> +			ptr_ldepth_rule++;
> +			num_ldepth_route_entries++;
> +		}
>  		ptr_rule++;
>  		start = (start + step) & mask;
>  	}
> @@ -273,6 +300,7 @@ static void generate_large_route_rule_table(void)
>  	uint8_t  depth;
> 
>  	num_route_entries = 0;
> +	num_ldepth_route_entries = 0;
>  	memset(large_route_table, 0, sizeof(large_route_table));
> 
>  	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) { @@ -
> 316,10 +344,248 @@ print_route_distribution(const struct route_rule *table,
> uint32_t n)
>  	printf("\n");
>  }
> 
> +/* Check condition and return an error if true. */ static uint16_t
> +enabled_core_ids[RTE_MAX_LCORE]; static unsigned int num_cores;
> +
> +/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
> +static inline uint32_t
> +alloc_thread_id(void)
> +{
> +	uint32_t tmp_thr_id;
> +
> +	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
> +	if (tmp_thr_id >= RTE_MAX_LCORE)
> +		printf("Invalid thread id %u\n", tmp_thr_id);
> +
> +	return tmp_thr_id;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure without RCU.
> + */
> +static int
> +test_lpm_reader(__attribute__((unused)) void *arg) {
> +	int i;
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +	} while (!writer_done);
> +
> +	return 0;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg) {
> +	int i;
> +	uint32_t thread_id = alloc_thread_id();
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	/* Register this thread to report quiescent state */
> +	rte_rcu_qsbr_thread_register(rv, thread_id);
> +	rte_rcu_qsbr_thread_online(rv, thread_id);
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +		/* Update quiescent state */
> +		rte_rcu_qsbr_quiescent(rv, thread_id);
> +	} while (!writer_done);
> +
> +	rte_rcu_qsbr_thread_offline(rv, thread_id);
> +	rte_rcu_qsbr_thread_unregister(rv, thread_id);
> +
> +	return 0;
> +}
> +
> +/*
> + * Functional test:
> + * Single writer, Single QS variable, Single QSBR query,
> + * Non-blocking rcu_qsbr_check
> + */
> +static int
> +test_lpm_rcu_perf(void)
> +{
> +	struct rte_lpm_config config;
> +	uint64_t begin, total_cycles;
> +	size_t sz;
> +	unsigned int i, j;
> +	uint16_t core_id;
> +	uint32_t next_hop_add = 0xAA;
> +
> +	if (rte_lcore_count() < 2) {
> +		printf("Not enough cores for lpm_rcu_perf_autotest,
> expecting at least 2\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_cores = 0;
> +	RTE_LCORE_FOREACH_SLAVE(core_id) {
> +		enabled_core_ids[num_cores] = core_id;
> +		num_cores++;
> +	}
> +
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Init RCU variable */
> +	sz = rte_rcu_qsbr_get_memsize(num_cores);
> +	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
> +						RTE_CACHE_LINE_SIZE);
> +	rte_rcu_qsbr_init(rv, num_cores);
> +
> +	/* Assign the RCU variable to LPM */
> +	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
> +		printf("RCU variable assignment failed\n");
> +		goto error;
> +	}
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d,
> route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d,
> route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS *
> NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES *
> ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +	lpm = NULL;
> +	rv = NULL;
> +
> +	/* Test without RCU integration */
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d,
> route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d,
> route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS *
> NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES *
> ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			printf("Warning: lcore %u not finished.\n",
> +				enabled_core_ids[i]);
> +
> +	rte_lpm_free(lpm);
> +
> +	return 0;
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until all readers have exited */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +
> +	return -1;
> +}
> +
>  static int
>  test_lpm_perf(void)
>  {
> -	struct rte_lpm *lpm = NULL;
>  	struct rte_lpm_config config;
> 
>  	config.max_rules = 2000000;
> @@ -343,7 +609,7 @@ test_lpm_perf(void)
>  	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
>  	TEST_LPM_ASSERT(lpm != NULL);
> 
> -	/* Measue add. */
> +	/* Measure add. */
>  	begin = rte_rdtsc();
> 
>  	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) { @@ -478,6 +744,8 @@
> test_lpm_perf(void)
>  	rte_lpm_delete_all(lpm);
>  	rte_lpm_free(lpm);
> 
> +	test_lpm_rcu_perf();
> +
>  	return 0;
>  }
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2019-09-06 19:44     ` Honnappa Nagarahalli
@ 2019-09-18 16:15     ` Medvedkin, Vladimir
  2019-09-19  6:17       ` Ruifeng Wang (Arm Technology China)
  1 sibling, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-09-18 16:15 UTC (permalink / raw)
  To: Ruifeng Wang, bruce.richardson, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd

Hi Ruifeng,

Thanks for this patchseries, see comments below.

On 06/09/2019 10:45, Ruifeng Wang wrote:
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
>
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>   lib/librte_lpm/Makefile            |   3 +-
>   lib/librte_lpm/meson.build         |   2 +
>   lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++++++++--
>   lib/librte_lpm/rte_lpm.h           |  22 +++
>   lib/librte_lpm/rte_lpm_version.map |   6 +
>   lib/meson.build                    |   3 +-
>   6 files changed, 244 insertions(+), 15 deletions(-)
>
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index a7946a1c5..ca9e16312 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
>   # library name
>   LIB = librte_lpm.a
>   
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>   CFLAGS += -O3
>   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
>   
>   EXPORT_MAP := rte_lpm_version.map
>   
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index a5176d8ae..19a35107f 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -2,9 +2,11 @@
>   # Copyright(c) 2017 Intel Corporation
>   
>   version = 2
> +allow_experimental_apis = true
>   sources = files('rte_lpm.c', 'rte_lpm6.c')
>   headers = files('rte_lpm.h', 'rte_lpm6.h')
>   # since header files have different names, we can install all vector headers
>   # without worrying about which architecture we actually need
>   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>   deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 3a929a1b1..9764b8de6 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>    */
>   
>   #include <string.h>
> @@ -22,6 +23,7 @@
>   #include <rte_rwlock.h>
>   #include <rte_spinlock.h>
>   #include <rte_tailq.h>
> +#include <rte_ring.h>
>   
>   #include "rte_lpm.h"
>   
> @@ -39,6 +41,11 @@ enum valid_flag {
>   	VALID
>   };
>   
> +struct __rte_lpm_qs_item {
> +	uint64_t token;	/**< QSBR token.*/
> +	uint32_t index;	/**< tbl8 group index.*/
> +};
> +
>   /* Macro to enable/disable run-time checks. */
>   #if defined(RTE_LIBRTE_LPM_DEBUG)
>   #include <rte_debug.h>
> @@ -381,6 +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
>   
>   	rte_mcfg_tailq_write_unlock();
>   
> +	rte_ring_free(lpm->qs_fifo);
>   	rte_free(lpm->tbl8);
>   	rte_free(lpm->rules_tbl);
>   	rte_free(lpm);
> @@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
>   MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
>   		rte_lpm_free_v1604);
>   
> +/* Add an item into FIFO.
> + * return: 0 - success
> + */
> +static int
> +__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
> +	struct __rte_lpm_qs_item *item)
> +{
> +	if (rte_ring_free_count(fifo) < 2) {
> +		RTE_LOG(ERR, LPM, "QS FIFO full\n");
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token);
> +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index);
> +
> +	return 0;
> +}
> +
> +/* Remove item from FIFO.
> + * Used when data observed by rte_ring_peek.
> + */
> +static void
> +__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
> +	struct __rte_lpm_qs_item *item)
Is it necessary to pass the pointer for struct __rte_lpm_qs_item? 
According to code only item.index is used after this call.
> +{
> +	void *obj_token = NULL;
> +	void *obj_index = NULL;
> +
> +	(void)rte_ring_sc_dequeue(fifo, &obj_token);
I think it is not necessary to cast here.
> +	(void)rte_ring_sc_dequeue(fifo, &obj_index);
> +
> +	if (item) {
I think it is redundant, it is always not NULL.
> +		item->token = (uint64_t)((uintptr_t)obj_token);
> +		item->index = (uint32_t)((uintptr_t)obj_index);
> +	}
> +}
> +
> +/* Max number of tbl8 groups to reclaim at one time. */
> +#define RCU_QSBR_RECLAIM_SIZE	8
> +
> +/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
> + * reclaim will be triggered by tbl8_free.
> + */
> +#define RCU_QSBR_RECLAIM_LEVEL	3
> +
> +/* Reclaim some tbl8 groups based on quiescent state check.
> + * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
> + * Params: lpm   - lpm object handle
> + *         index - (onput) one of successfully reclaimed tbl8 groups
> + * return: 0 - success, 1 - no group reclaimed.
> + */
> +static uint32_t
> +__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
> +{
> +	struct __rte_lpm_qs_item qs_item;
> +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
It is not necessary to init it with NULL.
> +	void *obj_token;
> +	uint32_t cnt = 0;
> +
> +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n");
> +	/* Check reader threads quiescent state and
> +	 * reclaim as much tbl8 groups as possible.
> +	 */
> +	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
> +		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
> +		(rte_rcu_qsbr_check(lpm->qsv, (uint64_t)((uintptr_t)obj_token),
> +					false) == 1)) {
> +		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
> +
> +		tbl8_entry = &lpm->tbl8[qs_item.index *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +		memset(&tbl8_entry[0], 0,
> +				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> +				sizeof(tbl8_entry[0]));
> +		cnt++;
> +	}
> +
> +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt);
> +	if (cnt) {
> +		if (index)
> +			*index = qs_item.index;
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +/* Trigger tbl8 group reclaim when necessary.
> + * Reclaim happens when RCU QSBR queue usage
> + * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL).
> + */
> +static void
> +__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm)
> +{
> +	if (lpm->qsv == NULL)
> +		return;
This check is redundant.
> +
> +	if (rte_ring_count(lpm->qs_fifo) <
> +		(rte_ring_get_capacity(lpm->qs_fifo) >> RCU_QSBR_RECLAIM_LEVEL))
> +		return;
> +
> +	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
> +{
> +	uint32_t qs_fifo_size;
> +	char rcu_ring_name[RTE_RING_NAMESIZE];
> +
> +	if ((lpm == NULL) || (v == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->qsv) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * number_tbl8s. Will store 'token' and 'index'.
> +	 */
> +	qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1);
> +
> +	/* Init QSBR reclaiming FIFO. */
> +	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm->name);
> +	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);
> +	if (lpm->qs_fifo == NULL) {
> +		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation failed\n");
> +		rte_errno = ENOMEM;
rte_ring_create() sets rte_errno on error, I don't think we need to 
rewrite it here.
> +		return 1;
> +	}
> +	lpm->qsv = v;
> +
> +	return 0;
> +}
> +
>   /*
>    * Adds a rule to the rule table.
>    *
> @@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
>   	return -EINVAL;
>   }
>   
> +static int32_t
> +tbl8_alloc_reclaimed(struct rte_lpm *lpm)
> +{
> +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> +	uint32_t index;
> +
> +	if (lpm->qsv != NULL) {
> +		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
> +			/* Set the last reclaimed tbl8 group as VALID. */
> +			struct rte_lpm_tbl_entry new_tbl8_entry = {
> +				.next_hop = 0,
> +				.valid = INVALID,
> +				.depth = 0,
> +				.valid_group = VALID,
> +			};
> +
> +			tbl8_entry = &lpm->tbl8[index *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +			__atomic_store(tbl8_entry, &new_tbl8_entry,
> +					__ATOMIC_RELAXED);
> +
> +			/* Return group index for reclaimed tbl8 group. */
> +			return index;
> +		}
> +	}
> +
> +	return -ENOSPC;
> +}
> +
>   /*
>    * Find, clean and allocate a tbl8.
>    */
> @@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
>   }
>   
>   static int32_t
> -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +tbl8_alloc_v1604(struct rte_lpm *lpm)
>   {
>   	uint32_t group_idx; /* tbl8 group index. */
>   	struct rte_lpm_tbl_entry *tbl8_entry;
>   
>   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>   		/* If a free tbl8 group is found clean it and set as VALID. */
>   		if (!tbl8_entry->valid_group) {
>   			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -708,8 +887,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>   		}
>   	}
>   
> -	/* If there are no tbl8 groups free then return error. */
> -	return -ENOSPC;
> +	/* If there are no tbl8 groups free then check reclaim queue. */
> +	return tbl8_alloc_reclaimed(lpm);
>   }
>   
>   static void
> @@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>   }
>   
>   static void
> -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>   {
> -	/* Set tbl8 group invalid*/
> +	struct __rte_lpm_qs_item qs_item;
>   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
>   
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (lpm->qsv != NULL) {
> +		/* Push into QSBR FIFO. */
> +		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
> +		qs_item.index =
> +			tbl8_group_start / RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> +		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0)
> +			/* This should never happen as FIFO size is big enough
> +			 * to hold all tbl8 groups.
> +			 */
> +			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
> +
> +		/* Speculatively reclaim tbl8 groups.
> +		 * Help spread the reclaim work load across multiple calls.
> +		 */
> +		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
> +	} else {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	}
>   }
>   
>   static __rte_noinline int32_t
> @@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   
>   	if (!lpm->tbl24[tbl24_index].valid) {
>   		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
>   
>   		/* Check tbl8 allocation was successful. */
>   		if (tbl8_group_index < 0) {
> @@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   	} /* If valid entry but not extended calculate the index into Table8. */
>   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>   		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
>   
>   		if (tbl8_group_index < 0) {
>   			return tbl8_group_index;
> @@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   		 */
>   		lpm->tbl24[tbl24_index].valid = 0;
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>   	} else if (tbl8_recycle_index > -1) {
>   		/* Update tbl24 entry. */
>   		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -1834,7 +2031,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>   				__ATOMIC_RELAXED);
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>   	}
>   #undef group_idx
>   	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index 906ec4483..5079fb262 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>    */
>   
>   #ifndef _RTE_LPM_H_
> @@ -21,6 +22,7 @@
>   #include <rte_common.h>
>   #include <rte_vect.h>
>   #include <rte_compat.h>
> +#include <rte_rcu_qsbr.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -186,6 +188,8 @@ struct rte_lpm {
>   			__rte_cache_aligned; /**< LPM tbl24 table. */
>   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8 group.*/
> +	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
>   };
>   
>   /**
> @@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
>   void
>   rte_lpm_free_v1604(struct rte_lpm *lpm);
>   
> +/**
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param v
> + *   RCU QSBR variable
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
> +
>   /**
>    * Add a rule to the LPM table.
>    *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 90beac853..b353aabd2 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -44,3 +44,9 @@ DPDK_17.05 {
>   	rte_lpm6_lookup_bulk_func;
>   
>   } DPDK_16.04;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..3a96f005d 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,6 +11,7 @@
>   libraries = [
>   	'kvargs', # eal depends on kvargs
>   	'eal', # everything depends on eal
> +	'rcu', # hash and lpm depends on this
>   	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>   	'cmdline',
>   	'metrics', # bitrate/latency stats depends on this
> @@ -22,7 +23,7 @@ libraries = [
>   	'gro', 'gso', 'ip_frag', 'jobstats',
>   	'kni', 'latencystats', 'lpm', 'member',
>   	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>   	# ipsec lib depends on net, crypto and security
>   	'ipsec',
>   	# add pkt framework libs which use other libs from above

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time Ruifeng Wang
@ 2019-09-18 16:17     ` Medvedkin, Vladimir
  2019-09-19  6:22       ` Ruifeng Wang (Arm Technology China)
  0 siblings, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-09-18 16:17 UTC (permalink / raw)
  To: Ruifeng Wang, bruce.richardson, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, stable

Hi Ruifeng,

Thanks for this bug fix.

I think it should be sent separately from this RCU related patch series.

On 06/09/2019 10:45, Ruifeng Wang wrote:
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>
> total_time needs to be reset to measure the cycles for delete API.
>
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   app/test/test_lpm_perf.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
> index 77eea66ad..a2578fe90 100644
> --- a/app/test/test_lpm_perf.c
> +++ b/app/test/test_lpm_perf.c
> @@ -460,7 +460,7 @@ test_lpm_perf(void)
>   			(double)total_time / ((double)ITERATIONS * BATCH_SIZE),
>   			(count * 100.0) / (double)(ITERATIONS * BATCH_SIZE));
>   
> -	/* Delete */
> +	/* Measure Delete */
>   	status = 0;
>   	begin = rte_rdtsc();
>   
> @@ -470,7 +470,7 @@ test_lpm_perf(void)
>   				large_route_table[i].depth);
>   	}
>   
> -	total_time += rte_rdtsc() - begin;
> +	total_time = rte_rdtsc() - begin;
>   
>   	printf("Average LPM Delete: %g cycles\n",
>   			(double)total_time / NUM_ROUTE_ENTRIES);

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
  2019-09-18 16:15     ` Medvedkin, Vladimir
@ 2019-09-19  6:17       ` Ruifeng Wang (Arm Technology China)
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang (Arm Technology China) @ 2019-09-19  6:17 UTC (permalink / raw)
  To: Medvedkin, Vladimir, bruce.richardson, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Honnappa Nagarahalli, Dharmik Thakkar, nd, nd

Hi Vladimir,

Thanks for your review and  the comments.
All the comments will be addressed in next version.

/Ruifeng

> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Thursday, September 19, 2019 00:16
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> bruce.richardson@intel.com; olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
> 
> Hi Ruifeng,
> 
> Thanks for this patchseries, see comments below.
> 
> On 06/09/2019 10:45, Ruifeng Wang wrote:
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >   lib/librte_lpm/Makefile            |   3 +-
> >   lib/librte_lpm/meson.build         |   2 +
> >   lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++++++++--
> >   lib/librte_lpm/rte_lpm.h           |  22 +++
> >   lib/librte_lpm/rte_lpm_version.map |   6 +
> >   lib/meson.build                    |   3 +-
> >   6 files changed, 244 insertions(+), 15 deletions(-)
> >
> > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > a7946a1c5..ca9e16312 100644
> > --- a/lib/librte_lpm/Makefile
> > +++ b/lib/librte_lpm/Makefile
> > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
> >   # library name
> >   LIB = librte_lpm.a
> >
> > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> >   CFLAGS += -O3
> >   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal
> > -lrte_hash
> > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> >
> >   EXPORT_MAP := rte_lpm_version.map
> >
> > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > index a5176d8ae..19a35107f 100644
> > --- a/lib/librte_lpm/meson.build
> > +++ b/lib/librte_lpm/meson.build
> > @@ -2,9 +2,11 @@
> >   # Copyright(c) 2017 Intel Corporation
> >
> >   version = 2
> > +allow_experimental_apis = true
> >   sources = files('rte_lpm.c', 'rte_lpm6.c')
> >   headers = files('rte_lpm.h', 'rte_lpm6.h')
> >   # since header files have different names, we can install all vector headers
> >   # without worrying about which architecture we actually need
> >   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> >   deps += ['hash']
> > +deps += ['rcu']
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 3a929a1b1..9764b8de6 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >    */
> >
> >   #include <string.h>
> > @@ -22,6 +23,7 @@
> >   #include <rte_rwlock.h>
> >   #include <rte_spinlock.h>
> >   #include <rte_tailq.h>
> > +#include <rte_ring.h>
> >
> >   #include "rte_lpm.h"
> >
> > @@ -39,6 +41,11 @@ enum valid_flag {
> >   	VALID
> >   };
> >
> > +struct __rte_lpm_qs_item {
> > +	uint64_t token;	/**< QSBR token.*/
> > +	uint32_t index;	/**< tbl8 group index.*/
> > +};
> > +
> >   /* Macro to enable/disable run-time checks. */
> >   #if defined(RTE_LIBRTE_LPM_DEBUG)
> >   #include <rte_debug.h>
> > @@ -381,6 +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> >
> >   	rte_mcfg_tailq_write_unlock();
> >
> > +	rte_ring_free(lpm->qs_fifo);
> >   	rte_free(lpm->tbl8);
> >   	rte_free(lpm->rules_tbl);
> >   	rte_free(lpm);
> > @@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> 16.04);
> >   MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> >   		rte_lpm_free_v1604);
> >
> > +/* Add an item into FIFO.
> > + * return: 0 - success
> > + */
> > +static int
> > +__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
> > +	struct __rte_lpm_qs_item *item)
> > +{
> > +	if (rte_ring_free_count(fifo) < 2) {
> > +		RTE_LOG(ERR, LPM, "QS FIFO full\n");
> > +		rte_errno = ENOSPC;
> > +		return 1;
> > +	}
> > +
> > +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token);
> > +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Remove item from FIFO.
> > + * Used when data observed by rte_ring_peek.
> > + */
> > +static void
> > +__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
> > +	struct __rte_lpm_qs_item *item)
> Is it necessary to pass the pointer for struct __rte_lpm_qs_item?
> According to code only item.index is used after this call.
> > +{
> > +	void *obj_token = NULL;
> > +	void *obj_index = NULL;
> > +
> > +	(void)rte_ring_sc_dequeue(fifo, &obj_token);
> I think it is not necessary to cast here.
> > +	(void)rte_ring_sc_dequeue(fifo, &obj_index);
> > +
> > +	if (item) {
> I think it is redundant, it is always not NULL.
> > +		item->token = (uint64_t)((uintptr_t)obj_token);
> > +		item->index = (uint32_t)((uintptr_t)obj_index);
> > +	}
> > +}
> > +
> > +/* Max number of tbl8 groups to reclaim at one time. */
> > +#define RCU_QSBR_RECLAIM_SIZE	8
> > +
> > +/* When RCU QSBR FIFO usage is above
> 1/(2^RCU_QSBR_RECLAIM_LEVEL),
> > + * reclaim will be triggered by tbl8_free.
> > + */
> > +#define RCU_QSBR_RECLAIM_LEVEL	3
> > +
> > +/* Reclaim some tbl8 groups based on quiescent state check.
> > + * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
> > + * Params: lpm   - lpm object handle
> > + *         index - (onput) one of successfully reclaimed tbl8 groups
> > + * return: 0 - success, 1 - no group reclaimed.
> > + */
> > +static uint32_t
> > +__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t
> > +*index) {
> > +	struct __rte_lpm_qs_item qs_item;
> > +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> It is not necessary to init it with NULL.
> > +	void *obj_token;
> > +	uint32_t cnt = 0;
> > +
> > +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n");
> > +	/* Check reader threads quiescent state and
> > +	 * reclaim as much tbl8 groups as possible.
> > +	 */
> > +	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
> > +		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
> > +		(rte_rcu_qsbr_check(lpm->qsv,
> (uint64_t)((uintptr_t)obj_token),
> > +					false) == 1)) {
> > +		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
> > +
> > +		tbl8_entry = &lpm->tbl8[qs_item.index *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +		memset(&tbl8_entry[0], 0,
> > +				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> > +				sizeof(tbl8_entry[0]));
> > +		cnt++;
> > +	}
> > +
> > +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt);
> > +	if (cnt) {
> > +		if (index)
> > +			*index = qs_item.index;
> > +		return 0;
> > +	}
> > +	return 1;
> > +}
> > +
> > +/* Trigger tbl8 group reclaim when necessary.
> > + * Reclaim happens when RCU QSBR queue usage
> > + * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL).
> > + */
> > +static void
> > +__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm) {
> > +	if (lpm->qsv == NULL)
> > +		return;
> This check is redundant.
> > +
> > +	if (rte_ring_count(lpm->qs_fifo) <
> > +		(rte_ring_get_capacity(lpm->qs_fifo) >>
> RCU_QSBR_RECLAIM_LEVEL))
> > +		return;
> > +
> > +	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL); }
> > +
> > +/* Associate QSBR variable with an LPM object.
> > + */
> > +int
> > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > +	uint32_t qs_fifo_size;
> > +	char rcu_ring_name[RTE_RING_NAMESIZE];
> > +
> > +	if ((lpm == NULL) || (v == NULL)) {
> > +		rte_errno = EINVAL;
> > +		return 1;
> > +	}
> > +
> > +	if (lpm->qsv) {
> > +		rte_errno = EEXIST;
> > +		return 1;
> > +	}
> > +
> > +	/* round up qs_fifo_size to next power of two that is not less than
> > +	 * number_tbl8s. Will store 'token' and 'index'.
> > +	 */
> > +	qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1);
> > +
> > +	/* Init QSBR reclaiming FIFO. */
> > +	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s",
> lpm->name);
> > +	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
> > +					SOCKET_ID_ANY, 0);
> > +	if (lpm->qs_fifo == NULL) {
> > +		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation
> failed\n");
> > +		rte_errno = ENOMEM;
> rte_ring_create() sets rte_errno on error, I don't think we need to rewrite it
> here.
> > +		return 1;
> > +	}
> > +	lpm->qsv = v;
> > +
> > +	return 0;
> > +}
> > +
> >   /*
> >    * Adds a rule to the rule table.
> >    *
> > @@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth)
> >   	return -EINVAL;
> >   }
> >
> > +static int32_t
> > +tbl8_alloc_reclaimed(struct rte_lpm *lpm) {
> > +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> > +	uint32_t index;
> > +
> > +	if (lpm->qsv != NULL) {
> > +		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
> > +			/* Set the last reclaimed tbl8 group as VALID. */
> > +			struct rte_lpm_tbl_entry new_tbl8_entry = {
> > +				.next_hop = 0,
> > +				.valid = INVALID,
> > +				.depth = 0,
> > +				.valid_group = VALID,
> > +			};
> > +
> > +			tbl8_entry = &lpm->tbl8[index *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +			__atomic_store(tbl8_entry, &new_tbl8_entry,
> > +					__ATOMIC_RELAXED);
> > +
> > +			/* Return group index for reclaimed tbl8 group. */
> > +			return index;
> > +		}
> > +	}
> > +
> > +	return -ENOSPC;
> > +}
> > +
> >   /*
> >    * Find, clean and allocate a tbl8.
> >    */
> > @@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> *tbl8)
> >   }
> >
> >   static int32_t
> > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > number_tbl8s)
> > +tbl8_alloc_v1604(struct rte_lpm *lpm)
> >   {
> >   	uint32_t group_idx; /* tbl8 group index. */
> >   	struct rte_lpm_tbl_entry *tbl8_entry;
> >
> >   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > +		tbl8_entry = &lpm->tbl8[group_idx *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> >   		/* If a free tbl8 group is found clean it and set as VALID. */
> >   		if (!tbl8_entry->valid_group) {
> >   			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 708,8 +887,8 @@
> > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> >   		}
> >   	}
> >
> > -	/* If there are no tbl8 groups free then return error. */
> > -	return -ENOSPC;
> > +	/* If there are no tbl8 groups free then check reclaim queue. */
> > +	return tbl8_alloc_reclaimed(lpm);
> >   }
> >
> >   static void
> > @@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20
> *tbl8, uint32_t tbl8_group_start)
> >   }
> >
> >   static void
> > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > tbl8_group_start)
> > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> >   {
> > -	/* Set tbl8 group invalid*/
> > +	struct __rte_lpm_qs_item qs_item;
> >   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> >
> > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > -			__ATOMIC_RELAXED);
> > +	if (lpm->qsv != NULL) {
> > +		/* Push into QSBR FIFO. */
> > +		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
> > +		qs_item.index =
> > +			tbl8_group_start /
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> > +		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo,
> &qs_item) != 0)
> > +			/* This should never happen as FIFO size is big
> enough
> > +			 * to hold all tbl8 groups.
> > +			 */
> > +			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
> > +
> > +		/* Speculatively reclaim tbl8 groups.
> > +		 * Help spread the reclaim work load across multiple calls.
> > +		 */
> > +		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
> > +	} else {
> > +		/* Set tbl8 group invalid*/
> > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> > +				__ATOMIC_RELAXED);
> > +	}
> >   }
> >
> >   static __rte_noinline int32_t
> > @@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked, uint8_t depth,
> >
> >   	if (!lpm->tbl24[tbl24_index].valid) {
> >   		/* Search for a free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >   		/* Check tbl8 allocation was successful. */
> >   		if (tbl8_group_index < 0) {
> > @@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> >   	} /* If valid entry but not extended calculate the index into Table8. */
> >   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> >   		/* Search for free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >   		if (tbl8_group_index < 0) {
> >   			return tbl8_group_index;
> > @@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
> >   		 */
> >   		lpm->tbl24[tbl24_index].valid = 0;
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >   	} else if (tbl8_recycle_index > -1) {
> >   		/* Update tbl24 entry. */
> >   		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> +2031,7 @@
> > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> >   		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> >   				__ATOMIC_RELAXED);
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >   	}
> >   #undef group_idx
> >   	return 0;
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > 906ec4483..5079fb262 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >    */
> >
> >   #ifndef _RTE_LPM_H_
> > @@ -21,6 +22,7 @@
> >   #include <rte_common.h>
> >   #include <rte_vect.h>
> >   #include <rte_compat.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >   #ifdef __cplusplus
> >   extern "C" {
> > @@ -186,6 +188,8 @@ struct rte_lpm {
> >   			__rte_cache_aligned; /**< LPM tbl24 table. */
> >   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8
> group.*/
> > +	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
> >   };
> >
> >   /**
> > @@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> >   void
> >   rte_lpm_free_v1604(struct rte_lpm *lpm);
> >
> > +/**
> > + * Associate RCU QSBR variable with an LPM object.
> > + *
> > + * @param lpm
> > + *   the lpm object to add RCU QSBR
> > + * @param v
> > + *   RCU QSBR variable
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with error code set in rte_errno.
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - invalid pointer
> > + *   - EEXIST - already added QSBR
> > + *   - ENOMEM - memory allocation failure
> > + */
> > +__rte_experimental
> > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > +*v);
> > +
> >   /**
> >    * Add a rule to the LPM table.
> >    *
> > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > b/lib/librte_lpm/rte_lpm_version.map
> > index 90beac853..b353aabd2 100644
> > --- a/lib/librte_lpm/rte_lpm_version.map
> > +++ b/lib/librte_lpm/rte_lpm_version.map
> > @@ -44,3 +44,9 @@ DPDK_17.05 {
> >   	rte_lpm6_lookup_bulk_func;
> >
> >   } DPDK_16.04;
> > +
> > +EXPERIMENTAL {
> > +	global:
> > +
> > +	rte_lpm_rcu_qsbr_add;
> > +};
> > diff --git a/lib/meson.build b/lib/meson.build index
> > e5ff83893..3a96f005d 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,6 +11,7 @@
> >   libraries = [
> >   	'kvargs', # eal depends on kvargs
> >   	'eal', # everything depends on eal
> > +	'rcu', # hash and lpm depends on this
> >   	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >   	'cmdline',
> >   	'metrics', # bitrate/latency stats depends on this @@ -22,7 +23,7
> > @@ libraries = [
> >   	'gro', 'gso', 'ip_frag', 'jobstats',
> >   	'kni', 'latencystats', 'lpm', 'member',
> >   	'power', 'pdump', 'rawdev',
> > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >   	# ipsec lib depends on net, crypto and security
> >   	'ipsec',
> >   	# add pkt framework libs which use other libs from above
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time
  2019-09-18 16:17     ` Medvedkin, Vladimir
@ 2019-09-19  6:22       ` Ruifeng Wang (Arm Technology China)
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang (Arm Technology China) @ 2019-09-19  6:22 UTC (permalink / raw)
  To: Medvedkin, Vladimir, bruce.richardson, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Honnappa Nagarahalli, Dharmik Thakkar, nd, stable, nd

Hi Vladimir,

> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Thursday, September 19, 2019 00:18
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> bruce.richardson@intel.com; olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; stable@dpdk.org
> Subject: Re: [PATCH v2 5/6] test/lpm: reset total time
> 
> Hi Ruifeng,
> 
> Thanks for this bug fix.
> 
> I think it should be sent separately from this RCU related patch series.
Agree. It will be sent out separately.
> 
> On 06/09/2019 10:45, Ruifeng Wang wrote:
> > From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >
> > total_time needs to be reset to measure the cycles for delete API.
> >
> > Fixes: af75078fece3 ("first public release")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >   app/test/test_lpm_perf.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c index
> > 77eea66ad..a2578fe90 100644
> > --- a/app/test/test_lpm_perf.c
> > +++ b/app/test/test_lpm_perf.c
> > @@ -460,7 +460,7 @@ test_lpm_perf(void)
> >   			(double)total_time / ((double)ITERATIONS *
> BATCH_SIZE),
> >   			(count * 100.0) / (double)(ITERATIONS *
> BATCH_SIZE));
> >
> > -	/* Delete */
> > +	/* Measure Delete */
> >   	status = 0;
> >   	begin = rte_rdtsc();
> >
> > @@ -470,7 +470,7 @@ test_lpm_perf(void)
> >   				large_route_table[i].depth);
> >   	}
> >
> > -	total_time += rte_rdtsc() - begin;
> > +	total_time = rte_rdtsc() - begin;
> >
> >   	printf("Average LPM Delete: %g cycles\n",
> >   			(double)total_time / NUM_ROUTE_ENTRIES);
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (5 preceding siblings ...)
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests Ruifeng Wang
@ 2019-10-01  6:29   ` Honnappa Nagarahalli
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
                       ` (5 more replies)
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
                     ` (7 subsequent siblings)
  14 siblings, 6 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01  6:29 UTC (permalink / raw)
  To: honnappa.nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, dev, nd

This is not a new patch. This patch set is separated from the LPM
changes as the size of the changes in RCU library has grown due
to comments from community. These APIs will help reduce the changes
in LPM and hash libraries that are getting integrated with RCU
library.

This adds 4 new APIs to RCU library to create a defer queue, enqueue
deleted resources, reclaim resources and delete the defer queue.

The rationale for the APIs is documented in 3/3.

The patches to LPM and HASH libraries to integrate RCU will depend on
this patch.

v3
1) Separated from the original series (https://patches.dpdk.org/cover/58811/)
2) Added reclamation APIs and test cases (Stephen, Yipeng)

Honnappa Nagarahalli (1):
  lib/rcu: add resource reclamation APIs

Ruifeng Wang (2):
  lib/ring: add peek API
  doc/rcu: add RCU integration design details

 app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/rcu_lib.rst  |  59 ++++++
 lib/librte_rcu/meson.build         |   2 +
 lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
 lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
 lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
 lib/librte_rcu/rte_rcu_version.map |   4 +
 lib/librte_ring/rte_ring.h         |  30 +++
 lib/meson.build                    |   6 +-
 9 files changed, 789 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
@ 2019-10-01  6:29     ` Honnappa Nagarahalli
  2019-10-02 18:42       ` Ananyev, Konstantin
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01  6:29 UTC (permalink / raw)
  To: honnappa.nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, dev, nd

From: Ruifeng Wang <ruifeng.wang@arm.com>

The peek API allows fetching the next available object in the ring
without dequeuing it. This helps in scenarios where dequeuing of
objects depend on their value.

Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..d3d0d5e18 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 				r->cons.single, available);
 }
 
+/**
+ * Peek one object from a ring.
+ *
+ * The peek API allows fetching the next available object in the ring
+ * without dequeuing it. This API is not multi-thread safe with respect
+ * to other consumer threads.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, object available
+ *   - -ENOENT: Not enough entries in the ring.
+ */
+__rte_experimental
+static __rte_always_inline int
+rte_ring_peek(struct rte_ring *r, void **obj_p)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_head = r->cons.head;
+	uint32_t count = (prod_tail - cons_head) & r->mask;
+	unsigned int n = 1;
+	if (count) {
+		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
+		return 0;
+	}
+	return -ENOENT;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
@ 2019-10-01  6:29     ` Honnappa Nagarahalli
  2019-10-02 17:39       ` Ananyev, Konstantin
                         ` (3 more replies)
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 3/3] doc/rcu: add RCU integration design details Honnappa Nagarahalli
                       ` (3 subsequent siblings)
  5 siblings, 4 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01  6:29 UTC (permalink / raw)
  To: honnappa.nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, dev, nd

Add resource reclamation APIs to make it simple for applications
and libraries to integrate rte_rcu library.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
 lib/librte_rcu/meson.build         |   2 +
 lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
 lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
 lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
 lib/librte_rcu/rte_rcu_version.map |   4 +
 lib/meson.build                    |   6 +-
 7 files changed, 700 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h

diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
index d1b9e46a2..3a6815243 100644
--- a/app/test/test_rcu_qsbr.c
+++ b/app/test/test_rcu_qsbr.c
@@ -1,8 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright (c) 2018 Arm Limited
+ * Copyright (c) 2019 Arm Limited
  */
 
 #include <stdio.h>
+#include <string.h>
 #include <rte_pause.h>
 #include <rte_rcu_qsbr.h>
 #include <rte_hash.h>
@@ -33,6 +34,7 @@ static uint32_t *keys;
 #define COUNTER_VALUE 4096
 static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
 static uint8_t writer_done;
+static uint8_t cb_failed;
 
 static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
 struct rte_hash *h[RTE_MAX_LCORE];
@@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
 	return 0;
 }
 
+static void
+rte_rcu_qsbr_test_free_resource(void *p, void *e)
+{
+	if (p != NULL && e != NULL) {
+		printf("%s: Test failed\n", __func__);
+		cb_failed = 1;
+	}
+}
+
+/*
+ * rte_rcu_qsbr_dq_create: create a queue used to store the data structure
+ * elements that can be freed later. This queue is referred to as 'defer queue'.
+ */
+static int
+test_rcu_qsbr_dq_create(void)
+{
+	char rcu_dq_name[RTE_RING_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_create()\n");
+
+	/* Pass invalid parameters */
+	dq = rte_rcu_qsbr_dq_create(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.f = rte_rcu_qsbr_test_free_resource;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.size = 1;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.esize = 3;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	/* Pass all valid parameters */
+	params.esize = 16;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	rte_rcu_qsbr_dq_delete(dq);
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
+ * to be freed later after atleast one grace period is over.
+ */
+static int
+test_rcu_qsbr_dq_enqueue(void)
+{
+	int ret;
+	uint64_t r;
+	char rcu_dq_name[RTE_RING_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
+
+	/* Create a queue with simple parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.f = rte_rcu_qsbr_test_free_resource;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 16;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
+ */
+static int
+test_rcu_qsbr_dq_reclaim(void)
+{
+	int ret;
+
+	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_reclaim(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_delete: Delete a defer queue.
+ */
+static int
+test_rcu_qsbr_dq_delete(void)
+{
+	int ret;
+	char rcu_dq_name[RTE_RING_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_delete(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid params");
+
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.f = rte_rcu_qsbr_test_free_resource;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 16;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
+ * to be freed later after atleast one grace period is over.
+ */
+static int
+test_rcu_qsbr_dq_functional(int32_t size, int32_t esize)
+{
+	int i, j, ret;
+	char rcu_dq_name[RTE_RING_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+	uint64_t *e;
+	uint64_t sc = 200;
+	int max_entries;
+
+	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
+	printf("Size = %d, esize = %d\n", size, esize);
+
+	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
+	if (e == NULL)
+		return 0;
+	cb_failed = 0;
+
+	/* Initialize the RCU variable. No threads are registered */
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+
+	/* Create a queue with simple parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.f = rte_rcu_qsbr_test_free_resource;
+	params.v = t[0];
+	params.size = size;
+	params.esize = esize;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+
+	/* Given the size and esize, calculate the maximum number of entries
+	 * that can be stored on the defer queue (look at the logic used
+	 * in capacity calculation of rte_ring).
+	 */
+	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
+	max_entries = (max_entries - 1)/(esize/8 + 1);
+
+	/* Enqueue few counters starting with the value 'sc' */
+	/* The queue size will be rounded up to 2. The enqueue API also
+	 * reclaims if the queue size is above certain limit. Since, there
+	 * are no threads registered, reclamation succedes. Hence, it should
+	 * be possible to enqueue more than the provided queue size.
+	 */
+	for (i = 0; i < 10; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
+			"dq enqueue functional");
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Register a thread on the RCU QSBR variable. Reclamation will not
+	 * succeed. It should not be possible to enqueue more than the size
+	 * number of resources.
+	 */
+	rte_rcu_qsbr_thread_register(t[0], 1);
+	rte_rcu_qsbr_thread_online(t[0], 1);
+
+	for (i = 0; i < max_entries; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
+			"dq enqueue functional");
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Enqueue fails as queue is full */
+	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
+
+	/* Delete should fail as there are elements in defer queue which
+	 * cannot be reclaimed.
+	 */
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid params");
+
+	/* Report quiescent state, enqueue should succeed */
+	rte_rcu_qsbr_quiescent(t[0], 1);
+	for (i = 0; i < max_entries; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
+			"dq enqueue functional");
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Queue is full */
+	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
+
+	/* Report quiescent state, delete should succeed */
+	rte_rcu_qsbr_quiescent(t[0], 1);
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+
+	/* Validate that call back function did not return any error */
+	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
+
+	rte_free(e);
+	return 0;
+}
+
 /*
  * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
  */
@@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
 	if (test_rcu_qsbr_thread_offline() < 0)
 		goto test_fail;
 
+	if (test_rcu_qsbr_dq_create() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_reclaim() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_delete() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_enqueue() < 0)
+		goto test_fail;
+
 	printf("\nFunctional tests\n");
 
 	if (test_rcu_qsbr_sw_sv_3qs() < 0)
@@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
 	if (test_rcu_qsbr_mw_mv_mqs() < 0)
 		goto test_fail;
 
+	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
+		goto test_fail;
+
 	free_rcu();
 
 	printf("\n");
diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
index 62920ba02..e280b29c1 100644
--- a/lib/librte_rcu/meson.build
+++ b/lib/librte_rcu/meson.build
@@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
 if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
 	ext_deps += cc.find_library('atomic')
 endif
+
+deps += ['ring']
diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
index ce7f93dd3..76814f50b 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.c
+++ b/lib/librte_rcu/rte_rcu_qsbr.c
@@ -21,6 +21,7 @@
 #include <rte_errno.h>
 
 #include "rte_rcu_qsbr.h"
+#include "rte_rcu_qsbr_pvt.h"
 
 /* Get the memory size of QSBR variable */
 size_t
@@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
 	return 0;
 }
 
+/* Create a queue used to store the data structure elements that can
+ * be freed later. This queue is referred to as 'defer queue'.
+ */
+struct rte_rcu_qsbr_dq *
+rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
+{
+	struct rte_rcu_qsbr_dq *dq;
+	uint32_t qs_fifo_size;
+
+	if (params == NULL || params->f == NULL ||
+		params->v == NULL || params->name == NULL ||
+		params->size == 0 || params->esize == 0 ||
+		(params->esize % 8 != 0)) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return NULL;
+	}
+
+	dq = rte_zmalloc(NULL,
+		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
+		RTE_CACHE_LINE_SIZE);
+	if (dq == NULL) {
+		rte_errno = ENOMEM;
+
+		return NULL;
+	}
+
+	/* round up qs_fifo_size to next power of two that is not less than
+	 * max_size.
+	 */
+	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
+					* params->size) + 1);
+	dq->r = rte_ring_create(params->name, qs_fifo_size,
+					SOCKET_ID_ANY, 0);
+	if (dq->r == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): defer queue create failed\n", __func__);
+		rte_free(dq);
+		return NULL;
+	}
+
+	dq->v = params->v;
+	dq->size = params->size;
+	dq->esize = params->esize;
+	dq->f = params->f;
+	dq->p = params->p;
+
+	return dq;
+}
+
+/* Enqueue one resource to the defer queue to free after the grace
+ * period is over.
+ */
+int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
+{
+	uint64_t token;
+	uint64_t *tmp;
+	uint32_t i;
+	uint32_t cur_size, free_size;
+
+	if (dq == NULL || e == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	/* Start the grace period */
+	token = rte_rcu_qsbr_start(dq->v);
+
+	/* Reclaim resources if the queue is 1/8th full. This helps
+	 * the queue from growing too large and allows time for reader
+	 * threads to report their quiescent state.
+	 */
+	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
+	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Triggering reclamation\n", __func__);
+		rte_rcu_qsbr_dq_reclaim(dq);
+	}
+
+	/* Check if there is space for atleast for 1 resource */
+	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
+	if (!free_size) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Defer queue is full\n", __func__);
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	/* Enqueue the resource */
+	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
+
+	/* The resource to enqueue needs to be a multiple of 64b
+	 * due to the limitation of the rte_ring implementation.
+	 */
+	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
+		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
+
+	return 0;
+}
+
+/* Reclaim resources from the defer queue. */
+int
+rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
+{
+	uint32_t max_cnt;
+	uint32_t cnt;
+	void *token;
+	uint64_t *tmp;
+	uint32_t i;
+
+	if (dq == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	/* Anything to reclaim? */
+	if (rte_ring_count(dq->r) == 0)
+		return 0;
+
+	/* Reclaim at the max 1/16th the total number of entries. */
+	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
+	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
+	cnt = 0;
+
+	/* Check reader threads quiescent state and reclaim resources */
+	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
+		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
+			== 1)) {
+		(void)rte_ring_sc_dequeue(dq->r, &token);
+		/* The resource to dequeue needs to be a multiple of 64b
+		 * due to the limitation of the rte_ring implementation.
+		 */
+		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
+			i++, tmp++)
+			(void)rte_ring_sc_dequeue(dq->r,
+					(void *)(uintptr_t)tmp);
+		dq->f(dq->p, dq->e);
+
+		cnt++;
+	}
+
+	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+		"%s(): Reclaimed %u resources\n", __func__, cnt);
+
+	if (cnt == 0) {
+		/* No resources were reclaimed */
+		rte_errno = EAGAIN;
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Delete a defer queue. */
+int
+rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
+{
+	if (dq == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	/* Reclaim all the resources */
+	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
+		/* Error number is already set by the reclaim API */
+		return 1;
+
+	rte_ring_free(dq->r);
+	rte_free(dq);
+
+	return 0;
+}
+
 int rte_rcu_log_type;
 
 RTE_INIT(rte_rcu_register)
diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index c80f15c00..185d4b50a 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -34,6 +34,7 @@ extern "C" {
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_atomic.h>
+#include <rte_ring.h>
 
 extern int rte_rcu_log_type;
 
@@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
 	 */
 } __rte_cache_aligned;
 
+/**
+ * Call back function called to free the resources.
+ *
+ * @param p
+ *   Pointer provided while creating the defer queue
+ * @param e
+ *   Pointer to the resource data stored on the defer queue
+ *
+ * @return
+ *   None
+ */
+typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
+
+#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
+
+/**
+ *  Trigger automatic reclamation after 1/8th the defer queue is full.
+ */
+#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
+
+/**
+ *  Reclaim at the max 1/16th the total number of resources.
+ */
+#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
+
+/**
+ * Parameters used when creating the defer queue.
+ */
+struct rte_rcu_qsbr_dq_parameters {
+	const char *name;
+	/**< Name of the queue. */
+	uint32_t size;
+	/**< Number of entries in queue. Typically, this will be
+	 *   the same as the maximum number of entries supported in the
+	 *   lock free data structure.
+	 *   Data structures with unbounded number of entries is not
+	 *   supported currently.
+	 */
+	uint32_t esize;
+	/**< Size (in bytes) of each element in the defer queue.
+	 *   This has to be multiple of 8B as the rte_ring APIs
+	 *   support 8B element sizes only.
+	 */
+	rte_rcu_qsbr_free_resource f;
+	/**< Function to call to free the resource. */
+	void *p;
+	/**< Pointer passed to the free function. Typically, this is the
+	 *   pointer to the data structure to which the resource to free
+	 *   belongs. This can be NULL.
+	 */
+	struct rte_rcu_qsbr *v;
+	/**< RCU QSBR variable to use for this defer queue */
+};
+
+/* RTE defer queue structure.
+ * This structure holds the defer queue. The defer queue is used to
+ * hold the deleted entries from the data structure that are not
+ * yet freed.
+ */
+struct rte_rcu_qsbr_dq;
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
@@ -648,6 +710,113 @@ __rte_experimental
 int
 rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a queue used to store the data structure elements that can
+ * be freed later. This queue is referred to as 'defer queue'.
+ *
+ * @param params
+ *   Parameters to create a defer queue.
+ * @return
+ *   On success - Valid pointer to defer queue
+ *   On error - NULL
+ *   Possible rte_errno codes are:
+ *   - EINVAL - NULL parameters are passed
+ *   - ENOMEM - Not enough memory
+ */
+__rte_experimental
+struct rte_rcu_qsbr_dq *
+rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enqueue one resource to the defer queue and start the grace period.
+ * The resource will be freed later after at least one grace period
+ * is over.
+ *
+ * If the defer queue is full, it will attempt to reclaim resources.
+ * It will also reclaim resources at regular intervals to avoid
+ * the defer queue from growing too big.
+ *
+ * This API is not multi-thread safe. It is expected that the caller
+ * provides multi-thread safety by locking a mutex or some other means.
+ *
+ * A lock free multi-thread writer algorithm could achieve multi-thread
+ * safety by creating and using one defer queue per thread.
+ *
+ * @param dq
+ *   Defer queue to allocate an entry from.
+ * @param e
+ *   Pointer to resource data to copy to the defer queue. The size of
+ *   the data to copy is equal to the element size provided when the
+ *   defer queue was created.
+ * @return
+ *   On success - 0
+ *   On error - 1 with rte_errno set to
+ *   - EINVAL - NULL parameters are passed
+ *   - ENOSPC - Defer queue is full. This condition can not happen
+ *		if the defer queue size is equal (or larger) than the
+ *		number of elements in the data structure.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Reclaim resources from the defer queue.
+ *
+ * This API is not multi-thread safe. It is expected that the caller
+ * provides multi-thread safety by locking a mutex or some other means.
+ *
+ * A lock free multi-thread writer algorithm could achieve multi-thread
+ * safety by creating and using one defer queue per thread.
+ *
+ * @param dq
+ *   Defer queue to reclaim an entry from.
+ * @return
+ *   On successful reclamation of at least 1 resource - 0
+ *   On error - 1 with rte_errno set to
+ *   - EINVAL - NULL parameters are passed
+ *   - EAGAIN - None of the resources have completed at least 1 grace period,
+ *		try again.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Delete a defer queue.
+ *
+ * It tries to reclaim all the resources on the defer queue.
+ * If any of the resources have not completed the grace period
+ * the reclamation stops and returns immediately. The rest of
+ * the resources are not reclaimed and the defer queue is not
+ * freed.
+ *
+ * @param dq
+ *   Defer queue to delete.
+ * @return
+ *   On success - 0
+ *   On error - 1
+ *   Possible rte_errno codes are:
+ *   - EINVAL - NULL parameters are passed
+ *   - EAGAIN - Some of the resources have not completed at least 1 grace
+ *		period, try again.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
new file mode 100644
index 000000000..2122bc36a
--- /dev/null
+++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_RCU_QSBR_PVT_H_
+#define _RTE_RCU_QSBR_PVT_H_
+
+/**
+ * This file is private to the RCU library. It should not be included
+ * by the user of this library.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_rcu_qsbr.h"
+
+/* RTE defer queue structure.
+ * This structure holds the defer queue. The defer queue is used to
+ * hold the deleted entries from the data structure that are not
+ * yet freed.
+ */
+struct rte_rcu_qsbr_dq {
+	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
+	struct rte_ring *r;     /**< RCU QSBR defer queue. */
+	uint32_t size;
+	/**< Number of elements in the defer queue */
+	uint32_t esize;
+	/**< Size (in bytes) of data stored on the defer queue */
+	rte_rcu_qsbr_free_resource f;
+	/**< Function to call to free the resource. */
+	void *p;
+	/**< Pointer passed to the free function. Typically, this is the
+	 *   pointer to the data structure to which the resource to free
+	 *   belongs.
+	 */
+	char e[0];
+	/**< Temporary storage to copy the defer queue element. */
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RCU_QSBR_PVT_H_ */
diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
index f8b9ef2ab..dfac88a37 100644
--- a/lib/librte_rcu/rte_rcu_version.map
+++ b/lib/librte_rcu/rte_rcu_version.map
@@ -8,6 +8,10 @@ EXPERIMENTAL {
 	rte_rcu_qsbr_synchronize;
 	rte_rcu_qsbr_thread_register;
 	rte_rcu_qsbr_thread_unregister;
+	rte_rcu_qsbr_dq_create;
+	rte_rcu_qsbr_dq_enqueue;
+	rte_rcu_qsbr_dq_reclaim;
+	rte_rcu_qsbr_dq_delete;
 
 	local: *;
 };
diff --git a/lib/meson.build b/lib/meson.build
index e5ff83893..0e1be8407 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,7 +11,9 @@
 libraries = [
 	'kvargs', # eal depends on kvargs
 	'eal', # everything depends on eal
-	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
+	'ring',
+	'rcu', # rcu depends on ring
+	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
 	'cmdline',
 	'metrics', # bitrate/latency stats depends on this
 	'hash',    # efd depends on this
@@ -22,7 +24,7 @@ libraries = [
 	'gro', 'gso', 'ip_frag', 'jobstats',
 	'kni', 'latencystats', 'lpm', 'member',
 	'power', 'pdump', 'rawdev',
-	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
+	'reorder', 'sched', 'security', 'stack', 'vhost',
 	# ipsec lib depends on net, crypto and security
 	'ipsec',
 	# add pkt framework libs which use other libs from above
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 3/3] doc/rcu: add RCU integration design details
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
@ 2019-10-01  6:29     ` Honnappa Nagarahalli
  2020-03-29 20:57     ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Thomas Monjalon
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01  6:29 UTC (permalink / raw)
  To: honnappa.nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, dev, nd

From: Ruifeng Wang <ruifeng.wang@arm.com>

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 59 +++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 8fe5b1f73..423ab283e 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -186,3 +186,62 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Resource reclamation framework for DPDK
+---------------------------------------
+
+Lock-free algorithms place additional burden of resource reclamation on
+the application. When a writer deletes an entry from a data structure, the writer:
+
+#. Has to start the grace period
+#. Has to store a reference to the deleted resources in a FIFO
+#. Should check if the readers have completed a grace period and free the resources. This can also be done when the writer runs out of free resources.
+
+There are several APIs provided to help with this process. The writer
+can create a FIFO to store the references to deleted resources using ``rte_rcu_qsbr_dq_create()``.
+The resources can be enqueued to this FIFO using ``rte_rcu_qsbr_dq_enqueue()``.
+If the FIFO is full, ``rte_rcu_qsbr_dq_enqueue`` will reclaim the resources before enqueuing. It will also reclaim resources on regular basis to keep the FIFO from growing too large. If the writer runs out of resources, the writer can call ``rte_rcu_qsbr_dq_reclaim`` API to reclaim resources. ``rte_rcu_qsbr_dq_delete`` is provided to reclaim any remaining resources and free the FIFO while shutting down.
+
+However, if this resource reclamation process were to be integrated in lock-free data structure libraries, it
+hides this complexity from the application and makes it easier for the application to adopt lock-free algorithms. The following paragraphs discuss how the reclamation process can be integrated in DPDK libraries.
+
+In any DPDK application, the resource reclamation process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to lock-free data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
+
+The application has to handle 'Initialization' and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader threads to report their quiescent state.
+* the application has to register the same RCU variable with the client library.
+* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
+
+The client library will handle 'Reclaiming Resources' part of the process. The
+client libraries will make use of the writer thread context to execute the memory
+reclamation algorithm. So,
+
+* client library should provide an API to register a RCU variable that it will use. It should call ``rte_rcu_qsbr_dq_create()`` to create the FIFO to store the references to deleted entries.
+* client library should use ``rte_rcu_qsbr_dq_enqueue`` to enqueue the deleted resources on the FIFO and start the grace period.
+* if the library runs out of resources while adding entries, it should call ``rte_rcu_qsbr_dq_reclaim`` to reclaim the resources and try the resource allocation again.
+
+The 'Shutdown' process needs to be shared between the application and the
+client library.
+
+* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function.
+
+* client library should call ``rte_rcu_qsbr_dq_delete`` to reclaim any remaining resources and free the FIFO.
+
+Integrating the resource reclamation with client libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclamation happens as part of the writer thread with little impact on
+   performance.
+#. The client library has better control over the resources. For ex: the client
+   library can attempt to reclaim when it has run out of resources.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (6 preceding siblings ...)
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
@ 2019-10-01 18:28   ` Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
                       ` (2 more replies)
  2020-06-08  5:16   ` [dpdk-dev] [PATCH v4 0/3] RCU integration with LPM library Ruifeng Wang
                     ` (6 subsequent siblings)
  14 siblings, 3 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01 18:28 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Honnappa.Nagarahalli, Dharmik.Thakkar, Ruifeng.Wang, nd,
	Honnappa Nagarahalli

This patch set is dependent on https://patches.dpdk.org/cover/60270/

This patchset integrates RCU QSBR support with LPM library.

Please refer to RCU documentation in the above mentioned patch series.
This patch set follows the suggested design of integrating RCU
library with other libraries in DPDK.

RCU is used to safely free tbl8 groups that can be recycled.
tbl8 groups will not be reclaimed or reused until readers stopped
referencing it.

This is implemented as an optional feature to ensure the existing
applications are not affected. New API rte_lpm_rcu_qsbr_add is
introduced for application to register a RCU variable that
LPM library will use. This provides user the handle to enable
this feature.

v3:
1) Integration with new RCU defer queue APIs (much smaller and simpler
   code in LPM library itself)
2) Separated the 'test/lpm: reset total time' patch from this series
3) Added multi-writer performance test. The performance difference
   between with and without RCU varies and is not small for
   multi-writer. However, this is due to the tbl8 group allocation
   algorithm in LPM, which is a linear search algorithm (given that
   the test case uses large number of tbl8 groups). We should look
   to change this algorithm to O(1) in the future.
4) Incorporated applicable feedback from Vladimir

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  app/test: add test case for LPM RCU integration

 app/test/test_lpm.c                | 152 ++++++++-
 app/test/test_lpm_perf.c           | 487 ++++++++++++++++++++++++++++-
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 102 +++++-
 lib/librte_lpm/rte_lpm.h           |  21 ++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 7 files changed, 757 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
@ 2019-10-01 18:28     ` Honnappa Nagarahalli
  2019-10-04 16:05       ` Medvedkin, Vladimir
  2019-10-07  9:21       ` Ananyev, Konstantin
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 2/3] app/test: add test case for LPM RCU integration Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests Honnappa Nagarahalli
  2 siblings, 2 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01 18:28 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Honnappa.Nagarahalli, Dharmik.Thakkar, Ruifeng.Wang, nd,
	Ruifeng Wang

From: Ruifeng Wang <ruifeng.wang@arm.com>

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  21 ++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 5 files changed, 122 insertions(+), 12 deletions(-)

diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index a7946a1c5..ca9e16312 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_lpm.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index a5176d8ae..19a35107f 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -2,9 +2,11 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
+allow_experimental_apis = true
 sources = files('rte_lpm.c', 'rte_lpm6.c')
 headers = files('rte_lpm.h', 'rte_lpm6.h')
 # since header files have different names, we can install all vector headers
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 3a929a1b1..ca58d4b35 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <string.h>
@@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	if (lpm->dq)
+		rte_rcu_qsbr_dq_delete(lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
@@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
 MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
 		rte_lpm_free_v1604);
 
+struct __rte_lpm_rcu_dq_entry {
+	uint32_t tbl8_group_index;
+	uint32_t pad;
+};
+
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	struct __rte_lpm_rcu_dq_entry *e =
+			(struct __rte_lpm_rcu_dq_entry *)data;
+	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
+
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
+{
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+
+	if ((lpm == NULL) || (v == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->dq) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	/* Init QSBR defer queue. */
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm->name);
+	params.name = rcu_dq_name;
+	params.size = lpm->number_tbl8s;
+	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
+	params.f = __lpm_rcu_qsbr_free_resource;
+	params.p = lpm->tbl8;
+	params.v = v;
+	lpm->dq = rte_rcu_qsbr_dq_create(&params);
+	if (lpm->dq == NULL) {
+		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
+		return 1;
+	}
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
 }
 
 static int32_t
-tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+__tbl8_alloc_v1604(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -712,6 +769,21 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc_v1604(struct rte_lpm *lpm)
+{
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = __tbl8_alloc_v1604(lpm);
+	if ((group_idx < 0) && (lpm->dq != NULL)) {
+		/* If there are no tbl8 groups try to reclaim some. */
+		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
+			group_idx = __tbl8_alloc_v1604(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
 tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
 {
@@ -728,13 +800,21 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
 }
 
 static void
-tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	struct __rte_lpm_rcu_dq_entry e;
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (lpm->dq != NULL) {
+		e.tbl8_group_index = tbl8_group_start;
+		e.pad = 0;
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
+	} else {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	}
 }
 
 static __rte_noinline int32_t
@@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -1834,7 +1914,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 906ec4483..49c12a68d 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -21,6 +22,7 @@
 #include <rte_common.h>
 #include <rte_vect.h>
 #include <rte_compat.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -186,6 +188,7 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
 };
 
 /**
@@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
 void
 rte_lpm_free_v1604(struct rte_lpm *lpm);
 
+/**
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param v
+ *   RCU QSBR variable
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 90beac853..b353aabd2 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -44,3 +44,9 @@ DPDK_17.05 {
 	rte_lpm6_lookup_bulk_func;
 
 } DPDK_16.04;
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 2/3] app/test: add test case for LPM RCU integration
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
@ 2019-10-01 18:28     ` Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests Honnappa Nagarahalli
  2 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01 18:28 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Honnappa.Nagarahalli, Dharmik.Thakkar, Ruifeng.Wang, nd,
	Ruifeng Wang

From: Ruifeng Wang <ruifeng.wang@arm.com>

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_lpm.c | 152 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 151 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index e969fe051..6882cae6a 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,8 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +64,9 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20
 };
 
 #define NUM_LPM_TESTS (sizeof(tests)/sizeof(tests[0]))
@@ -1266,6 +1271,151 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check LPM attached RCU QSBR variable and FIFO queue
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv2);
+	TEST_LPM_ASSERT(status != 0);
+
+	TEST_LPM_ASSERT(lpm->dq != NULL);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add functional test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 18, 100, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 2/3] app/test: add test case for LPM RCU integration Honnappa Nagarahalli
@ 2019-10-01 18:28     ` Honnappa Nagarahalli
  2019-10-02 13:02       ` Aaron Conole
  2 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01 18:28 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Honnappa.Nagarahalli, Dharmik.Thakkar, Ruifeng.Wang, nd,
	Honnappa Nagarahalli

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 487 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 484 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 77eea66ad..a9f02d983 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,28 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_rcu_qsbr.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static rte_atomic64_t gwrite_cycles;
+static rte_atomic64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 8192 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +41,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +53,13 @@ struct route_rule {
 };
 
 struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +213,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +258,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +305,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +349,454 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(__attribute__((unused)) void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(__attribute__((unused)) void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	rte_atomic64_add(&gwrite_cycles, total_cycles);
+	rte_atomic64_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	rte_atomic64_init(&gwrite_cycles);
+	rte_atomic64_init(&gwrites);
+	rte_atomic64_clear(&gwrite_cycles);
+	rte_atomic64_clear(&gwrites);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %lu cycles\n",
+		rte_atomic64_read(&gwrite_cycles) / rte_atomic64_read(&gwrites)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	rte_atomic64_init(&gwrite_cycles);
+	rte_atomic64_init(&gwrites);
+	rte_atomic64_clear(&gwrite_cycles);
+	rte_atomic64_clear(&gwrites);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %lu cycles\n",
+		rte_atomic64_read(&gwrite_cycles) / rte_atomic64_read(&gwrites)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +820,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +955,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests Honnappa Nagarahalli
@ 2019-10-02 13:02       ` Aaron Conole
  2019-10-03  9:09         ` Bruce Richardson
  0 siblings, 1 reply; 137+ messages in thread
From: Aaron Conole @ 2019-10-02 13:02 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: bruce.richardson, vladimir.medvedkin, olivier.matz, dev,
	konstantin.ananyev, stephen, paulmck, Gavin.Hu, Dharmik.Thakkar,
	Ruifeng.Wang, nd

Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> writes:

> Add performance tests for RCU integration. The performance
> difference with and without RCU integration is very small
> (~1% to ~2%) on both Arm and x86 platforms.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---

I see the following:

  lib/meson.build:89:5: ERROR: Problem encountered: Missing dependency rcu
  for library rte_lpm

Maybe there's something wrong with the environment?  This isn't the
first time I've seen a dependency detection problem with meson.

>  app/test/test_lpm_perf.c | 487 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 484 insertions(+), 3 deletions(-)
>
> diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
> index 77eea66ad..a9f02d983 100644
> --- a/app/test/test_lpm_perf.c
> +++ b/app/test/test_lpm_perf.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
>  
>  #include <stdio.h>
> @@ -10,12 +11,28 @@
>  #include <rte_cycles.h>
>  #include <rte_random.h>
>  #include <rte_branch_prediction.h>
> +#include <rte_malloc.h>
>  #include <rte_ip.h>
>  #include <rte_lpm.h>
> +#include <rte_rcu_qsbr.h>
>  
>  #include "test.h"
>  #include "test_xmmt_ops.h"
>  
> +struct rte_lpm *lpm;
> +static struct rte_rcu_qsbr *rv;
> +static volatile uint8_t writer_done;
> +static volatile uint32_t thr_id;
> +static rte_atomic64_t gwrite_cycles;
> +static rte_atomic64_t gwrites;
> +/* LPM APIs are not thread safe, use mutex to provide thread safety */
> +static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
> +
> +/* Report quiescent state interval every 8192 lookups. Larger critical
> + * sections in reader will result in writer polling multiple times.
> + */
> +#define QSBR_REPORTING_INTERVAL 1024
> +
>  #define TEST_LPM_ASSERT(cond) do {                                            \
>  	if (!(cond)) {                                                        \
>  		printf("Error at line %d: \n", __LINE__);                     \
> @@ -24,6 +41,7 @@
>  } while(0)
>  
>  #define ITERATIONS (1 << 10)
> +#define RCU_ITERATIONS 10
>  #define BATCH_SIZE (1 << 12)
>  #define BULK_SIZE 32
>  
> @@ -35,9 +53,13 @@ struct route_rule {
>  };
>  
>  struct route_rule large_route_table[MAX_RULE_NUM];
> +/* Route table for routes with depth > 24 */
> +struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
>  
>  static uint32_t num_route_entries;
> +static uint32_t num_ldepth_route_entries;
>  #define NUM_ROUTE_ENTRIES num_route_entries
> +#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
>  
>  enum {
>  	IP_CLASS_A,
> @@ -191,7 +213,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
>  	uint32_t ip_head_mask;
>  	uint32_t rule_num;
>  	uint32_t k;
> -	struct route_rule *ptr_rule;
> +	struct route_rule *ptr_rule, *ptr_ldepth_rule;
>  
>  	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
>  		fixed_bit_num = IP_HEAD_BIT_NUM_A;
> @@ -236,10 +258,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
>  	 */
>  	start = lrand48() & mask;
>  	ptr_rule = &large_route_table[num_route_entries];
> +	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
>  	for (k = 0; k < rule_num; k++) {
>  		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
>  			| ip_head_mask;
>  		ptr_rule->depth = depth;
> +		/* If the depth of the route is more than 24, store it
> +		 * in another table as well.
> +		 */
> +		if (depth > 24) {
> +			ptr_ldepth_rule->ip = ptr_rule->ip;
> +			ptr_ldepth_rule->depth = ptr_rule->depth;
> +			ptr_ldepth_rule++;
> +			num_ldepth_route_entries++;
> +		}
>  		ptr_rule++;
>  		start = (start + step) & mask;
>  	}
> @@ -273,6 +305,7 @@ static void generate_large_route_rule_table(void)
>  	uint8_t  depth;
>  
>  	num_route_entries = 0;
> +	num_ldepth_route_entries = 0;
>  	memset(large_route_table, 0, sizeof(large_route_table));
>  
>  	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
> @@ -316,10 +349,454 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
>  	printf("\n");
>  }
>  
> +/* Check condition and return an error if true. */
> +static uint16_t enabled_core_ids[RTE_MAX_LCORE];
> +static unsigned int num_cores;
> +
> +/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
> +static inline uint32_t
> +alloc_thread_id(void)
> +{
> +	uint32_t tmp_thr_id;
> +
> +	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
> +	if (tmp_thr_id >= RTE_MAX_LCORE)
> +		printf("Invalid thread id %u\n", tmp_thr_id);
> +
> +	return tmp_thr_id;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure without RCU.
> + */
> +static int
> +test_lpm_reader(__attribute__((unused)) void *arg)
> +{
> +	int i;
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +	} while (!writer_done);
> +
> +	return 0;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg)
> +{
> +	int i;
> +	uint32_t thread_id = alloc_thread_id();
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	/* Register this thread to report quiescent state */
> +	rte_rcu_qsbr_thread_register(rv, thread_id);
> +	rte_rcu_qsbr_thread_online(rv, thread_id);
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +		/* Update quiescent state */
> +		rte_rcu_qsbr_quiescent(rv, thread_id);
> +	} while (!writer_done);
> +
> +	rte_rcu_qsbr_thread_offline(rv, thread_id);
> +	rte_rcu_qsbr_thread_unregister(rv, thread_id);
> +
> +	return 0;
> +}
> +
> +/*
> + * Writer thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_writer(__attribute__((unused)) void *arg)
> +{
> +	unsigned int i, j, si, ei;
> +	uint64_t begin, total_cycles;
> +	uint8_t core_id = (uint8_t)((uintptr_t)arg);
> +	uint32_t next_hop_add = 0xAA;
> +
> +	/* 2 writer threads are used */
> +	if (core_id % 2 == 0) {
> +		si = 0;
> +		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
> +	} else {
> +		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
> +		ei = NUM_LDEPTH_ROUTE_ENTRIES;
> +	}
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = si; j < ei; j++) {
> +			pthread_mutex_lock(&lpm_mutex);
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +			}
> +			pthread_mutex_unlock(&lpm_mutex);
> +		}
> +
> +		/* Delete all the entries */
> +		for (j = si; j < ei; j++) {
> +			pthread_mutex_lock(&lpm_mutex);
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +			}
> +			pthread_mutex_unlock(&lpm_mutex);
> +		}
> +	}
> +
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	rte_atomic64_add(&gwrite_cycles, total_cycles);
> +	rte_atomic64_add(&gwrites,
> +			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS);
> +
> +	return 0;
> +}
> +
> +/*
> + * Functional test:
> + * 2 writers, rest are readers
> + */
> +static int
> +test_lpm_rcu_perf_multi_writer(void)
> +{
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	unsigned int i;
> +	uint16_t core_id;
> +
> +	if (rte_lcore_count() < 3) {
> +		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_cores = 0;
> +	RTE_LCORE_FOREACH_SLAVE(core_id) {
> +		enabled_core_ids[num_cores] = core_id;
> +		num_cores++;
> +	}
> +
> +	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
> +		num_cores - 2);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Init RCU variable */
> +	sz = rte_rcu_qsbr_get_memsize(num_cores);
> +	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
> +						RTE_CACHE_LINE_SIZE);
> +	rte_rcu_qsbr_init(rv, num_cores);
> +
> +	/* Assign the RCU variable to LPM */
> +	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
> +		printf("RCU variable assignment failed\n");
> +		goto error;
> +	}
> +
> +	writer_done = 0;
> +	rte_atomic64_init(&gwrite_cycles);
> +	rte_atomic64_init(&gwrites);
> +	rte_atomic64_clear(&gwrite_cycles);
> +	rte_atomic64_clear(&gwrites);
> +
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Launch writer threads */
> +	for (i = 0; i < 2; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
> +					(void *)(uintptr_t)i,
> +					enabled_core_ids[i]);
> +
> +	/* Wait for writer threads */
> +	for (i = 0; i < 2; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	printf("Total LPM Adds: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %lu cycles\n",
> +		rte_atomic64_read(&gwrite_cycles) / rte_atomic64_read(&gwrites)
> +		);
> +
> +	/* Wait and check return value from reader threads */
> +	writer_done = 1;
> +	for (i = 2; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +	lpm = NULL;
> +	rv = NULL;
> +
> +	/* Test without RCU integration */
> +	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
> +		num_cores - 2);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	writer_done = 0;
> +	rte_atomic64_init(&gwrite_cycles);
> +	rte_atomic64_init(&gwrites);
> +	rte_atomic64_clear(&gwrite_cycles);
> +	rte_atomic64_clear(&gwrites);
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Launch writer threads */
> +	for (i = 0; i < 2; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
> +					(void *)(uintptr_t)i,
> +					enabled_core_ids[i]);
> +
> +	/* Wait for writer threads */
> +	for (i = 0; i < 2; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	printf("Total LPM Adds: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %lu cycles\n",
> +		rte_atomic64_read(&gwrite_cycles) / rte_atomic64_read(&gwrites)
> +		);
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +
> +	return 0;
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until all readers have exited */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +
> +	return -1;
> +}
> +
> +/*
> + * Functional test:
> + * Single writer, rest are readers
> + */
> +static int
> +test_lpm_rcu_perf(void)
> +{
> +	struct rte_lpm_config config;
> +	uint64_t begin, total_cycles;
> +	size_t sz;
> +	unsigned int i, j;
> +	uint16_t core_id;
> +	uint32_t next_hop_add = 0xAA;
> +
> +	if (rte_lcore_count() < 2) {
> +		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_cores = 0;
> +	RTE_LCORE_FOREACH_SLAVE(core_id) {
> +		enabled_core_ids[num_cores] = core_id;
> +		num_cores++;
> +	}
> +
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Init RCU variable */
> +	sz = rte_rcu_qsbr_get_memsize(num_cores);
> +	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
> +						RTE_CACHE_LINE_SIZE);
> +	rte_rcu_qsbr_init(rv, num_cores);
> +
> +	/* Assign the RCU variable to LPM */
> +	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
> +		printf("RCU variable assignment failed\n");
> +		goto error;
> +	}
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +	lpm = NULL;
> +	rv = NULL;
> +
> +	/* Test without RCU integration */
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			printf("Warning: lcore %u not finished.\n",
> +				enabled_core_ids[i]);
> +
> +	rte_lpm_free(lpm);
> +
> +	return 0;
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until all readers have exited */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +
> +	return -1;
> +}
> +
>  static int
>  test_lpm_perf(void)
>  {
> -	struct rte_lpm *lpm = NULL;
>  	struct rte_lpm_config config;
>  
>  	config.max_rules = 2000000;
> @@ -343,7 +820,7 @@ test_lpm_perf(void)
>  	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
>  	TEST_LPM_ASSERT(lpm != NULL);
>  
> -	/* Measue add. */
> +	/* Measure add. */
>  	begin = rte_rdtsc();
>  
>  	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
> @@ -478,6 +955,10 @@ test_lpm_perf(void)
>  	rte_lpm_delete_all(lpm);
>  	rte_lpm_free(lpm);
>  
> +	test_lpm_rcu_perf();
> +
> +	test_lpm_rcu_perf_multi_writer();
> +
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
@ 2019-10-02 17:39       ` Ananyev, Konstantin
  2019-10-03  6:29         ` Honnappa Nagarahalli
  2019-10-02 18:50       ` Ananyev, Konstantin
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-02 17:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir, ruifeng.wang,
	dharmik.thakkar, dev, nd

Hi Honnappa,

 
> Add resource reclamation APIs to make it simple for applications
> and libraries to integrate rte_rcu library.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
>  lib/librte_rcu/meson.build         |   2 +
>  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
>  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
>  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
>  lib/librte_rcu/rte_rcu_version.map |   4 +
>  lib/meson.build                    |   6 +-
>  7 files changed, 700 insertions(+), 3 deletions(-)
>  create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> 
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
> index ce7f93dd3..76814f50b 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> @@ -21,6 +21,7 @@
>  #include <rte_errno.h>
> 
>  #include "rte_rcu_qsbr.h"
> +#include "rte_rcu_qsbr_pvt.h"
> 
>  /* Get the memory size of QSBR variable */
>  size_t
> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
>  	return 0;
>  }
> 
> +/* Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + */
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
> +{
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint32_t qs_fifo_size;
> +
> +	if (params == NULL || params->f == NULL ||
> +		params->v == NULL || params->name == NULL ||
> +		params->size == 0 || params->esize == 0 ||
> +		(params->esize % 8 != 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +
> +	dq = rte_zmalloc(NULL,
> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> +		RTE_CACHE_LINE_SIZE);
> +	if (dq == NULL) {
> +		rte_errno = ENOMEM;
> +
> +		return NULL;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * max_size.
> +	 */
> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> +					* params->size) + 1);
> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);

If it is going to be not MT safe, then why not to create the ring with
(RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
Though I think it could be changed to allow MT safe multiple
enqeue/single dequeue, see below.

> +	if (dq->r == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): defer queue create failed\n", __func__);
> +		rte_free(dq);
> +		return NULL;
> +	}
> +
> +	dq->v = params->v;
> +	dq->size = params->size;
> +	dq->esize = params->esize;
> +	dq->f = params->f;
> +	dq->p = params->p;
> +
> +	return dq;
> +}
> +
> +/* Enqueue one resource to the defer queue to free after the grace
> + * period is over.
> + */
> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> +{
> +	uint64_t token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +	uint32_t cur_size, free_size;
> +
> +	if (dq == NULL || e == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;

Why just not to return -EINVAL straightway?
I think there is no much point to set rte_errno in that function at all,
just return value should do.

> +	}
> +
> +	/* Start the grace period */
> +	token = rte_rcu_qsbr_start(dq->v);
> +
> +	/* Reclaim resources if the queue is 1/8th full. This helps
> +	 * the queue from growing too large and allows time for reader
> +	 * threads to report their quiescent state.
> +	 */
> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);

Probably would be a bit easier if you just store in dq->esize (elt size + token size) / 8.

> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {

Why to make this threshold value hard-coded?
Why either not to put it into create parameter, or just return a special return value,
to indicate that threshold is reached?
Or even return number of filled/free entroes on success, so caller can decide
to reclaim or not based on that information on his own?

> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Triggering reclamation\n", __func__);
> +		rte_rcu_qsbr_dq_reclaim(dq);
> +	}
> +
> +	/* Check if there is space for atleast for 1 resource */
> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> +	if (!free_size) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Defer queue is full\n", __func__);
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	/* Enqueue the resource */
> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> +
> +	/* The resource to enqueue needs to be a multiple of 64b
> +	 * due to the limitation of the rte_ring implementation.
> +	 */
> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);


That whole construction above looks a bit clumsy and error prone...
I suppose just:

const uint32_t nb_elt =  dq->elt_size/8 + 1;
uint32_t free, n;
...
n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free);
if (n == 0)
  return -ENOSPC;
return free;

That way I think you can have MT-safe version of that function.

> +
> +	return 0;
> +}
> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
> +{
> +	uint32_t max_cnt;
> +	uint32_t cnt;
> +	void *token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;

Same story as above - I think rte_errno is excessive in this function.
Just return value should be enough.


> +	}
> +
> +	/* Anything to reclaim? */
> +	if (rte_ring_count(dq->r) == 0)
> +		return 0;

Not sure you need that, see below.

> +
> +	/* Reclaim at the max 1/16th the total number of entries. */
> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;

Again why not to make max_cnt a configurable at create() parameter?
Or even a parameter for that function?

> +	cnt = 0;
> +
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> +			== 1)) {


> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> +		/* The resource to dequeue needs to be a multiple of 64b
> +		 * due to the limitation of the rte_ring implementation.
> +		 */
> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> +			i++, tmp++)
> +			(void)rte_ring_sc_dequeue(dq->r,
> +					(void *)(uintptr_t)tmp);

Again, no need for such constructs with multiple dequeuer I believe.
Just:

const uint32_t nb_elt =  dq->elt_size/8 + 1;
uint32_t n;
uintptr_t elt[nb_elt];
...
n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL);
if (n != 0) {dq->f(dq->p, elt);}

Seems enough.
Again in that case you can have enqueue/reclaim running in
different threads simultaneously, plus you don't need dq->e at all. 

> +		dq->f(dq->p, dq->e);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (cnt == 0) {
> +		/* No resources were reclaimed */
> +		rte_errno = EAGAIN;
> +		return 1;
> +	}
> +
> +	return 0;

I'd suggest to return cnt on success.

> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> +		/* Error number is already set by the reclaim API */
> +		return 1;

How do you know that you have reclaimed everything?

> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>  int rte_rcu_log_type;
> 
>  RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index c80f15c00..185d4b50a 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> +#include <rte_ring.h>
> 
>  extern int rte_rcu_log_type;
> 
> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>  	 */
>  } __rte_cache_aligned;
> 
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);

Stylish thing - usually in DPDK we have typedf newtype_t ...
Though I am not sure you need a new typedef at all - just 
a function pointer inside the struct seems enough.

> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> +
> +/**
> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> + */
> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> +
> +/**
> + *  Reclaim at the max 1/16th the total number of resources.
> + */
> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4


As I said above, I don't think these thresholds need to be hardcoded.
In any case, there seems not much point to put them in the public header file.

> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 8B as the rte_ring APIs
> +	 *   support 8B element sizes only.
> +	 */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;

Style nit again - I like short names myself, but that seems a bit extreme... :)
Might be at least:
void (*reclaim)(void *, void *);
void * reclaim_data;
?

> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;

Does it need to be inside that struct?
Might be better:
rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct rte_rcu_qsbr_dq_parameters *params);

Another alternative: make both reclaim() and enqueue() to take v as a parameter.

> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this API may change without prior notice
> @@ -648,6 +710,113 @@ __rte_experimental
>  int
>  rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Reclaim resources from the defer queue.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to reclaim an entry from.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - None of the resources have completed at least 1 grace period,
> + *		try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..2122bc36a
> --- /dev/null
> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h

Again style suggestion: as it is not public header - don't use rte_ prefix for naming.
From my perspective - easier to relalize for reader what is public header, what is not.

> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data stored on the defer queue */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +	char e[0];
> +	/**< Temporary storage to copy the defer queue element. */

Do you really need 'e' at all?
Can't it be just temporary stack variable?

> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>  	rte_rcu_qsbr_synchronize;
>  	rte_rcu_qsbr_thread_register;
>  	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
> 
>  	local: *;
>  };
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..0e1be8407 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>  libraries = [
>  	'kvargs', # eal depends on kvargs
>  	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>  	'cmdline',
>  	'metrics', # bitrate/latency stats depends on this
>  	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>  	'gro', 'gso', 'ip_frag', 'jobstats',
>  	'kni', 'latencystats', 'lpm', 'member',
>  	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>  	# ipsec lib depends on net, crypto and security
>  	'ipsec',
>  	# add pkt framework libs which use other libs from above
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
@ 2019-10-02 18:42       ` Ananyev, Konstantin
  2019-10-03 19:49         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-02 18:42 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir, ruifeng.wang,
	dharmik.thakkar, dev, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
> Sent: Tuesday, October 1, 2019 7:29 AM
> To: honnappa.nagarahalli@arm.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org;
> paulmck@linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; ruifeng.wang@arm.com;
> dharmik.thakkar@arm.com; dev@dpdk.org; nd@arm.com
> Subject: [PATCH v3 1/3] lib/ring: add peek API
> 
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> The peek API allows fetching the next available object in the ring
> without dequeuing it. This helps in scenarios where dequeuing of
> objects depend on their value.
> 
> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 2a9f768a1..d3d0d5e18 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
>  				r->cons.single, available);
>  }
> 
> +/**
> + * Peek one object from a ring.
> + *
> + * The peek API allows fetching the next available object in the ring
> + * without dequeuing it. This API is not multi-thread safe with respect
> + * to other consumer threads.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @return
> + *   - 0: Success, object available
> + *   - -ENOENT: Not enough entries in the ring.
> + */
> +__rte_experimental
> +static __rte_always_inline int
> +rte_ring_peek(struct rte_ring *r, void **obj_p)

As it is not MT safe, then I think we need _sc_ in the name,
to follow other rte_ring functions naming conventions
(rte_ring_sc_peek() or so).

As a better alternative what do you think about introducing 
a serialized versions of DPDK rte_ring dequeue functions?
Something like that:

/* same as original ring dequeue, but:
  * 1) move cons.head only if cons.head == const.tail
  * 2) don't update cons.tail
  */
unsigned int
rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
                unsigned int *available);

/* sets both cons.head and cons.tail to cons.head + num */
void rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);

/* resets cons.head to const.tail value */
void rte_ring_serial_dequeue_abort(struct rte_ring *r);

Then your dq_reclaim cycle function will look like that:

const uint32_t nb_elt =  dq->elt_size/8 + 1;
uint32_t avl, n;
uintptr_t elt[nb_elt];
...

do {

  /* read next elem from the queue */
  n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
  if (n == 0)
      break; 

 /* wrong period, keep elem in the queue */  
 if (rte_rcu_qsbr_check(dr->v, elt[0]) != 1) {
     rte_ring_serial_dequeue_abort(dq->r);
     break;
  }

  /* can reclaim, remove elem from the queue */
  rte_ring_serial_dequeue_finish(dr->q, nb_elt);

   /*call reclaim function */
  dr->f(dr->p, elt);

} while (avl >= nb_elt);

That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
As long as actual reclamation callback itself is MT safe of course.

> +{
> +	uint32_t prod_tail = r->prod.tail;
> +	uint32_t cons_head = r->cons.head;
> +	uint32_t count = (prod_tail - cons_head) & r->mask;
> +	unsigned int n = 1;
> +	if (count) {
> +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> +		return 0;
> +	}
> +	return -ENOENT;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
  2019-10-02 17:39       ` Ananyev, Konstantin
@ 2019-10-02 18:50       ` Ananyev, Konstantin
  2019-10-03  6:42         ` Honnappa Nagarahalli
  2019-10-04 19:01       ` Medvedkin, Vladimir
  2019-10-07 13:11       ` Medvedkin, Vladimir
  3 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-02 18:50 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir, ruifeng.wang,
	dharmik.thakkar, dev, nd


> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
> +{
> +	uint32_t max_cnt;
> +	uint32_t cnt;
> +	void *token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Anything to reclaim? */
> +	if (rte_ring_count(dq->r) == 0)
> +		return 0;
> +
> +	/* Reclaim at the max 1/16th the total number of entries. */
> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> +	cnt = 0;
> +
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)

One more thing I forgot to ask - how this construct supposed to work on 32 bit machines?
peek() will return 32-bit value, while  qsbr_check() operates with 64bit tokens...
As I understand in that case you need to peek() 2 elems.
Might work, but still think better to introduce serialize version of ring_dequeue()
See my other mail about re_ring_peek().


> +			== 1)) {
> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> +		/* The resource to dequeue needs to be a multiple of 64b
> +		 * due to the limitation of the rte_ring implementation.
> +		 */
> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> +			i++, tmp++)
> +			(void)rte_ring_sc_dequeue(dq->r,
> +					(void *)(uintptr_t)tmp);
> +		dq->f(dq->p, dq->e);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (cnt == 0) {
> +		/* No resources were reclaimed */
> +		rte_errno = EAGAIN;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> +		/* Error number is already set by the reclaim API */
> +		return 1;
> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>  int rte_rcu_log_type;
> 
>  RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index c80f15c00..185d4b50a 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> +#include <rte_ring.h>
> 
>  extern int rte_rcu_log_type;
> 
> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>  	 */
>  } __rte_cache_aligned;
> 
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> +
> +/**
> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> + */
> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> +
> +/**
> + *  Reclaim at the max 1/16th the total number of resources.
> + */
> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 8B as the rte_ring APIs
> +	 *   support 8B element sizes only.
> +	 */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;
> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this API may change without prior notice
> @@ -648,6 +710,113 @@ __rte_experimental
>  int
>  rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Reclaim resources from the defer queue.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to reclaim an entry from.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - None of the resources have completed at least 1 grace period,
> + *		try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..2122bc36a
> --- /dev/null
> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data stored on the defer queue */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +	char e[0];
> +	/**< Temporary storage to copy the defer queue element. */
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>  	rte_rcu_qsbr_synchronize;
>  	rte_rcu_qsbr_thread_register;
>  	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
> 
>  	local: *;
>  };
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..0e1be8407 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>  libraries = [
>  	'kvargs', # eal depends on kvargs
>  	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>  	'cmdline',
>  	'metrics', # bitrate/latency stats depends on this
>  	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>  	'gro', 'gso', 'ip_frag', 'jobstats',
>  	'kni', 'latencystats', 'lpm', 'member',
>  	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>  	# ipsec lib depends on net, crypto and security
>  	'ipsec',
>  	# add pkt framework libs which use other libs from above
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-02 17:39       ` Ananyev, Konstantin
@ 2019-10-03  6:29         ` Honnappa Nagarahalli
  2019-10-03 12:26           ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-03  6:29 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, Honnappa Nagarahalli, dev, nd, nd

> 
> Hi Honnappa,
Thanks Konstantin for the feedback.

> 
> 
> > Add resource reclamation APIs to make it simple for applications and
> > libraries to integrate rte_rcu library.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> >  lib/librte_rcu/meson.build         |   2 +
> >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> >  lib/librte_rcu/rte_rcu_version.map |   4 +
> >  lib/meson.build                    |   6 +-
> >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > @@ -21,6 +21,7 @@
> >  #include <rte_errno.h>
> >
> >  #include "rte_rcu_qsbr.h"
> > +#include "rte_rcu_qsbr_pvt.h"
> >
> >  /* Get the memory size of QSBR variable */  size_t @@ -267,6 +268,190
> > @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> >  	return 0;
> >  }
> >
> > +/* Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + */
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params) {
> > +	struct rte_rcu_qsbr_dq *dq;
> > +	uint32_t qs_fifo_size;
> > +
> > +	if (params == NULL || params->f == NULL ||
> > +		params->v == NULL || params->name == NULL ||
> > +		params->size == 0 || params->esize == 0 ||
> > +		(params->esize % 8 != 0)) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	dq = rte_zmalloc(NULL,
> > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > +		RTE_CACHE_LINE_SIZE);
> > +	if (dq == NULL) {
> > +		rte_errno = ENOMEM;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	/* round up qs_fifo_size to next power of two that is not less than
> > +	 * max_size.
> > +	 */
> > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > +					* params->size) + 1);
> > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > +					SOCKET_ID_ANY, 0);
> 
> If it is going to be not MT safe, then why not to create the ring with
> (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
Agree.

> Though I think it could be changed to allow MT safe multiple enqeue/single
> dequeue, see below.
The MT safe issue is due to reclaim code. The reclaim code has the following sequence:

rte_ring_peek
rte_rcu_qsbr_check
rte_ring_dequeue

This entire sequence needs to be atomic as the entry cannot be dequeued without knowing that the grace period for that entry is over. Note that due to optimizations in rte_rcu_qsbr_check API, this sequence should not be large in most cases. I do not have ideas on how to make this sequence lock-free.

If the writer is on the control plane, most use cases will use mutex locks for synchronization if they are multi-threaded. That lock should be enough to provide the thread safety for these APIs.

If the writer is multi-threaded and lock-free, then one should use per thread defer queue.

> 
> > +	if (dq->r == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): defer queue create failed\n", __func__);
> > +		rte_free(dq);
> > +		return NULL;
> > +	}
> > +
> > +	dq->v = params->v;
> > +	dq->size = params->size;
> > +	dq->esize = params->esize;
> > +	dq->f = params->f;
> > +	dq->p = params->p;
> > +
> > +	return dq;
> > +}
> > +
> > +/* Enqueue one resource to the defer queue to free after the grace
> > + * period is over.
> > + */
> > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > +	uint64_t token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +	uint32_t cur_size, free_size;
> > +
> > +	if (dq == NULL || e == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> 
> Why just not to return -EINVAL straightway?
> I think there is no much point to set rte_errno in that function at all, just
> return value should do.
I am trying to keep these consistent with the existing APIs. They return 0 or 1 and set the rte_errno.

> 
> > +	}
> > +
> > +	/* Start the grace period */
> > +	token = rte_rcu_qsbr_start(dq->v);
> > +
> > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > +	 * the queue from growing too large and allows time for reader
> > +	 * threads to report their quiescent state.
> > +	 */
> > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> 
> Probably would be a bit easier if you just store in dq->esize (elt size + token
> size) / 8.
Agree

> 
> > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> 
> Why to make this threshold value hard-coded?
> Why either not to put it into create parameter, or just return a special return
> value, to indicate that threshold is reached?
My thinking was to keep the programming interface easy to use. The more the parameters, the more painful it is for the user. IMO, the constants chosen should be good enough for most cases. More advanced users could modify the constants. However, we could make these as part of the parameters, but make them optional for the user. For ex: if they set them to 0, default values can be used.

> Or even return number of filled/free entroes on success, so caller can decide
> to reclaim or not based on that information on his own?
This means more code on the user side. I think adding these to parameters seems like a better option.

> 
> > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +			"%s(): Triggering reclamation\n", __func__);
> > +		rte_rcu_qsbr_dq_reclaim(dq);
> > +	}
> > +
> > +	/* Check if there is space for atleast for 1 resource */
> > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > +	if (!free_size) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Defer queue is full\n", __func__);
> > +		rte_errno = ENOSPC;
> > +		return 1;
> > +	}
> > +
> > +	/* Enqueue the resource */
> > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > +
> > +	/* The resource to enqueue needs to be a multiple of 64b
> > +	 * due to the limitation of the rte_ring implementation.
> > +	 */
> > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> 
> 
> That whole construction above looks a bit clumsy and error prone...
> I suppose just:
> 
> const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
Yes, bulk enqueue can be used. But note that once the flexible element size ring patch is done, this code will use that.

>   return -ENOSPC;
> return free;
> 
> That way I think you can have MT-safe version of that function.
Please see the description of MT safe issue above.

> 
> > +
> > +	return 0;
> > +}
> > +
> > +/* Reclaim resources from the defer queue. */ int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > +	uint32_t max_cnt;
> > +	uint32_t cnt;
> > +	void *token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> 
> Same story as above - I think rte_errno is excessive in this function.
> Just return value should be enough.
> 
> 
> > +	}
> > +
> > +	/* Anything to reclaim? */
> > +	if (rte_ring_count(dq->r) == 0)
> > +		return 0;
> 
> Not sure you need that, see below.
> 
> > +
> > +	/* Reclaim at the max 1/16th the total number of entries. */
> > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> 
> Again why not to make max_cnt a configurable at create() parameter?
I think making this as an optional parameter for creating defer queue is a better option.

> Or even a parameter for that function?
> 
> > +	cnt = 0;
> > +
> > +	/* Check reader threads quiescent state and reclaim resources */
> > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > +			== 1)) {
> 
> 
> > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > +		/* The resource to dequeue needs to be a multiple of 64b
> > +		 * due to the limitation of the rte_ring implementation.
> > +		 */
> > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > +			i++, tmp++)
> > +			(void)rte_ring_sc_dequeue(dq->r,
> > +					(void *)(uintptr_t)tmp);
> 
> Again, no need for such constructs with multiple dequeuer I believe.
> Just:
> 
> const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> elt[nb_elt]; ...
> n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0) {dq->f(dq->p,
> elt);}
Agree on bulk API use.

> 
> Seems enough.
> Again in that case you can have enqueue/reclaim running in different threads
> simultaneously, plus you don't need dq->e at all.
Will check on dq->e

> 
> > +		dq->f(dq->p, dq->e);
> > +
> > +		cnt++;
> > +	}
> > +
> > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > +
> > +	if (cnt == 0) {
> > +		/* No resources were reclaimed */
> > +		rte_errno = EAGAIN;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> 
> I'd suggest to return cnt on success.
I am trying to keep the APIs simple. I do not see much use for 'cnt' as return value to the user. It exposes more details which I think are internal to the library.

> 
> > +}
> > +
> > +/* Delete a defer queue. */
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Reclaim all the resources */
> > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > +		/* Error number is already set by the reclaim API */
> > +		return 1;
> 
> How do you know that you have reclaimed everything?
Good point, will come back with a different solution.

> 
> > +
> > +	rte_ring_free(dq->r);
> > +	rte_free(dq);
> > +
> > +	return 0;
> > +}
> > +
> >  int rte_rcu_log_type;
> >
> >  RTE_INIT(rte_rcu_register)
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > @@ -34,6 +34,7 @@ extern "C" {
> >  #include <rte_lcore.h>
> >  #include <rte_debug.h>
> >  #include <rte_atomic.h>
> > +#include <rte_ring.h>
> >
> >  extern int rte_rcu_log_type;
> >
> > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> >  	 */
> >  } __rte_cache_aligned;
> >
> > +/**
> > + * Call back function called to free the resources.
> > + *
> > + * @param p
> > + *   Pointer provided while creating the defer queue
> > + * @param e
> > + *   Pointer to the resource data stored on the defer queue
> > + *
> > + * @return
> > + *   None
> > + */
> > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> 
> Stylish thing - usually in DPDK we have typedf newtype_t ...
> Though I am not sure you need a new typedef at all - just a function pointer
> inside the struct seems enough.
Other libraries (for ex: rte_hash) use this approach. I think it is better to keep it out of the structure to allow for better commenting.

> 
> > +
> > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > +
> > +/**
> > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > + */
> > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > +
> > +/**
> > + *  Reclaim at the max 1/16th the total number of resources.
> > + */
> > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> 
> 
> As I said above, I don't think these thresholds need to be hardcoded.
> In any case, there seems not much point to put them in the public header file.
> 
> > +
> > +/**
> > + * Parameters used when creating the defer queue.
> > + */
> > +struct rte_rcu_qsbr_dq_parameters {
> > +	const char *name;
> > +	/**< Name of the queue. */
> > +	uint32_t size;
> > +	/**< Number of entries in queue. Typically, this will be
> > +	 *   the same as the maximum number of entries supported in the
> > +	 *   lock free data structure.
> > +	 *   Data structures with unbounded number of entries is not
> > +	 *   supported currently.
> > +	 */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of each element in the defer queue.
> > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > +	 *   support 8B element sizes only.
> > +	 */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> 
> Style nit again - I like short names myself, but that seems a bit extreme... :)
> Might be at least:
> void (*reclaim)(void *, void *);
May be 'free_fn'?

> void * reclaim_data;
> ?
This is the pointer to the data structure to free the resource into. For ex: In LPM data structure, it will be pointer to LPM. 'reclaim_data' does not convey the meaning correctly.

> 
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs. This can be NULL.
> > +	 */
> > +	struct rte_rcu_qsbr *v;
> 
> Does it need to be inside that struct?
> Might be better:
> rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> rte_rcu_qsbr_dq_parameters *params);
The API takes a parameter structure as input anyway, why to add another argument to the function? QSBR variable is also another parameter.

> 
> Another alternative: make both reclaim() and enqueue() to take v as a
> parameter.
But both of them need access to some of the parameters provided in rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to the functions.

> 
> > +	/**< RCU QSBR variable to use for this defer queue */ };
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq;
> > +
> >  /**
> >   * @warning
> >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > struct rte_rcu_qsbr *v);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + *
> > + * @param params
> > + *   Parameters to create a defer queue.
> > + * @return
> > + *   On success - Valid pointer to defer queue
> > + *   On error - NULL
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOMEM - Not enough memory
> > + */
> > +__rte_experimental
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Enqueue one resource to the defer queue and start the grace period.
> > + * The resource will be freed later after at least one grace period
> > + * is over.
> > + *
> > + * If the defer queue is full, it will attempt to reclaim resources.
> > + * It will also reclaim resources at regular intervals to avoid
> > + * the defer queue from growing too big.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to allocate an entry from.
> > + * @param e
> > + *   Pointer to resource data to copy to the defer queue. The size of
> > + *   the data to copy is equal to the element size provided when the
> > + *   defer queue was created.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > + *		if the defer queue size is equal (or larger) than the
> > + *		number of elements in the data structure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Reclaim resources from the defer queue.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to reclaim an entry from.
> > + * @return
> > + *   On successful reclamation of at least 1 resource - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - None of the resources have completed at least 1 grace
> period,
> > + *		try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Delete a defer queue.
> > + *
> > + * It tries to reclaim all the resources on the defer queue.
> > + * If any of the resources have not completed the grace period
> > + * the reclamation stops and returns immediately. The rest of
> > + * the resources are not reclaimed and the defer queue is not
> > + * freed.
> > + *
> > + * @param dq
> > + *   Defer queue to delete.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > + *		period, try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > new file mode 100644
> > index 000000000..2122bc36a
> > --- /dev/null
> > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> 
> Again style suggestion: as it is not public header - don't use rte_ prefix for
> naming.
> From my perspective - easier to relalize for reader what is public header,
> what is not.
Looks like the guidelines are not defined very well. I see one private file with rte_ prefix. I see Stephen not using rte_ prefix. I do not have any preference. But, a consistent approach is required.

> 
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > +#define _RTE_RCU_QSBR_PVT_H_
> > +
> > +/**
> > + * This file is private to the RCU library. It should not be included
> > + * by the user of this library.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "rte_rcu_qsbr.h"
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq {
> > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > +	uint32_t size;
> > +	/**< Number of elements in the defer queue */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of data stored on the defer queue */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs.
> > +	 */
> > +	char e[0];
> > +	/**< Temporary storage to copy the defer queue element. */
> 
> Do you really need 'e' at all?
> Can't it be just temporary stack variable?
Ok, will check.

> 
> > +};
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > b/lib/librte_rcu/rte_rcu_version.map
> > index f8b9ef2ab..dfac88a37 100644
> > --- a/lib/librte_rcu/rte_rcu_version.map
> > +++ b/lib/librte_rcu/rte_rcu_version.map
> > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >  	rte_rcu_qsbr_synchronize;
> >  	rte_rcu_qsbr_thread_register;
> >  	rte_rcu_qsbr_thread_unregister;
> > +	rte_rcu_qsbr_dq_create;
> > +	rte_rcu_qsbr_dq_enqueue;
> > +	rte_rcu_qsbr_dq_reclaim;
> > +	rte_rcu_qsbr_dq_delete;
> >
> >  	local: *;
> >  };
> > diff --git a/lib/meson.build b/lib/meson.build index
> > e5ff83893..0e1be8407 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,7 +11,9 @@
> >  libraries = [
> >  	'kvargs', # eal depends on kvargs
> >  	'eal', # everything depends on eal
> > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > +	'ring',
> > +	'rcu', # rcu depends on ring
> > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >  	'cmdline',
> >  	'metrics', # bitrate/latency stats depends on this
> >  	'hash',    # efd depends on this
> > @@ -22,7 +24,7 @@ libraries = [
> >  	'gro', 'gso', 'ip_frag', 'jobstats',
> >  	'kni', 'latencystats', 'lpm', 'member',
> >  	'power', 'pdump', 'rawdev',
> > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >  	# ipsec lib depends on net, crypto and security
> >  	'ipsec',
> >  	# add pkt framework libs which use other libs from above
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-02 18:50       ` Ananyev, Konstantin
@ 2019-10-03  6:42         ` Honnappa Nagarahalli
  2019-10-03 11:52           ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-03  6:42 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, Honnappa Nagarahalli, dev, nd, nd

> 
> > +
> > +/* Reclaim resources from the defer queue. */ int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > +	uint32_t max_cnt;
> > +	uint32_t cnt;
> > +	void *token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Anything to reclaim? */
> > +	if (rte_ring_count(dq->r) == 0)
> > +		return 0;
> > +
> > +	/* Reclaim at the max 1/16th the total number of entries. */
> > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > +	cnt = 0;
> > +
> > +	/* Check reader threads quiescent state and reclaim resources */
> > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> 
> One more thing I forgot to ask - how this construct supposed to work on 32
> bit machines?
> peek() will return 32-bit value, while  qsbr_check() operates with 64bit
> tokens...
> As I understand in that case you need to peek() 2 elems.
Yes, that is the intention. Ring APIs with desired element size will help address the 32b machines.

> Might work, but still think better to introduce serialize version of
> ring_dequeue() See my other mail about re_ring_peek().
> 
> 
> > +			== 1)) {
> > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > +		/* The resource to dequeue needs to be a multiple of 64b
> > +		 * due to the limitation of the rte_ring implementation.
> > +		 */
> > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > +			i++, tmp++)
> > +			(void)rte_ring_sc_dequeue(dq->r,
> > +					(void *)(uintptr_t)tmp);
> > +		dq->f(dq->p, dq->e);
> > +
> > +		cnt++;
> > +	}
> > +
> > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > +
> > +	if (cnt == 0) {
> > +		/* No resources were reclaimed */
> > +		rte_errno = EAGAIN;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/* Delete a defer queue. */
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Reclaim all the resources */
> > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > +		/* Error number is already set by the reclaim API */
> > +		return 1;
> > +
> > +	rte_ring_free(dq->r);
> > +	rte_free(dq);
> > +
> > +	return 0;
> > +}
> > +
> >  int rte_rcu_log_type;
> >
> >  RTE_INIT(rte_rcu_register)
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > @@ -34,6 +34,7 @@ extern "C" {
> >  #include <rte_lcore.h>
> >  #include <rte_debug.h>
> >  #include <rte_atomic.h>
> > +#include <rte_ring.h>
> >
> >  extern int rte_rcu_log_type;
> >
> > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> >  	 */
> >  } __rte_cache_aligned;
> >
> > +/**
> > + * Call back function called to free the resources.
> > + *
> > + * @param p
> > + *   Pointer provided while creating the defer queue
> > + * @param e
> > + *   Pointer to the resource data stored on the defer queue
> > + *
> > + * @return
> > + *   None
> > + */
> > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > +
> > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > +
> > +/**
> > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > + */
> > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > +
> > +/**
> > + *  Reclaim at the max 1/16th the total number of resources.
> > + */
> > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > +
> > +/**
> > + * Parameters used when creating the defer queue.
> > + */
> > +struct rte_rcu_qsbr_dq_parameters {
> > +	const char *name;
> > +	/**< Name of the queue. */
> > +	uint32_t size;
> > +	/**< Number of entries in queue. Typically, this will be
> > +	 *   the same as the maximum number of entries supported in the
> > +	 *   lock free data structure.
> > +	 *   Data structures with unbounded number of entries is not
> > +	 *   supported currently.
> > +	 */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of each element in the defer queue.
> > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > +	 *   support 8B element sizes only.
> > +	 */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs. This can be NULL.
> > +	 */
> > +	struct rte_rcu_qsbr *v;
> > +	/**< RCU QSBR variable to use for this defer queue */ };
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq;
> > +
> >  /**
> >   * @warning
> >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > struct rte_rcu_qsbr *v);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + *
> > + * @param params
> > + *   Parameters to create a defer queue.
> > + * @return
> > + *   On success - Valid pointer to defer queue
> > + *   On error - NULL
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOMEM - Not enough memory
> > + */
> > +__rte_experimental
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Enqueue one resource to the defer queue and start the grace period.
> > + * The resource will be freed later after at least one grace period
> > + * is over.
> > + *
> > + * If the defer queue is full, it will attempt to reclaim resources.
> > + * It will also reclaim resources at regular intervals to avoid
> > + * the defer queue from growing too big.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to allocate an entry from.
> > + * @param e
> > + *   Pointer to resource data to copy to the defer queue. The size of
> > + *   the data to copy is equal to the element size provided when the
> > + *   defer queue was created.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > + *		if the defer queue size is equal (or larger) than the
> > + *		number of elements in the data structure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Reclaim resources from the defer queue.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to reclaim an entry from.
> > + * @return
> > + *   On successful reclamation of at least 1 resource - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - None of the resources have completed at least 1 grace
> period,
> > + *		try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Delete a defer queue.
> > + *
> > + * It tries to reclaim all the resources on the defer queue.
> > + * If any of the resources have not completed the grace period
> > + * the reclamation stops and returns immediately. The rest of
> > + * the resources are not reclaimed and the defer queue is not
> > + * freed.
> > + *
> > + * @param dq
> > + *   Defer queue to delete.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > + *		period, try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > new file mode 100644
> > index 000000000..2122bc36a
> > --- /dev/null
> > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > +#define _RTE_RCU_QSBR_PVT_H_
> > +
> > +/**
> > + * This file is private to the RCU library. It should not be included
> > + * by the user of this library.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "rte_rcu_qsbr.h"
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq {
> > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > +	uint32_t size;
> > +	/**< Number of elements in the defer queue */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of data stored on the defer queue */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs.
> > +	 */
> > +	char e[0];
> > +	/**< Temporary storage to copy the defer queue element. */ };
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > b/lib/librte_rcu/rte_rcu_version.map
> > index f8b9ef2ab..dfac88a37 100644
> > --- a/lib/librte_rcu/rte_rcu_version.map
> > +++ b/lib/librte_rcu/rte_rcu_version.map
> > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >  	rte_rcu_qsbr_synchronize;
> >  	rte_rcu_qsbr_thread_register;
> >  	rte_rcu_qsbr_thread_unregister;
> > +	rte_rcu_qsbr_dq_create;
> > +	rte_rcu_qsbr_dq_enqueue;
> > +	rte_rcu_qsbr_dq_reclaim;
> > +	rte_rcu_qsbr_dq_delete;
> >
> >  	local: *;
> >  };
> > diff --git a/lib/meson.build b/lib/meson.build index
> > e5ff83893..0e1be8407 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,7 +11,9 @@
> >  libraries = [
> >  	'kvargs', # eal depends on kvargs
> >  	'eal', # everything depends on eal
> > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > +	'ring',
> > +	'rcu', # rcu depends on ring
> > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >  	'cmdline',
> >  	'metrics', # bitrate/latency stats depends on this
> >  	'hash',    # efd depends on this
> > @@ -22,7 +24,7 @@ libraries = [
> >  	'gro', 'gso', 'ip_frag', 'jobstats',
> >  	'kni', 'latencystats', 'lpm', 'member',
> >  	'power', 'pdump', 'rawdev',
> > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >  	# ipsec lib depends on net, crypto and security
> >  	'ipsec',
> >  	# add pkt framework libs which use other libs from above
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests
  2019-10-02 13:02       ` Aaron Conole
@ 2019-10-03  9:09         ` Bruce Richardson
  0 siblings, 0 replies; 137+ messages in thread
From: Bruce Richardson @ 2019-10-03  9:09 UTC (permalink / raw)
  To: Aaron Conole
  Cc: Honnappa Nagarahalli, vladimir.medvedkin, olivier.matz, dev,
	konstantin.ananyev, stephen, paulmck, Gavin.Hu, Dharmik.Thakkar,
	Ruifeng.Wang, nd

On Wed, Oct 02, 2019 at 09:02:03AM -0400, Aaron Conole wrote:
> Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> writes:
> 
> > Add performance tests for RCU integration. The performance
> > difference with and without RCU integration is very small
> > (~1% to ~2%) on both Arm and x86 platforms.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> 
> I see the following:
> 
>   lib/meson.build:89:5: ERROR: Problem encountered: Missing dependency rcu
>   for library rte_lpm
> 
> Maybe there's something wrong with the environment?  This isn't the
> first time I've seen a dependency detection problem with meson.
> 
It probably not a detection problem, more likely the rcu library is not
being built for some reason. If you apply patch [1] the meson run will
print out each library and the dependency object generated for it as each
is processed. That should help debug issues like this.

/Bruce

[1] http://patches.dpdk.org/patch/59470/

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-03  6:42         ` Honnappa Nagarahalli
@ 2019-10-03 11:52           ` Ananyev, Konstantin
  0 siblings, 0 replies; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-03 11:52 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Thursday, October 3, 2019 7:42 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; paulmck@linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; Ruifeng Wang (Arm Technology
> China) <Ruifeng.Wang@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; dev@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
> 
> >
> > > +
> > > +/* Reclaim resources from the defer queue. */ int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > +	uint32_t max_cnt;
> > > +	uint32_t cnt;
> > > +	void *token;
> > > +	uint64_t *tmp;
> > > +	uint32_t i;
> > > +
> > > +	if (dq == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Anything to reclaim? */
> > > +	if (rte_ring_count(dq->r) == 0)
> > > +		return 0;
> > > +
> > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > > +	cnt = 0;
> > > +
> > > +	/* Check reader threads quiescent state and reclaim resources */
> > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> >
> > One more thing I forgot to ask - how this construct supposed to work on 32
> > bit machines?
> > peek() will return 32-bit value, while  qsbr_check() operates with 64bit
> > tokens...
> > As I understand in that case you need to peek() 2 elems.
> Yes, that is the intention. Ring APIs with desired element size will help address the 32b machines.

Or serialized dequeue :)

> 
> > Might work, but still think better to introduce serialize version of
> > ring_dequeue() See my other mail about re_ring_peek().
> >
> >
> > > +			== 1)) {
> > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > +		 * due to the limitation of the rte_ring implementation.
> > > +		 */
> > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > +			i++, tmp++)
> > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > +					(void *)(uintptr_t)tmp);
> > > +		dq->f(dq->p, dq->e);
> > > +
> > > +		cnt++;
> > > +	}
> > > +
> > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > +
> > > +	if (cnt == 0) {
> > > +		/* No resources were reclaimed */
> > > +		rte_errno = EAGAIN;
> > > +		return 1;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Delete a defer queue. */
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > +	if (dq == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Reclaim all the resources */
> > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > +		/* Error number is already set by the reclaim API */
> > > +		return 1;
> > > +
> > > +	rte_ring_free(dq->r);
> > > +	rte_free(dq);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  int rte_rcu_log_type;
> > >
> > >  RTE_INIT(rte_rcu_register)
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > @@ -34,6 +34,7 @@ extern "C" {
> > >  #include <rte_lcore.h>
> > >  #include <rte_debug.h>
> > >  #include <rte_atomic.h>
> > > +#include <rte_ring.h>
> > >
> > >  extern int rte_rcu_log_type;
> > >
> > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > >  	 */
> > >  } __rte_cache_aligned;
> > >
> > > +/**
> > > + * Call back function called to free the resources.
> > > + *
> > > + * @param p
> > > + *   Pointer provided while creating the defer queue
> > > + * @param e
> > > + *   Pointer to the resource data stored on the defer queue
> > > + *
> > > + * @return
> > > + *   None
> > > + */
> > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > > +
> > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > +
> > > +/**
> > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > + */
> > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > +
> > > +/**
> > > + *  Reclaim at the max 1/16th the total number of resources.
> > > + */
> > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > > +
> > > +/**
> > > + * Parameters used when creating the defer queue.
> > > + */
> > > +struct rte_rcu_qsbr_dq_parameters {
> > > +	const char *name;
> > > +	/**< Name of the queue. */
> > > +	uint32_t size;
> > > +	/**< Number of entries in queue. Typically, this will be
> > > +	 *   the same as the maximum number of entries supported in the
> > > +	 *   lock free data structure.
> > > +	 *   Data structures with unbounded number of entries is not
> > > +	 *   supported currently.
> > > +	 */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of each element in the defer queue.
> > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > +	 *   support 8B element sizes only.
> > > +	 */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs. This can be NULL.
> > > +	 */
> > > +	struct rte_rcu_qsbr *v;
> > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq;
> > > +
> > >  /**
> > >   * @warning
> > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > > struct rte_rcu_qsbr *v);
> > >
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Create a queue used to store the data structure elements that can
> > > + * be freed later. This queue is referred to as 'defer queue'.
> > > + *
> > > + * @param params
> > > + *   Parameters to create a defer queue.
> > > + * @return
> > > + *   On success - Valid pointer to defer queue
> > > + *   On error - NULL
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOMEM - Not enough memory
> > > + */
> > > +__rte_experimental
> > > +struct rte_rcu_qsbr_dq *
> > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > +*params);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Enqueue one resource to the defer queue and start the grace period.
> > > + * The resource will be freed later after at least one grace period
> > > + * is over.
> > > + *
> > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > + * It will also reclaim resources at regular intervals to avoid
> > > + * the defer queue from growing too big.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other means.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to allocate an entry from.
> > > + * @param e
> > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > + *   the data to copy is equal to the element size provided when the
> > > + *   defer queue was created.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > + *		if the defer queue size is equal (or larger) than the
> > > + *		number of elements in the data structure.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Reclaim resources from the defer queue.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other means.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to reclaim an entry from.
> > > + * @return
> > > + *   On successful reclamation of at least 1 resource - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - None of the resources have completed at least 1 grace
> > period,
> > > + *		try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Delete a defer queue.
> > > + *
> > > + * It tries to reclaim all the resources on the defer queue.
> > > + * If any of the resources have not completed the grace period
> > > + * the reclamation stops and returns immediately. The rest of
> > > + * the resources are not reclaimed and the defer queue is not
> > > + * freed.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to delete.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > > + *		period, try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > new file mode 100644
> > > index 000000000..2122bc36a
> > > --- /dev/null
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > @@ -0,0 +1,46 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright (c) 2019 Arm Limited
> > > + */
> > > +
> > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > +#define _RTE_RCU_QSBR_PVT_H_
> > > +
> > > +/**
> > > + * This file is private to the RCU library. It should not be included
> > > + * by the user of this library.
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include "rte_rcu_qsbr.h"
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq {
> > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > +	uint32_t size;
> > > +	/**< Number of elements in the defer queue */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs.
> > > +	 */
> > > +	char e[0];
> > > +	/**< Temporary storage to copy the defer queue element. */ };
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > b/lib/librte_rcu/rte_rcu_version.map
> > > index f8b9ef2ab..dfac88a37 100644
> > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > >  	rte_rcu_qsbr_synchronize;
> > >  	rte_rcu_qsbr_thread_register;
> > >  	rte_rcu_qsbr_thread_unregister;
> > > +	rte_rcu_qsbr_dq_create;
> > > +	rte_rcu_qsbr_dq_enqueue;
> > > +	rte_rcu_qsbr_dq_reclaim;
> > > +	rte_rcu_qsbr_dq_delete;
> > >
> > >  	local: *;
> > >  };
> > > diff --git a/lib/meson.build b/lib/meson.build index
> > > e5ff83893..0e1be8407 100644
> > > --- a/lib/meson.build
> > > +++ b/lib/meson.build
> > > @@ -11,7 +11,9 @@
> > >  libraries = [
> > >  	'kvargs', # eal depends on kvargs
> > >  	'eal', # everything depends on eal
> > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > +	'ring',
> > > +	'rcu', # rcu depends on ring
> > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > >  	'cmdline',
> > >  	'metrics', # bitrate/latency stats depends on this
> > >  	'hash',    # efd depends on this
> > > @@ -22,7 +24,7 @@ libraries = [
> > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > >  	'kni', 'latencystats', 'lpm', 'member',
> > >  	'power', 'pdump', 'rawdev',
> > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > >  	# ipsec lib depends on net, crypto and security
> > >  	'ipsec',
> > >  	# add pkt framework libs which use other libs from above
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-03  6:29         ` Honnappa Nagarahalli
@ 2019-10-03 12:26           ` Ananyev, Konstantin
  2019-10-04  6:07             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-03 12:26 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd

Hi Honnappa,

> > > Add resource reclamation APIs to make it simple for applications and
> > > libraries to integrate rte_rcu library.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> > >  lib/librte_rcu/meson.build         |   2 +
> > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > >  lib/meson.build                    |   6 +-
> > >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > >
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > @@ -21,6 +21,7 @@
> > >  #include <rte_errno.h>
> > >
> > >  #include "rte_rcu_qsbr.h"
> > > +#include "rte_rcu_qsbr_pvt.h"
> > >
> > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6 +268,190
> > > @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > >  	return 0;
> > >  }
> > >
> > > +/* Create a queue used to store the data structure elements that can
> > > + * be freed later. This queue is referred to as 'defer queue'.
> > > + */
> > > +struct rte_rcu_qsbr_dq *
> > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > +*params) {
> > > +	struct rte_rcu_qsbr_dq *dq;
> > > +	uint32_t qs_fifo_size;
> > > +
> > > +	if (params == NULL || params->f == NULL ||
> > > +		params->v == NULL || params->name == NULL ||
> > > +		params->size == 0 || params->esize == 0 ||
> > > +		(params->esize % 8 != 0)) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return NULL;
> > > +	}
> > > +
> > > +	dq = rte_zmalloc(NULL,
> > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > +		RTE_CACHE_LINE_SIZE);
> > > +	if (dq == NULL) {
> > > +		rte_errno = ENOMEM;
> > > +
> > > +		return NULL;
> > > +	}
> > > +
> > > +	/* round up qs_fifo_size to next power of two that is not less than
> > > +	 * max_size.
> > > +	 */
> > > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > > +					* params->size) + 1);
> > > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > > +					SOCKET_ID_ANY, 0);
> >
> > If it is going to be not MT safe, then why not to create the ring with
> > (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> Agree.
> 
> > Though I think it could be changed to allow MT safe multiple enqeue/single
> > dequeue, see below.
> The MT safe issue is due to reclaim code. The reclaim code has the following sequence:
> 
> rte_ring_peek
> rte_rcu_qsbr_check
> rte_ring_dequeue
> 
> This entire sequence needs to be atomic as the entry cannot be dequeued without knowing that the grace period for that entry is over.

I understand that, though I believe at least it should be possible to support multiple-enqueue/single dequeuer and reclaim mode.
With serialized dequeue() even multiple dequeue should be possible.

> Note that due to optimizations in rte_rcu_qsbr_check API, this sequence should not be large in most cases. I do not have ideas on how to
> make this sequence lock-free.
> 
> If the writer is on the control plane, most use cases will use mutex locks for synchronization if they are multi-threaded. That lock should be
> enough to provide the thread safety for these APIs.

In that is case, why do we need ring at all?
For sure people can create their own queue quite easily with mutex and TAILQ.
If performance is not an issue, they can even add pthread_cond to it, and have an ability
for the consumer to sleep/wakeup on empty/full queue. 

> 
> If the writer is multi-threaded and lock-free, then one should use per thread defer queue.

If that's the only working model, then the question is why do we need that API at all?
Just simple array with counter or linked-list should do for majority of cases.

> 
> >
> > > +	if (dq->r == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): defer queue create failed\n", __func__);
> > > +		rte_free(dq);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	dq->v = params->v;
> > > +	dq->size = params->size;
> > > +	dq->esize = params->esize;
> > > +	dq->f = params->f;
> > > +	dq->p = params->p;
> > > +
> > > +	return dq;
> > > +}
> > > +
> > > +/* Enqueue one resource to the defer queue to free after the grace
> > > + * period is over.
> > > + */
> > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > > +	uint64_t token;
> > > +	uint64_t *tmp;
> > > +	uint32_t i;
> > > +	uint32_t cur_size, free_size;
> > > +
> > > +	if (dq == NULL || e == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> >
> > Why just not to return -EINVAL straightway?
> > I think there is no much point to set rte_errno in that function at all, just
> > return value should do.
> I am trying to keep these consistent with the existing APIs. They return 0 or 1 and set the rte_errno.

A lot of public DPDK API functions do use return value to return status code
(0, or some positive numbers of success, negative errno values on failure),
I am not inventing anything new here.

> 
> >
> > > +	}
> > > +
> > > +	/* Start the grace period */
> > > +	token = rte_rcu_qsbr_start(dq->v);
> > > +
> > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > +	 * the queue from growing too large and allows time for reader
> > > +	 * threads to report their quiescent state.
> > > +	 */
> > > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> >
> > Probably would be a bit easier if you just store in dq->esize (elt size + token
> > size) / 8.
> Agree
> 
> >
> > > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> >
> > Why to make this threshold value hard-coded?
> > Why either not to put it into create parameter, or just return a special return
> > value, to indicate that threshold is reached?
> My thinking was to keep the programming interface easy to use. The more the parameters, the more painful it is for the user. IMO, the
> constants chosen should be good enough for most cases. More advanced users could modify the constants. However, we could make these
> as part of the parameters, but make them optional for the user. For ex: if they set them to 0, default values can be used.
> 
> > Or even return number of filled/free entroes on success, so caller can decide
> > to reclaim or not based on that information on his own?
> This means more code on the user side. 

I personally think it it really wouldn't be that big problem to the user to pass extra parameter to the function.
Again what if user doesn't want to reclaim() in enqueue() thread at all?

> I think adding these to parameters seems like a better option.
> 
> >
> > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +			"%s(): Triggering reclamation\n", __func__);
> > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > +	}
> > > +
> > > +	/* Check if there is space for atleast for 1 resource */
> > > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > +	if (!free_size) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Defer queue is full\n", __func__);
> > > +		rte_errno = ENOSPC;
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Enqueue the resource */
> > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > +
> > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > +	 * due to the limitation of the rte_ring implementation.
> > > +	 */
> > > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> >
> >
> > That whole construction above looks a bit clumsy and error prone...
> > I suppose just:
> >
> > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> > n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
> Yes, bulk enqueue can be used. But note that once the flexible element size ring patch is done, this code will use that.

Well, when it will be in the mainline, and it would provide a better way,
for sure this code can be updated to use new API (if it is provide some improvements).
But as I udenrstand, right now it is not there, while bulk enqueue/dequeue are.

> 
> >   return -ENOSPC;
> > return free;
> >
> > That way I think you can have MT-safe version of that function.
> Please see the description of MT safe issue above.
> 
> >
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Reclaim resources from the defer queue. */ int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > +	uint32_t max_cnt;
> > > +	uint32_t cnt;
> > > +	void *token;
> > > +	uint64_t *tmp;
> > > +	uint32_t i;
> > > +
> > > +	if (dq == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> >
> > Same story as above - I think rte_errno is excessive in this function.
> > Just return value should be enough.
> >
> >
> > > +	}
> > > +
> > > +	/* Anything to reclaim? */
> > > +	if (rte_ring_count(dq->r) == 0)
> > > +		return 0;
> >
> > Not sure you need that, see below.
> >
> > > +
> > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> >
> > Again why not to make max_cnt a configurable at create() parameter?
> I think making this as an optional parameter for creating defer queue is a better option.
> 
> > Or even a parameter for that function?
> >
> > > +	cnt = 0;
> > > +
> > > +	/* Check reader threads quiescent state and reclaim resources */
> > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > > +			== 1)) {
> >
> >
> > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > +		 * due to the limitation of the rte_ring implementation.
> > > +		 */
> > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > +			i++, tmp++)
> > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > +					(void *)(uintptr_t)tmp);
> >
> > Again, no need for such constructs with multiple dequeuer I believe.
> > Just:
> >
> > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> > elt[nb_elt]; ...
> > n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0) {dq->f(dq->p,
> > elt);}
> Agree on bulk API use.
> 
> >
> > Seems enough.
> > Again in that case you can have enqueue/reclaim running in different threads
> > simultaneously, plus you don't need dq->e at all.
> Will check on dq->e
> 
> >
> > > +		dq->f(dq->p, dq->e);
> > > +
> > > +		cnt++;
> > > +	}
> > > +
> > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > +
> > > +	if (cnt == 0) {
> > > +		/* No resources were reclaimed */
> > > +		rte_errno = EAGAIN;
> > > +		return 1;
> > > +	}
> > > +
> > > +	return 0;
> >
> > I'd suggest to return cnt on success.
> I am trying to keep the APIs simple. I do not see much use for 'cnt' as return value to the user. It exposes more details which I think are
> internal to the library.

Not sure what is the hassle to return number of completed reclamaitions?
If user doesn't need that information, he simply wouldn't use it.
But might be it would be usefull - he can decide should he try another attempt
of reclaim() immediately or is it ok to do something else.

> 
> >
> > > +}
> > > +
> > > +/* Delete a defer queue. */
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > +	if (dq == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Reclaim all the resources */
> > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > +		/* Error number is already set by the reclaim API */
> > > +		return 1;
> >
> > How do you know that you have reclaimed everything?
> Good point, will come back with a different solution.
> 
> >
> > > +
> > > +	rte_ring_free(dq->r);
> > > +	rte_free(dq);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  int rte_rcu_log_type;
> > >
> > >  RTE_INIT(rte_rcu_register)
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > @@ -34,6 +34,7 @@ extern "C" {
> > >  #include <rte_lcore.h>
> > >  #include <rte_debug.h>
> > >  #include <rte_atomic.h>
> > > +#include <rte_ring.h>
> > >
> > >  extern int rte_rcu_log_type;
> > >
> > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > >  	 */
> > >  } __rte_cache_aligned;
> > >
> > > +/**
> > > + * Call back function called to free the resources.
> > > + *
> > > + * @param p
> > > + *   Pointer provided while creating the defer queue
> > > + * @param e
> > > + *   Pointer to the resource data stored on the defer queue
> > > + *
> > > + * @return
> > > + *   None
> > > + */
> > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> >
> > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > Though I am not sure you need a new typedef at all - just a function pointer
> > inside the struct seems enough.
> Other libraries (for ex: rte_hash) use this approach. I think it is better to keep it out of the structure to allow for better commenting.

I am saying majority of DPDK code use _t suffix for typedef:
typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);

> 
> >
> > > +
> > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > +
> > > +/**
> > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > + */
> > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > +
> > > +/**
> > > + *  Reclaim at the max 1/16th the total number of resources.
> > > + */
> > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> >
> >
> > As I said above, I don't think these thresholds need to be hardcoded.
> > In any case, there seems not much point to put them in the public header file.
> >
> > > +
> > > +/**
> > > + * Parameters used when creating the defer queue.
> > > + */
> > > +struct rte_rcu_qsbr_dq_parameters {
> > > +	const char *name;
> > > +	/**< Name of the queue. */
> > > +	uint32_t size;
> > > +	/**< Number of entries in queue. Typically, this will be
> > > +	 *   the same as the maximum number of entries supported in the
> > > +	 *   lock free data structure.
> > > +	 *   Data structures with unbounded number of entries is not
> > > +	 *   supported currently.
> > > +	 */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of each element in the defer queue.
> > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > +	 *   support 8B element sizes only.
> > > +	 */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> >
> > Style nit again - I like short names myself, but that seems a bit extreme... :)
> > Might be at least:
> > void (*reclaim)(void *, void *);
> May be 'free_fn'?
> 
> > void * reclaim_data;
> > ?
> This is the pointer to the data structure to free the resource into. For ex: In LPM data structure, it will be pointer to LPM. 'reclaim_data'
> does not convey the meaning correctly.

Ok, please free to comeup with your own names.
I just wanted to say that 'f' and 'p' are a bit an extreme for public API.

> 
> >
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs. This can be NULL.
> > > +	 */
> > > +	struct rte_rcu_qsbr *v;
> >
> > Does it need to be inside that struct?
> > Might be better:
> > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > rte_rcu_qsbr_dq_parameters *params);
> The API takes a parameter structure as input anyway, why to add another argument to the function? QSBR variable is also another
> parameter.
> 
> >
> > Another alternative: make both reclaim() and enqueue() to take v as a
> > parameter.
> But both of them need access to some of the parameters provided in rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to
> the functions.

Pure stylish thing.
From my perspective it just provides better visibility what is going in the code:
For QSBR var 'v' create a new deferred queue.
But no strong opinion here.

> 
> >
> > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq;
> > > +
> > >  /**
> > >   * @warning
> > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > > struct rte_rcu_qsbr *v);
> > >
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Create a queue used to store the data structure elements that can
> > > + * be freed later. This queue is referred to as 'defer queue'.
> > > + *
> > > + * @param params
> > > + *   Parameters to create a defer queue.
> > > + * @return
> > > + *   On success - Valid pointer to defer queue
> > > + *   On error - NULL
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOMEM - Not enough memory
> > > + */
> > > +__rte_experimental
> > > +struct rte_rcu_qsbr_dq *
> > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > +*params);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Enqueue one resource to the defer queue and start the grace period.
> > > + * The resource will be freed later after at least one grace period
> > > + * is over.
> > > + *
> > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > + * It will also reclaim resources at regular intervals to avoid
> > > + * the defer queue from growing too big.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other means.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to allocate an entry from.
> > > + * @param e
> > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > + *   the data to copy is equal to the element size provided when the
> > > + *   defer queue was created.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > + *		if the defer queue size is equal (or larger) than the
> > > + *		number of elements in the data structure.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Reclaim resources from the defer queue.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other means.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to reclaim an entry from.
> > > + * @return
> > > + *   On successful reclamation of at least 1 resource - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - None of the resources have completed at least 1 grace
> > period,
> > > + *		try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Delete a defer queue.
> > > + *
> > > + * It tries to reclaim all the resources on the defer queue.
> > > + * If any of the resources have not completed the grace period
> > > + * the reclamation stops and returns immediately. The rest of
> > > + * the resources are not reclaimed and the defer queue is not
> > > + * freed.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to delete.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > > + *		period, try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > new file mode 100644
> > > index 000000000..2122bc36a
> > > --- /dev/null
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >
> > Again style suggestion: as it is not public header - don't use rte_ prefix for
> > naming.
> > From my perspective - easier to relalize for reader what is public header,
> > what is not.
> Looks like the guidelines are not defined very well. I see one private file with rte_ prefix. I see Stephen not using rte_ prefix. I do not have
> any preference. But, a consistent approach is required.

That's just a suggestion.
For me (and I hope for others) it would be a bit easier.
When looking at the code for first time I had to look a t meson.build to check
is it a public header or not.
If the file doesn't have 'rte_' prefix, I assume that it is an internal one straightway.
But , as you said, there is no exact guidelines here, so up to you to decide.

> 
> >
> > > @@ -0,0 +1,46 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright (c) 2019 Arm Limited
> > > + */
> > > +
> > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > +#define _RTE_RCU_QSBR_PVT_H_
> > > +
> > > +/**
> > > + * This file is private to the RCU library. It should not be included
> > > + * by the user of this library.
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include "rte_rcu_qsbr.h"
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq {
> > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > +	uint32_t size;
> > > +	/**< Number of elements in the defer queue */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs.
> > > +	 */
> > > +	char e[0];
> > > +	/**< Temporary storage to copy the defer queue element. */
> >
> > Do you really need 'e' at all?
> > Can't it be just temporary stack variable?
> Ok, will check.
> 
> >
> > > +};
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > b/lib/librte_rcu/rte_rcu_version.map
> > > index f8b9ef2ab..dfac88a37 100644
> > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > >  	rte_rcu_qsbr_synchronize;
> > >  	rte_rcu_qsbr_thread_register;
> > >  	rte_rcu_qsbr_thread_unregister;
> > > +	rte_rcu_qsbr_dq_create;
> > > +	rte_rcu_qsbr_dq_enqueue;
> > > +	rte_rcu_qsbr_dq_reclaim;
> > > +	rte_rcu_qsbr_dq_delete;
> > >
> > >  	local: *;
> > >  };
> > > diff --git a/lib/meson.build b/lib/meson.build index
> > > e5ff83893..0e1be8407 100644
> > > --- a/lib/meson.build
> > > +++ b/lib/meson.build
> > > @@ -11,7 +11,9 @@
> > >  libraries = [
> > >  	'kvargs', # eal depends on kvargs
> > >  	'eal', # everything depends on eal
> > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > +	'ring',
> > > +	'rcu', # rcu depends on ring
> > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > >  	'cmdline',
> > >  	'metrics', # bitrate/latency stats depends on this
> > >  	'hash',    # efd depends on this
> > > @@ -22,7 +24,7 @@ libraries = [
> > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > >  	'kni', 'latencystats', 'lpm', 'member',
> > >  	'power', 'pdump', 'rawdev',
> > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > >  	# ipsec lib depends on net, crypto and security
> > >  	'ipsec',
> > >  	# add pkt framework libs which use other libs from above
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-02 18:42       ` Ananyev, Konstantin
@ 2019-10-03 19:49         ` Honnappa Nagarahalli
  2019-10-07  9:01           ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-03 19:49 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd

> > Subject: [PATCH v3 1/3] lib/ring: add peek API
> >
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> >
> > The peek API allows fetching the next available object in the ring
> > without dequeuing it. This helps in scenarios where dequeuing of
> > objects depend on their value.
> >
> > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 2a9f768a1..d3d0d5e18 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> **obj_table,
> >  				r->cons.single, available);
> >  }
> >
> > +/**
> > + * Peek one object from a ring.
> > + *
> > + * The peek API allows fetching the next available object in the ring
> > + * without dequeuing it. This API is not multi-thread safe with
> > +respect
> > + * to other consumer threads.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_p
> > + *   A pointer to a void * pointer (object) that will be filled.
> > + * @return
> > + *   - 0: Success, object available
> > + *   - -ENOENT: Not enough entries in the ring.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline int
> > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> 
> As it is not MT safe, then I think we need _sc_ in the name, to follow other
> rte_ring functions naming conventions
> (rte_ring_sc_peek() or so).
Agree

> 
> As a better alternative what do you think about introducing a serialized
> versions of DPDK rte_ring dequeue functions?
> Something like that:
> 
> /* same as original ring dequeue, but:
>   * 1) move cons.head only if cons.head == const.tail
>   * 2) don't update cons.tail
>   */
> unsigned int
> rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned
> int n,
>                 unsigned int *available);
> 
> /* sets both cons.head and cons.tail to cons.head + num */ void
> rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> 
> /* resets cons.head to const.tail value */ void
> rte_ring_serial_dequeue_abort(struct rte_ring *r);
> 
> Then your dq_reclaim cycle function will look like that:
> 
> const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n; uintptr_t
> elt[nb_elt]; ...
> 
> do {
> 
>   /* read next elem from the queue */
>   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
>   if (n == 0)
>       break;
> 
>  /* wrong period, keep elem in the queue */  if (rte_rcu_qsbr_check(dr->v,
> elt[0]) != 1) {
>      rte_ring_serial_dequeue_abort(dq->r);
>      break;
>   }
> 
>   /* can reclaim, remove elem from the queue */
>   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> 
>    /*call reclaim function */
>   dr->f(dr->p, elt);
> 
> } while (avl >= nb_elt);
> 
> That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> As long as actual reclamation callback itself is MT safe of course.

I think it is a great idea. The other writers would still be polling for the current writer to update the tail or update the head. This makes it a blocking solution.

We can make the other threads not poll i.e. they will quit reclaiming if they see that other writers are dequeuing from the queue. The other way is to use per thread queues.

The other requirement I see is to support unbounded-size data structures where in the data structures do not have a pre-determined number of entries. Also, currently the defer queue size is equal to the total number of entries in a given data structure. There are plans to support dynamically resizable defer queue. This means, memory allocation which will affect the lock-free-ness of the solution.

So, IMO:
1) The API should provide the capability to support different algorithms - may be through some flags?
2) The requirements for the ring are pretty unique to the problem we have here (for ex: move the cons-head only if cons-tail is also the same, skip polling). So, we should probably implement a ring with-in the RCU library?

From the timeline perspective, adding all these capabilities would be difficult to get done with in 19.11 timeline. What I have here satisfies my current needs. I suggest that we make provisions in APIs now to support all these features, but do the implementation in the coming releases. Does this sound ok for you?

> 
> > +{
> > +	uint32_t prod_tail = r->prod.tail;
> > +	uint32_t cons_head = r->cons.head;
> > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > +	unsigned int n = 1;
> > +	if (count) {
> > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > +		return 0;
> > +	}
> > +	return -ENOENT;
> > +}
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-03 12:26           ` Ananyev, Konstantin
@ 2019-10-04  6:07             ` Honnappa Nagarahalli
  2019-10-07 10:46               ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-04  6:07 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd, nd

> 
> Hi Honnappa,
> 
> > > > Add resource reclamation APIs to make it simple for applications
> > > > and libraries to integrate rte_rcu library.
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> > > >  lib/librte_rcu/meson.build         |   2 +
> > > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > > >  lib/meson.build                    |   6 +-
> > > >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > >
> > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > > @@ -21,6 +21,7 @@
> > > >  #include <rte_errno.h>
> > > >
> > > >  #include "rte_rcu_qsbr.h"
> > > > +#include "rte_rcu_qsbr_pvt.h"
> > > >
> > > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6
> > > > +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > > >  	return 0;
> > > >  }
> > > >
> > > > +/* Create a queue used to store the data structure elements that
> > > > +can
> > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > + */
> > > > +struct rte_rcu_qsbr_dq *
> > > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > > +*params) {
> > > > +	struct rte_rcu_qsbr_dq *dq;
> > > > +	uint32_t qs_fifo_size;
> > > > +
> > > > +	if (params == NULL || params->f == NULL ||
> > > > +		params->v == NULL || params->name == NULL ||
> > > > +		params->size == 0 || params->esize == 0 ||
> > > > +		(params->esize % 8 != 0)) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > +		rte_errno = EINVAL;
> > > > +
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	dq = rte_zmalloc(NULL,
> > > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > > +		RTE_CACHE_LINE_SIZE);
> > > > +	if (dq == NULL) {
> > > > +		rte_errno = ENOMEM;
> > > > +
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	/* round up qs_fifo_size to next power of two that is not less than
> > > > +	 * max_size.
> > > > +	 */
> > > > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > > > +					* params->size) + 1);
> > > > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > > > +					SOCKET_ID_ANY, 0);
> > >
> > > If it is going to be not MT safe, then why not to create the ring
> > > with (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> > Agree.
> >
> > > Though I think it could be changed to allow MT safe multiple
> > > enqeue/single dequeue, see below.
> > The MT safe issue is due to reclaim code. The reclaim code has the following
> sequence:
> >
> > rte_ring_peek
> > rte_rcu_qsbr_check
> > rte_ring_dequeue
> >
> > This entire sequence needs to be atomic as the entry cannot be dequeued
> without knowing that the grace period for that entry is over.
> 
> I understand that, though I believe at least it should be possible to support
> multiple-enqueue/single dequeuer and reclaim mode.
> With serialized dequeue() even multiple dequeue should be possible.
Agreed. Please see the response on the other thread.

> 
> > Note that due to optimizations in rte_rcu_qsbr_check API, this
> > sequence should not be large in most cases. I do not have ideas on how to
> make this sequence lock-free.
> >
> > If the writer is on the control plane, most use cases will use mutex
> > locks for synchronization if they are multi-threaded. That lock should be
> enough to provide the thread safety for these APIs.
> 
> In that is case, why do we need ring at all?
> For sure people can create their own queue quite easily with mutex and TAILQ.
> If performance is not an issue, they can even add pthread_cond to it, and have
> an ability for the consumer to sleep/wakeup on empty/full queue.
> 
> >
> > If the writer is multi-threaded and lock-free, then one should use per thread
> defer queue.
> 
> If that's the only working model, then the question is why do we need that API
> at all?
> Just simple array with counter or linked-list should do for majority of cases.
Please see the other thread.

> 
> >
> > >
> > > > +	if (dq->r == NULL) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): defer queue create failed\n", __func__);
> > > > +		rte_free(dq);
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	dq->v = params->v;
> > > > +	dq->size = params->size;
> > > > +	dq->esize = params->esize;
> > > > +	dq->f = params->f;
> > > > +	dq->p = params->p;
> > > > +
> > > > +	return dq;
> > > > +}
> > > > +
> > > > +/* Enqueue one resource to the defer queue to free after the
> > > > +grace
> > > > + * period is over.
> > > > + */
> > > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > > > +	uint64_t token;
> > > > +	uint64_t *tmp;
> > > > +	uint32_t i;
> > > > +	uint32_t cur_size, free_size;
> > > > +
> > > > +	if (dq == NULL || e == NULL) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > +		rte_errno = EINVAL;
> > > > +
> > > > +		return 1;
> > >
> > > Why just not to return -EINVAL straightway?
> > > I think there is no much point to set rte_errno in that function at
> > > all, just return value should do.
> > I am trying to keep these consistent with the existing APIs. They return 0 or 1
> and set the rte_errno.
> 
> A lot of public DPDK API functions do use return value to return status code (0,
> or some positive numbers of success, negative errno values on failure), I am
> not inventing anything new here.
Agree, you are not proposing a new thing here. May be I was not clear. I really do not have an opinion on how this should be done. But, I do have an opinion on consistency. These new APIs follow what has been done in the existing RCU APIs. I think we have 2 options here.
1) Either we change existing RCU APIs to get rid of rte_errno (is it an ABI change?) or
2) The new APIs follow what has been done in the existing RCU APIs.
I want to make sure we are consistent at least within RCU APIs.

> 
> >
> > >
> > > > +	}
> > > > +
> > > > +	/* Start the grace period */
> > > > +	token = rte_rcu_qsbr_start(dq->v);
> > > > +
> > > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > > +	 * the queue from growing too large and allows time for reader
> > > > +	 * threads to report their quiescent state.
> > > > +	 */
> > > > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> > >
> > > Probably would be a bit easier if you just store in dq->esize (elt
> > > size + token
> > > size) / 8.
> > Agree
> >
> > >
> > > > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> > >
> > > Why to make this threshold value hard-coded?
> > > Why either not to put it into create parameter, or just return a
> > > special return value, to indicate that threshold is reached?
> > My thinking was to keep the programming interface easy to use. The
> > more the parameters, the more painful it is for the user. IMO, the
> > constants chosen should be good enough for most cases. More advanced
> users could modify the constants. However, we could make these as part of the
> parameters, but make them optional for the user. For ex: if they set them to 0,
> default values can be used.
> >
> > > Or even return number of filled/free entroes on success, so caller
> > > can decide to reclaim or not based on that information on his own?
> > This means more code on the user side.
> 
> I personally think it it really wouldn't be that big problem to the user to pass
> extra parameter to the function.
I will convert the 2 constants into optional parameters (user can set them to 0 to make the algorithm use default values)

> Again what if user doesn't want to reclaim() in enqueue() thread at all?
'enqueue' has to do reclamation if the defer queue is full. I do not think this is trivial.

In the current design, reclamation in enqueue is also done on regular basis (automatic triggering of reclamation when the queue reaches certain limit) to keep the queue from growing too large. This is required when we implement a dynamically adjusting defer queue. The current algorithm keeps the cost of reclamation spread across multiple calls and puts an upper bound on cycles for delete API by reclaiming a fixed number of entries.

This algorithm is proven to work in the LPM integration performance tests at a very low performance over head (~1%). So, I do not know why a user would not want to use this. The 2 additional parameters should give the user more flexibility.

However, if the user wants his own algorithm, he can create one with the base APIs provided.

> 
> > I think adding these to parameters seems like a better option.
> >
> > >
> > > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > +			"%s(): Triggering reclamation\n", __func__);
> > > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > > +	}
> > > > +
> > > > +	/* Check if there is space for atleast for 1 resource */
> > > > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > > +	if (!free_size) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Defer queue is full\n", __func__);
> > > > +		rte_errno = ENOSPC;
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	/* Enqueue the resource */
> > > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > > +
> > > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > > +	 * due to the limitation of the rte_ring implementation.
> > > > +	 */
> > > > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> > >
> > >
> > > That whole construction above looks a bit clumsy and error prone...
> > > I suppose just:
> > >
> > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> > > n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
> > Yes, bulk enqueue can be used. But note that once the flexible element size
> ring patch is done, this code will use that.
> 
> Well, when it will be in the mainline, and it would provide a better way, for sure
> this code can be updated to use new API (if it is provide some improvements).
> But as I udenrstand, right now it is not there, while bulk enqueue/dequeue are.
Apologies, I was not clear. I agree we can go with bulk APIs for now.

> 
> >
> > >   return -ENOSPC;
> > > return free;
> > >
> > > That way I think you can have MT-safe version of that function.
> > Please see the description of MT safe issue above.
> >
> > >
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +/* Reclaim resources from the defer queue. */ int
> > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > > +	uint32_t max_cnt;
> > > > +	uint32_t cnt;
> > > > +	void *token;
> > > > +	uint64_t *tmp;
> > > > +	uint32_t i;
> > > > +
> > > > +	if (dq == NULL) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > +		rte_errno = EINVAL;
> > > > +
> > > > +		return 1;
> > >
> > > Same story as above - I think rte_errno is excessive in this function.
> > > Just return value should be enough.
> > >
> > >
> > > > +	}
> > > > +
> > > > +	/* Anything to reclaim? */
> > > > +	if (rte_ring_count(dq->r) == 0)
> > > > +		return 0;
> > >
> > > Not sure you need that, see below.
> > >
> > > > +
> > > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > >
> > > Again why not to make max_cnt a configurable at create() parameter?
> > I think making this as an optional parameter for creating defer queue is a
> better option.
> >
> > > Or even a parameter for that function?
> > >
> > > > +	cnt = 0;
> > > > +
> > > > +	/* Check reader threads quiescent state and reclaim resources */
> > > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > > > +			== 1)) {
> > >
> > >
> > > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > > +		 * due to the limitation of the rte_ring implementation.
> > > > +		 */
> > > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > > +			i++, tmp++)
> > > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > > +					(void *)(uintptr_t)tmp);
> > >
> > > Again, no need for such constructs with multiple dequeuer I believe.
> > > Just:
> > >
> > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> > > elt[nb_elt]; ...
> > > n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0)
> > > {dq->f(dq->p, elt);}
> > Agree on bulk API use.
> >
> > >
> > > Seems enough.
> > > Again in that case you can have enqueue/reclaim running in different
> > > threads simultaneously, plus you don't need dq->e at all.
> > Will check on dq->e
> >
> > >
> > > > +		dq->f(dq->p, dq->e);
> > > > +
> > > > +		cnt++;
> > > > +	}
> > > > +
> > > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > > +
> > > > +	if (cnt == 0) {
> > > > +		/* No resources were reclaimed */
> > > > +		rte_errno = EAGAIN;
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	return 0;
> > >
> > > I'd suggest to return cnt on success.
> > I am trying to keep the APIs simple. I do not see much use for 'cnt'
> > as return value to the user. It exposes more details which I think are internal
> to the library.
> 
> Not sure what is the hassle to return number of completed reclamaitions?
> If user doesn't need that information, he simply wouldn't use it.
> But might be it would be usefull - he can decide should he try another attempt
> of reclaim() immediately or is it ok to do something else.
There is no hassle to return that information.

As per the current design, user calls 'reclaim' when it is out of resources while adding an entry to the data structure. At that point the user wants to know if at least 1 resource was reclaimed because the user has to allocate 1 resource. He does not have a use for the number of resources reclaimed.

If this API returns 0, then the user can decide to repeat the call or return failure. But that decision depends on the length of the grace period which is under user's control.

> 
> >
> > >
> > > > +}
> > > > +
> > > > +/* Delete a defer queue. */
> > > > +int
> > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > > +	if (dq == NULL) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > +		rte_errno = EINVAL;
> > > > +
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	/* Reclaim all the resources */
> > > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > > +		/* Error number is already set by the reclaim API */
> > > > +		return 1;
> > >
> > > How do you know that you have reclaimed everything?
> > Good point, will come back with a different solution.
> >
> > >
> > > > +
> > > > +	rte_ring_free(dq->r);
> > > > +	rte_free(dq);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  int rte_rcu_log_type;
> > > >
> > > >  RTE_INIT(rte_rcu_register)
> > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > > @@ -34,6 +34,7 @@ extern "C" {
> > > >  #include <rte_lcore.h>
> > > >  #include <rte_debug.h>
> > > >  #include <rte_atomic.h>
> > > > +#include <rte_ring.h>
> > > >
> > > >  extern int rte_rcu_log_type;
> > > >
> > > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > > >  	 */
> > > >  } __rte_cache_aligned;
> > > >
> > > > +/**
> > > > + * Call back function called to free the resources.
> > > > + *
> > > > + * @param p
> > > > + *   Pointer provided while creating the defer queue
> > > > + * @param e
> > > > + *   Pointer to the resource data stored on the defer queue
> > > > + *
> > > > + * @return
> > > > + *   None
> > > > + */
> > > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > >
> > > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > > Though I am not sure you need a new typedef at all - just a function
> > > pointer inside the struct seems enough.
> > Other libraries (for ex: rte_hash) use this approach. I think it is better to keep
> it out of the structure to allow for better commenting.
> 
> I am saying majority of DPDK code use _t suffix for typedef:
> typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
Apologies, got it, will change.

> 
> >
> > >
> > > > +
> > > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > > +
> > > > +/**
> > > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > > + */
> > > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > > +
> > > > +/**
> > > > + *  Reclaim at the max 1/16th the total number of resources.
> > > > + */
> > > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > >
> > >
> > > As I said above, I don't think these thresholds need to be hardcoded.
> > > In any case, there seems not much point to put them in the public header
> file.
> > >
> > > > +
> > > > +/**
> > > > + * Parameters used when creating the defer queue.
> > > > + */
> > > > +struct rte_rcu_qsbr_dq_parameters {
> > > > +	const char *name;
> > > > +	/**< Name of the queue. */
> > > > +	uint32_t size;
> > > > +	/**< Number of entries in queue. Typically, this will be
> > > > +	 *   the same as the maximum number of entries supported in the
> > > > +	 *   lock free data structure.
> > > > +	 *   Data structures with unbounded number of entries is not
> > > > +	 *   supported currently.
> > > > +	 */
> > > > +	uint32_t esize;
> > > > +	/**< Size (in bytes) of each element in the defer queue.
> > > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > > +	 *   support 8B element sizes only.
> > > > +	 */
> > > > +	rte_rcu_qsbr_free_resource f;
> > > > +	/**< Function to call to free the resource. */
> > > > +	void *p;
> > >
> > > Style nit again - I like short names myself, but that seems a bit
> > > extreme... :) Might be at least:
> > > void (*reclaim)(void *, void *);
> > May be 'free_fn'?
> >
> > > void * reclaim_data;
> > > ?
> > This is the pointer to the data structure to free the resource into. For ex: In
> LPM data structure, it will be pointer to LPM. 'reclaim_data'
> > does not convey the meaning correctly.
> 
> Ok, please free to comeup with your own names.
> I just wanted to say that 'f' and 'p' are a bit an extreme for public API.
ok, this is the hardest thing to do 😊

> 
> >
> > >
> > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > +	 *   pointer to the data structure to which the resource to free
> > > > +	 *   belongs. This can be NULL.
> > > > +	 */
> > > > +	struct rte_rcu_qsbr *v;
> > >
> > > Does it need to be inside that struct?
> > > Might be better:
> > > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > > rte_rcu_qsbr_dq_parameters *params);
> > The API takes a parameter structure as input anyway, why to add
> > another argument to the function? QSBR variable is also another parameter.
> >
> > >
> > > Another alternative: make both reclaim() and enqueue() to take v as
> > > a parameter.
> > But both of them need access to some of the parameters provided in
> > rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to the
> functions.
> 
> Pure stylish thing.
> From my perspective it just provides better visibility what is going in the code:
> For QSBR var 'v' create a new deferred queue.
> But no strong opinion here.
> 
> >
> > >
> > > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > > +
> > > > +/* RTE defer queue structure.
> > > > + * This structure holds the defer queue. The defer queue is used
> > > > +to
> > > > + * hold the deleted entries from the data structure that are not
> > > > + * yet freed.
> > > > + */
> > > > +struct rte_rcu_qsbr_dq;
> > > > +
> > > >  /**
> > > >   * @warning
> > > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE
> > > > *f, struct rte_rcu_qsbr *v);
> > > >
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Create a queue used to store the data structure elements that
> > > > +can
> > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > + *
> > > > + * @param params
> > > > + *   Parameters to create a defer queue.
> > > > + * @return
> > > > + *   On success - Valid pointer to defer queue
> > > > + *   On error - NULL
> > > > + *   Possible rte_errno codes are:
> > > > + *   - EINVAL - NULL parameters are passed
> > > > + *   - ENOMEM - Not enough memory
> > > > + */
> > > > +__rte_experimental
> > > > +struct rte_rcu_qsbr_dq *
> > > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > > +*params);
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Enqueue one resource to the defer queue and start the grace period.
> > > > + * The resource will be freed later after at least one grace
> > > > +period
> > > > + * is over.
> > > > + *
> > > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > > + * It will also reclaim resources at regular intervals to avoid
> > > > + * the defer queue from growing too big.
> > > > + *
> > > > + * This API is not multi-thread safe. It is expected that the
> > > > +caller
> > > > + * provides multi-thread safety by locking a mutex or some other means.
> > > > + *
> > > > + * A lock free multi-thread writer algorithm could achieve
> > > > +multi-thread
> > > > + * safety by creating and using one defer queue per thread.
> > > > + *
> > > > + * @param dq
> > > > + *   Defer queue to allocate an entry from.
> > > > + * @param e
> > > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > > + *   the data to copy is equal to the element size provided when the
> > > > + *   defer queue was created.
> > > > + * @return
> > > > + *   On success - 0
> > > > + *   On error - 1 with rte_errno set to
> > > > + *   - EINVAL - NULL parameters are passed
> > > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > > + *		if the defer queue size is equal (or larger) than the
> > > > + *		number of elements in the data structure.
> > > > + */
> > > > +__rte_experimental
> > > > +int
> > > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Reclaim resources from the defer queue.
> > > > + *
> > > > + * This API is not multi-thread safe. It is expected that the
> > > > +caller
> > > > + * provides multi-thread safety by locking a mutex or some other means.
> > > > + *
> > > > + * A lock free multi-thread writer algorithm could achieve
> > > > +multi-thread
> > > > + * safety by creating and using one defer queue per thread.
> > > > + *
> > > > + * @param dq
> > > > + *   Defer queue to reclaim an entry from.
> > > > + * @return
> > > > + *   On successful reclamation of at least 1 resource - 0
> > > > + *   On error - 1 with rte_errno set to
> > > > + *   - EINVAL - NULL parameters are passed
> > > > + *   - EAGAIN - None of the resources have completed at least 1 grace
> > > period,
> > > > + *		try again.
> > > > + */
> > > > +__rte_experimental
> > > > +int
> > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Delete a defer queue.
> > > > + *
> > > > + * It tries to reclaim all the resources on the defer queue.
> > > > + * If any of the resources have not completed the grace period
> > > > + * the reclamation stops and returns immediately. The rest of
> > > > + * the resources are not reclaimed and the defer queue is not
> > > > + * freed.
> > > > + *
> > > > + * @param dq
> > > > + *   Defer queue to delete.
> > > > + * @return
> > > > + *   On success - 0
> > > > + *   On error - 1
> > > > + *   Possible rte_errno codes are:
> > > > + *   - EINVAL - NULL parameters are passed
> > > > + *   - EAGAIN - Some of the resources have not completed at least 1
> grace
> > > > + *		period, try again.
> > > > + */
> > > > +__rte_experimental
> > > > +int
> > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > new file mode 100644
> > > > index 000000000..2122bc36a
> > > > --- /dev/null
> > > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > >
> > > Again style suggestion: as it is not public header - don't use rte_
> > > prefix for naming.
> > > From my perspective - easier to relalize for reader what is public
> > > header, what is not.
> > Looks like the guidelines are not defined very well. I see one private
> > file with rte_ prefix. I see Stephen not using rte_ prefix. I do not have any
> preference. But, a consistent approach is required.
> 
> That's just a suggestion.
> For me (and I hope for others) it would be a bit easier.
> When looking at the code for first time I had to look a t meson.build to check is
> it a public header or not.
> If the file doesn't have 'rte_' prefix, I assume that it is an internal one
> straightway.
> But , as you said, there is no exact guidelines here, so up to you to decide.
I think it makes sense to remove 'rte_' prefix. I will also change the file name to have '_private' suffix.
There are some inconsistencies in the existing code, will send a patch to correct them to follow this approach.

> 
> >
> > >
> > > > @@ -0,0 +1,46 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright (c) 2019 Arm Limited  */
> > > > +
> > > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > > +#define _RTE_RCU_QSBR_PVT_H_
> > > > +
> > > > +/**
> > > > + * This file is private to the RCU library. It should not be
> > > > +included
> > > > + * by the user of this library.
> > > > + */
> > > > +
> > > > +#ifdef __cplusplus
> > > > +extern "C" {
> > > > +#endif
> > > > +
> > > > +#include "rte_rcu_qsbr.h"
> > > > +
> > > > +/* RTE defer queue structure.
> > > > + * This structure holds the defer queue. The defer queue is used
> > > > +to
> > > > + * hold the deleted entries from the data structure that are not
> > > > + * yet freed.
> > > > + */
> > > > +struct rte_rcu_qsbr_dq {
> > > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > > +	uint32_t size;
> > > > +	/**< Number of elements in the defer queue */
> > > > +	uint32_t esize;
> > > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > > +	rte_rcu_qsbr_free_resource f;
> > > > +	/**< Function to call to free the resource. */
> > > > +	void *p;
> > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > +	 *   pointer to the data structure to which the resource to free
> > > > +	 *   belongs.
> > > > +	 */
> > > > +	char e[0];
> > > > +	/**< Temporary storage to copy the defer queue element. */
> > >
> > > Do you really need 'e' at all?
> > > Can't it be just temporary stack variable?
> > Ok, will check.
> >
> > >
> > > > +};
> > > > +
> > > > +#ifdef __cplusplus
> > > > +}
> > > > +#endif
> > > > +
> > > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > > b/lib/librte_rcu/rte_rcu_version.map
> > > > index f8b9ef2ab..dfac88a37 100644
> > > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > > >  	rte_rcu_qsbr_synchronize;
> > > >  	rte_rcu_qsbr_thread_register;
> > > >  	rte_rcu_qsbr_thread_unregister;
> > > > +	rte_rcu_qsbr_dq_create;
> > > > +	rte_rcu_qsbr_dq_enqueue;
> > > > +	rte_rcu_qsbr_dq_reclaim;
> > > > +	rte_rcu_qsbr_dq_delete;
> > > >
> > > >  	local: *;
> > > >  };
> > > > diff --git a/lib/meson.build b/lib/meson.build index
> > > > e5ff83893..0e1be8407 100644
> > > > --- a/lib/meson.build
> > > > +++ b/lib/meson.build
> > > > @@ -11,7 +11,9 @@
> > > >  libraries = [
> > > >  	'kvargs', # eal depends on kvargs
> > > >  	'eal', # everything depends on eal
> > > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > +	'ring',
> > > > +	'rcu', # rcu depends on ring
> > > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > >  	'cmdline',
> > > >  	'metrics', # bitrate/latency stats depends on this
> > > >  	'hash',    # efd depends on this
> > > > @@ -22,7 +24,7 @@ libraries = [
> > > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > > >  	'kni', 'latencystats', 'lpm', 'member',
> > > >  	'power', 'pdump', 'rawdev',
> > > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > > >  	# ipsec lib depends on net, crypto and security
> > > >  	'ipsec',
> > > >  	# add pkt framework libs which use other libs from above
> > > > --
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
@ 2019-10-04 16:05       ` Medvedkin, Vladimir
  2019-10-09  3:48         ` Honnappa Nagarahalli
  2019-10-07  9:21       ` Ananyev, Konstantin
  1 sibling, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-10-04 16:05 UTC (permalink / raw)
  To: Honnappa Nagarahalli, bruce.richardson, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Dharmik.Thakkar, Ruifeng.Wang, nd

Hi Honnappa,

On 01/10/2019 19:28, Honnappa Nagarahalli wrote:
> From: Ruifeng Wang <ruifeng.wang@arm.com>
>
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
>
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>   lib/librte_lpm/Makefile            |   3 +-
>   lib/librte_lpm/meson.build         |   2 +
>   lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
>   lib/librte_lpm/rte_lpm.h           |  21 ++++++
>   lib/librte_lpm/rte_lpm_version.map |   6 ++
>   5 files changed, 122 insertions(+), 12 deletions(-)
>
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index a7946a1c5..ca9e16312 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
>   # library name
>   LIB = librte_lpm.a
>   
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>   CFLAGS += -O3
>   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
>   
>   EXPORT_MAP := rte_lpm_version.map
>   
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index a5176d8ae..19a35107f 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -2,9 +2,11 @@
>   # Copyright(c) 2017 Intel Corporation
>   
>   version = 2
> +allow_experimental_apis = true
>   sources = files('rte_lpm.c', 'rte_lpm6.c')
>   headers = files('rte_lpm.h', 'rte_lpm6.h')
>   # since header files have different names, we can install all vector headers
>   # without worrying about which architecture we actually need
>   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>   deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 3a929a1b1..ca58d4b35 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>    */
>   
>   #include <string.h>
> @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
>   
>   	rte_mcfg_tailq_write_unlock();
>   
> +	if (lpm->dq)
> +		rte_rcu_qsbr_dq_delete(lpm->dq);
>   	rte_free(lpm->tbl8);
>   	rte_free(lpm->rules_tbl);
>   	rte_free(lpm);
> @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
>   MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
>   		rte_lpm_free_v1604);
As a general comment, are you going to add rcu support to the legacy _v20 ?
>   
> +struct __rte_lpm_rcu_dq_entry {
> +	uint32_t tbl8_group_index;
> +	uint32_t pad;
> +};

Is this struct necessary? I mean in tbl8_free_v1604() you can pass 
tbl8_group_index as a pointer without "e.pad = 0;".

And what about 32bit environment?

> +
> +static void
> +__lpm_rcu_qsbr_free_resource(void *p, void *data)
> +{
> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	struct __rte_lpm_rcu_dq_entry *e =
> +			(struct __rte_lpm_rcu_dq_entry *)data;
> +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> +
> +	/* Set tbl8 group invalid */
> +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> +		__ATOMIC_RELAXED);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
> +{
> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +
> +	if ((lpm == NULL) || (v == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->dq) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	/* Init QSBR defer queue. */
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm->name);

Consider moving this logic into rte_rcu_qsbr_dq_create(). I think there 
you could prefix the name with just RCU_ . So it would be possible to 
move include <rte_ring.h> into the rte_rcu_qsbr.c from rte_rcu_qsbr.h 
and get rid of RTE_RCU_QSBR_DQ_NAMESIZE macro in rte_rcu_qsbr.h file.

> +	params.name = rcu_dq_name;
> +	params.size = lpm->number_tbl8s;
> +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> +	params.f = __lpm_rcu_qsbr_free_resource;
> +	params.p = lpm->tbl8;
> +	params.v = v;
> +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> +	if (lpm->dq == NULL) {
> +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
>   /*
>    * Adds a rule to the rule table.
>    *
> @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
>   }
>   
>   static int32_t
> -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +__tbl8_alloc_v1604(struct rte_lpm *lpm)
>   {
>   	uint32_t group_idx; /* tbl8 group index. */
>   	struct rte_lpm_tbl_entry *tbl8_entry;
>   
>   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>   		/* If a free tbl8 group is found clean it and set as VALID. */
>   		if (!tbl8_entry->valid_group) {
>   			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -712,6 +769,21 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>   	return -ENOSPC;
>   }
>   
> +static int32_t
> +tbl8_alloc_v1604(struct rte_lpm *lpm)
> +{
> +	int32_t group_idx; /* tbl8 group index. */
> +
> +	group_idx = __tbl8_alloc_v1604(lpm);
> +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> +		/* If there are no tbl8 groups try to reclaim some. */
> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> +			group_idx = __tbl8_alloc_v1604(lpm);
> +	}
> +
> +	return group_idx;
> +}
> +
>   static void
>   tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>   {
> @@ -728,13 +800,21 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>   }
>   
>   static void
> -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>   {
> -	/* Set tbl8 group invalid*/
>   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	struct __rte_lpm_rcu_dq_entry e;
>   
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (lpm->dq != NULL) {
> +		e.tbl8_group_index = tbl8_group_start;
> +		e.pad = 0;
> +		/* Push into QSBR defer queue. */
> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> +	} else {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	}
>   }
>   
>   static __rte_noinline int32_t
> @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   
>   	if (!lpm->tbl24[tbl24_index].valid) {
>   		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
>   
>   		/* Check tbl8 allocation was successful. */
>   		if (tbl8_group_index < 0) {
> @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   	} /* If valid entry but not extended calculate the index into Table8. */
>   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>   		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
>   
>   		if (tbl8_group_index < 0) {
>   			return tbl8_group_index;
> @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   		 */
>   		lpm->tbl24[tbl24_index].valid = 0;
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>   	} else if (tbl8_recycle_index > -1) {
>   		/* Update tbl24 entry. */
>   		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -1834,7 +1914,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>   				__ATOMIC_RELAXED);
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>   	}
>   #undef group_idx
>   	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index 906ec4483..49c12a68d 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>    */
>   
>   #ifndef _RTE_LPM_H_
> @@ -21,6 +22,7 @@
>   #include <rte_common.h>
>   #include <rte_vect.h>
>   #include <rte_compat.h>
> +#include <rte_rcu_qsbr.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -186,6 +188,7 @@ struct rte_lpm {
>   			__rte_cache_aligned; /**< LPM tbl24 table. */
>   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
>   };
>   
>   /**
> @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
>   void
>   rte_lpm_free_v1604(struct rte_lpm *lpm);
>   
> +/**
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param v
> + *   RCU QSBR variable
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
> +
>   /**
>    * Add a rule to the LPM table.
>    *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 90beac853..b353aabd2 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -44,3 +44,9 @@ DPDK_17.05 {
>   	rte_lpm6_lookup_bulk_func;
>   
>   } DPDK_16.04;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
  2019-10-02 17:39       ` Ananyev, Konstantin
  2019-10-02 18:50       ` Ananyev, Konstantin
@ 2019-10-04 19:01       ` Medvedkin, Vladimir
  2019-10-07 13:11       ` Medvedkin, Vladimir
  3 siblings, 0 replies; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-10-04 19:01 UTC (permalink / raw)
  To: Honnappa Nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, ruifeng.wang, dharmik.thakkar, dev, nd

Hi Honnappa,

On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
> Add resource reclamation APIs to make it simple for applications
> and libraries to integrate rte_rcu library.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
>   lib/librte_rcu/meson.build         |   2 +
>   lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
>   lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
>   lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
>   lib/librte_rcu/rte_rcu_version.map |   4 +
>   lib/meson.build                    |   6 +-
>   7 files changed, 700 insertions(+), 3 deletions(-)
>   create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
There are compilation errors when building DPDK as a shared library.

I think you need something like:

--- a/lib/librte_rcu/Makefile
+++ b/lib/librte_rcu/Makefile
@@ -8,7 +8,7 @@ LIB = librte_rcu.a

  CFLAGS += -DALLOW_EXPERIMENTAL_API
  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
-LDLIBS += -lrte_eal
+LDLIBS += -lrte_eal -lrte_ring
>
> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
> index d1b9e46a2..3a6815243 100644
> --- a/app/test/test_rcu_qsbr.c
> +++ b/app/test/test_rcu_qsbr.c
I think it's better to split unittests patches and the library patches
> @@ -1,8 +1,9 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright (c) 2018 Arm Limited
> + * Copyright (c) 2019 Arm Limited
>    */
>   
>   #include <stdio.h>
> +#include <string.h>
>   #include <rte_pause.h>
>   #include <rte_rcu_qsbr.h>
>   #include <rte_hash.h>
> @@ -33,6 +34,7 @@ static uint32_t *keys;
>   #define COUNTER_VALUE 4096
>   static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
>   static uint8_t writer_done;
> +static uint8_t cb_failed;
>   
>   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
>   struct rte_hash *h[RTE_MAX_LCORE];
> @@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
>   	return 0;
>   }
>   
> +static void
> +rte_rcu_qsbr_test_free_resource(void *p, void *e)
This function is not a part of DPDK API so it's better to name it like 
test_rcu_qsbr_free_resource().
> +{
> +	if (p != NULL && e != NULL) {
> +		printf("%s: Test failed\n", __func__);
> +		cb_failed = 1;
> +	}
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_create: create a queue used to store the data structure
> + * elements that can be freed later. This queue is referred to as 'defer queue'.
> + */
> +static int
> +test_rcu_qsbr_dq_create(void)
> +{
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
> +
> +	/* Pass invalid parameters */
> +	dq = rte_rcu_qsbr_dq_create(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.size = 1;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.esize = 3;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	/* Pass all valid parameters */
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +	rte_rcu_qsbr_dq_delete(dq);
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> + * to be freed later after atleast one grace period is over.
> + */
> +static int
> +test_rcu_qsbr_dq_enqueue(void)
> +{
> +	int ret;
> +	uint64_t r;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
> +
> +	/* Create a queue with simple parameters */
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	params.size = 1;
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
> + */
> +static int
> +test_rcu_qsbr_dq_reclaim(void)
> +{
> +	int ret;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
> + */
> +static int
> +test_rcu_qsbr_dq_delete(void)
> +{
> +	int ret;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_delete(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid params");
> +
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	params.size = 1;
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> + * to be freed later after atleast one grace period is over.
> + */
> +static int
> +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize)
> +{
> +	int i, j, ret;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint64_t *e;
> +	uint64_t sc = 200;
> +	int max_entries;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
> +	printf("Size = %d, esize = %d\n", size, esize);
> +
> +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
> +	if (e == NULL)
> +		return 0;
> +	cb_failed = 0;
> +
> +	/* Initialize the RCU variable. No threads are registered */
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +
> +	/* Create a queue with simple parameters */
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	params.v = t[0];
> +	params.size = size;
> +	params.esize = esize;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +
> +	/* Given the size and esize, calculate the maximum number of entries
> +	 * that can be stored on the defer queue (look at the logic used
> +	 * in capacity calculation of rte_ring).
> +	 */
> +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
> +	max_entries = (max_entries - 1)/(esize/8 + 1);
> +
> +	/* Enqueue few counters starting with the value 'sc' */
> +	/* The queue size will be rounded up to 2. The enqueue API also
> +	 * reclaims if the queue size is above certain limit. Since, there
> +	 * are no threads registered, reclamation succedes. Hence, it should
> +	 * be possible to enqueue more than the provided queue size.
> +	 */
> +	for (i = 0; i < 10; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Register a thread on the RCU QSBR variable. Reclamation will not
> +	 * succeed. It should not be possible to enqueue more than the size
> +	 * number of resources.
> +	 */
> +	rte_rcu_qsbr_thread_register(t[0], 1);
> +	rte_rcu_qsbr_thread_online(t[0], 1);
> +
> +	for (i = 0; i < max_entries; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Enqueue fails as queue is full */
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
> +
> +	/* Delete should fail as there are elements in defer queue which
> +	 * cannot be reclaimed.
> +	 */
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid params");
> +
> +	/* Report quiescent state, enqueue should succeed */
> +	rte_rcu_qsbr_quiescent(t[0], 1);
> +	for (i = 0; i < max_entries; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Queue is full */
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
> +
> +	/* Report quiescent state, delete should succeed */
> +	rte_rcu_qsbr_quiescent(t[0], 1);
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
> +
> +	/* Validate that call back function did not return any error */
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
> +
> +	rte_free(e);
> +	return 0;
> +}
> +
>   /*
>    * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
>    */
> @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
>   	if (test_rcu_qsbr_thread_offline() < 0)
>   		goto test_fail;
>   
> +	if (test_rcu_qsbr_dq_create() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_reclaim() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_delete() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_enqueue() < 0)
> +		goto test_fail;
> +
>   	printf("\nFunctional tests\n");
>   
>   	if (test_rcu_qsbr_sw_sv_3qs() < 0)
> @@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
>   	if (test_rcu_qsbr_mw_mv_mqs() < 0)
>   		goto test_fail;
>   
> +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
> +		goto test_fail;
> +
>   	free_rcu();
>   
>   	printf("\n");
> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> index 62920ba02..e280b29c1 100644
> --- a/lib/librte_rcu/meson.build
> +++ b/lib/librte_rcu/meson.build
> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
>   if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
>   	ext_deps += cc.find_library('atomic')
>   endif
> +
> +deps += ['ring']
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
> index ce7f93dd3..76814f50b 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> @@ -21,6 +21,7 @@
>   #include <rte_errno.h>
>   
>   #include "rte_rcu_qsbr.h"
> +#include "rte_rcu_qsbr_pvt.h"
>   
>   /* Get the memory size of QSBR variable */
>   size_t
> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
>   	return 0;
>   }
>   
> +/* Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + */
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
> +{
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint32_t qs_fifo_size;
> +
> +	if (params == NULL || params->f == NULL ||
> +		params->v == NULL || params->name == NULL ||
> +		params->size == 0 || params->esize == 0 ||
> +		(params->esize % 8 != 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +
> +	dq = rte_zmalloc(NULL,
> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> +		RTE_CACHE_LINE_SIZE);
> +	if (dq == NULL) {
> +		rte_errno = ENOMEM;
> +
> +		return NULL;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * max_size.
> +	 */
> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> +					* params->size) + 1);
> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);
> +	if (dq->r == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): defer queue create failed\n", __func__);
> +		rte_free(dq);
> +		return NULL;
> +	}
> +
> +	dq->v = params->v;
> +	dq->size = params->size;
> +	dq->esize = params->esize;
> +	dq->f = params->f;
> +	dq->p = params->p;
> +
> +	return dq;
> +}
> +
> +/* Enqueue one resource to the defer queue to free after the grace
> + * period is over.
> + */
> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> +{
> +	uint64_t token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +	uint32_t cur_size, free_size;
> +
> +	if (dq == NULL || e == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Start the grace period */
> +	token = rte_rcu_qsbr_start(dq->v);
> +
> +	/* Reclaim resources if the queue is 1/8th full. This helps
> +	 * the queue from growing too large and allows time for reader
> +	 * threads to report their quiescent state.
> +	 */
> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Triggering reclamation\n", __func__);
> +		rte_rcu_qsbr_dq_reclaim(dq);
> +	}
> +
> +	/* Check if there is space for atleast for 1 resource */
> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> +	if (!free_size) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Defer queue is full\n", __func__);
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	/* Enqueue the resource */
> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> +
> +	/* The resource to enqueue needs to be a multiple of 64b
> +	 * due to the limitation of the rte_ring implementation.
> +	 */
> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> +
> +	return 0;
> +}
> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
> +{
> +	uint32_t max_cnt;
> +	uint32_t cnt;
> +	void *token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Anything to reclaim? */
> +	if (rte_ring_count(dq->r) == 0)
> +		return 0;
> +
> +	/* Reclaim at the max 1/16th the total number of entries. */
> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> +	cnt = 0;
> +
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> +			== 1)) {
> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> +		/* The resource to dequeue needs to be a multiple of 64b
> +		 * due to the limitation of the rte_ring implementation.
> +		 */
> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> +			i++, tmp++)
> +			(void)rte_ring_sc_dequeue(dq->r,
> +					(void *)(uintptr_t)tmp);
> +		dq->f(dq->p, dq->e);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (cnt == 0) {
> +		/* No resources were reclaimed */
> +		rte_errno = EAGAIN;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> +		/* Error number is already set by the reclaim API */
> +		return 1;
Here could be a potential problem. rte_rcu_qsbr_dq_reclai() reclaims 
only max_cnt entries that is 1/16 of possible enqueued entries, so the 
rest won't be reclaimed.
> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>   int rte_rcu_log_type;
>   
>   RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index c80f15c00..185d4b50a 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>   #include <rte_lcore.h>
>   #include <rte_debug.h>
>   #include <rte_atomic.h>
> +#include <rte_ring.h>
I think it's better to move this include into rte_rcu_qsbr.c
>   
>   extern int rte_rcu_log_type;
>   
> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>   	 */
>   } __rte_cache_aligned;
>   
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
I don't see the usage of this macro anywhere in the rcu library (I see 
you are using it in LPM).

char rcu_dq_name[RTE_RING_NAMESIZE];
is using instead in the tests.
+ See my comments for  [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
> +
> +/**
> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> + */
> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> +
> +/**
> + *  Reclaim at the max 1/16th the total number of resources.
> + */
> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
Those two defines could be moved into .c file.
> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 8B as the rte_ring APIs
> +	 *   support 8B element sizes only.
> +	 */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;
> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>   /**
>    * @warning
>    * @b EXPERIMENTAL: this API may change without prior notice
> @@ -648,6 +710,113 @@ __rte_experimental
>   int
>   rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Reclaim resources from the defer queue.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to reclaim an entry from.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - None of the resources have completed at least 1 grace period,
> + *		try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..2122bc36a
> --- /dev/null
> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
Why this struct definition is separated into private .h? Maybe just 
define it in rte_rcu_qsbr.c instead?
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data stored on the defer queue */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +	char e[0];
> +	/**< Temporary storage to copy the defer queue element. */
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>   	rte_rcu_qsbr_synchronize;
>   	rte_rcu_qsbr_thread_register;
>   	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
>   
>   	local: *;
>   };
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..0e1be8407 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>   libraries = [
>   	'kvargs', # eal depends on kvargs
>   	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>   	'cmdline',
>   	'metrics', # bitrate/latency stats depends on this
>   	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>   	'gro', 'gso', 'ip_frag', 'jobstats',
>   	'kni', 'latencystats', 'lpm', 'member',
>   	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>   	# ipsec lib depends on net, crypto and security
>   	'ipsec',
>   	# add pkt framework libs which use other libs from above

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-03 19:49         ` Honnappa Nagarahalli
@ 2019-10-07  9:01           ` Ananyev, Konstantin
  2019-10-09  4:25             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-07  9:01 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd


> 
> > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > >
> > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > >
> > > The peek API allows fetching the next available object in the ring
> > > without dequeuing it. This helps in scenarios where dequeuing of
> > > objects depend on their value.
> > >
> > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > ---
> > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > >  1 file changed, 30 insertions(+)
> > >
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 2a9f768a1..d3d0d5e18 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> > **obj_table,
> > >  				r->cons.single, available);
> > >  }
> > >
> > > +/**
> > > + * Peek one object from a ring.
> > > + *
> > > + * The peek API allows fetching the next available object in the ring
> > > + * without dequeuing it. This API is not multi-thread safe with
> > > +respect
> > > + * to other consumer threads.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_p
> > > + *   A pointer to a void * pointer (object) that will be filled.
> > > + * @return
> > > + *   - 0: Success, object available
> > > + *   - -ENOENT: Not enough entries in the ring.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline int
> > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> >
> > As it is not MT safe, then I think we need _sc_ in the name, to follow other
> > rte_ring functions naming conventions
> > (rte_ring_sc_peek() or so).
> Agree
> 
> >
> > As a better alternative what do you think about introducing a serialized
> > versions of DPDK rte_ring dequeue functions?
> > Something like that:
> >
> > /* same as original ring dequeue, but:
> >   * 1) move cons.head only if cons.head == const.tail
> >   * 2) don't update cons.tail
> >   */
> > unsigned int
> > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned
> > int n,
> >                 unsigned int *available);
> >
> > /* sets both cons.head and cons.tail to cons.head + num */ void
> > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> >
> > /* resets cons.head to const.tail value */ void
> > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> >
> > Then your dq_reclaim cycle function will look like that:
> >
> > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n; uintptr_t
> > elt[nb_elt]; ...
> >
> > do {
> >
> >   /* read next elem from the queue */
> >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> >   if (n == 0)
> >       break;
> >
> >  /* wrong period, keep elem in the queue */  if (rte_rcu_qsbr_check(dr->v,
> > elt[0]) != 1) {
> >      rte_ring_serial_dequeue_abort(dq->r);
> >      break;
> >   }
> >
> >   /* can reclaim, remove elem from the queue */
> >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> >
> >    /*call reclaim function */
> >   dr->f(dr->p, elt);
> >
> > } while (avl >= nb_elt);
> >
> > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > As long as actual reclamation callback itself is MT safe of course.
> 
> I think it is a great idea. The other writers would still be polling for the current writer to update the tail or update the head. This makes it a
> blocking solution.

Yep, it is a blocking one.

> We can make the other threads not poll i.e. they will quit reclaiming if they see that other writers are dequeuing from the queue. 

Actually didn't think about that possibility, but yes should be possible to have _try_ semantics too. 

>The other  way is to use per thread queues.
> 
> The other requirement I see is to support unbounded-size data structures where in the data structures do not have a pre-determined
> number of entries. Also, currently the defer queue size is equal to the total number of entries in a given data structure. There are plans to
> support dynamically resizable defer queue. This means, memory allocation which will affect the lock-free-ness of the solution.
> 
> So, IMO:
> 1) The API should provide the capability to support different algorithms - may be through some flags?
> 2) The requirements for the ring are pretty unique to the problem we have here (for ex: move the cons-head only if cons-tail is also the
> same, skip polling). So, we should probably implement a ring with-in the RCU library?

Personally, I think such serialization ring API would be useful for other cases too.
There are few cases when user need to read contents of the queue without removing elements from it.
Let say we do use similar approach inside TLDK to implement TCP transmit queue.
If such API would exist in DPDK we can just use it straightway, without maintaining a separate one.

> 
> From the timeline perspective, adding all these capabilities would be difficult to get done with in 19.11 timeline. What I have here satisfies
> my current needs. I suggest that we make provisions in APIs now to support all these features, but do the implementation in the coming
> releases. Does this sound ok for you?

Not sure I understand your suggestion here...
Could you explain it a bit more - how new API will look like and what would be left for the future. 

> 
> >
> > > +{
> > > +	uint32_t prod_tail = r->prod.tail;
> > > +	uint32_t cons_head = r->cons.head;
> > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > +	unsigned int n = 1;
> > > +	if (count) {
> > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > +		return 0;
> > > +	}
> > > +	return -ENOENT;
> > > +}
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
  2019-10-04 16:05       ` Medvedkin, Vladimir
@ 2019-10-07  9:21       ` Ananyev, Konstantin
  2019-10-13  4:36         ` Honnappa Nagarahalli
  1 sibling, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-07  9:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce, Medvedkin, Vladimir,
	olivier.matz
  Cc: dev, stephen, paulmck, Gavin.Hu, Dharmik.Thakkar, Ruifeng.Wang,
	nd, Ruifeng Wang

Hi guys,

> 
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_lpm/Makefile            |   3 +-
>  lib/librte_lpm/meson.build         |   2 +
>  lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
>  lib/librte_lpm/rte_lpm.h           |  21 ++++++
>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>  5 files changed, 122 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index a7946a1c5..ca9e16312 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_lpm.a
> 
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> 
>  EXPORT_MAP := rte_lpm_version.map
> 
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index a5176d8ae..19a35107f 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -2,9 +2,11 @@
>  # Copyright(c) 2017 Intel Corporation
> 
>  version = 2
> +allow_experimental_apis = true
>  sources = files('rte_lpm.c', 'rte_lpm6.c')
>  headers = files('rte_lpm.h', 'rte_lpm6.h')
>  # since header files have different names, we can install all vector headers
>  # without worrying about which architecture we actually need
>  headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>  deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 3a929a1b1..ca58d4b35 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #include <string.h>
> @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> 
>  	rte_mcfg_tailq_write_unlock();
> 
> +	if (lpm->dq)
> +		rte_rcu_qsbr_dq_delete(lpm->dq);
>  	rte_free(lpm->tbl8);
>  	rte_free(lpm->rules_tbl);
>  	rte_free(lpm);
> @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
>  MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
>  		rte_lpm_free_v1604);
> 
> +struct __rte_lpm_rcu_dq_entry {
> +	uint32_t tbl8_group_index;
> +	uint32_t pad;
> +};
> +
> +static void
> +__lpm_rcu_qsbr_free_resource(void *p, void *data)
> +{
> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	struct __rte_lpm_rcu_dq_entry *e =
> +			(struct __rte_lpm_rcu_dq_entry *)data;
> +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> +
> +	/* Set tbl8 group invalid */
> +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> +		__ATOMIC_RELAXED);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
> +{
> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +
> +	if ((lpm == NULL) || (v == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->dq) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	/* Init QSBR defer queue. */
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm->name);
> +	params.name = rcu_dq_name;
> +	params.size = lpm->number_tbl8s;
> +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> +	params.f = __lpm_rcu_qsbr_free_resource;
> +	params.p = lpm->tbl8;
> +	params.v = v;
> +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> +	if (lpm->dq == NULL) {
> +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> +		return 1;
> +	}

Few thoughts about that function:
It names rcu_qsbr_add() but in fact it allocates defer queue for give rcu var.
So first thought - is it always necessary?
For some use-cases I suppose user might be ok to wait for quiescent state change
inside tbl8_free()?     
Another thing you do allocate defer queue, but it is internal, so user can't call
reclaim() manually, which looks strange.
Why not to return defer_queue pointer to the user, so he can call reclaim() himself
at appropriate time?
Third thing - you always allocate defer queue with size equal to number of tbl8.
Though I understand it could be up to 16M tbl8 groups inside the LPM.
Do we really need defer queue that long?
Especially  considering that current rcu_defer_queue will start reclamation when 1/8
of defer_quueue becomes full and wouldn't reclaim more then 1/16 of it.
Probably better to let user to decide himself how long defer_queue he needs for that LPM?

Konstantin


> +
> +	return 0;
> +}
> +
>  /*
>   * Adds a rule to the rule table.
>   *
> @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
>  }
> 
>  static int32_t
> -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +__tbl8_alloc_v1604(struct rte_lpm *lpm)
>  {
>  	uint32_t group_idx; /* tbl8 group index. */
>  	struct rte_lpm_tbl_entry *tbl8_entry;
> 
>  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>  		/* If a free tbl8 group is found clean it and set as VALID. */
>  		if (!tbl8_entry->valid_group) {
>  			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -712,6 +769,21 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>  	return -ENOSPC;
>  }
> 
> +static int32_t
> +tbl8_alloc_v1604(struct rte_lpm *lpm)
> +{
> +	int32_t group_idx; /* tbl8 group index. */
> +
> +	group_idx = __tbl8_alloc_v1604(lpm);
> +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> +		/* If there are no tbl8 groups try to reclaim some. */
> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> +			group_idx = __tbl8_alloc_v1604(lpm);
> +	}
> +
> +	return group_idx;
> +}
> +
>  static void
>  tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>  {
> @@ -728,13 +800,21 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>  }
> 
>  static void
> -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>  {
> -	/* Set tbl8 group invalid*/
>  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	struct __rte_lpm_rcu_dq_entry e;
> 
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (lpm->dq != NULL) {
> +		e.tbl8_group_index = tbl8_group_start;
> +		e.pad = 0;
> +		/* Push into QSBR defer queue. */
> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> +	} else {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	}
>  }
> 
>  static __rte_noinline int32_t
> @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
> 
>  	if (!lpm->tbl24[tbl24_index].valid) {
>  		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> 
>  		/* Check tbl8 allocation was successful. */
>  		if (tbl8_group_index < 0) {
> @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>  	} /* If valid entry but not extended calculate the index into Table8. */
>  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>  		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> 
>  		if (tbl8_group_index < 0) {
>  			return tbl8_group_index;
> @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>  		 */
>  		lpm->tbl24[tbl24_index].valid = 0;
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>  	} else if (tbl8_recycle_index > -1) {
>  		/* Update tbl24 entry. */
>  		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -1834,7 +1914,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>  		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>  				__ATOMIC_RELAXED);
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>  	}
>  #undef group_idx
>  	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index 906ec4483..49c12a68d 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #ifndef _RTE_LPM_H_
> @@ -21,6 +22,7 @@
>  #include <rte_common.h>
>  #include <rte_vect.h>
>  #include <rte_compat.h>
> +#include <rte_rcu_qsbr.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -186,6 +188,7 @@ struct rte_lpm {
>  			__rte_cache_aligned; /**< LPM tbl24 table. */
>  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
>  };
> 
>  /**
> @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
>  void
>  rte_lpm_free_v1604(struct rte_lpm *lpm);
> 
> +/**
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param v
> + *   RCU QSBR variable
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
> +
>  /**
>   * Add a rule to the LPM table.
>   *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 90beac853..b353aabd2 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -44,3 +44,9 @@ DPDK_17.05 {
>  	rte_lpm6_lookup_bulk_func;
> 
>  } DPDK_16.04;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-04  6:07             ` Honnappa Nagarahalli
@ 2019-10-07 10:46               ` Ananyev, Konstantin
  2019-10-13  4:35                 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-07 10:46 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd, nd



> > > > > Add resource reclamation APIs to make it simple for applications
> > > > > and libraries to integrate rte_rcu library.
> > > > >
> > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > ---
> > > > >  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> > > > >  lib/librte_rcu/meson.build         |   2 +
> > > > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > > > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > > > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > > > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > > > >  lib/meson.build                    |   6 +-
> > > > >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > > > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > >
> > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > > > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > @@ -21,6 +21,7 @@
> > > > >  #include <rte_errno.h>
> > > > >
> > > > >  #include "rte_rcu_qsbr.h"
> > > > > +#include "rte_rcu_qsbr_pvt.h"
> > > > >
> > > > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6
> > > > > +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > +/* Create a queue used to store the data structure elements that
> > > > > +can
> > > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > > + */
> > > > > +struct rte_rcu_qsbr_dq *
> > > > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > > > +*params) {
> > > > > +	struct rte_rcu_qsbr_dq *dq;
> > > > > +	uint32_t qs_fifo_size;
> > > > > +
> > > > > +	if (params == NULL || params->f == NULL ||
> > > > > +		params->v == NULL || params->name == NULL ||
> > > > > +		params->size == 0 || params->esize == 0 ||
> > > > > +		(params->esize % 8 != 0)) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > +		rte_errno = EINVAL;
> > > > > +
> > > > > +		return NULL;
> > > > > +	}
> > > > > +
> > > > > +	dq = rte_zmalloc(NULL,
> > > > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > > > +		RTE_CACHE_LINE_SIZE);
> > > > > +	if (dq == NULL) {
> > > > > +		rte_errno = ENOMEM;
> > > > > +
> > > > > +		return NULL;
> > > > > +	}
> > > > > +
> > > > > +	/* round up qs_fifo_size to next power of two that is not less than
> > > > > +	 * max_size.
> > > > > +	 */
> > > > > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > > > > +					* params->size) + 1);
> > > > > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > > > > +					SOCKET_ID_ANY, 0);
> > > >
> > > > If it is going to be not MT safe, then why not to create the ring
> > > > with (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> > > Agree.
> > >
> > > > Though I think it could be changed to allow MT safe multiple
> > > > enqeue/single dequeue, see below.
> > > The MT safe issue is due to reclaim code. The reclaim code has the following
> > sequence:
> > >
> > > rte_ring_peek
> > > rte_rcu_qsbr_check
> > > rte_ring_dequeue
> > >
> > > This entire sequence needs to be atomic as the entry cannot be dequeued
> > without knowing that the grace period for that entry is over.
> >
> > I understand that, though I believe at least it should be possible to support
> > multiple-enqueue/single dequeuer and reclaim mode.
> > With serialized dequeue() even multiple dequeue should be possible.
> Agreed. Please see the response on the other thread.
> 
> >
> > > Note that due to optimizations in rte_rcu_qsbr_check API, this
> > > sequence should not be large in most cases. I do not have ideas on how to
> > make this sequence lock-free.
> > >
> > > If the writer is on the control plane, most use cases will use mutex
> > > locks for synchronization if they are multi-threaded. That lock should be
> > enough to provide the thread safety for these APIs.
> >
> > In that is case, why do we need ring at all?
> > For sure people can create their own queue quite easily with mutex and TAILQ.
> > If performance is not an issue, they can even add pthread_cond to it, and have
> > an ability for the consumer to sleep/wakeup on empty/full queue.
> >
> > >
> > > If the writer is multi-threaded and lock-free, then one should use per thread
> > defer queue.
> >
> > If that's the only working model, then the question is why do we need that API
> > at all?
> > Just simple array with counter or linked-list should do for majority of cases.
> Please see the other thread.
> 
> >
> > >
> > > >
> > > > > +	if (dq->r == NULL) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): defer queue create failed\n", __func__);
> > > > > +		rte_free(dq);
> > > > > +		return NULL;
> > > > > +	}
> > > > > +
> > > > > +	dq->v = params->v;
> > > > > +	dq->size = params->size;
> > > > > +	dq->esize = params->esize;
> > > > > +	dq->f = params->f;
> > > > > +	dq->p = params->p;
> > > > > +
> > > > > +	return dq;
> > > > > +}
> > > > > +
> > > > > +/* Enqueue one resource to the defer queue to free after the
> > > > > +grace
> > > > > + * period is over.
> > > > > + */
> > > > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > > > > +	uint64_t token;
> > > > > +	uint64_t *tmp;
> > > > > +	uint32_t i;
> > > > > +	uint32_t cur_size, free_size;
> > > > > +
> > > > > +	if (dq == NULL || e == NULL) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > +		rte_errno = EINVAL;
> > > > > +
> > > > > +		return 1;
> > > >
> > > > Why just not to return -EINVAL straightway?
> > > > I think there is no much point to set rte_errno in that function at
> > > > all, just return value should do.
> > > I am trying to keep these consistent with the existing APIs. They return 0 or 1
> > and set the rte_errno.
> >
> > A lot of public DPDK API functions do use return value to return status code (0,
> > or some positive numbers of success, negative errno values on failure), I am
> > not inventing anything new here.
> Agree, you are not proposing a new thing here. May be I was not clear. I really do not have an opinion on how this should be done. But, I do
> have an opinion on consistency. These new APIs follow what has been done in the existing RCU APIs. I think we have 2 options here.
> 1) Either we change existing RCU APIs to get rid of rte_errno (is it an ABI change?) or
> 2) The new APIs follow what has been done in the existing RCU APIs.
> I want to make sure we are consistent at least within RCU APIs.

But as I can see right now rcu API sets rte_errno only for control-path functions
(get_memsize, init, register, unregister, dump).
All fast-path (inline) function don't set/use it.
So from perspective that is consistent behavior, no?

> 
> >
> > >
> > > >
> > > > > +	}
> > > > > +
> > > > > +	/* Start the grace period */
> > > > > +	token = rte_rcu_qsbr_start(dq->v);
> > > > > +
> > > > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > > > +	 * the queue from growing too large and allows time for reader
> > > > > +	 * threads to report their quiescent state.
> > > > > +	 */
> > > > > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> > > >
> > > > Probably would be a bit easier if you just store in dq->esize (elt
> > > > size + token
> > > > size) / 8.
> > > Agree
> > >
> > > >
> > > > > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> > > >
> > > > Why to make this threshold value hard-coded?
> > > > Why either not to put it into create parameter, or just return a
> > > > special return value, to indicate that threshold is reached?
> > > My thinking was to keep the programming interface easy to use. The
> > > more the parameters, the more painful it is for the user. IMO, the
> > > constants chosen should be good enough for most cases. More advanced
> > users could modify the constants. However, we could make these as part of the
> > parameters, but make them optional for the user. For ex: if they set them to 0,
> > default values can be used.
> > >
> > > > Or even return number of filled/free entroes on success, so caller
> > > > can decide to reclaim or not based on that information on his own?
> > > This means more code on the user side.
> >
> > I personally think it it really wouldn't be that big problem to the user to pass
> > extra parameter to the function.
> I will convert the 2 constants into optional parameters (user can set them to 0 to make the algorithm use default values)
> 
> > Again what if user doesn't want to reclaim() in enqueue() thread at all?
> 'enqueue' has to do reclamation if the defer queue is full. I do not think this is trivial.
> 
> In the current design, reclamation in enqueue is also done on regular basis (automatic triggering of reclamation when the queue reaches
> certain limit) to keep the queue from growing too large. This is required when we implement a dynamically adjusting defer queue. The
> current algorithm keeps the cost of reclamation spread across multiple calls and puts an upper bound on cycles for delete API by reclaiming
> a fixed number of entries.
> 
> This algorithm is proven to work in the LPM integration performance tests at a very low performance over head (~1%). So, I do not know
> why a user would not want to use this. 

Yeh, I looked at LPM implementation and one thing I found strange -
defer_queue is hidden inside LPM struct and all reclamations are done internally.
Yes for sure it allows to defer and group actual reclaim(), which hopefully will lead to better performance.
But why not to allow user to call reclaim() for it directly too?
In that way user might avoid/(minimize) doing reclaim() in LPM write() at all.
And let say do it somewhere later in the same thread (when no other tasks to do),
or even leave it to some other house-keeping thread to do (sort of garbage collector).
Or such mode is not supported/planned?

> The 2 additional parameters should give the user more flexibility.

Ok, let's keep it as config params.
After another though - I think you right, it should be good enough.

> 
> However, if the user wants his own algorithm, he can create one with the base APIs provided.
> 
> >
> > > I think adding these to parameters seems like a better option.
> > >
> > > >
> > > > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > > +			"%s(): Triggering reclamation\n", __func__);
> > > > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > > > +	}
> > > > > +
> > > > > +	/* Check if there is space for atleast for 1 resource */
> > > > > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > > > +	if (!free_size) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Defer queue is full\n", __func__);
> > > > > +		rte_errno = ENOSPC;
> > > > > +		return 1;
> > > > > +	}
> > > > > +
> > > > > +	/* Enqueue the resource */
> > > > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > > > +
> > > > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > > > +	 * due to the limitation of the rte_ring implementation.
> > > > > +	 */
> > > > > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> > > >
> > > >
> > > > That whole construction above looks a bit clumsy and error prone...
> > > > I suppose just:
> > > >
> > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> > > > n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
> > > Yes, bulk enqueue can be used. But note that once the flexible element size
> > ring patch is done, this code will use that.
> >
> > Well, when it will be in the mainline, and it would provide a better way, for sure
> > this code can be updated to use new API (if it is provide some improvements).
> > But as I udenrstand, right now it is not there, while bulk enqueue/dequeue are.
> Apologies, I was not clear. I agree we can go with bulk APIs for now.
> 
> >
> > >
> > > >   return -ENOSPC;
> > > > return free;
> > > >
> > > > That way I think you can have MT-safe version of that function.
> > > Please see the description of MT safe issue above.
> > >
> > > >
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +/* Reclaim resources from the defer queue. */ int
> > > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > > > +	uint32_t max_cnt;
> > > > > +	uint32_t cnt;
> > > > > +	void *token;
> > > > > +	uint64_t *tmp;
> > > > > +	uint32_t i;
> > > > > +
> > > > > +	if (dq == NULL) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > +		rte_errno = EINVAL;
> > > > > +
> > > > > +		return 1;
> > > >
> > > > Same story as above - I think rte_errno is excessive in this function.
> > > > Just return value should be enough.
> > > >
> > > >
> > > > > +	}
> > > > > +
> > > > > +	/* Anything to reclaim? */
> > > > > +	if (rte_ring_count(dq->r) == 0)
> > > > > +		return 0;
> > > >
> > > > Not sure you need that, see below.
> > > >
> > > > > +
> > > > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > > >
> > > > Again why not to make max_cnt a configurable at create() parameter?
> > > I think making this as an optional parameter for creating defer queue is a
> > better option.
> > >
> > > > Or even a parameter for that function?
> > > >
> > > > > +	cnt = 0;
> > > > > +
> > > > > +	/* Check reader threads quiescent state and reclaim resources */
> > > > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > > > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > > > > +			== 1)) {
> > > >
> > > >
> > > > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > > > +		 * due to the limitation of the rte_ring implementation.
> > > > > +		 */
> > > > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > > > +			i++, tmp++)
> > > > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > > > +					(void *)(uintptr_t)tmp);
> > > >
> > > > Again, no need for such constructs with multiple dequeuer I believe.
> > > > Just:
> > > >
> > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> > > > elt[nb_elt]; ...
> > > > n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0)
> > > > {dq->f(dq->p, elt);}
> > > Agree on bulk API use.
> > >
> > > >
> > > > Seems enough.
> > > > Again in that case you can have enqueue/reclaim running in different
> > > > threads simultaneously, plus you don't need dq->e at all.
> > > Will check on dq->e
> > >
> > > >
> > > > > +		dq->f(dq->p, dq->e);
> > > > > +
> > > > > +		cnt++;
> > > > > +	}
> > > > > +
> > > > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > > > +
> > > > > +	if (cnt == 0) {
> > > > > +		/* No resources were reclaimed */
> > > > > +		rte_errno = EAGAIN;
> > > > > +		return 1;
> > > > > +	}
> > > > > +
> > > > > +	return 0;
> > > >
> > > > I'd suggest to return cnt on success.
> > > I am trying to keep the APIs simple. I do not see much use for 'cnt'
> > > as return value to the user. It exposes more details which I think are internal
> > to the library.
> >
> > Not sure what is the hassle to return number of completed reclamaitions?
> > If user doesn't need that information, he simply wouldn't use it.
> > But might be it would be usefull - he can decide should he try another attempt
> > of reclaim() immediately or is it ok to do something else.
> There is no hassle to return that information.
> 
> As per the current design, user calls 'reclaim' when it is out of resources while adding an entry to the data structure. At that point the user
> wants to know if at least 1 resource was reclaimed because the user has to allocate 1 resource. He does not have a use for the number of
> resources reclaimed.

Ok, but why user can't decide to do reclaim in advance, let say when he foresee that he would need a lot of allocations in nearest future?
Or when there is some idle time? Or some combination of these things?
At he would like to free some extra resources in that case to minimize number of reclaims in future peak interval?

> 
> If this API returns 0, then the user can decide to repeat the call or return failure. But that decision depends on the length of the grace period
> which is under user's control.
> 
> >
> > >
> > > >
> > > > > +}
> > > > > +
> > > > > +/* Delete a defer queue. */
> > > > > +int
> > > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > > > +	if (dq == NULL) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > +		rte_errno = EINVAL;
> > > > > +
> > > > > +		return 1;
> > > > > +	}
> > > > > +
> > > > > +	/* Reclaim all the resources */
> > > > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > > > +		/* Error number is already set by the reclaim API */
> > > > > +		return 1;
> > > >
> > > > How do you know that you have reclaimed everything?
> > > Good point, will come back with a different solution.
> > >
> > > >
> > > > > +
> > > > > +	rte_ring_free(dq->r);
> > > > > +	rte_free(dq);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >  int rte_rcu_log_type;
> > > > >
> > > > >  RTE_INIT(rte_rcu_register)
> > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > @@ -34,6 +34,7 @@ extern "C" {
> > > > >  #include <rte_lcore.h>
> > > > >  #include <rte_debug.h>
> > > > >  #include <rte_atomic.h>
> > > > > +#include <rte_ring.h>
> > > > >
> > > > >  extern int rte_rcu_log_type;
> > > > >
> > > > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > > > >  	 */
> > > > >  } __rte_cache_aligned;
> > > > >
> > > > > +/**
> > > > > + * Call back function called to free the resources.
> > > > > + *
> > > > > + * @param p
> > > > > + *   Pointer provided while creating the defer queue
> > > > > + * @param e
> > > > > + *   Pointer to the resource data stored on the defer queue
> > > > > + *
> > > > > + * @return
> > > > > + *   None
> > > > > + */
> > > > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > > >
> > > > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > > > Though I am not sure you need a new typedef at all - just a function
> > > > pointer inside the struct seems enough.
> > > Other libraries (for ex: rte_hash) use this approach. I think it is better to keep
> > it out of the structure to allow for better commenting.
> >
> > I am saying majority of DPDK code use _t suffix for typedef:
> > typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
> Apologies, got it, will change.
> 
> >
> > >
> > > >
> > > > > +
> > > > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > > > +
> > > > > +/**
> > > > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > > > + */
> > > > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > > > +
> > > > > +/**
> > > > > + *  Reclaim at the max 1/16th the total number of resources.
> > > > > + */
> > > > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > > >
> > > >
> > > > As I said above, I don't think these thresholds need to be hardcoded.
> > > > In any case, there seems not much point to put them in the public header
> > file.
> > > >
> > > > > +
> > > > > +/**
> > > > > + * Parameters used when creating the defer queue.
> > > > > + */
> > > > > +struct rte_rcu_qsbr_dq_parameters {
> > > > > +	const char *name;
> > > > > +	/**< Name of the queue. */
> > > > > +	uint32_t size;
> > > > > +	/**< Number of entries in queue. Typically, this will be
> > > > > +	 *   the same as the maximum number of entries supported in the
> > > > > +	 *   lock free data structure.
> > > > > +	 *   Data structures with unbounded number of entries is not
> > > > > +	 *   supported currently.
> > > > > +	 */
> > > > > +	uint32_t esize;
> > > > > +	/**< Size (in bytes) of each element in the defer queue.
> > > > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > > > +	 *   support 8B element sizes only.
> > > > > +	 */
> > > > > +	rte_rcu_qsbr_free_resource f;
> > > > > +	/**< Function to call to free the resource. */
> > > > > +	void *p;
> > > >
> > > > Style nit again - I like short names myself, but that seems a bit
> > > > extreme... :) Might be at least:
> > > > void (*reclaim)(void *, void *);
> > > May be 'free_fn'?
> > >
> > > > void * reclaim_data;
> > > > ?
> > > This is the pointer to the data structure to free the resource into. For ex: In
> > LPM data structure, it will be pointer to LPM. 'reclaim_data'
> > > does not convey the meaning correctly.
> >
> > Ok, please free to comeup with your own names.
> > I just wanted to say that 'f' and 'p' are a bit an extreme for public API.
> ok, this is the hardest thing to do 😊
> 
> >
> > >
> > > >
> > > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > > +	 *   pointer to the data structure to which the resource to free
> > > > > +	 *   belongs. This can be NULL.
> > > > > +	 */
> > > > > +	struct rte_rcu_qsbr *v;
> > > >
> > > > Does it need to be inside that struct?
> > > > Might be better:
> > > > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > > > rte_rcu_qsbr_dq_parameters *params);
> > > The API takes a parameter structure as input anyway, why to add
> > > another argument to the function? QSBR variable is also another parameter.
> > >
> > > >
> > > > Another alternative: make both reclaim() and enqueue() to take v as
> > > > a parameter.
> > > But both of them need access to some of the parameters provided in
> > > rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to the
> > functions.
> >
> > Pure stylish thing.
> > From my perspective it just provides better visibility what is going in the code:
> > For QSBR var 'v' create a new deferred queue.
> > But no strong opinion here.
> >
> > >
> > > >
> > > > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > > > +
> > > > > +/* RTE defer queue structure.
> > > > > + * This structure holds the defer queue. The defer queue is used
> > > > > +to
> > > > > + * hold the deleted entries from the data structure that are not
> > > > > + * yet freed.
> > > > > + */
> > > > > +struct rte_rcu_qsbr_dq;
> > > > > +
> > > > >  /**
> > > > >   * @warning
> > > > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE
> > > > > *f, struct rte_rcu_qsbr *v);
> > > > >
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > + *
> > > > > + * Create a queue used to store the data structure elements that
> > > > > +can
> > > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > > + *
> > > > > + * @param params
> > > > > + *   Parameters to create a defer queue.
> > > > > + * @return
> > > > > + *   On success - Valid pointer to defer queue
> > > > > + *   On error - NULL
> > > > > + *   Possible rte_errno codes are:
> > > > > + *   - EINVAL - NULL parameters are passed
> > > > > + *   - ENOMEM - Not enough memory
> > > > > + */
> > > > > +__rte_experimental
> > > > > +struct rte_rcu_qsbr_dq *
> > > > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > > > +*params);
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > + *
> > > > > + * Enqueue one resource to the defer queue and start the grace period.
> > > > > + * The resource will be freed later after at least one grace
> > > > > +period
> > > > > + * is over.
> > > > > + *
> > > > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > > > + * It will also reclaim resources at regular intervals to avoid
> > > > > + * the defer queue from growing too big.
> > > > > + *
> > > > > + * This API is not multi-thread safe. It is expected that the
> > > > > +caller
> > > > > + * provides multi-thread safety by locking a mutex or some other means.
> > > > > + *
> > > > > + * A lock free multi-thread writer algorithm could achieve
> > > > > +multi-thread
> > > > > + * safety by creating and using one defer queue per thread.
> > > > > + *
> > > > > + * @param dq
> > > > > + *   Defer queue to allocate an entry from.
> > > > > + * @param e
> > > > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > > > + *   the data to copy is equal to the element size provided when the
> > > > > + *   defer queue was created.
> > > > > + * @return
> > > > > + *   On success - 0
> > > > > + *   On error - 1 with rte_errno set to
> > > > > + *   - EINVAL - NULL parameters are passed
> > > > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > > > + *		if the defer queue size is equal (or larger) than the
> > > > > + *		number of elements in the data structure.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int
> > > > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > + *
> > > > > + * Reclaim resources from the defer queue.
> > > > > + *
> > > > > + * This API is not multi-thread safe. It is expected that the
> > > > > +caller
> > > > > + * provides multi-thread safety by locking a mutex or some other means.
> > > > > + *
> > > > > + * A lock free multi-thread writer algorithm could achieve
> > > > > +multi-thread
> > > > > + * safety by creating and using one defer queue per thread.
> > > > > + *
> > > > > + * @param dq
> > > > > + *   Defer queue to reclaim an entry from.
> > > > > + * @return
> > > > > + *   On successful reclamation of at least 1 resource - 0
> > > > > + *   On error - 1 with rte_errno set to
> > > > > + *   - EINVAL - NULL parameters are passed
> > > > > + *   - EAGAIN - None of the resources have completed at least 1 grace
> > > > period,
> > > > > + *		try again.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int
> > > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > + *
> > > > > + * Delete a defer queue.
> > > > > + *
> > > > > + * It tries to reclaim all the resources on the defer queue.
> > > > > + * If any of the resources have not completed the grace period
> > > > > + * the reclamation stops and returns immediately. The rest of
> > > > > + * the resources are not reclaimed and the defer queue is not
> > > > > + * freed.
> > > > > + *
> > > > > + * @param dq
> > > > > + *   Defer queue to delete.
> > > > > + * @return
> > > > > + *   On success - 0
> > > > > + *   On error - 1
> > > > > + *   Possible rte_errno codes are:
> > > > > + *   - EINVAL - NULL parameters are passed
> > > > > + *   - EAGAIN - Some of the resources have not completed at least 1
> > grace
> > > > > + *		period, try again.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int
> > > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > > > +
> > > > >  #ifdef __cplusplus
> > > > >  }
> > > > >  #endif
> > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > new file mode 100644
> > > > > index 000000000..2122bc36a
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > >
> > > > Again style suggestion: as it is not public header - don't use rte_
> > > > prefix for naming.
> > > > From my perspective - easier to relalize for reader what is public
> > > > header, what is not.
> > > Looks like the guidelines are not defined very well. I see one private
> > > file with rte_ prefix. I see Stephen not using rte_ prefix. I do not have any
> > preference. But, a consistent approach is required.
> >
> > That's just a suggestion.
> > For me (and I hope for others) it would be a bit easier.
> > When looking at the code for first time I had to look a t meson.build to check is
> > it a public header or not.
> > If the file doesn't have 'rte_' prefix, I assume that it is an internal one
> > straightway.
> > But , as you said, there is no exact guidelines here, so up to you to decide.
> I think it makes sense to remove 'rte_' prefix. I will also change the file name to have '_private' suffix.
> There are some inconsistencies in the existing code, will send a patch to correct them to follow this approach.
> 
> >
> > >
> > > >
> > > > > @@ -0,0 +1,46 @@
> > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > + * Copyright (c) 2019 Arm Limited  */
> > > > > +
> > > > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > > > +#define _RTE_RCU_QSBR_PVT_H_
> > > > > +
> > > > > +/**
> > > > > + * This file is private to the RCU library. It should not be
> > > > > +included
> > > > > + * by the user of this library.
> > > > > + */
> > > > > +
> > > > > +#ifdef __cplusplus
> > > > > +extern "C" {
> > > > > +#endif
> > > > > +
> > > > > +#include "rte_rcu_qsbr.h"
> > > > > +
> > > > > +/* RTE defer queue structure.
> > > > > + * This structure holds the defer queue. The defer queue is used
> > > > > +to
> > > > > + * hold the deleted entries from the data structure that are not
> > > > > + * yet freed.
> > > > > + */
> > > > > +struct rte_rcu_qsbr_dq {
> > > > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > > > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > > > +	uint32_t size;
> > > > > +	/**< Number of elements in the defer queue */
> > > > > +	uint32_t esize;
> > > > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > > > +	rte_rcu_qsbr_free_resource f;
> > > > > +	/**< Function to call to free the resource. */
> > > > > +	void *p;
> > > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > > +	 *   pointer to the data structure to which the resource to free
> > > > > +	 *   belongs.
> > > > > +	 */
> > > > > +	char e[0];
> > > > > +	/**< Temporary storage to copy the defer queue element. */
> > > >
> > > > Do you really need 'e' at all?
> > > > Can't it be just temporary stack variable?
> > > Ok, will check.
> > >
> > > >
> > > > > +};
> > > > > +
> > > > > +#ifdef __cplusplus
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > > > b/lib/librte_rcu/rte_rcu_version.map
> > > > > index f8b9ef2ab..dfac88a37 100644
> > > > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > > > >  	rte_rcu_qsbr_synchronize;
> > > > >  	rte_rcu_qsbr_thread_register;
> > > > >  	rte_rcu_qsbr_thread_unregister;
> > > > > +	rte_rcu_qsbr_dq_create;
> > > > > +	rte_rcu_qsbr_dq_enqueue;
> > > > > +	rte_rcu_qsbr_dq_reclaim;
> > > > > +	rte_rcu_qsbr_dq_delete;
> > > > >
> > > > >  	local: *;
> > > > >  };
> > > > > diff --git a/lib/meson.build b/lib/meson.build index
> > > > > e5ff83893..0e1be8407 100644
> > > > > --- a/lib/meson.build
> > > > > +++ b/lib/meson.build
> > > > > @@ -11,7 +11,9 @@
> > > > >  libraries = [
> > > > >  	'kvargs', # eal depends on kvargs
> > > > >  	'eal', # everything depends on eal
> > > > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > > +	'ring',
> > > > > +	'rcu', # rcu depends on ring
> > > > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > >  	'cmdline',
> > > > >  	'metrics', # bitrate/latency stats depends on this
> > > > >  	'hash',    # efd depends on this
> > > > > @@ -22,7 +24,7 @@ libraries = [
> > > > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > > > >  	'kni', 'latencystats', 'lpm', 'member',
> > > > >  	'power', 'pdump', 'rawdev',
> > > > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > >  	# ipsec lib depends on net, crypto and security
> > > > >  	'ipsec',
> > > > >  	# add pkt framework libs which use other libs from above
> > > > > --
> > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
                         ` (2 preceding siblings ...)
  2019-10-04 19:01       ` Medvedkin, Vladimir
@ 2019-10-07 13:11       ` Medvedkin, Vladimir
  2019-10-13  3:02         ` Honnappa Nagarahalli
  3 siblings, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-10-07 13:11 UTC (permalink / raw)
  To: Honnappa Nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, ruifeng.wang, dharmik.thakkar, dev, nd

Hi Honnappa,

On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
> Add resource reclamation APIs to make it simple for applications
> and libraries to integrate rte_rcu library.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
>   lib/librte_rcu/meson.build         |   2 +
>   lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
>   lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
>   lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
>   lib/librte_rcu/rte_rcu_version.map |   4 +
>   lib/meson.build                    |   6 +-
>   7 files changed, 700 insertions(+), 3 deletions(-)
>   create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
>
> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
> index d1b9e46a2..3a6815243 100644
> --- a/app/test/test_rcu_qsbr.c
> +++ b/app/test/test_rcu_qsbr.c
> @@ -1,8 +1,9 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright (c) 2018 Arm Limited
> + * Copyright (c) 2019 Arm Limited
>    */
>   
>   #include <stdio.h>
> +#include <string.h>
>   #include <rte_pause.h>
>   #include <rte_rcu_qsbr.h>
>   #include <rte_hash.h>
> @@ -33,6 +34,7 @@ static uint32_t *keys;
>   #define COUNTER_VALUE 4096
>   static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
>   static uint8_t writer_done;
> +static uint8_t cb_failed;
>   
>   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
>   struct rte_hash *h[RTE_MAX_LCORE];
> @@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
>   	return 0;
>   }
>   
> +static void
> +rte_rcu_qsbr_test_free_resource(void *p, void *e)
> +{
> +	if (p != NULL && e != NULL) {
> +		printf("%s: Test failed\n", __func__);
> +		cb_failed = 1;
> +	}
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_create: create a queue used to store the data structure
> + * elements that can be freed later. This queue is referred to as 'defer queue'.
> + */
> +static int
> +test_rcu_qsbr_dq_create(void)
> +{
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
> +
> +	/* Pass invalid parameters */
> +	dq = rte_rcu_qsbr_dq_create(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.size = 1;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.esize = 3;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	/* Pass all valid parameters */
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +	rte_rcu_qsbr_dq_delete(dq);
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> + * to be freed later after atleast one grace period is over.
> + */
> +static int
> +test_rcu_qsbr_dq_enqueue(void)
> +{
> +	int ret;
> +	uint64_t r;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
> +
> +	/* Create a queue with simple parameters */
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	params.size = 1;
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
> + */
> +static int
> +test_rcu_qsbr_dq_reclaim(void)
> +{
> +	int ret;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
> + */
> +static int
> +test_rcu_qsbr_dq_delete(void)
> +{
> +	int ret;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_delete(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid params");
> +
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	params.size = 1;
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> + * to be freed later after atleast one grace period is over.
> + */
> +static int
> +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize)
> +{
> +	int i, j, ret;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint64_t *e;
> +	uint64_t sc = 200;
> +	int max_entries;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
> +	printf("Size = %d, esize = %d\n", size, esize);
> +
> +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
> +	if (e == NULL)
> +		return 0;
> +	cb_failed = 0;
> +
> +	/* Initialize the RCU variable. No threads are registered */
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +
> +	/* Create a queue with simple parameters */
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	params.v = t[0];
> +	params.size = size;
> +	params.esize = esize;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +
> +	/* Given the size and esize, calculate the maximum number of entries
> +	 * that can be stored on the defer queue (look at the logic used
> +	 * in capacity calculation of rte_ring).
> +	 */
> +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
> +	max_entries = (max_entries - 1)/(esize/8 + 1);
> +
> +	/* Enqueue few counters starting with the value 'sc' */
> +	/* The queue size will be rounded up to 2. The enqueue API also
> +	 * reclaims if the queue size is above certain limit. Since, there
> +	 * are no threads registered, reclamation succedes. Hence, it should
> +	 * be possible to enqueue more than the provided queue size.
> +	 */
> +	for (i = 0; i < 10; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Register a thread on the RCU QSBR variable. Reclamation will not
> +	 * succeed. It should not be possible to enqueue more than the size
> +	 * number of resources.
> +	 */
> +	rte_rcu_qsbr_thread_register(t[0], 1);
> +	rte_rcu_qsbr_thread_online(t[0], 1);
> +
> +	for (i = 0; i < max_entries; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Enqueue fails as queue is full */
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
> +
> +	/* Delete should fail as there are elements in defer queue which
> +	 * cannot be reclaimed.
> +	 */
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid params");
> +
> +	/* Report quiescent state, enqueue should succeed */
> +	rte_rcu_qsbr_quiescent(t[0], 1);
> +	for (i = 0; i < max_entries; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Queue is full */
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
> +
> +	/* Report quiescent state, delete should succeed */
> +	rte_rcu_qsbr_quiescent(t[0], 1);
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
> +
> +	/* Validate that call back function did not return any error */
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
> +
> +	rte_free(e);
> +	return 0;
> +}
> +
>   /*
>    * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
>    */
> @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
>   	if (test_rcu_qsbr_thread_offline() < 0)
>   		goto test_fail;
>   
> +	if (test_rcu_qsbr_dq_create() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_reclaim() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_delete() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_enqueue() < 0)
> +		goto test_fail;
> +
>   	printf("\nFunctional tests\n");
>   
>   	if (test_rcu_qsbr_sw_sv_3qs() < 0)
> @@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
>   	if (test_rcu_qsbr_mw_mv_mqs() < 0)
>   		goto test_fail;
>   
> +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
> +		goto test_fail;
> +
>   	free_rcu();
>   
>   	printf("\n");
> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> index 62920ba02..e280b29c1 100644
> --- a/lib/librte_rcu/meson.build
> +++ b/lib/librte_rcu/meson.build
> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
>   if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
>   	ext_deps += cc.find_library('atomic')
>   endif
> +
> +deps += ['ring']
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
> index ce7f93dd3..76814f50b 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> @@ -21,6 +21,7 @@
>   #include <rte_errno.h>
>   
>   #include "rte_rcu_qsbr.h"
> +#include "rte_rcu_qsbr_pvt.h"
>   
>   /* Get the memory size of QSBR variable */
>   size_t
> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
>   	return 0;
>   }
>   
> +/* Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + */
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
> +{
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint32_t qs_fifo_size;
> +
> +	if (params == NULL || params->f == NULL ||
> +		params->v == NULL || params->name == NULL ||
> +		params->size == 0 || params->esize == 0 ||
> +		(params->esize % 8 != 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +
> +	dq = rte_zmalloc(NULL,
> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> +		RTE_CACHE_LINE_SIZE);
> +	if (dq == NULL) {
> +		rte_errno = ENOMEM;
> +
> +		return NULL;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * max_size.
> +	 */
> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> +					* params->size) + 1);
> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);
> +	if (dq->r == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): defer queue create failed\n", __func__);
> +		rte_free(dq);
> +		return NULL;
> +	}
> +
> +	dq->v = params->v;
> +	dq->size = params->size;
> +	dq->esize = params->esize;
> +	dq->f = params->f;
> +	dq->p = params->p;
> +
> +	return dq;
> +}
> +
> +/* Enqueue one resource to the defer queue to free after the grace
> + * period is over.
> + */
> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> +{
> +	uint64_t token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +	uint32_t cur_size, free_size;
> +
> +	if (dq == NULL || e == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Start the grace period */
> +	token = rte_rcu_qsbr_start(dq->v);
> +
> +	/* Reclaim resources if the queue is 1/8th full. This helps
> +	 * the queue from growing too large and allows time for reader
> +	 * threads to report their quiescent state.
> +	 */
> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Triggering reclamation\n", __func__);
> +		rte_rcu_qsbr_dq_reclaim(dq);
> +	}

There are two problems I see:

1. rte_rcu_qsbr_dq_reclaim() reclaims only 1/16 of the defer queue while 
it triggers on 1/8. This means that there will always be 1/16 of non 
reclaimed entries in the queue.

2. Number of entries to reclaim depend on dq->size. So, 
rte_rcu_qsbr_dq_reclaim() could take a lot of cycles. For LPM library 
this means that rte_lpm_delete() sometimes takes a long time.

So, my suggestions here would be

- trigger rte_rcu_qsbr_dq_reclaim() with every enqueue

- reclaim small amount of entries (could be configurable of creation time)

- provide API to trigger reclaim from the application manually.

> +
> +	/* Check if there is space for atleast for 1 resource */
> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> +	if (!free_size) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Defer queue is full\n", __func__);
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	/* Enqueue the resource */
> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> +
> +	/* The resource to enqueue needs to be a multiple of 64b
> +	 * due to the limitation of the rte_ring implementation.
> +	 */
> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> +
> +	return 0;
> +}
> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
> +{
> +	uint32_t max_cnt;
> +	uint32_t cnt;
> +	void *token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Anything to reclaim? */
> +	if (rte_ring_count(dq->r) == 0)
> +		return 0;
> +
> +	/* Reclaim at the max 1/16th the total number of entries. */
> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> +	cnt = 0;
> +
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> +			== 1)) {
> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> +		/* The resource to dequeue needs to be a multiple of 64b
> +		 * due to the limitation of the rte_ring implementation.
> +		 */
> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> +			i++, tmp++)
> +			(void)rte_ring_sc_dequeue(dq->r,
> +					(void *)(uintptr_t)tmp);
> +		dq->f(dq->p, dq->e);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (cnt == 0) {
> +		/* No resources were reclaimed */
> +		rte_errno = EAGAIN;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> +		/* Error number is already set by the reclaim API */
> +		return 1;
> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>   int rte_rcu_log_type;
>   
>   RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index c80f15c00..185d4b50a 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>   #include <rte_lcore.h>
>   #include <rte_debug.h>
>   #include <rte_atomic.h>
> +#include <rte_ring.h>
>   
>   extern int rte_rcu_log_type;
>   
> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>   	 */
>   } __rte_cache_aligned;
>   
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> +
> +/**
> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> + */
> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> +
> +/**
> + *  Reclaim at the max 1/16th the total number of resources.
> + */
> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 8B as the rte_ring APIs
> +	 *   support 8B element sizes only.
> +	 */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;
> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>   /**
>    * @warning
>    * @b EXPERIMENTAL: this API may change without prior notice
> @@ -648,6 +710,113 @@ __rte_experimental
>   int
>   rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Reclaim resources from the defer queue.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to reclaim an entry from.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - None of the resources have completed at least 1 grace period,
> + *		try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..2122bc36a
> --- /dev/null
> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data stored on the defer queue */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +	char e[0];
> +	/**< Temporary storage to copy the defer queue element. */
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>   	rte_rcu_qsbr_synchronize;
>   	rte_rcu_qsbr_thread_register;
>   	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
>   
>   	local: *;
>   };
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..0e1be8407 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>   libraries = [
>   	'kvargs', # eal depends on kvargs
>   	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>   	'cmdline',
>   	'metrics', # bitrate/latency stats depends on this
>   	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>   	'gro', 'gso', 'ip_frag', 'jobstats',
>   	'kni', 'latencystats', 'lpm', 'member',
>   	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>   	# ipsec lib depends on net, crypto and security
>   	'ipsec',
>   	# add pkt framework libs which use other libs from above

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-04 16:05       ` Medvedkin, Vladimir
@ 2019-10-09  3:48         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-09  3:48 UTC (permalink / raw)
  To: Medvedkin, Vladimir, bruce.richardson, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck,
	Gavin Hu (Arm Technology China),
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

<snip>

> 
> Hi Honnappa,
> 
> On 01/10/2019 19:28, Honnappa Nagarahalli wrote:
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> >
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >   lib/librte_lpm/Makefile            |   3 +-
> >   lib/librte_lpm/meson.build         |   2 +
> >   lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
> >   lib/librte_lpm/rte_lpm.h           |  21 ++++++
> >   lib/librte_lpm/rte_lpm_version.map |   6 ++
> >   5 files changed, 122 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > a7946a1c5..ca9e16312 100644
> > --- a/lib/librte_lpm/Makefile
> > +++ b/lib/librte_lpm/Makefile
> > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
> >   # library name
> >   LIB = librte_lpm.a
> >
> > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> >   CFLAGS += -O3
> >   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal
> > -lrte_hash
> > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> >
> >   EXPORT_MAP := rte_lpm_version.map
> >
> > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > index a5176d8ae..19a35107f 100644
> > --- a/lib/librte_lpm/meson.build
> > +++ b/lib/librte_lpm/meson.build
> > @@ -2,9 +2,11 @@
> >   # Copyright(c) 2017 Intel Corporation
> >
> >   version = 2
> > +allow_experimental_apis = true
> >   sources = files('rte_lpm.c', 'rte_lpm6.c')
> >   headers = files('rte_lpm.h', 'rte_lpm6.h')
> >   # since header files have different names, we can install all vector headers
> >   # without worrying about which architecture we actually need
> >   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> >   deps += ['hash']
> > +deps += ['rcu']
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 3a929a1b1..ca58d4b35 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >    */
> >
> >   #include <string.h>
> > @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> >
> >   	rte_mcfg_tailq_write_unlock();
> >
> > +	if (lpm->dq)
> > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> >   	rte_free(lpm->tbl8);
> >   	rte_free(lpm->rules_tbl);
> >   	rte_free(lpm);
> > @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> 16.04);
> >   MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> >   		rte_lpm_free_v1604);
> As a general comment, are you going to add rcu support to the legacy _v20 ?
I do not see a requirement from my side. What's your suggestion?

> >
> > +struct __rte_lpm_rcu_dq_entry {
> > +	uint32_t tbl8_group_index;
> > +	uint32_t pad;
> > +};
> 
> Is this struct necessary? I mean in tbl8_free_v1604() you can pass
> tbl8_group_index as a pointer without "e.pad = 0;".
Agree, that is another way. This structure will go away once the ring library supports storing 32b elements.

> 
> And what about 32bit environment?
Waiting for rte_ring to support 32b elements (the patch is being discussed).

> 
> > +
> > +static void
> > +__lpm_rcu_qsbr_free_resource(void *p, void *data) {
> > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	struct __rte_lpm_rcu_dq_entry *e =
> > +			(struct __rte_lpm_rcu_dq_entry *)data;
> > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > +
> > +	/* Set tbl8 group invalid */
> > +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> > +		__ATOMIC_RELAXED);
> > +}
> > +
> > +/* Associate QSBR variable with an LPM object.
> > + */
> > +int
> > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +
> > +	if ((lpm == NULL) || (v == NULL)) {
> > +		rte_errno = EINVAL;
> > +		return 1;
> > +	}
> > +
> > +	if (lpm->dq) {
> > +		rte_errno = EEXIST;
> > +		return 1;
> > +	}
> > +
> > +	/* Init QSBR defer queue. */
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm-
> >name);
> 
> Consider moving this logic into rte_rcu_qsbr_dq_create(). I think there you
> could prefix the name with just RCU_ . So it would be possible to move
> include <rte_ring.h> into the rte_rcu_qsbr.c from rte_rcu_qsbr.h and get rid
> of RTE_RCU_QSBR_DQ_NAMESIZE macro in rte_rcu_qsbr.h file.
Macro is required to provide a length for the name, similar to what rte_ring does. What would be the length of the 'name' if RTE_RCU_QSBR_DQ_NAMESIZE is removed?
If the 'RCU_' has to be prefixed in 'rte_rcu_qsbr_dq_create', then RTE_RCU_QSBR_DQ_NAMESIZE needs to be readjusted in the header file. I am trying to keep it simple by constructing the string in a single function.

> 
> > +	params.name = rcu_dq_name;
> > +	params.size = lpm->number_tbl8s;
> > +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> > +	params.f = __lpm_rcu_qsbr_free_resource;
> > +	params.p = lpm->tbl8;
> > +	params.v = v;
> > +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > +	if (lpm->dq == NULL) {
> > +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >   /*
> >    * Adds a rule to the rule table.
> >    *
> > @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> *tbl8)
> >   }
> >
> >   static int32_t
> > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > number_tbl8s)
> > +__tbl8_alloc_v1604(struct rte_lpm *lpm)
> >   {
> >   	uint32_t group_idx; /* tbl8 group index. */
> >   	struct rte_lpm_tbl_entry *tbl8_entry;
> >
> >   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > +		tbl8_entry = &lpm->tbl8[group_idx *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> >   		/* If a free tbl8 group is found clean it and set as VALID. */
> >   		if (!tbl8_entry->valid_group) {
> >   			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 712,6 +769,21 @@
> > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> >   	return -ENOSPC;
> >   }
> >
> > +static int32_t
> > +tbl8_alloc_v1604(struct rte_lpm *lpm) {
> > +	int32_t group_idx; /* tbl8 group index. */
> > +
> > +	group_idx = __tbl8_alloc_v1604(lpm);
> > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> > +		/* If there are no tbl8 groups try to reclaim some. */
> > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> > +			group_idx = __tbl8_alloc_v1604(lpm);
> > +	}
> > +
> > +	return group_idx;
> > +}
> > +
> >   static void
> >   tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t
> tbl8_group_start)
> >   {
> > @@ -728,13 +800,21 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20
> *tbl8, uint32_t tbl8_group_start)
> >   }
> >
> >   static void
> > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > tbl8_group_start)
> > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> >   {
> > -	/* Set tbl8 group invalid*/
> >   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	struct __rte_lpm_rcu_dq_entry e;
> >
> > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > -			__ATOMIC_RELAXED);
> > +	if (lpm->dq != NULL) {
> > +		e.tbl8_group_index = tbl8_group_start;
> > +		e.pad = 0;
> > +		/* Push into QSBR defer queue. */
> > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> > +	} else {
> > +		/* Set tbl8 group invalid*/
> > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> > +				__ATOMIC_RELAXED);
> > +	}
> >   }
> >
> >   static __rte_noinline int32_t
> > @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked, uint8_t depth,
> >
> >   	if (!lpm->tbl24[tbl24_index].valid) {
> >   		/* Search for a free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >   		/* Check tbl8 allocation was successful. */
> >   		if (tbl8_group_index < 0) {
> > @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> >   	} /* If valid entry but not extended calculate the index into Table8. */
> >   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> >   		/* Search for free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >   		if (tbl8_group_index < 0) {
> >   			return tbl8_group_index;
> > @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
> >   		 */
> >   		lpm->tbl24[tbl24_index].valid = 0;
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >   	} else if (tbl8_recycle_index > -1) {
> >   		/* Update tbl24 entry. */
> >   		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> +1914,7 @@
> > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> >   		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> >   				__ATOMIC_RELAXED);
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >   	}
> >   #undef group_idx
> >   	return 0;
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > 906ec4483..49c12a68d 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >    */
> >
> >   #ifndef _RTE_LPM_H_
> > @@ -21,6 +22,7 @@
> >   #include <rte_common.h>
> >   #include <rte_vect.h>
> >   #include <rte_compat.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >   #ifdef __cplusplus
> >   extern "C" {
> > @@ -186,6 +188,7 @@ struct rte_lpm {
> >   			__rte_cache_aligned; /**< LPM tbl24 table. */
> >   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
> >   };
> >
> >   /**
> > @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> >   void
> >   rte_lpm_free_v1604(struct rte_lpm *lpm);
> >
> > +/**
> > + * Associate RCU QSBR variable with an LPM object.
> > + *
> > + * @param lpm
> > + *   the lpm object to add RCU QSBR
> > + * @param v
> > + *   RCU QSBR variable
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with error code set in rte_errno.
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - invalid pointer
> > + *   - EEXIST - already added QSBR
> > + *   - ENOMEM - memory allocation failure
> > + */
> > +__rte_experimental
> > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > +*v);
> > +
> >   /**
> >    * Add a rule to the LPM table.
> >    *
> > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > b/lib/librte_lpm/rte_lpm_version.map
> > index 90beac853..b353aabd2 100644
> > --- a/lib/librte_lpm/rte_lpm_version.map
> > +++ b/lib/librte_lpm/rte_lpm_version.map
> > @@ -44,3 +44,9 @@ DPDK_17.05 {
> >   	rte_lpm6_lookup_bulk_func;
> >
> >   } DPDK_16.04;
> > +
> > +EXPERIMENTAL {
> > +	global:
> > +
> > +	rte_lpm_rcu_qsbr_add;
> > +};
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-07  9:01           ` Ananyev, Konstantin
@ 2019-10-09  4:25             ` Honnappa Nagarahalli
  2019-10-10 15:09               ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-09  4:25 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> 
> >
> > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > >
> > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > >
> > > > The peek API allows fetching the next available object in the ring
> > > > without dequeuing it. This helps in scenarios where dequeuing of
> > > > objects depend on their value.
> > > >
> > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > ---
> > > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > > >  1 file changed, 30 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r,
> > > > void
> > > **obj_table,
> > > >  				r->cons.single, available);
> > > >  }
> > > >
> > > > +/**
> > > > + * Peek one object from a ring.
> > > > + *
> > > > + * The peek API allows fetching the next available object in the
> > > > +ring
> > > > + * without dequeuing it. This API is not multi-thread safe with
> > > > +respect
> > > > + * to other consumer threads.
> > > > + *
> > > > + * @param r
> > > > + *   A pointer to the ring structure.
> > > > + * @param obj_p
> > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > + * @return
> > > > + *   - 0: Success, object available
> > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline int
> > > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> > >
> > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > follow other rte_ring functions naming conventions
> > > (rte_ring_sc_peek() or so).
> > Agree
> >
> > >
> > > As a better alternative what do you think about introducing a
> > > serialized versions of DPDK rte_ring dequeue functions?
> > > Something like that:
> > >
> > > /* same as original ring dequeue, but:
> > >   * 1) move cons.head only if cons.head == const.tail
> > >   * 2) don't update cons.tail
> > >   */
> > > unsigned int
> > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > unsigned int n,
> > >                 unsigned int *available);
> > >
> > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> > >
> > > /* resets cons.head to const.tail value */ void
> > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > >
> > > Then your dq_reclaim cycle function will look like that:
> > >
> > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > uintptr_t elt[nb_elt]; ...
> > >
> > > do {
> > >
> > >   /* read next elem from the queue */
> > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > >   if (n == 0)
> > >       break;
> > >
> > >  /* wrong period, keep elem in the queue */  if
> > > (rte_rcu_qsbr_check(dr->v,
> > > elt[0]) != 1) {
> > >      rte_ring_serial_dequeue_abort(dq->r);
> > >      break;
> > >   }
> > >
> > >   /* can reclaim, remove elem from the queue */
> > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > >
> > >    /*call reclaim function */
> > >   dr->f(dr->p, elt);
> > >
> > > } while (avl >= nb_elt);
> > >
> > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > As long as actual reclamation callback itself is MT safe of course.
> >
> > I think it is a great idea. The other writers would still be polling
> > for the current writer to update the tail or update the head. This makes it a
> blocking solution.
> 
> Yep, it is a blocking one.
> 
> > We can make the other threads not poll i.e. they will quit reclaiming if they
> see that other writers are dequeuing from the queue.
> 
> Actually didn't think about that possibility, but yes should be possible to have
> _try_ semantics too.
> 
> >The other  way is to use per thread queues.
> >
> > The other requirement I see is to support unbounded-size data
> > structures where in the data structures do not have a pre-determined
> > number of entries. Also, currently the defer queue size is equal to the total
> number of entries in a given data structure. There are plans to support
> dynamically resizable defer queue. This means, memory allocation which will
> affect the lock-free-ness of the solution.
> >
> > So, IMO:
> > 1) The API should provide the capability to support different algorithms -
> may be through some flags?
> > 2) The requirements for the ring are pretty unique to the problem we
> > have here (for ex: move the cons-head only if cons-tail is also the same, skip
> polling). So, we should probably implement a ring with-in the RCU library?
> 
> Personally, I think such serialization ring API would be useful for other cases
> too.
> There are few cases when user need to read contents of the queue without
> removing elements from it.
> Let say we do use similar approach inside TLDK to implement TCP transmit
> queue.
> If such API would exist in DPDK we can just use it straightway, without
> maintaining a separate one.
ok

> 
> >
> > From the timeline perspective, adding all these capabilities would be
> > difficult to get done with in 19.11 timeline. What I have here
> > satisfies my current needs. I suggest that we make provisions in APIs now to
> support all these features, but do the implementation in the coming releases.
> Does this sound ok for you?
> 
> Not sure I understand your suggestion here...
> Could you explain it a bit more - how new API will look like and what would
> be left for the future.
For this patch, I suggest we do not add any more complexity. If someone wants a lock-free/block-free mechanism, it is available by creating per thread defer queues.

We push the following to the future:
1) Dynamically size adjustable defer queue. IMO, with this, the lock-free/block-free reclamation will not be available (memory allocation requires locking). The memory for the defer queue will be allocated/freed in chunks of 'size' elements as the queue grows/shrinks.

2) Constant size defer queue with lock-free and block-free reclamation (single option). The defer queue will be of fixed length 'size'. If the queue gets full an error is returned. The user could provide a 'size' equal to the number of elements in a data structure to ensure queue never gets full.

I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide 2 #defines, one for dynamically variable size defer queue and the other for constant size defer queue.

However, IMO, using per thread defer queue is a much simpler way to achieve 2. It does not add any significant burden to the user either.

> 
> >
> > >
> > > > +{
> > > > +	uint32_t prod_tail = r->prod.tail;
> > > > +	uint32_t cons_head = r->cons.head;
> > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > +	unsigned int n = 1;
> > > > +	if (count) {
> > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > +		return 0;
> > > > +	}
> > > > +	return -ENOENT;
> > > > +}
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-09  4:25             ` Honnappa Nagarahalli
@ 2019-10-10 15:09               ` Ananyev, Konstantin
  2019-10-11  5:03                 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-10 15:09 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd


> <snip>
> 
> >
> > >
> > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > >
> > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >
> > > > > The peek API allows fetching the next available object in the ring
> > > > > without dequeuing it. This helps in scenarios where dequeuing of
> > > > > objects depend on their value.
> > > > >
> > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > ---
> > > > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > > > >  1 file changed, 30 insertions(+)
> > > > >
> > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r,
> > > > > void
> > > > **obj_table,
> > > > >  				r->cons.single, available);
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * Peek one object from a ring.
> > > > > + *
> > > > > + * The peek API allows fetching the next available object in the
> > > > > +ring
> > > > > + * without dequeuing it. This API is not multi-thread safe with
> > > > > +respect
> > > > > + * to other consumer threads.
> > > > > + *
> > > > > + * @param r
> > > > > + *   A pointer to the ring structure.
> > > > > + * @param obj_p
> > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > + * @return
> > > > > + *   - 0: Success, object available
> > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static __rte_always_inline int
> > > > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> > > >
> > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > follow other rte_ring functions naming conventions
> > > > (rte_ring_sc_peek() or so).
> > > Agree
> > >
> > > >
> > > > As a better alternative what do you think about introducing a
> > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > Something like that:
> > > >
> > > > /* same as original ring dequeue, but:
> > > >   * 1) move cons.head only if cons.head == const.tail
> > > >   * 2) don't update cons.tail
> > > >   */
> > > > unsigned int
> > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n,
> > > >                 unsigned int *available);
> > > >
> > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> > > >
> > > > /* resets cons.head to const.tail value */ void
> > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > >
> > > > Then your dq_reclaim cycle function will look like that:
> > > >
> > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > uintptr_t elt[nb_elt]; ...
> > > >
> > > > do {
> > > >
> > > >   /* read next elem from the queue */
> > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > >   if (n == 0)
> > > >       break;
> > > >
> > > >  /* wrong period, keep elem in the queue */  if
> > > > (rte_rcu_qsbr_check(dr->v,
> > > > elt[0]) != 1) {
> > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > >      break;
> > > >   }
> > > >
> > > >   /* can reclaim, remove elem from the queue */
> > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > >
> > > >    /*call reclaim function */
> > > >   dr->f(dr->p, elt);
> > > >
> > > > } while (avl >= nb_elt);
> > > >
> > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > As long as actual reclamation callback itself is MT safe of course.
> > >
> > > I think it is a great idea. The other writers would still be polling
> > > for the current writer to update the tail or update the head. This makes it a
> > blocking solution.
> >
> > Yep, it is a blocking one.
> >
> > > We can make the other threads not poll i.e. they will quit reclaiming if they
> > see that other writers are dequeuing from the queue.
> >
> > Actually didn't think about that possibility, but yes should be possible to have
> > _try_ semantics too.
> >
> > >The other  way is to use per thread queues.
> > >
> > > The other requirement I see is to support unbounded-size data
> > > structures where in the data structures do not have a pre-determined
> > > number of entries. Also, currently the defer queue size is equal to the total
> > number of entries in a given data structure. There are plans to support
> > dynamically resizable defer queue. This means, memory allocation which will
> > affect the lock-free-ness of the solution.
> > >
> > > So, IMO:
> > > 1) The API should provide the capability to support different algorithms -
> > may be through some flags?
> > > 2) The requirements for the ring are pretty unique to the problem we
> > > have here (for ex: move the cons-head only if cons-tail is also the same, skip
> > polling). So, we should probably implement a ring with-in the RCU library?
> >
> > Personally, I think such serialization ring API would be useful for other cases
> > too.
> > There are few cases when user need to read contents of the queue without
> > removing elements from it.
> > Let say we do use similar approach inside TLDK to implement TCP transmit
> > queue.
> > If such API would exist in DPDK we can just use it straightway, without
> > maintaining a separate one.
> ok
> 
> >
> > >
> > > From the timeline perspective, adding all these capabilities would be
> > > difficult to get done with in 19.11 timeline. What I have here
> > > satisfies my current needs. I suggest that we make provisions in APIs now to
> > support all these features, but do the implementation in the coming releases.
> > Does this sound ok for you?
> >
> > Not sure I understand your suggestion here...
> > Could you explain it a bit more - how new API will look like and what would
> > be left for the future.
> For this patch, I suggest we do not add any more complexity. If someone wants a lock-free/block-free mechanism, it is available by creating
> per thread defer queues.
> 
> We push the following to the future:
> 1) Dynamically size adjustable defer queue. IMO, with this, the lock-free/block-free reclamation will not be available (memory allocation
> requires locking). The memory for the defer queue will be allocated/freed in chunks of 'size' elements as the queue grows/shrinks.

That one is fine by me.
In fact I don't know would be there a real use-case for dynamic defer queue for rcu var...
But I suppose that's subject for another discussion.

> 
> 2) Constant size defer queue with lock-free and block-free reclamation (single option). The defer queue will be of fixed length 'size'. If the
> queue gets full an error is returned. The user could provide a 'size' equal to the number of elements in a data structure to ensure queue
> never gets full.

Ok so for 19.11 what enqueue/dequeue model do you plan to support?
- MP/MC
- MP/SC
- SP/SC
- non MT at all (only same single thread can do enqueue and dequeue)

And related question:
What additional rte_ring API you plan to introduce in that case?
- None
- rte_ring_sc_peek()
- rte_ring_serial_dequeue()

> 
> I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide 2 #defines, one for dynamically variable size defer queue and the
> other for constant size defer queue.
> 
> However, IMO, using per thread defer queue is a much simpler way to achieve 2. It does not add any significant burden to the user either.
> 
> >
> > >
> > > >
> > > > > +{
> > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > +	uint32_t cons_head = r->cons.head;
> > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > +	unsigned int n = 1;
> > > > > +	if (count) {
> > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > +		return 0;
> > > > > +	}
> > > > > +	return -ENOENT;
> > > > > +}
> > > > > +
> > > > >  #ifdef __cplusplus
> > > > >  }
> > > > >  #endif
> > > > > --
> > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-10 15:09               ` Ananyev, Konstantin
@ 2019-10-11  5:03                 ` Honnappa Nagarahalli
  2019-10-11 14:41                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-11  5:03 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd, nd

> 
> > <snip>
> >
> > >
> > > >
> > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > >
> > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > >
> > > > > > The peek API allows fetching the next available object in the
> > > > > > ring without dequeuing it. This helps in scenarios where
> > > > > > dequeuing of objects depend on their value.
> > > > > >
> > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > ---
> > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > ++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 30 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring
> > > > > > *r, void
> > > > > **obj_table,
> > > > > >  				r->cons.single, available);  }
> > > > > >
> > > > > > +/**
> > > > > > + * Peek one object from a ring.
> > > > > > + *
> > > > > > + * The peek API allows fetching the next available object in
> > > > > > +the ring
> > > > > > + * without dequeuing it. This API is not multi-thread safe
> > > > > > +with respect
> > > > > > + * to other consumer threads.
> > > > > > + *
> > > > > > + * @param r
> > > > > > + *   A pointer to the ring structure.
> > > > > > + * @param obj_p
> > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > + * @return
> > > > > > + *   - 0: Success, object available
> > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +static __rte_always_inline int rte_ring_peek(struct rte_ring
> > > > > > +*r, void **obj_p)
> > > > >
> > > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > > follow other rte_ring functions naming conventions
> > > > > (rte_ring_sc_peek() or so).
> > > > Agree
> > > >
> > > > >
> > > > > As a better alternative what do you think about introducing a
> > > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > > Something like that:
> > > > >
> > > > > /* same as original ring dequeue, but:
> > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > >   * 2) don't update cons.tail
> > > > >   */
> > > > > unsigned int
> > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > **obj_table, unsigned int n,
> > > > >                 unsigned int *available);
> > > > >
> > > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t
> > > > > num);
> > > > >
> > > > > /* resets cons.head to const.tail value */ void
> > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > >
> > > > > Then your dq_reclaim cycle function will look like that:
> > > > >
> > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > > uintptr_t elt[nb_elt]; ...
> > > > >
> > > > > do {
> > > > >
> > > > >   /* read next elem from the queue */
> > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > >   if (n == 0)
> > > > >       break;
> > > > >
> > > > >  /* wrong period, keep elem in the queue */  if
> > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > elt[0]) != 1) {
> > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > >      break;
> > > > >   }
> > > > >
> > > > >   /* can reclaim, remove elem from the queue */
> > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > >
> > > > >    /*call reclaim function */
> > > > >   dr->f(dr->p, elt);
> > > > >
> > > > > } while (avl >= nb_elt);
> > > > >
> > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > As long as actual reclamation callback itself is MT safe of course.
> > > >
> > > > I think it is a great idea. The other writers would still be
> > > > polling for the current writer to update the tail or update the
> > > > head. This makes it a
> > > blocking solution.
> > >
> > > Yep, it is a blocking one.
> > >
> > > > We can make the other threads not poll i.e. they will quit
> > > > reclaiming if they
> > > see that other writers are dequeuing from the queue.
> > >
> > > Actually didn't think about that possibility, but yes should be
> > > possible to have _try_ semantics too.
> > >
> > > >The other  way is to use per thread queues.
> > > >
> > > > The other requirement I see is to support unbounded-size data
> > > > structures where in the data structures do not have a
> > > > pre-determined number of entries. Also, currently the defer queue
> > > > size is equal to the total
> > > number of entries in a given data structure. There are plans to
> > > support dynamically resizable defer queue. This means, memory
> > > allocation which will affect the lock-free-ness of the solution.
> > > >
> > > > So, IMO:
> > > > 1) The API should provide the capability to support different
> > > > algorithms -
> > > may be through some flags?
> > > > 2) The requirements for the ring are pretty unique to the problem
> > > > we have here (for ex: move the cons-head only if cons-tail is also
> > > > the same, skip
> > > polling). So, we should probably implement a ring with-in the RCU library?
> > >
> > > Personally, I think such serialization ring API would be useful for
> > > other cases too.
> > > There are few cases when user need to read contents of the queue
> > > without removing elements from it.
> > > Let say we do use similar approach inside TLDK to implement TCP
> > > transmit queue.
> > > If such API would exist in DPDK we can just use it straightway,
> > > without maintaining a separate one.
> > ok
> >
> > >
> > > >
> > > > From the timeline perspective, adding all these capabilities would
> > > > be difficult to get done with in 19.11 timeline. What I have here
> > > > satisfies my current needs. I suggest that we make provisions in
> > > > APIs now to
> > > support all these features, but do the implementation in the coming
> releases.
> > > Does this sound ok for you?
> > >
> > > Not sure I understand your suggestion here...
> > > Could you explain it a bit more - how new API will look like and
> > > what would be left for the future.
> > For this patch, I suggest we do not add any more complexity. If
> > someone wants a lock-free/block-free mechanism, it is available by creating
> per thread defer queues.
> >
> > We push the following to the future:
> > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > lock-free/block-free reclamation will not be available (memory allocation
> requires locking). The memory for the defer queue will be allocated/freed in
> chunks of 'size' elements as the queue grows/shrinks.
> 
> That one is fine by me.
> In fact I don't know would be there a real use-case for dynamic defer queue
> for rcu var...
> But I suppose that's subject for another discussion.
Currently, the defer queue size is equal to the number of resources in the data structure. This is unnecessary as the reclamation is done regularly.
If a smaller queue size is used, the queue might get full (even after reclamation), in which case, the queue size should be increased.

> 
> >
> > 2) Constant size defer queue with lock-free and block-free reclamation
> > (single option). The defer queue will be of fixed length 'size'. If
> > the queue gets full an error is returned. The user could provide a 'size' equal
> to the number of elements in a data structure to ensure queue never gets full.
> 
> Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> - MP/MC
> - MP/SC
> - SP/SC
Just SP/SC

> - non MT at all (only same single thread can do enqueue and dequeue)
If MT safe is required, one should use 1 defer queue per thread for now.

> 
> And related question:
> What additional rte_ring API you plan to introduce in that case?
> - None
> - rte_ring_sc_peek()
rte_ring_peek will be changed to rte_ring_sc_peek

> - rte_ring_serial_dequeue()
> 
> >
> > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide
> > 2 #defines, one for dynamically variable size defer queue and the other for
> constant size defer queue.
> >
> > However, IMO, using per thread defer queue is a much simpler way to
> achieve 2. It does not add any significant burden to the user either.
> >
> > >
> > > >
> > > > >
> > > > > > +{
> > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > +	unsigned int n = 1;
> > > > > > +	if (count) {
> > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > +		return 0;
> > > > > > +	}
> > > > > > +	return -ENOENT;
> > > > > > +}
> > > > > > +
> > > > > >  #ifdef __cplusplus
> > > > > >  }
> > > > > >  #endif
> > > > > > --
> > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-11  5:03                 ` Honnappa Nagarahalli
@ 2019-10-11 14:41                   ` Ananyev, Konstantin
  2019-10-11 18:28                     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-11 14:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Friday, October 11, 2019 6:04 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; paulmck@linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; Ruifeng Wang (Arm Technology
> China) <Ruifeng.Wang@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; dev@dpdk.org; nd <nd@arm.com>; nd
> <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v3 1/3] lib/ring: add peek API
> 
> >
> > > <snip>
> > >
> > > >
> > > > >
> > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > >
> > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > >
> > > > > > > The peek API allows fetching the next available object in the
> > > > > > > ring without dequeuing it. This helps in scenarios where
> > > > > > > dequeuing of objects depend on their value.
> > > > > > >
> > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > ---
> > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > ++++++++++++++++++++++++++++++
> > > > > > >  1 file changed, 30 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring
> > > > > > > *r, void
> > > > > > **obj_table,
> > > > > > >  				r->cons.single, available);  }
> > > > > > >
> > > > > > > +/**
> > > > > > > + * Peek one object from a ring.
> > > > > > > + *
> > > > > > > + * The peek API allows fetching the next available object in
> > > > > > > +the ring
> > > > > > > + * without dequeuing it. This API is not multi-thread safe
> > > > > > > +with respect
> > > > > > > + * to other consumer threads.
> > > > > > > + *
> > > > > > > + * @param r
> > > > > > > + *   A pointer to the ring structure.
> > > > > > > + * @param obj_p
> > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > + * @return
> > > > > > > + *   - 0: Success, object available
> > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +static __rte_always_inline int rte_ring_peek(struct rte_ring
> > > > > > > +*r, void **obj_p)
> > > > > >
> > > > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > > > follow other rte_ring functions naming conventions
> > > > > > (rte_ring_sc_peek() or so).
> > > > > Agree
> > > > >
> > > > > >
> > > > > > As a better alternative what do you think about introducing a
> > > > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > > > Something like that:
> > > > > >
> > > > > > /* same as original ring dequeue, but:
> > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > >   * 2) don't update cons.tail
> > > > > >   */
> > > > > > unsigned int
> > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > **obj_table, unsigned int n,
> > > > > >                 unsigned int *available);
> > > > > >
> > > > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t
> > > > > > num);
> > > > > >
> > > > > > /* resets cons.head to const.tail value */ void
> > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > >
> > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > >
> > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > > > uintptr_t elt[nb_elt]; ...
> > > > > >
> > > > > > do {
> > > > > >
> > > > > >   /* read next elem from the queue */
> > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > >   if (n == 0)
> > > > > >       break;
> > > > > >
> > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > elt[0]) != 1) {
> > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > >      break;
> > > > > >   }
> > > > > >
> > > > > >   /* can reclaim, remove elem from the queue */
> > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > >
> > > > > >    /*call reclaim function */
> > > > > >   dr->f(dr->p, elt);
> > > > > >
> > > > > > } while (avl >= nb_elt);
> > > > > >
> > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > >
> > > > > I think it is a great idea. The other writers would still be
> > > > > polling for the current writer to update the tail or update the
> > > > > head. This makes it a
> > > > blocking solution.
> > > >
> > > > Yep, it is a blocking one.
> > > >
> > > > > We can make the other threads not poll i.e. they will quit
> > > > > reclaiming if they
> > > > see that other writers are dequeuing from the queue.
> > > >
> > > > Actually didn't think about that possibility, but yes should be
> > > > possible to have _try_ semantics too.
> > > >
> > > > >The other  way is to use per thread queues.
> > > > >
> > > > > The other requirement I see is to support unbounded-size data
> > > > > structures where in the data structures do not have a
> > > > > pre-determined number of entries. Also, currently the defer queue
> > > > > size is equal to the total
> > > > number of entries in a given data structure. There are plans to
> > > > support dynamically resizable defer queue. This means, memory
> > > > allocation which will affect the lock-free-ness of the solution.
> > > > >
> > > > > So, IMO:
> > > > > 1) The API should provide the capability to support different
> > > > > algorithms -
> > > > may be through some flags?
> > > > > 2) The requirements for the ring are pretty unique to the problem
> > > > > we have here (for ex: move the cons-head only if cons-tail is also
> > > > > the same, skip
> > > > polling). So, we should probably implement a ring with-in the RCU library?
> > > >
> > > > Personally, I think such serialization ring API would be useful for
> > > > other cases too.
> > > > There are few cases when user need to read contents of the queue
> > > > without removing elements from it.
> > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > transmit queue.
> > > > If such API would exist in DPDK we can just use it straightway,
> > > > without maintaining a separate one.
> > > ok
> > >
> > > >
> > > > >
> > > > > From the timeline perspective, adding all these capabilities would
> > > > > be difficult to get done with in 19.11 timeline. What I have here
> > > > > satisfies my current needs. I suggest that we make provisions in
> > > > > APIs now to
> > > > support all these features, but do the implementation in the coming
> > releases.
> > > > Does this sound ok for you?
> > > >
> > > > Not sure I understand your suggestion here...
> > > > Could you explain it a bit more - how new API will look like and
> > > > what would be left for the future.
> > > For this patch, I suggest we do not add any more complexity. If
> > > someone wants a lock-free/block-free mechanism, it is available by creating
> > per thread defer queues.
> > >
> > > We push the following to the future:
> > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > lock-free/block-free reclamation will not be available (memory allocation
> > requires locking). The memory for the defer queue will be allocated/freed in
> > chunks of 'size' elements as the queue grows/shrinks.
> >
> > That one is fine by me.
> > In fact I don't know would be there a real use-case for dynamic defer queue
> > for rcu var...
> > But I suppose that's subject for another discussion.
> Currently, the defer queue size is equal to the number of resources in the data structure. This is unnecessary as the reclamation is done
> regularly.
> If a smaller queue size is used, the queue might get full (even after reclamation), in which case, the queue size should be increased.

I understand the intention.
Though I am not very happy with approach where to free one resource we first have to allocate another one.
Sounds like a source of deadlocks and for that case probably unnecessary complication.
But again, as it is not for 19.11 we don't have to discuss it now.
 
> >
> > >
> > > 2) Constant size defer queue with lock-free and block-free reclamation
> > > (single option). The defer queue will be of fixed length 'size'. If
> > > the queue gets full an error is returned. The user could provide a 'size' equal
> > to the number of elements in a data structure to ensure queue never gets full.
> >
> > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > - MP/MC
> > - MP/SC
> > - SP/SC
> Just SP/SC

Ok, just to confirm we are on the same page:
there would be a possibility for one thread do dq_enqueue(), second one do  dq_reclaim() simultaneously
(of course if actual reclamation function is thread safe)?
 
> > - non MT at all (only same single thread can do enqueue and dequeue)
> If MT safe is required, one should use 1 defer queue per thread for now.
> 
> >
> > And related question:
> > What additional rte_ring API you plan to introduce in that case?
> > - None
> > - rte_ring_sc_peek()
> rte_ring_peek will be changed to rte_ring_sc_peek
> 
> > - rte_ring_serial_dequeue()
> >
> > >
> > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide
> > > 2 #defines, one for dynamically variable size defer queue and the other for
> > constant size defer queue.
> > >
> > > However, IMO, using per thread defer queue is a much simpler way to
> > achieve 2. It does not add any significant burden to the user either.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > +{
> > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > +	unsigned int n = 1;
> > > > > > > +	if (count) {
> > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > +		return 0;
> > > > > > > +	}
> > > > > > > +	return -ENOENT;
> > > > > > > +}
> > > > > > > +
> > > > > > >  #ifdef __cplusplus
> > > > > > >  }
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-11 14:41                   ` Ananyev, Konstantin
@ 2019-10-11 18:28                     ` Honnappa Nagarahalli
  2019-10-13 20:09                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-11 18:28 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> > > >
> > > > >
> > > > > >
> > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > >
> > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > >
> > > > > > > > The peek API allows fetching the next available object in
> > > > > > > > the ring without dequeuing it. This helps in scenarios
> > > > > > > > where dequeuing of objects depend on their value.
> > > > > > > >
> > > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > ---
> > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18
> > > > > > > > 100644
> > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > rte_ring *r, void
> > > > > > > **obj_table,
> > > > > > > >  				r->cons.single, available);  }
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * Peek one object from a ring.
> > > > > > > > + *
> > > > > > > > + * The peek API allows fetching the next available object
> > > > > > > > +in the ring
> > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > +safe with respect
> > > > > > > > + * to other consumer threads.
> > > > > > > > + *
> > > > > > > > + * @param r
> > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > + * @param obj_p
> > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > + * @return
> > > > > > > > + *   - 0: Success, object available
> > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > >
> > > > > > > As it is not MT safe, then I think we need _sc_ in the name,
> > > > > > > to follow other rte_ring functions naming conventions
> > > > > > > (rte_ring_sc_peek() or so).
> > > > > > Agree
> > > > > >
> > > > > > >
> > > > > > > As a better alternative what do you think about introducing
> > > > > > > a serialized versions of DPDK rte_ring dequeue functions?
> > > > > > > Something like that:
> > > > > > >
> > > > > > > /* same as original ring dequeue, but:
> > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > >   * 2) don't update cons.tail
> > > > > > >   */
> > > > > > > unsigned int
> > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > **obj_table, unsigned int n,
> > > > > > >                 unsigned int *available);
> > > > > > >
> > > > > > > /* sets both cons.head and cons.tail to cons.head + num */
> > > > > > > void rte_ring_serial_dequeue_finish(struct rte_ring *r,
> > > > > > > uint32_t num);
> > > > > > >
> > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > >
> > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > >
> > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl,
> > > > > > > n; uintptr_t elt[nb_elt]; ...
> > > > > > >
> > > > > > > do {
> > > > > > >
> > > > > > >   /* read next elem from the queue */
> > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > >   if (n == 0)
> > > > > > >       break;
> > > > > > >
> > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > elt[0]) != 1) {
> > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > >      break;
> > > > > > >   }
> > > > > > >
> > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > >
> > > > > > >    /*call reclaim function */
> > > > > > >   dr->f(dr->p, elt);
> > > > > > >
> > > > > > > } while (avl >= nb_elt);
> > > > > > >
> > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > > >
> > > > > > I think it is a great idea. The other writers would still be
> > > > > > polling for the current writer to update the tail or update
> > > > > > the head. This makes it a
> > > > > blocking solution.
> > > > >
> > > > > Yep, it is a blocking one.
> > > > >
> > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > reclaiming if they
> > > > > see that other writers are dequeuing from the queue.
> > > > >
> > > > > Actually didn't think about that possibility, but yes should be
> > > > > possible to have _try_ semantics too.
> > > > >
> > > > > >The other  way is to use per thread queues.
> > > > > >
> > > > > > The other requirement I see is to support unbounded-size data
> > > > > > structures where in the data structures do not have a
> > > > > > pre-determined number of entries. Also, currently the defer
> > > > > > queue size is equal to the total
> > > > > number of entries in a given data structure. There are plans to
> > > > > support dynamically resizable defer queue. This means, memory
> > > > > allocation which will affect the lock-free-ness of the solution.
> > > > > >
> > > > > > So, IMO:
> > > > > > 1) The API should provide the capability to support different
> > > > > > algorithms -
> > > > > may be through some flags?
> > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > cons-tail is also the same, skip
> > > > > polling). So, we should probably implement a ring with-in the RCU
> library?
> > > > >
> > > > > Personally, I think such serialization ring API would be useful
> > > > > for other cases too.
> > > > > There are few cases when user need to read contents of the queue
> > > > > without removing elements from it.
> > > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > > transmit queue.
> > > > > If such API would exist in DPDK we can just use it straightway,
> > > > > without maintaining a separate one.
> > > > ok
> > > >
> > > > >
> > > > > >
> > > > > > From the timeline perspective, adding all these capabilities
> > > > > > would be difficult to get done with in 19.11 timeline. What I
> > > > > > have here satisfies my current needs. I suggest that we make
> > > > > > provisions in APIs now to
> > > > > support all these features, but do the implementation in the
> > > > > coming
> > > releases.
> > > > > Does this sound ok for you?
> > > > >
> > > > > Not sure I understand your suggestion here...
> > > > > Could you explain it a bit more - how new API will look like and
> > > > > what would be left for the future.
> > > > For this patch, I suggest we do not add any more complexity. If
> > > > someone wants a lock-free/block-free mechanism, it is available by
> > > > creating
> > > per thread defer queues.
> > > >
> > > > We push the following to the future:
> > > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > > lock-free/block-free reclamation will not be available (memory
> > > > allocation
> > > requires locking). The memory for the defer queue will be
> > > allocated/freed in chunks of 'size' elements as the queue grows/shrinks.
> > >
> > > That one is fine by me.
> > > In fact I don't know would be there a real use-case for dynamic
> > > defer queue for rcu var...
> > > But I suppose that's subject for another discussion.
> > Currently, the defer queue size is equal to the number of resources in
> > the data structure. This is unnecessary as the reclamation is done regularly.
> > If a smaller queue size is used, the queue might get full (even after
> reclamation), in which case, the queue size should be increased.
> 
> I understand the intention.
> Though I am not very happy with approach where to free one resource we first
> have to allocate another one.
> Sounds like a source of deadlocks and for that case probably unnecessary
> complication.
It depends on the use case. For some use cases lock-free reader-writer concurrency is enough (in which case there is no need to have a queue large enough to hold all the resources) and some would require lock-free reader-writer and writer-writer concurrency (where, theoretically, a queue large enough to hold all the resources would be required).

> But again, as it is not for 19.11 we don't have to discuss it now.
> 
> > >
> > > >
> > > > 2) Constant size defer queue with lock-free and block-free
> > > > reclamation (single option). The defer queue will be of fixed
> > > > length 'size'. If the queue gets full an error is returned. The
> > > > user could provide a 'size' equal
> > > to the number of elements in a data structure to ensure queue never gets
> full.
> > >
> > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > - MP/MC
> > > - MP/SC
> > > - SP/SC
> > Just SP/SC
> 
> Ok, just to confirm we are on the same page:
> there would be a possibility for one thread do dq_enqueue(), second one do
> dq_reclaim() simultaneously (of course if actual reclamation function is thread
> safe)?
Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.

> 
> > > - non MT at all (only same single thread can do enqueue and dequeue)
> > If MT safe is required, one should use 1 defer queue per thread for now.
> >
> > >
> > > And related question:
> > > What additional rte_ring API you plan to introduce in that case?
> > > - None
> > > - rte_ring_sc_peek()
> > rte_ring_peek will be changed to rte_ring_sc_peek
> >
> > > - rte_ring_serial_dequeue()
> > >
> > > >
> > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > provide
> > > > 2 #defines, one for dynamically variable size defer queue and the
> > > > other for
> > > constant size defer queue.
> > > >
> > > > However, IMO, using per thread defer queue is a much simpler way
> > > > to
> > > achieve 2. It does not add any significant burden to the user either.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > +{
> > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > +	unsigned int n = 1;
> > > > > > > > +	if (count) {
> > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > > +		return 0;
> > > > > > > > +	}
> > > > > > > > +	return -ENOENT;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  #ifdef __cplusplus
> > > > > > > >  }
> > > > > > > >  #endif
> > > > > > > > --
> > > > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-07 13:11       ` Medvedkin, Vladimir
@ 2019-10-13  3:02         ` Honnappa Nagarahalli
  2019-10-15 16:48           ` Medvedkin, Vladimir
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-13  3:02 UTC (permalink / raw)
  To: Medvedkin, Vladimir, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

Hi Vladimir,
	Apologies for the delayed response, I had to run few experiments.

<snip>

> 
> Hi Honnappa,
> 
> On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
> > Add resource reclamation APIs to make it simple for applications and
> > libraries to integrate rte_rcu library.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >   app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> >   lib/librte_rcu/meson.build         |   2 +
> >   lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> >   lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> >   lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> >   lib/librte_rcu/rte_rcu_version.map |   4 +
> >   lib/meson.build                    |   6 +-
> >   7 files changed, 700 insertions(+), 3 deletions(-)
> >   create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >
> > diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c index
> > d1b9e46a2..3a6815243 100644
> > --- a/app/test/test_rcu_qsbr.c
> > +++ b/app/test/test_rcu_qsbr.c
> > @@ -1,8 +1,9 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright (c) 2018 Arm Limited
> > + * Copyright (c) 2019 Arm Limited
> >    */
> >
> >   #include <stdio.h>
> > +#include <string.h>
> >   #include <rte_pause.h>
> >   #include <rte_rcu_qsbr.h>
> >   #include <rte_hash.h>
> > @@ -33,6 +34,7 @@ static uint32_t *keys;
> >   #define COUNTER_VALUE 4096
> >   static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
> >   static uint8_t writer_done;
> > +static uint8_t cb_failed;
> >
> >   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
> >   struct rte_hash *h[RTE_MAX_LCORE];
> > @@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
> >   	return 0;
> >   }
> >
> > +static void
> > +rte_rcu_qsbr_test_free_resource(void *p, void *e) {
> > +	if (p != NULL && e != NULL) {
> > +		printf("%s: Test failed\n", __func__);
> > +		cb_failed = 1;
> > +	}
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_create: create a queue used to store the data
> > +structure
> > + * elements that can be freed later. This queue is referred to as 'defer
> queue'.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_create(void)
> > +{
> > +	char rcu_dq_name[RTE_RING_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +	struct rte_rcu_qsbr_dq *dq;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
> > +
> > +	/* Pass invalid parameters */
> > +	dq = rte_rcu_qsbr_dq_create(NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> > +	params.name = rcu_dq_name;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	params.f = rte_rcu_qsbr_test_free_resource;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> > +	params.v = t[0];
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	params.size = 1;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	params.esize = 3;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	/* Pass all valid parameters */
> > +	params.esize = 16;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> params");
> > +	rte_rcu_qsbr_dq_delete(dq);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> > + * to be freed later after atleast one grace period is over.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_enqueue(void)
> > +{
> > +	int ret;
> > +	uint64_t r;
> > +	char rcu_dq_name[RTE_RING_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +	struct rte_rcu_qsbr_dq *dq;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
> > +
> > +	/* Create a queue with simple parameters */
> > +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> > +	params.name = rcu_dq_name;
> > +	params.f = rte_rcu_qsbr_test_free_resource;
> > +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> > +	params.v = t[0];
> > +	params.size = 1;
> > +	params.esize = 16;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> > +params");
> > +
> > +	/* Pass invalid parameters */
> > +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> > +params");
> > +
> > +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> > +params");
> > +
> > +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> > +params");
> > +
> > +	ret = rte_rcu_qsbr_dq_delete(dq);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid
> params");
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_reclaim(void)
> > +{
> > +	int ret;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
> > +
> > +	/* Pass invalid parameters */
> > +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid
> > +params");
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_delete(void)
> > +{
> > +	int ret;
> > +	char rcu_dq_name[RTE_RING_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +	struct rte_rcu_qsbr_dq *dq;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
> > +
> > +	/* Pass invalid parameters */
> > +	ret = rte_rcu_qsbr_dq_delete(NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid
> > +params");
> > +
> > +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> > +	params.name = rcu_dq_name;
> > +	params.f = rte_rcu_qsbr_test_free_resource;
> > +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> > +	params.v = t[0];
> > +	params.size = 1;
> > +	params.esize = 16;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> params");
> > +	ret = rte_rcu_qsbr_dq_delete(dq);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
> params");
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> > + * to be freed later after atleast one grace period is over.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize) {
> > +	int i, j, ret;
> > +	char rcu_dq_name[RTE_RING_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +	struct rte_rcu_qsbr_dq *dq;
> > +	uint64_t *e;
> > +	uint64_t sc = 200;
> > +	int max_entries;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
> > +	printf("Size = %d, esize = %d\n", size, esize);
> > +
> > +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
> > +	if (e == NULL)
> > +		return 0;
> > +	cb_failed = 0;
> > +
> > +	/* Initialize the RCU variable. No threads are registered */
> > +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> > +
> > +	/* Create a queue with simple parameters */
> > +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> > +	params.name = rcu_dq_name;
> > +	params.f = rte_rcu_qsbr_test_free_resource;
> > +	params.v = t[0];
> > +	params.size = size;
> > +	params.esize = esize;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> > +params");
> > +
> > +	/* Given the size and esize, calculate the maximum number of entries
> > +	 * that can be stored on the defer queue (look at the logic used
> > +	 * in capacity calculation of rte_ring).
> > +	 */
> > +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
> > +	max_entries = (max_entries - 1)/(esize/8 + 1);
> > +
> > +	/* Enqueue few counters starting with the value 'sc' */
> > +	/* The queue size will be rounded up to 2. The enqueue API also
> > +	 * reclaims if the queue size is above certain limit. Since, there
> > +	 * are no threads registered, reclamation succedes. Hence, it should
> > +	 * be possible to enqueue more than the provided queue size.
> > +	 */
> > +	for (i = 0; i < 10; i++) {
> > +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> > +			"dq enqueue functional");
> > +		for (j = 0; j < esize/8; j++)
> > +			e[j] = sc++;
> > +	}
> > +
> > +	/* Register a thread on the RCU QSBR variable. Reclamation will not
> > +	 * succeed. It should not be possible to enqueue more than the size
> > +	 * number of resources.
> > +	 */
> > +	rte_rcu_qsbr_thread_register(t[0], 1);
> > +	rte_rcu_qsbr_thread_online(t[0], 1);
> > +
> > +	for (i = 0; i < max_entries; i++) {
> > +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> > +			"dq enqueue functional");
> > +		for (j = 0; j < esize/8; j++)
> > +			e[j] = sc++;
> > +	}
> > +
> > +	/* Enqueue fails as queue is full */
> > +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
> functional");
> > +
> > +	/* Delete should fail as there are elements in defer queue which
> > +	 * cannot be reclaimed.
> > +	 */
> > +	ret = rte_rcu_qsbr_dq_delete(dq);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid
> params");
> > +
> > +	/* Report quiescent state, enqueue should succeed */
> > +	rte_rcu_qsbr_quiescent(t[0], 1);
> > +	for (i = 0; i < max_entries; i++) {
> > +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> > +			"dq enqueue functional");
> > +		for (j = 0; j < esize/8; j++)
> > +			e[j] = sc++;
> > +	}
> > +
> > +	/* Queue is full */
> > +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
> functional");
> > +
> > +	/* Report quiescent state, delete should succeed */
> > +	rte_rcu_qsbr_quiescent(t[0], 1);
> > +	ret = rte_rcu_qsbr_dq_delete(dq);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
> params");
> > +
> > +	/* Validate that call back function did not return any error */
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
> > +
> > +	rte_free(e);
> > +	return 0;
> > +}
> > +
> >   /*
> >    * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
> >    */
> > @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
> >   	if (test_rcu_qsbr_thread_offline() < 0)
> >   		goto test_fail;
> >
> > +	if (test_rcu_qsbr_dq_create() < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_reclaim() < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_delete() < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_enqueue() < 0)
> > +		goto test_fail;
> > +
> >   	printf("\nFunctional tests\n");
> >
> >   	if (test_rcu_qsbr_sw_sv_3qs() < 0)
> > @@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
> >   	if (test_rcu_qsbr_mw_mv_mqs() < 0)
> >   		goto test_fail;
> >
> > +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
> > +		goto test_fail;
> > +
> >   	free_rcu();
> >
> >   	printf("\n");
> > diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> > index 62920ba02..e280b29c1 100644
> > --- a/lib/librte_rcu/meson.build
> > +++ b/lib/librte_rcu/meson.build
> > @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
> >   if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> >   	ext_deps += cc.find_library('atomic')
> >   endif
> > +
> > +deps += ['ring']
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > @@ -21,6 +21,7 @@
> >   #include <rte_errno.h>
> >
> >   #include "rte_rcu_qsbr.h"
> > +#include "rte_rcu_qsbr_pvt.h"
> >
> >   /* Get the memory size of QSBR variable */
> >   size_t
> > @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr
> *v)
> >   	return 0;
> >   }
> >
> > +/* Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + */
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params) {
> > +	struct rte_rcu_qsbr_dq *dq;
> > +	uint32_t qs_fifo_size;
> > +
> > +	if (params == NULL || params->f == NULL ||
> > +		params->v == NULL || params->name == NULL ||
> > +		params->size == 0 || params->esize == 0 ||
> > +		(params->esize % 8 != 0)) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	dq = rte_zmalloc(NULL,
> > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > +		RTE_CACHE_LINE_SIZE);
> > +	if (dq == NULL) {
> > +		rte_errno = ENOMEM;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	/* round up qs_fifo_size to next power of two that is not less than
> > +	 * max_size.
> > +	 */
> > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > +					* params->size) + 1);
> > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > +					SOCKET_ID_ANY, 0);
> > +	if (dq->r == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): defer queue create failed\n", __func__);
> > +		rte_free(dq);
> > +		return NULL;
> > +	}
> > +
> > +	dq->v = params->v;
> > +	dq->size = params->size;
> > +	dq->esize = params->esize;
> > +	dq->f = params->f;
> > +	dq->p = params->p;
> > +
> > +	return dq;
> > +}
> > +
> > +/* Enqueue one resource to the defer queue to free after the grace
> > + * period is over.
> > + */
> > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > +	uint64_t token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +	uint32_t cur_size, free_size;
> > +
> > +	if (dq == NULL || e == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Start the grace period */
> > +	token = rte_rcu_qsbr_start(dq->v);
> > +
> > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > +	 * the queue from growing too large and allows time for reader
> > +	 * threads to report their quiescent state.
> > +	 */
> > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +			"%s(): Triggering reclamation\n", __func__);
> > +		rte_rcu_qsbr_dq_reclaim(dq);
> > +	}
> 
> There are two problems I see:
> 
> 1. rte_rcu_qsbr_dq_reclaim() reclaims only 1/16 of the defer queue while it
> triggers on 1/8. This means that there will always be 1/16 of non reclaimed
> entries in the queue.
There will be 'at least' 1/16 non-reclaimed entries. It could be more depending on the length of the grace period and the rate of deletion.
The trigger of 1/8 is used to give sufficient time for the readers to report their quiescent state. 1/16 is used to spread the load of reclamation across multiple calls and provide a upper bound on the cycles consumed.

> 
> 2. Number of entries to reclaim depend on dq->size. So,
> rte_rcu_qsbr_dq_reclaim() could take a lot of cycles. For LPM library this
That is true. It depends on dq->size (number of tbl8 groups). However, note that there is patch [1] which provides batch reclamation kind of behavior which reduces the cycles consumed by reclamation significantly.

[1] https://patches.dpdk.org/patch/58960/

> means that rte_lpm_delete() sometimes takes a long time.
Agree, sometimes takes additional time. It is good to spread it over multiple calls.

> 
> So, my suggestions here would be
> 
> - trigger rte_rcu_qsbr_dq_reclaim() with every enqueue
Given that the LPM APIs are mainly for control plane, I would think that, the next time LPM API is called, the readers have completed the grace period. But if there are frequent updates, we might end up with empty reclaims which will waste cycles. IMO, this trigger should happen after at least few entries are in the queue. 

> 
> - reclaim small amount of entries (could be configurable of creation time)
Agree. I would keep it a smaller than the trigger amount knowing that the elements added right before the trigger might not have completed the grace period.

> 
> - provide API to trigger reclaim from the application manually.
IMO, this will add additional complexity to the application. I agree that there will be special needs for some applications. I think those applications might have to implement their own methods using the base RCU APIs.
Instead, as agreed in other threads, I suggest we expose the parameters (when to trigger and how much to reclaim) to the application as optional configurable parameters. i.e. if the application does not provide we can use default values. I think this should provide enough flexibility to the application.

> 
> > +
> > +	/* Check if there is space for atleast for 1 resource */
> > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > +	if (!free_size) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Defer queue is full\n", __func__);
> > +		rte_errno = ENOSPC;
> > +		return 1;
> > +	}
> > +
> > +	/* Enqueue the resource */
> > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > +
> > +	/* The resource to enqueue needs to be a multiple of 64b
> > +	 * due to the limitation of the rte_ring implementation.
> > +	 */
> > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Reclaim resources from the defer queue. */ int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > +	uint32_t max_cnt;
> > +	uint32_t cnt;
> > +	void *token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Anything to reclaim? */
> > +	if (rte_ring_count(dq->r) == 0)
> > +		return 0;
> > +
> > +	/* Reclaim at the max 1/16th the total number of entries. */
> > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > +	cnt = 0;
> > +
> > +	/* Check reader threads quiescent state and reclaim resources */
> > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > +			== 1)) {
> > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > +		/* The resource to dequeue needs to be a multiple of 64b
> > +		 * due to the limitation of the rte_ring implementation.
> > +		 */
> > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > +			i++, tmp++)
> > +			(void)rte_ring_sc_dequeue(dq->r,
> > +					(void *)(uintptr_t)tmp);
> > +		dq->f(dq->p, dq->e);
> > +
> > +		cnt++;
> > +	}
> > +
> > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > +
> > +	if (cnt == 0) {
> > +		/* No resources were reclaimed */
> > +		rte_errno = EAGAIN;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/* Delete a defer queue. */
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Reclaim all the resources */
> > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > +		/* Error number is already set by the reclaim API */
> > +		return 1;
> > +
> > +	rte_ring_free(dq->r);
> > +	rte_free(dq);
> > +
> > +	return 0;
> > +}
> > +
> >   int rte_rcu_log_type;
> >
> >   RTE_INIT(rte_rcu_register)
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > @@ -34,6 +34,7 @@ extern "C" {
> >   #include <rte_lcore.h>
> >   #include <rte_debug.h>
> >   #include <rte_atomic.h>
> > +#include <rte_ring.h>
> >
> >   extern int rte_rcu_log_type;
> >
> > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> >   	 */
> >   } __rte_cache_aligned;
> >
> > +/**
> > + * Call back function called to free the resources.
> > + *
> > + * @param p
> > + *   Pointer provided while creating the defer queue
> > + * @param e
> > + *   Pointer to the resource data stored on the defer queue
> > + *
> > + * @return
> > + *   None
> > + */
> > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > +
> > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > +
> > +/**
> > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > + */
> > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > +
> > +/**
> > + *  Reclaim at the max 1/16th the total number of resources.
> > + */
> > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > +
> > +/**
> > + * Parameters used when creating the defer queue.
> > + */
> > +struct rte_rcu_qsbr_dq_parameters {
> > +	const char *name;
> > +	/**< Name of the queue. */
> > +	uint32_t size;
> > +	/**< Number of entries in queue. Typically, this will be
> > +	 *   the same as the maximum number of entries supported in the
> > +	 *   lock free data structure.
> > +	 *   Data structures with unbounded number of entries is not
> > +	 *   supported currently.
> > +	 */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of each element in the defer queue.
> > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > +	 *   support 8B element sizes only.
> > +	 */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs. This can be NULL.
> > +	 */
> > +	struct rte_rcu_qsbr *v;
> > +	/**< RCU QSBR variable to use for this defer queue */ };
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq;
> > +
> >   /**
> >    * @warning
> >    * @b EXPERIMENTAL: this API may change without prior notice @@
> > -648,6 +710,113 @@ __rte_experimental
> >   int
> >   rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + *
> > + * @param params
> > + *   Parameters to create a defer queue.
> > + * @return
> > + *   On success - Valid pointer to defer queue
> > + *   On error - NULL
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOMEM - Not enough memory
> > + */
> > +__rte_experimental
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Enqueue one resource to the defer queue and start the grace period.
> > + * The resource will be freed later after at least one grace period
> > + * is over.
> > + *
> > + * If the defer queue is full, it will attempt to reclaim resources.
> > + * It will also reclaim resources at regular intervals to avoid
> > + * the defer queue from growing too big.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to allocate an entry from.
> > + * @param e
> > + *   Pointer to resource data to copy to the defer queue. The size of
> > + *   the data to copy is equal to the element size provided when the
> > + *   defer queue was created.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > + *		if the defer queue size is equal (or larger) than the
> > + *		number of elements in the data structure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Reclaim resources from the defer queue.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to reclaim an entry from.
> > + * @return
> > + *   On successful reclamation of at least 1 resource - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - None of the resources have completed at least 1 grace
> period,
> > + *		try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Delete a defer queue.
> > + *
> > + * It tries to reclaim all the resources on the defer queue.
> > + * If any of the resources have not completed the grace period
> > + * the reclamation stops and returns immediately. The rest of
> > + * the resources are not reclaimed and the defer queue is not
> > + * freed.
> > + *
> > + * @param dq
> > + *   Defer queue to delete.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > + *		period, try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > +
> >   #ifdef __cplusplus
> >   }
> >   #endif
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > new file mode 100644
> > index 000000000..2122bc36a
> > --- /dev/null
> > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > +#define _RTE_RCU_QSBR_PVT_H_
> > +
> > +/**
> > + * This file is private to the RCU library. It should not be included
> > + * by the user of this library.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "rte_rcu_qsbr.h"
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq {
> > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > +	uint32_t size;
> > +	/**< Number of elements in the defer queue */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of data stored on the defer queue */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs.
> > +	 */
> > +	char e[0];
> > +	/**< Temporary storage to copy the defer queue element. */ };
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > b/lib/librte_rcu/rte_rcu_version.map
> > index f8b9ef2ab..dfac88a37 100644
> > --- a/lib/librte_rcu/rte_rcu_version.map
> > +++ b/lib/librte_rcu/rte_rcu_version.map
> > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >   	rte_rcu_qsbr_synchronize;
> >   	rte_rcu_qsbr_thread_register;
> >   	rte_rcu_qsbr_thread_unregister;
> > +	rte_rcu_qsbr_dq_create;
> > +	rte_rcu_qsbr_dq_enqueue;
> > +	rte_rcu_qsbr_dq_reclaim;
> > +	rte_rcu_qsbr_dq_delete;
> >
> >   	local: *;
> >   };
> > diff --git a/lib/meson.build b/lib/meson.build index
> > e5ff83893..0e1be8407 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,7 +11,9 @@
> >   libraries = [
> >   	'kvargs', # eal depends on kvargs
> >   	'eal', # everything depends on eal
> > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > +	'ring',
> > +	'rcu', # rcu depends on ring
> > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >   	'cmdline',
> >   	'metrics', # bitrate/latency stats depends on this
> >   	'hash',    # efd depends on this
> > @@ -22,7 +24,7 @@ libraries = [
> >   	'gro', 'gso', 'ip_frag', 'jobstats',
> >   	'kni', 'latencystats', 'lpm', 'member',
> >   	'power', 'pdump', 'rawdev',
> > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >   	# ipsec lib depends on net, crypto and security
> >   	'ipsec',
> >   	# add pkt framework libs which use other libs from above
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-07 10:46               ` Ananyev, Konstantin
@ 2019-10-13  4:35                 ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-13  4:35 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> > > > > > Add resource reclamation APIs to make it simple for
> > > > > > applications and libraries to integrate rte_rcu library.
> > > > > >
> > > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > ---
> > > > > >  app/test/test_rcu_qsbr.c           | 291
> ++++++++++++++++++++++++++++-
> > > > > >  lib/librte_rcu/meson.build         |   2 +
> > > > > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > > > > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > > > > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > > > > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > > > > >  lib/meson.build                    |   6 +-
> > > > > >  7 files changed, 700 insertions(+), 3 deletions(-)  create
> > > > > > mode
> > > > > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > >
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b
> > > > > > 100644
> > > > > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > > @@ -21,6 +21,7 @@
> > > > > >  #include <rte_errno.h>
> > > > > >
> > > > > >  #include "rte_rcu_qsbr.h"
> > > > > > +#include "rte_rcu_qsbr_pvt.h"
> > > > > >
> > > > > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6
> > > > > > +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > > > > >  	return 0;
> > > > > >  }
> > > > > >
> > > > > > +/* Create a queue used to store the data structure elements
> > > > > > +that can
> > > > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > > > + */
> > > > > > +struct rte_rcu_qsbr_dq *
> > > > > > +rte_rcu_qsbr_dq_create(const struct
> > > > > > +rte_rcu_qsbr_dq_parameters
> > > > > > +*params) {
> > > > > > +	struct rte_rcu_qsbr_dq *dq;
> > > > > > +	uint32_t qs_fifo_size;
> > > > > > +
> > > > > > +	if (params == NULL || params->f == NULL ||
> > > > > > +		params->v == NULL || params->name == NULL ||
> > > > > > +		params->size == 0 || params->esize == 0 ||
> > > > > > +		(params->esize % 8 != 0)) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return NULL;
> > > > > > +	}
> > > > > > +
> > > > > > +	dq = rte_zmalloc(NULL,
> > > > > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > > > > +		RTE_CACHE_LINE_SIZE);
> > > > > > +	if (dq == NULL) {
> > > > > > +		rte_errno = ENOMEM;
> > > > > > +
> > > > > > +		return NULL;
> > > > > > +	}
> > > > > > +
> > > > > > +	/* round up qs_fifo_size to next power of two that is not less
> than
> > > > > > +	 * max_size.
> > > > > > +	 */
> > > > > > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > > > > > +					* params->size) + 1);
> > > > > > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > > > > > +					SOCKET_ID_ANY, 0);
> > > > >
> > > > > If it is going to be not MT safe, then why not to create the
> > > > > ring with (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> > > > Agree.
> > > >
> > > > > Though I think it could be changed to allow MT safe multiple
> > > > > enqeue/single dequeue, see below.
> > > > The MT safe issue is due to reclaim code. The reclaim code has the
> > > > following
> > > sequence:
> > > >
> > > > rte_ring_peek
> > > > rte_rcu_qsbr_check
> > > > rte_ring_dequeue
> > > >
> > > > This entire sequence needs to be atomic as the entry cannot be
> > > > dequeued
> > > without knowing that the grace period for that entry is over.
> > >
> > > I understand that, though I believe at least it should be possible
> > > to support multiple-enqueue/single dequeuer and reclaim mode.
> > > With serialized dequeue() even multiple dequeue should be possible.
> > Agreed. Please see the response on the other thread.
> >
> > >
> > > > Note that due to optimizations in rte_rcu_qsbr_check API, this
> > > > sequence should not be large in most cases. I do not have ideas on
> > > > how to
> > > make this sequence lock-free.
> > > >
> > > > If the writer is on the control plane, most use cases will use
> > > > mutex locks for synchronization if they are multi-threaded. That
> > > > lock should be
> > > enough to provide the thread safety for these APIs.
> > >
> > > In that is case, why do we need ring at all?
> > > For sure people can create their own queue quite easily with mutex and
> TAILQ.
> > > If performance is not an issue, they can even add pthread_cond to
> > > it, and have an ability for the consumer to sleep/wakeup on empty/full
> queue.
> > >
> > > >
> > > > If the writer is multi-threaded and lock-free, then one should use
> > > > per thread
> > > defer queue.
> > >
> > > If that's the only working model, then the question is why do we
> > > need that API at all?
> > > Just simple array with counter or linked-list should do for majority of
> cases.
> > Please see the other thread.
> >
> > >
> > > >
> > > > >
> > > > > > +	if (dq->r == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): defer queue create failed\n",
> __func__);
> > > > > > +		rte_free(dq);
> > > > > > +		return NULL;
> > > > > > +	}
> > > > > > +
> > > > > > +	dq->v = params->v;
> > > > > > +	dq->size = params->size;
> > > > > > +	dq->esize = params->esize;
> > > > > > +	dq->f = params->f;
> > > > > > +	dq->p = params->p;
> > > > > > +
> > > > > > +	return dq;
> > > > > > +}
> > > > > > +
> > > > > > +/* Enqueue one resource to the defer queue to free after the
> > > > > > +grace
> > > > > > + * period is over.
> > > > > > + */
> > > > > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> {
> > > > > > +	uint64_t token;
> > > > > > +	uint64_t *tmp;
> > > > > > +	uint32_t i;
> > > > > > +	uint32_t cur_size, free_size;
> > > > > > +
> > > > > > +	if (dq == NULL || e == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > >
> > > > > Why just not to return -EINVAL straightway?
> > > > > I think there is no much point to set rte_errno in that function
> > > > > at all, just return value should do.
> > > > I am trying to keep these consistent with the existing APIs. They
> > > > return 0 or 1
> > > and set the rte_errno.
> > >
> > > A lot of public DPDK API functions do use return value to return
> > > status code (0, or some positive numbers of success, negative errno
> > > values on failure), I am not inventing anything new here.
> > Agree, you are not proposing a new thing here. May be I was not clear.
> > I really do not have an opinion on how this should be done. But, I do have
> an opinion on consistency. These new APIs follow what has been done in the
> existing RCU APIs. I think we have 2 options here.
> > 1) Either we change existing RCU APIs to get rid of rte_errno (is it
> > an ABI change?) or
> > 2) The new APIs follow what has been done in the existing RCU APIs.
> > I want to make sure we are consistent at least within RCU APIs.
> 
> But as I can see right now rcu API sets rte_errno only for control-path
> functions (get_memsize, init, register, unregister, dump).
> All fast-path (inline) function don't set/use it.
> So from perspective that is consistent behavior, no?
Agree. I am treating this as a control plane function mainly (hence it is a non-inline function as well).

> 
> >
> > >
> > > >
> > > > >
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Start the grace period */
> > > > > > +	token = rte_rcu_qsbr_start(dq->v);
> > > > > > +
> > > > > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > > > > +	 * the queue from growing too large and allows time for
> reader
> > > > > > +	 * threads to report their quiescent state.
> > > > > > +	 */
> > > > > > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> > > > >
> > > > > Probably would be a bit easier if you just store in dq->esize
> > > > > (elt size + token
> > > > > size) / 8.
> > > > Agree
> > > >
> > > > >
> > > > > > +	if (cur_size > (dq->size >>
> > > > > > +RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> > > > >
> > > > > Why to make this threshold value hard-coded?
> > > > > Why either not to put it into create parameter, or just return a
> > > > > special return value, to indicate that threshold is reached?
> > > > My thinking was to keep the programming interface easy to use. The
> > > > more the parameters, the more painful it is for the user. IMO, the
> > > > constants chosen should be good enough for most cases. More
> > > > advanced
> > > users could modify the constants. However, we could make these as
> > > part of the parameters, but make them optional for the user. For ex:
> > > if they set them to 0, default values can be used.
> > > >
> > > > > Or even return number of filled/free entroes on success, so
> > > > > caller can decide to reclaim or not based on that information on his
> own?
> > > > This means more code on the user side.
> > >
> > > I personally think it it really wouldn't be that big problem to the
> > > user to pass extra parameter to the function.
> > I will convert the 2 constants into optional parameters (user can set
> > them to 0 to make the algorithm use default values)
> >
> > > Again what if user doesn't want to reclaim() in enqueue() thread at all?
> > 'enqueue' has to do reclamation if the defer queue is full. I do not think this
> is trivial.
> >
> > In the current design, reclamation in enqueue is also done on regular
> > basis (automatic triggering of reclamation when the queue reaches
> > certain limit) to keep the queue from growing too large. This is
> > required when we implement a dynamically adjusting defer queue. The
> current algorithm keeps the cost of reclamation spread across multiple calls
> and puts an upper bound on cycles for delete API by reclaiming a fixed
> number of entries.
> >
> > This algorithm is proven to work in the LPM integration performance
> > tests at a very low performance over head (~1%). So, I do not know why a
> user would not want to use this.
> 
> Yeh, I looked at LPM implementation and one thing I found strange -
> defer_queue is hidden inside LPM struct and all reclamations are done
> internally.
> Yes for sure it allows to defer and group actual reclaim(), which hopefully will
> lead to better performance.
> But why not to allow user to call reclaim() for it directly too?
> In that way user might avoid/(minimize) doing reclaim() in LPM write() at all.
> And let say do it somewhere later in the same thread (when no other tasks to
> do), or even leave it to some other house-keeping thread to do (sort of
> garbage collector).
> Or such mode is not supported/planned?
The goal of integrating the RCU defer APIs with libraries is to take away the complexity on the writer to adopt the lock-free algorithms. I am looking to address most used use cases. There will be use cases which are not very common and I think those should be addressed by the application by using the base RCU APIs. Let us discuss this more in the other thread, where you have similar questions.

> 
> > The 2 additional parameters should give the user more flexibility.
> 
> Ok, let's keep it as config params.
> After another though - I think you right, it should be good enough.
> 
> >
> > However, if the user wants his own algorithm, he can create one with the
> base APIs provided.
> >
> > >
> > > > I think adding these to parameters seems like a better option.
> > > >
> > > > >
> > > > > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > > > +			"%s(): Triggering reclamation\n", __func__);
> > > > > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Check if there is space for atleast for 1 resource */
> > > > > > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > > > > +	if (!free_size) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Defer queue is full\n", __func__);
> > > > > > +		rte_errno = ENOSPC;
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Enqueue the resource */
> > > > > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > > > > +
> > > > > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > > > > +	 * due to the limitation of the rte_ring implementation.
> > > > > > +	 */
> > > > > > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > > > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> > > > >
> > > > >
> > > > > That whole construction above looks a bit clumsy and error prone...
> > > > > I suppose just:
> > > > >
> > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> > > > > n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
> > > > Yes, bulk enqueue can be used. But note that once the flexible
> > > > element size
> > > ring patch is done, this code will use that.
> > >
> > > Well, when it will be in the mainline, and it would provide a better
> > > way, for sure this code can be updated to use new API (if it is provide
> some improvements).
> > > But as I udenrstand, right now it is not there, while bulk
> enqueue/dequeue are.
> > Apologies, I was not clear. I agree we can go with bulk APIs for now.
> >
> > >
> > > >
> > > > >   return -ENOSPC;
> > > > > return free;
> > > > >
> > > > > That way I think you can have MT-safe version of that function.
> > > > Please see the description of MT safe issue above.
> > > >
> > > > >
> > > > > > +
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > > +/* Reclaim resources from the defer queue. */ int
> > > > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > > > > +	uint32_t max_cnt;
> > > > > > +	uint32_t cnt;
> > > > > > +	void *token;
> > > > > > +	uint64_t *tmp;
> > > > > > +	uint32_t i;
> > > > > > +
> > > > > > +	if (dq == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > >
> > > > > Same story as above - I think rte_errno is excessive in this function.
> > > > > Just return value should be enough.
> > > > >
> > > > >
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Anything to reclaim? */
> > > > > > +	if (rte_ring_count(dq->r) == 0)
> > > > > > +		return 0;
> > > > >
> > > > > Not sure you need that, see below.
> > > > >
> > > > > > +
> > > > > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > > > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > > > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > > > >
> > > > > Again why not to make max_cnt a configurable at create() parameter?
> > > > I think making this as an optional parameter for creating defer
> > > > queue is a
> > > better option.
> > > >
> > > > > Or even a parameter for that function?
> > > > >
> > > > > > +	cnt = 0;
> > > > > > +
> > > > > > +	/* Check reader threads quiescent state and reclaim
> resources */
> > > > > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) ==
> 0) &&
> > > > > > +		(rte_rcu_qsbr_check(dq->v,
> (uint64_t)((uintptr_t)token), false)
> > > > > > +			== 1)) {
> > > > >
> > > > >
> > > > > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > > > > +		/* The resource to dequeue needs to be a multiple of
> 64b
> > > > > > +		 * due to the limitation of the rte_ring
> implementation.
> > > > > > +		 */
> > > > > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > > > > +			i++, tmp++)
> > > > > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > > > > +					(void *)(uintptr_t)tmp);
> > > > >
> > > > > Again, no need for such constructs with multiple dequeuer I believe.
> > > > > Just:
> > > > >
> > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n;
> > > > > uintptr_t elt[nb_elt]; ...
> > > > > n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0)
> > > > > {dq->f(dq->p, elt);}
> > > > Agree on bulk API use.
> > > >
> > > > >
> > > > > Seems enough.
> > > > > Again in that case you can have enqueue/reclaim running in
> > > > > different threads simultaneously, plus you don't need dq->e at all.
> > > > Will check on dq->e
> > > >
> > > > >
> > > > > > +		dq->f(dq->p, dq->e);
> > > > > > +
> > > > > > +		cnt++;
> > > > > > +	}
> > > > > > +
> > > > > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > > > > +
> > > > > > +	if (cnt == 0) {
> > > > > > +		/* No resources were reclaimed */
> > > > > > +		rte_errno = EAGAIN;
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	return 0;
> > > > >
> > > > > I'd suggest to return cnt on success.
> > > > I am trying to keep the APIs simple. I do not see much use for 'cnt'
> > > > as return value to the user. It exposes more details which I think
> > > > are internal
> > > to the library.
> > >
> > > Not sure what is the hassle to return number of completed reclamaitions?
> > > If user doesn't need that information, he simply wouldn't use it.
> > > But might be it would be usefull - he can decide should he try
> > > another attempt of reclaim() immediately or is it ok to do something else.
> > There is no hassle to return that information.
> >
> > As per the current design, user calls 'reclaim' when it is out of
> > resources while adding an entry to the data structure. At that point
> > the user wants to know if at least 1 resource was reclaimed because the
> user has to allocate 1 resource. He does not have a use for the number of
> resources reclaimed.
> 
> Ok, but why user can't decide to do reclaim in advance, let say when he
> foresee that he would need a lot of allocations in nearest future?
> Or when there is some idle time? Or some combination of these things?
> At he would like to free some extra resources in that case to minimize
> number of reclaims in future peak interval?
If the user has free time he can call the reclaim API. By making the parameters configurable, he should be able to control how much he can reclaim.
If the user wants to make sure that he has enough free resources for the future. He should be able to do it by knowing how many free resources are available in his data structure currently.
But, I do not see it as a problem to return the number of resources reclaimed. I will add that.

> 
> >
> > If this API returns 0, then the user can decide to repeat the call or
> > return failure. But that decision depends on the length of the grace period
> which is under user's control.
> >
> > >
> > > >
> > > > >
> > > > > > +}
> > > > > > +
> > > > > > +/* Delete a defer queue. */
> > > > > > +int
> > > > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > > > > +	if (dq == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Reclaim all the resources */
> > > > > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > > > > +		/* Error number is already set by the reclaim API */
> > > > > > +		return 1;
> > > > >
> > > > > How do you know that you have reclaimed everything?
> > > > Good point, will come back with a different solution.
> > > >
> > > > >
> > > > > > +
> > > > > > +	rte_ring_free(dq->r);
> > > > > > +	rte_free(dq);
> > > > > > +
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > >  int rte_rcu_log_type;
> > > > > >
> > > > > >  RTE_INIT(rte_rcu_register)
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a
> > > > > > 100644
> > > > > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > @@ -34,6 +34,7 @@ extern "C" {  #include <rte_lcore.h>
> > > > > > #include <rte_debug.h>  #include <rte_atomic.h>
> > > > > > +#include <rte_ring.h>
> > > > > >
> > > > > >  extern int rte_rcu_log_type;
> > > > > >
> > > > > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > > > > >  	 */
> > > > > >  } __rte_cache_aligned;
> > > > > >
> > > > > > +/**
> > > > > > + * Call back function called to free the resources.
> > > > > > + *
> > > > > > + * @param p
> > > > > > + *   Pointer provided while creating the defer queue
> > > > > > + * @param e
> > > > > > + *   Pointer to the resource data stored on the defer queue
> > > > > > + *
> > > > > > + * @return
> > > > > > + *   None
> > > > > > + */
> > > > > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > > > >
> > > > > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > > > > Though I am not sure you need a new typedef at all - just a
> > > > > function pointer inside the struct seems enough.
> > > > Other libraries (for ex: rte_hash) use this approach. I think it
> > > > is better to keep
> > > it out of the structure to allow for better commenting.
> > >
> > > I am saying majority of DPDK code use _t suffix for typedef:
> > > typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
> > Apologies, got it, will change.
> >
> > >
> > > >
> > > > >
> > > > > > +
> > > > > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > > > > +
> > > > > > +/**
> > > > > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > > > > + */
> > > > > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > > > > +
> > > > > > +/**
> > > > > > + *  Reclaim at the max 1/16th the total number of resources.
> > > > > > + */
> > > > > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > > > >
> > > > >
> > > > > As I said above, I don't think these thresholds need to be hardcoded.
> > > > > In any case, there seems not much point to put them in the
> > > > > public header
> > > file.
> > > > >
> > > > > > +
> > > > > > +/**
> > > > > > + * Parameters used when creating the defer queue.
> > > > > > + */
> > > > > > +struct rte_rcu_qsbr_dq_parameters {
> > > > > > +	const char *name;
> > > > > > +	/**< Name of the queue. */
> > > > > > +	uint32_t size;
> > > > > > +	/**< Number of entries in queue. Typically, this will be
> > > > > > +	 *   the same as the maximum number of entries supported in
> the
> > > > > > +	 *   lock free data structure.
> > > > > > +	 *   Data structures with unbounded number of entries is not
> > > > > > +	 *   supported currently.
> > > > > > +	 */
> > > > > > +	uint32_t esize;
> > > > > > +	/**< Size (in bytes) of each element in the defer queue.
> > > > > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > > > > +	 *   support 8B element sizes only.
> > > > > > +	 */
> > > > > > +	rte_rcu_qsbr_free_resource f;
> > > > > > +	/**< Function to call to free the resource. */
> > > > > > +	void *p;
> > > > >
> > > > > Style nit again - I like short names myself, but that seems a
> > > > > bit extreme... :) Might be at least:
> > > > > void (*reclaim)(void *, void *);
> > > > May be 'free_fn'?
> > > >
> > > > > void * reclaim_data;
> > > > > ?
> > > > This is the pointer to the data structure to free the resource
> > > > into. For ex: In
> > > LPM data structure, it will be pointer to LPM. 'reclaim_data'
> > > > does not convey the meaning correctly.
> > >
> > > Ok, please free to comeup with your own names.
> > > I just wanted to say that 'f' and 'p' are a bit an extreme for public API.
> > ok, this is the hardest thing to do 😊
> >
> > >
> > > >
> > > > >
> > > > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > > > +	 *   pointer to the data structure to which the resource to
> free
> > > > > > +	 *   belongs. This can be NULL.
> > > > > > +	 */
> > > > > > +	struct rte_rcu_qsbr *v;
> > > > >
> > > > > Does it need to be inside that struct?
> > > > > Might be better:
> > > > > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > > > > rte_rcu_qsbr_dq_parameters *params);
> > > > The API takes a parameter structure as input anyway, why to add
> > > > another argument to the function? QSBR variable is also another
> parameter.
> > > >
> > > > >
> > > > > Another alternative: make both reclaim() and enqueue() to take v
> > > > > as a parameter.
> > > > But both of them need access to some of the parameters provided in
> > > > rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to
> > > > the
> > > functions.
> > >
> > > Pure stylish thing.
> > > From my perspective it just provides better visibility what is going in the
> code:
> > > For QSBR var 'v' create a new deferred queue.
> > > But no strong opinion here.
> > >
> > > >
> > > > >
> > > > > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > > > > +
> > > > > > +/* RTE defer queue structure.
> > > > > > + * This structure holds the defer queue. The defer queue is
> > > > > > +used to
> > > > > > + * hold the deleted entries from the data structure that are
> > > > > > +not
> > > > > > + * yet freed.
> > > > > > + */
> > > > > > +struct rte_rcu_qsbr_dq;
> > > > > > +
> > > > > >  /**
> > > > > >   * @warning
> > > > > >   * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > @@
> > > > > > -648,6 +710,113 @@ __rte_experimental  int
> > > > > > rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> > > > > >
> > > > > > +/**
> > > > > > + * @warning
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + *
> > > > > > + * Create a queue used to store the data structure elements
> > > > > > +that can
> > > > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > > > + *
> > > > > > + * @param params
> > > > > > + *   Parameters to create a defer queue.
> > > > > > + * @return
> > > > > > + *   On success - Valid pointer to defer queue
> > > > > > + *   On error - NULL
> > > > > > + *   Possible rte_errno codes are:
> > > > > > + *   - EINVAL - NULL parameters are passed
> > > > > > + *   - ENOMEM - Not enough memory
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +struct rte_rcu_qsbr_dq *
> > > > > > +rte_rcu_qsbr_dq_create(const struct
> > > > > > +rte_rcu_qsbr_dq_parameters *params);
> > > > > > +
> > > > > > +/**
> > > > > > + * @warning
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + *
> > > > > > + * Enqueue one resource to the defer queue and start the grace
> period.
> > > > > > + * The resource will be freed later after at least one grace
> > > > > > +period
> > > > > > + * is over.
> > > > > > + *
> > > > > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > > > > + * It will also reclaim resources at regular intervals to
> > > > > > +avoid
> > > > > > + * the defer queue from growing too big.
> > > > > > + *
> > > > > > + * This API is not multi-thread safe. It is expected that the
> > > > > > +caller
> > > > > > + * provides multi-thread safety by locking a mutex or some other
> means.
> > > > > > + *
> > > > > > + * A lock free multi-thread writer algorithm could achieve
> > > > > > +multi-thread
> > > > > > + * safety by creating and using one defer queue per thread.
> > > > > > + *
> > > > > > + * @param dq
> > > > > > + *   Defer queue to allocate an entry from.
> > > > > > + * @param e
> > > > > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > > > > + *   the data to copy is equal to the element size provided when the
> > > > > > + *   defer queue was created.
> > > > > > + * @return
> > > > > > + *   On success - 0
> > > > > > + *   On error - 1 with rte_errno set to
> > > > > > + *   - EINVAL - NULL parameters are passed
> > > > > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > > > > + *		if the defer queue size is equal (or larger) than the
> > > > > > + *		number of elements in the data structure.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > > > > +
> > > > > > +/**
> > > > > > + * @warning
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + *
> > > > > > + * Reclaim resources from the defer queue.
> > > > > > + *
> > > > > > + * This API is not multi-thread safe. It is expected that the
> > > > > > +caller
> > > > > > + * provides multi-thread safety by locking a mutex or some other
> means.
> > > > > > + *
> > > > > > + * A lock free multi-thread writer algorithm could achieve
> > > > > > +multi-thread
> > > > > > + * safety by creating and using one defer queue per thread.
> > > > > > + *
> > > > > > + * @param dq
> > > > > > + *   Defer queue to reclaim an entry from.
> > > > > > + * @return
> > > > > > + *   On successful reclamation of at least 1 resource - 0
> > > > > > + *   On error - 1 with rte_errno set to
> > > > > > + *   - EINVAL - NULL parameters are passed
> > > > > > + *   - EAGAIN - None of the resources have completed at least 1
> grace
> > > > > period,
> > > > > > + *		try again.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > > > > +
> > > > > > +/**
> > > > > > + * @warning
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + *
> > > > > > + * Delete a defer queue.
> > > > > > + *
> > > > > > + * It tries to reclaim all the resources on the defer queue.
> > > > > > + * If any of the resources have not completed the grace
> > > > > > +period
> > > > > > + * the reclamation stops and returns immediately. The rest of
> > > > > > + * the resources are not reclaimed and the defer queue is not
> > > > > > + * freed.
> > > > > > + *
> > > > > > + * @param dq
> > > > > > + *   Defer queue to delete.
> > > > > > + * @return
> > > > > > + *   On success - 0
> > > > > > + *   On error - 1
> > > > > > + *   Possible rte_errno codes are:
> > > > > > + *   - EINVAL - NULL parameters are passed
> > > > > > + *   - EAGAIN - Some of the resources have not completed at least 1
> > > grace
> > > > > > + *		period, try again.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > > > > +
> > > > > >  #ifdef __cplusplus
> > > > > >  }
> > > > > >  #endif
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > > new file mode 100644
> > > > > > index 000000000..2122bc36a
> > > > > > --- /dev/null
> > > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > >
> > > > > Again style suggestion: as it is not public header - don't use
> > > > > rte_ prefix for naming.
> > > > > From my perspective - easier to relalize for reader what is
> > > > > public header, what is not.
> > > > Looks like the guidelines are not defined very well. I see one
> > > > private file with rte_ prefix. I see Stephen not using rte_
> > > > prefix. I do not have any
> > > preference. But, a consistent approach is required.
> > >
> > > That's just a suggestion.
> > > For me (and I hope for others) it would be a bit easier.
> > > When looking at the code for first time I had to look a t
> > > meson.build to check is it a public header or not.
> > > If the file doesn't have 'rte_' prefix, I assume that it is an
> > > internal one straightway.
> > > But , as you said, there is no exact guidelines here, so up to you to decide.
> > I think it makes sense to remove 'rte_' prefix. I will also change the file
> name to have '_private' suffix.
> > There are some inconsistencies in the existing code, will send a patch to
> correct them to follow this approach.
> >
> > >
> > > >
> > > > >
> > > > > > @@ -0,0 +1,46 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + * Copyright (c) 2019 Arm Limited  */
> > > > > > +
> > > > > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > > > > +#define _RTE_RCU_QSBR_PVT_H_
> > > > > > +
> > > > > > +/**
> > > > > > + * This file is private to the RCU library. It should not be
> > > > > > +included
> > > > > > + * by the user of this library.
> > > > > > + */
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +#include "rte_rcu_qsbr.h"
> > > > > > +
> > > > > > +/* RTE defer queue structure.
> > > > > > + * This structure holds the defer queue. The defer queue is
> > > > > > +used to
> > > > > > + * hold the deleted entries from the data structure that are
> > > > > > +not
> > > > > > + * yet freed.
> > > > > > + */
> > > > > > +struct rte_rcu_qsbr_dq {
> > > > > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this
> queue.*/
> > > > > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > > > > +	uint32_t size;
> > > > > > +	/**< Number of elements in the defer queue */
> > > > > > +	uint32_t esize;
> > > > > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > > > > +	rte_rcu_qsbr_free_resource f;
> > > > > > +	/**< Function to call to free the resource. */
> > > > > > +	void *p;
> > > > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > > > +	 *   pointer to the data structure to which the resource to
> free
> > > > > > +	 *   belongs.
> > > > > > +	 */
> > > > > > +	char e[0];
> > > > > > +	/**< Temporary storage to copy the defer queue element. */
> > > > >
> > > > > Do you really need 'e' at all?
> > > > > Can't it be just temporary stack variable?
> > > > Ok, will check.
> > > >
> > > > >
> > > > > > +};
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > > > > b/lib/librte_rcu/rte_rcu_version.map
> > > > > > index f8b9ef2ab..dfac88a37 100644
> > > > > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > > > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > > > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > > > > >  	rte_rcu_qsbr_synchronize;
> > > > > >  	rte_rcu_qsbr_thread_register;
> > > > > >  	rte_rcu_qsbr_thread_unregister;
> > > > > > +	rte_rcu_qsbr_dq_create;
> > > > > > +	rte_rcu_qsbr_dq_enqueue;
> > > > > > +	rte_rcu_qsbr_dq_reclaim;
> > > > > > +	rte_rcu_qsbr_dq_delete;
> > > > > >
> > > > > >  	local: *;
> > > > > >  };
> > > > > > diff --git a/lib/meson.build b/lib/meson.build index
> > > > > > e5ff83893..0e1be8407 100644
> > > > > > --- a/lib/meson.build
> > > > > > +++ b/lib/meson.build
> > > > > > @@ -11,7 +11,9 @@
> > > > > >  libraries = [
> > > > > >  	'kvargs', # eal depends on kvargs
> > > > > >  	'eal', # everything depends on eal
> > > > > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > > > +	'ring',
> > > > > > +	'rcu', # rcu depends on ring
> > > > > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > > >  	'cmdline',
> > > > > >  	'metrics', # bitrate/latency stats depends on this
> > > > > >  	'hash',    # efd depends on this
> > > > > > @@ -22,7 +24,7 @@ libraries = [
> > > > > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > > > > >  	'kni', 'latencystats', 'lpm', 'member',
> > > > > >  	'power', 'pdump', 'rawdev',
> > > > > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > > >  	# ipsec lib depends on net, crypto and security
> > > > > >  	'ipsec',
> > > > > >  	# add pkt framework libs which use other libs from above
> > > > > > --
> > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-07  9:21       ` Ananyev, Konstantin
@ 2019-10-13  4:36         ` Honnappa Nagarahalli
  2019-10-15 11:15           ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-13  4:36 UTC (permalink / raw)
  To: Ananyev, Konstantin, Richardson, Bruce, Medvedkin, Vladimir,
	olivier.matz
  Cc: dev, stephen, paulmck, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	nd, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd

<snip>

> Hi guys,
I have tried to consolidate design related questions here. If I have missed anything, please add.

> 
> >
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> >
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  lib/librte_lpm/Makefile            |   3 +-
> >  lib/librte_lpm/meson.build         |   2 +
> >  lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
> >  lib/librte_lpm/rte_lpm.h           |  21 ++++++
> >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> >  5 files changed, 122 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > a7946a1c5..ca9e16312 100644
> > --- a/lib/librte_lpm/Makefile
> > +++ b/lib/librte_lpm/Makefile
> > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name
> > LIB = librte_lpm.a
> >
> > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> >  CFLAGS += -O3
> >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal -lrte_hash
> > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> >
> >  EXPORT_MAP := rte_lpm_version.map
> >
> > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > index a5176d8ae..19a35107f 100644
> > --- a/lib/librte_lpm/meson.build
> > +++ b/lib/librte_lpm/meson.build
> > @@ -2,9 +2,11 @@
> >  # Copyright(c) 2017 Intel Corporation
> >
> >  version = 2
> > +allow_experimental_apis = true
> >  sources = files('rte_lpm.c', 'rte_lpm6.c')  headers =
> > files('rte_lpm.h', 'rte_lpm6.h')  # since header files have different
> > names, we can install all vector headers  # without worrying about
> > which architecture we actually need  headers +=
> > files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')  deps +=
> > ['hash']
> > +deps += ['rcu']
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 3a929a1b1..ca58d4b35 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >   */
> >
> >  #include <string.h>
> > @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> >
> >  	rte_mcfg_tailq_write_unlock();
> >
> > +	if (lpm->dq)
> > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> >  	rte_free(lpm->tbl8);
> >  	rte_free(lpm->rules_tbl);
> >  	rte_free(lpm);
> > @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> 16.04);
> > MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> >  		rte_lpm_free_v1604);
> >
> > +struct __rte_lpm_rcu_dq_entry {
> > +	uint32_t tbl8_group_index;
> > +	uint32_t pad;
> > +};
> > +
> > +static void
> > +__lpm_rcu_qsbr_free_resource(void *p, void *data) {
> > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	struct __rte_lpm_rcu_dq_entry *e =
> > +			(struct __rte_lpm_rcu_dq_entry *)data;
> > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > +
> > +	/* Set tbl8 group invalid */
> > +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> > +		__ATOMIC_RELAXED);
> > +}
> > +
> > +/* Associate QSBR variable with an LPM object.
> > + */
> > +int
> > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +
> > +	if ((lpm == NULL) || (v == NULL)) {
> > +		rte_errno = EINVAL;
> > +		return 1;
> > +	}
> > +
> > +	if (lpm->dq) {
> > +		rte_errno = EEXIST;
> > +		return 1;
> > +	}
> > +
> > +	/* Init QSBR defer queue. */
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm-
> >name);
> > +	params.name = rcu_dq_name;
> > +	params.size = lpm->number_tbl8s;
> > +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> > +	params.f = __lpm_rcu_qsbr_free_resource;
> > +	params.p = lpm->tbl8;
> > +	params.v = v;
> > +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > +	if (lpm->dq == NULL) {
> > +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> > +		return 1;
> > +	}
> 
> Few thoughts about that function:
Few things to keep in mind, the goal of the design is to make it easy for the applications to adopt lock-free algorithms. The reclamation process in the writer is a major portion of code one has to write for using lock-free algorithms. The current design is such that the writer does not have to change any code or write additional code other than calling 'rte_lpm_rcu_qsbr_add'.

> It names rcu_qsbr_add() but in fact it allocates defer queue for give rcu var.
> So first thought - is it always necessary?
This is part of the design. If the application does not want to use this integrated logic then, it does not have to call this API. It can use the RCU defer APIs to implement its own logic. But, if I ask the question, does this integrated logic address most of the use cases of the LPM library, I think the answer is yes.

> For some use-cases I suppose user might be ok to wait for quiescent state
> change
> inside tbl8_free()?
Yes, that is a possibility (for ex: no frequent route changes). But, I think that is very trivial for the application to implement. Though, the LPM library has to separate the 'delete' and 'free' operations. Similar operations are provided in rte_hash library. IMO, we should follow consistent approach.

> Another thing you do allocate defer queue, but it is internal, so user can't call
> reclaim() manually, which looks strange.
> Why not to return defer_queue pointer to the user, so he can call reclaim()
> himself at appropriate time?
The intention of the design is to take the complexity away from the user of LPM library. IMO, the current design will address most uses cases of LPM library. If we expose the 2 parameters (when to trigger reclamation and how much to reclaim) in the 'rte_lpm_rcu_qsbr_add' API, it should provide enough flexibility to the application.

> Third thing - you always allocate defer queue with size equal to number of
> tbl8.
> Though I understand it could be up to 16M tbl8 groups inside the LPM.
> Do we really need defer queue that long?
No, we do not need it to be this long. It is this long today to avoid returning no-space on the defer queue error.

> Especially  considering that current rcu_defer_queue will start reclamation
> when 1/8 of defer_quueue becomes full and wouldn't reclaim more then
> 1/16 of it.
> Probably better to let user to decide himself how long defer_queue he needs
> for that LPM?
It makes sense to expose it to the user if the writer-writer concurrency is lock-free (no memory allocation allowed to expand the defer queue size when the queue is full). However, LPM is not lock-free on the writer side. If we think the writer could be lock-free in the future, it has to be exposed to the user. 

> 
> Konstantin
Pulling questions/comments from other threads:
Can we leave reclamation to some other house-keeping thread to do (sort of garbage collector). Or such mode is not supported/planned?

[Honnappa] If the reclamation cost is small, the current method provides advantages over having a separate thread to do reclamation. I did not plan to provide such an option. But may be it makes sense to keep the options open (especially from ABI perspective). May be we should add a flags field which will allow us to implement different methods in the future?

> 
> 
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Adds a rule to the rule table.
> >   *
> > @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> > *tbl8)  }
> >
> >  static int32_t
> > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > number_tbl8s)
> > +__tbl8_alloc_v1604(struct rte_lpm *lpm)
> >  {
> >  	uint32_t group_idx; /* tbl8 group index. */
> >  	struct rte_lpm_tbl_entry *tbl8_entry;
> >
> >  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > +		tbl8_entry = &lpm->tbl8[group_idx *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> >  		/* If a free tbl8 group is found clean it and set as VALID. */
> >  		if (!tbl8_entry->valid_group) {
> >  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 712,6 +769,21 @@
> > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> >  	return -ENOSPC;
> >  }
> >
> > +static int32_t
> > +tbl8_alloc_v1604(struct rte_lpm *lpm) {
> > +	int32_t group_idx; /* tbl8 group index. */
> > +
> > +	group_idx = __tbl8_alloc_v1604(lpm);
> > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> > +		/* If there are no tbl8 groups try to reclaim some. */
> > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> > +			group_idx = __tbl8_alloc_v1604(lpm);
> > +	}
> > +
> > +	return group_idx;
> > +}
> > +
> >  static void
> >  tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t
> > tbl8_group_start)  { @@ -728,13 +800,21 @@ tbl8_free_v20(struct
> > rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)  }
> >
> >  static void
> > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > tbl8_group_start)
> > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> >  {
> > -	/* Set tbl8 group invalid*/
> >  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	struct __rte_lpm_rcu_dq_entry e;
> >
> > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > -			__ATOMIC_RELAXED);
> > +	if (lpm->dq != NULL) {
> > +		e.tbl8_group_index = tbl8_group_start;
> > +		e.pad = 0;
> > +		/* Push into QSBR defer queue. */
> > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> > +	} else {
> > +		/* Set tbl8 group invalid*/
> > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> > +				__ATOMIC_RELAXED);
> > +	}
> >  }
> >
> >  static __rte_noinline int32_t
> > @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked, uint8_t depth,
> >
> >  	if (!lpm->tbl24[tbl24_index].valid) {
> >  		/* Search for a free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >  		/* Check tbl8 allocation was successful. */
> >  		if (tbl8_group_index < 0) {
> > @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> >  	} /* If valid entry but not extended calculate the index into Table8. */
> >  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> >  		/* Search for free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >  		if (tbl8_group_index < 0) {
> >  			return tbl8_group_index;
> > @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
> >  		 */
> >  		lpm->tbl24[tbl24_index].valid = 0;
> >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >  	} else if (tbl8_recycle_index > -1) {
> >  		/* Update tbl24 entry. */
> >  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> +1914,7 @@
> > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> >  		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> >  				__ATOMIC_RELAXED);
> >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >  	}
> >  #undef group_idx
> >  	return 0;
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > 906ec4483..49c12a68d 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >   */
> >
> >  #ifndef _RTE_LPM_H_
> > @@ -21,6 +22,7 @@
> >  #include <rte_common.h>
> >  #include <rte_vect.h>
> >  #include <rte_compat.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> > @@ -186,6 +188,7 @@ struct rte_lpm {
> >  			__rte_cache_aligned; /**< LPM tbl24 table. */
> >  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
> >  };
> >
> >  /**
> > @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> void
> > rte_lpm_free_v1604(struct rte_lpm *lpm);
> >
> > +/**
> > + * Associate RCU QSBR variable with an LPM object.
> > + *
> > + * @param lpm
> > + *   the lpm object to add RCU QSBR
> > + * @param v
> > + *   RCU QSBR variable
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with error code set in rte_errno.
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - invalid pointer
> > + *   - EEXIST - already added QSBR
> > + *   - ENOMEM - memory allocation failure
> > + */
> > +__rte_experimental
> > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > +*v);
> > +
> >  /**
> >   * Add a rule to the LPM table.
> >   *
> > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > b/lib/librte_lpm/rte_lpm_version.map
> > index 90beac853..b353aabd2 100644
> > --- a/lib/librte_lpm/rte_lpm_version.map
> > +++ b/lib/librte_lpm/rte_lpm_version.map
> > @@ -44,3 +44,9 @@ DPDK_17.05 {
> >  	rte_lpm6_lookup_bulk_func;
> >
> >  } DPDK_16.04;
> > +
> > +EXPERIMENTAL {
> > +	global:
> > +
> > +	rte_lpm_rcu_qsbr_add;
> > +};
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-11 18:28                     ` Honnappa Nagarahalli
@ 2019-10-13 20:09                       ` Ananyev, Konstantin
  2019-10-14  4:11                         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-13 20:09 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd



> > > > >
> > > > > >
> > > > > > >
> > > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > > >
> > > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > >
> > > > > > > > > The peek API allows fetching the next available object in
> > > > > > > > > the ring without dequeuing it. This helps in scenarios
> > > > > > > > > where dequeuing of objects depend on their value.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > > ---
> > > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18
> > > > > > > > > 100644
> > > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > > rte_ring *r, void
> > > > > > > > **obj_table,
> > > > > > > > >  				r->cons.single, available);  }
> > > > > > > > >
> > > > > > > > > +/**
> > > > > > > > > + * Peek one object from a ring.
> > > > > > > > > + *
> > > > > > > > > + * The peek API allows fetching the next available object
> > > > > > > > > +in the ring
> > > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > > +safe with respect
> > > > > > > > > + * to other consumer threads.
> > > > > > > > > + *
> > > > > > > > > + * @param r
> > > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > > + * @param obj_p
> > > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > > + * @return
> > > > > > > > > + *   - 0: Success, object available
> > > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > > + */
> > > > > > > > > +__rte_experimental
> > > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > > >
> > > > > > > > As it is not MT safe, then I think we need _sc_ in the name,
> > > > > > > > to follow other rte_ring functions naming conventions
> > > > > > > > (rte_ring_sc_peek() or so).
> > > > > > > Agree
> > > > > > >
> > > > > > > >
> > > > > > > > As a better alternative what do you think about introducing
> > > > > > > > a serialized versions of DPDK rte_ring dequeue functions?
> > > > > > > > Something like that:
> > > > > > > >
> > > > > > > > /* same as original ring dequeue, but:
> > > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > > >   * 2) don't update cons.tail
> > > > > > > >   */
> > > > > > > > unsigned int
> > > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > > **obj_table, unsigned int n,
> > > > > > > >                 unsigned int *available);
> > > > > > > >
> > > > > > > > /* sets both cons.head and cons.tail to cons.head + num */
> > > > > > > > void rte_ring_serial_dequeue_finish(struct rte_ring *r,
> > > > > > > > uint32_t num);
> > > > > > > >
> > > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > > >
> > > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > > >
> > > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl,
> > > > > > > > n; uintptr_t elt[nb_elt]; ...
> > > > > > > >
> > > > > > > > do {
> > > > > > > >
> > > > > > > >   /* read next elem from the queue */
> > > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > > >   if (n == 0)
> > > > > > > >       break;
> > > > > > > >
> > > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > > elt[0]) != 1) {
> > > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > > >      break;
> > > > > > > >   }
> > > > > > > >
> > > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > > >
> > > > > > > >    /*call reclaim function */
> > > > > > > >   dr->f(dr->p, elt);
> > > > > > > >
> > > > > > > > } while (avl >= nb_elt);
> > > > > > > >
> > > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > > > >
> > > > > > > I think it is a great idea. The other writers would still be
> > > > > > > polling for the current writer to update the tail or update
> > > > > > > the head. This makes it a
> > > > > > blocking solution.
> > > > > >
> > > > > > Yep, it is a blocking one.
> > > > > >
> > > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > > reclaiming if they
> > > > > > see that other writers are dequeuing from the queue.
> > > > > >
> > > > > > Actually didn't think about that possibility, but yes should be
> > > > > > possible to have _try_ semantics too.
> > > > > >
> > > > > > >The other  way is to use per thread queues.
> > > > > > >
> > > > > > > The other requirement I see is to support unbounded-size data
> > > > > > > structures where in the data structures do not have a
> > > > > > > pre-determined number of entries. Also, currently the defer
> > > > > > > queue size is equal to the total
> > > > > > number of entries in a given data structure. There are plans to
> > > > > > support dynamically resizable defer queue. This means, memory
> > > > > > allocation which will affect the lock-free-ness of the solution.
> > > > > > >
> > > > > > > So, IMO:
> > > > > > > 1) The API should provide the capability to support different
> > > > > > > algorithms -
> > > > > > may be through some flags?
> > > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > > cons-tail is also the same, skip
> > > > > > polling). So, we should probably implement a ring with-in the RCU
> > library?
> > > > > >
> > > > > > Personally, I think such serialization ring API would be useful
> > > > > > for other cases too.
> > > > > > There are few cases when user need to read contents of the queue
> > > > > > without removing elements from it.
> > > > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > > > transmit queue.
> > > > > > If such API would exist in DPDK we can just use it straightway,
> > > > > > without maintaining a separate one.
> > > > > ok
> > > > >
> > > > > >
> > > > > > >
> > > > > > > From the timeline perspective, adding all these capabilities
> > > > > > > would be difficult to get done with in 19.11 timeline. What I
> > > > > > > have here satisfies my current needs. I suggest that we make
> > > > > > > provisions in APIs now to
> > > > > > support all these features, but do the implementation in the
> > > > > > coming
> > > > releases.
> > > > > > Does this sound ok for you?
> > > > > >
> > > > > > Not sure I understand your suggestion here...
> > > > > > Could you explain it a bit more - how new API will look like and
> > > > > > what would be left for the future.
> > > > > For this patch, I suggest we do not add any more complexity. If
> > > > > someone wants a lock-free/block-free mechanism, it is available by
> > > > > creating
> > > > per thread defer queues.
> > > > >
> > > > > We push the following to the future:
> > > > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > > > lock-free/block-free reclamation will not be available (memory
> > > > > allocation
> > > > requires locking). The memory for the defer queue will be
> > > > allocated/freed in chunks of 'size' elements as the queue grows/shrinks.
> > > >
> > > > That one is fine by me.
> > > > In fact I don't know would be there a real use-case for dynamic
> > > > defer queue for rcu var...
> > > > But I suppose that's subject for another discussion.
> > > Currently, the defer queue size is equal to the number of resources in
> > > the data structure. This is unnecessary as the reclamation is done regularly.
> > > If a smaller queue size is used, the queue might get full (even after
> > reclamation), in which case, the queue size should be increased.
> >
> > I understand the intention.
> > Though I am not very happy with approach where to free one resource we first
> > have to allocate another one.
> > Sounds like a source of deadlocks and for that case probably unnecessary
> > complication.
> It depends on the use case. For some use cases lock-free reader-writer concurrency is enough (in which case there is no need to have a
> queue large enough to hold all the resources) and some would require lock-free reader-writer and writer-writer concurrency (where,
> theoretically, a queue large enough to hold all the resources would be required).
> 
> > But again, as it is not for 19.11 we don't have to discuss it now.
> >
> > > >
> > > > >
> > > > > 2) Constant size defer queue with lock-free and block-free
> > > > > reclamation (single option). The defer queue will be of fixed
> > > > > length 'size'. If the queue gets full an error is returned. The
> > > > > user could provide a 'size' equal
> > > > to the number of elements in a data structure to ensure queue never gets
> > full.
> > > >
> > > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > > - MP/MC
> > > > - MP/SC
> > > > - SP/SC
> > > Just SP/SC
> >
> > Ok, just to confirm we are on the same page:
> > there would be a possibility for one thread do dq_enqueue(), second one do
> > dq_reclaim() simultaneously (of course if actual reclamation function is thread
> > safe)?
> Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.

Ok, and that probably due to nature of ring_sc_peek(), right?.
BuT user can set reclaim threshold higher then number of elems in the defere queue,
and that should help to prevent dq_reclaim() from inside dq_enqueue(), correct?
If so, I have no objections in general to the proposed plan.
Konstantin

> 
> >
> > > > - non MT at all (only same single thread can do enqueue and dequeue)
> > > If MT safe is required, one should use 1 defer queue per thread for now.
> > >
> > > >
> > > > And related question:
> > > > What additional rte_ring API you plan to introduce in that case?
> > > > - None
> > > > - rte_ring_sc_peek()
> > > rte_ring_peek will be changed to rte_ring_sc_peek
> > >
> > > > - rte_ring_serial_dequeue()
> > > >
> > > > >
> > > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > > provide
> > > > > 2 #defines, one for dynamically variable size defer queue and the
> > > > > other for
> > > > constant size defer queue.
> > > > >
> > > > > However, IMO, using per thread defer queue is a much simpler way
> > > > > to
> > > > achieve 2. It does not add any significant burden to the user either.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > +{
> > > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > > +	unsigned int n = 1;
> > > > > > > > > +	if (count) {
> > > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > > > +		return 0;
> > > > > > > > > +	}
> > > > > > > > > +	return -ENOENT;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  #ifdef __cplusplus
> > > > > > > > >  }
> > > > > > > > >  #endif
> > > > > > > > > --
> > > > > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-13 20:09                       ` Ananyev, Konstantin
@ 2019-10-14  4:11                         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-14  4:11 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> > > > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > > > >
> > > > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > >
> > > > > > > > > > The peek API allows fetching the next available object
> > > > > > > > > > in the ring without dequeuing it. This helps in
> > > > > > > > > > scenarios where dequeuing of objects depend on their value.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Dharmik Thakkar
> > > > > > > > > > <dharmik.thakkar@arm.com>
> > > > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > > > ---
> > > > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > > > b/lib/librte_ring/rte_ring.h index
> > > > > > > > > > 2a9f768a1..d3d0d5e18
> > > > > > > > > > 100644
> > > > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > > > rte_ring *r, void
> > > > > > > > > **obj_table,
> > > > > > > > > >  				r->cons.single, available);  }
> > > > > > > > > >
> > > > > > > > > > +/**
> > > > > > > > > > + * Peek one object from a ring.
> > > > > > > > > > + *
> > > > > > > > > > + * The peek API allows fetching the next available
> > > > > > > > > > +object in the ring
> > > > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > > > +safe with respect
> > > > > > > > > > + * to other consumer threads.
> > > > > > > > > > + *
> > > > > > > > > > + * @param r
> > > > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > > > + * @param obj_p
> > > > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > > > + * @return
> > > > > > > > > > + *   - 0: Success, object available
> > > > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > > > + */
> > > > > > > > > > +__rte_experimental
> > > > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > > > >
> > > > > > > > > As it is not MT safe, then I think we need _sc_ in the
> > > > > > > > > name, to follow other rte_ring functions naming
> > > > > > > > > conventions
> > > > > > > > > (rte_ring_sc_peek() or so).
> > > > > > > > Agree
> > > > > > > >
> > > > > > > > >
> > > > > > > > > As a better alternative what do you think about
> > > > > > > > > introducing a serialized versions of DPDK rte_ring dequeue
> functions?
> > > > > > > > > Something like that:
> > > > > > > > >
> > > > > > > > > /* same as original ring dequeue, but:
> > > > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > > > >   * 2) don't update cons.tail
> > > > > > > > >   */
> > > > > > > > > unsigned int
> > > > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > > > **obj_table, unsigned int n,
> > > > > > > > >                 unsigned int *available);
> > > > > > > > >
> > > > > > > > > /* sets both cons.head and cons.tail to cons.head + num
> > > > > > > > > */ void rte_ring_serial_dequeue_finish(struct rte_ring
> > > > > > > > > *r, uint32_t num);
> > > > > > > > >
> > > > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > > > >
> > > > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > > > >
> > > > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t
> > > > > > > > > avl, n; uintptr_t elt[nb_elt]; ...
> > > > > > > > >
> > > > > > > > > do {
> > > > > > > > >
> > > > > > > > >   /* read next elem from the queue */
> > > > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > > > >   if (n == 0)
> > > > > > > > >       break;
> > > > > > > > >
> > > > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > > > elt[0]) != 1) {
> > > > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > > > >      break;
> > > > > > > > >   }
> > > > > > > > >
> > > > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > > > >
> > > > > > > > >    /*call reclaim function */
> > > > > > > > >   dr->f(dr->p, elt);
> > > > > > > > >
> > > > > > > > > } while (avl >= nb_elt);
> > > > > > > > >
> > > > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT
> safe.
> > > > > > > > > As long as actual reclamation callback itself is MT safe of
> course.
> > > > > > > >
> > > > > > > > I think it is a great idea. The other writers would still
> > > > > > > > be polling for the current writer to update the tail or
> > > > > > > > update the head. This makes it a
> > > > > > > blocking solution.
> > > > > > >
> > > > > > > Yep, it is a blocking one.
> > > > > > >
> > > > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > > > reclaiming if they
> > > > > > > see that other writers are dequeuing from the queue.
> > > > > > >
> > > > > > > Actually didn't think about that possibility, but yes should
> > > > > > > be possible to have _try_ semantics too.
> > > > > > >
> > > > > > > >The other  way is to use per thread queues.
> > > > > > > >
> > > > > > > > The other requirement I see is to support unbounded-size
> > > > > > > > data structures where in the data structures do not have a
> > > > > > > > pre-determined number of entries. Also, currently the
> > > > > > > > defer queue size is equal to the total
> > > > > > > number of entries in a given data structure. There are plans
> > > > > > > to support dynamically resizable defer queue. This means,
> > > > > > > memory allocation which will affect the lock-free-ness of the
> solution.
> > > > > > > >
> > > > > > > > So, IMO:
> > > > > > > > 1) The API should provide the capability to support
> > > > > > > > different algorithms -
> > > > > > > may be through some flags?
> > > > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > > > cons-tail is also the same, skip
> > > > > > > polling). So, we should probably implement a ring with-in
> > > > > > > the RCU
> > > library?
> > > > > > >
> > > > > > > Personally, I think such serialization ring API would be
> > > > > > > useful for other cases too.
> > > > > > > There are few cases when user need to read contents of the
> > > > > > > queue without removing elements from it.
> > > > > > > Let say we do use similar approach inside TLDK to implement
> > > > > > > TCP transmit queue.
> > > > > > > If such API would exist in DPDK we can just use it
> > > > > > > straightway, without maintaining a separate one.
> > > > > > ok
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > From the timeline perspective, adding all these
> > > > > > > > capabilities would be difficult to get done with in 19.11
> > > > > > > > timeline. What I have here satisfies my current needs. I
> > > > > > > > suggest that we make provisions in APIs now to
> > > > > > > support all these features, but do the implementation in the
> > > > > > > coming
> > > > > releases.
> > > > > > > Does this sound ok for you?
> > > > > > >
> > > > > > > Not sure I understand your suggestion here...
> > > > > > > Could you explain it a bit more - how new API will look like
> > > > > > > and what would be left for the future.
> > > > > > For this patch, I suggest we do not add any more complexity.
> > > > > > If someone wants a lock-free/block-free mechanism, it is
> > > > > > available by creating
> > > > > per thread defer queues.
> > > > > >
> > > > > > We push the following to the future:
> > > > > > 1) Dynamically size adjustable defer queue. IMO, with this,
> > > > > > the lock-free/block-free reclamation will not be available
> > > > > > (memory allocation
> > > > > requires locking). The memory for the defer queue will be
> > > > > allocated/freed in chunks of 'size' elements as the queue
> grows/shrinks.
> > > > >
> > > > > That one is fine by me.
> > > > > In fact I don't know would be there a real use-case for dynamic
> > > > > defer queue for rcu var...
> > > > > But I suppose that's subject for another discussion.
> > > > Currently, the defer queue size is equal to the number of
> > > > resources in the data structure. This is unnecessary as the reclamation is
> done regularly.
> > > > If a smaller queue size is used, the queue might get full (even
> > > > after
> > > reclamation), in which case, the queue size should be increased.
> > >
> > > I understand the intention.
> > > Though I am not very happy with approach where to free one resource
> > > we first have to allocate another one.
> > > Sounds like a source of deadlocks and for that case probably
> > > unnecessary complication.
> > It depends on the use case. For some use cases lock-free reader-writer
> > concurrency is enough (in which case there is no need to have a queue
> > large enough to hold all the resources) and some would require lock-free
> reader-writer and writer-writer concurrency (where, theoretically, a queue
> large enough to hold all the resources would be required).
> >
> > > But again, as it is not for 19.11 we don't have to discuss it now.
> > >
> > > > >
> > > > > >
> > > > > > 2) Constant size defer queue with lock-free and block-free
> > > > > > reclamation (single option). The defer queue will be of fixed
> > > > > > length 'size'. If the queue gets full an error is returned.
> > > > > > The user could provide a 'size' equal
> > > > > to the number of elements in a data structure to ensure queue
> > > > > never gets
> > > full.
> > > > >
> > > > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > > > - MP/MC
> > > > > - MP/SC
> > > > > - SP/SC
> > > > Just SP/SC
> > >
> > > Ok, just to confirm we are on the same page:
> > > there would be a possibility for one thread do dq_enqueue(), second
> > > one do
> > > dq_reclaim() simultaneously (of course if actual reclamation
> > > function is thread safe)?
> > Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.
This is not completely correct (as you have pointed out below) as dq_enqueue, will end up calling do_reclaim
> 
> Ok, and that probably due to nature of ring_sc_peek(), right?.
> BuT user can set reclaim threshold higher then number of elems in the defere
> queue, and that should help to prevent dq_reclaim() from inside
> dq_enqueue(), correct?
Yes, that is possible.

> If so, I have no objections in general to the proposed plan.
> Konstantin
> 
> >
> > >
> > > > > - non MT at all (only same single thread can do enqueue and
> > > > > dequeue)
> > > > If MT safe is required, one should use 1 defer queue per thread for now.
> > > >
> > > > >
> > > > > And related question:
> > > > > What additional rte_ring API you plan to introduce in that case?
> > > > > - None
> > > > > - rte_ring_sc_peek()
> > > > rte_ring_peek will be changed to rte_ring_sc_peek
> > > >
> > > > > - rte_ring_serial_dequeue()
> > > > >
> > > > > >
> > > > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > > > provide
> > > > > > 2 #defines, one for dynamically variable size defer queue and
> > > > > > the other for
> > > > > constant size defer queue.
> > > > > >
> > > > > > However, IMO, using per thread defer queue is a much simpler
> > > > > > way to
> > > > > achieve 2. It does not add any significant burden to the user either.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > +{
> > > > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > > > +	unsigned int n = 1;
> > > > > > > > > > +	if (count) {
> > > > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n,
> void *);
> > > > > > > > > > +		return 0;
> > > > > > > > > > +	}
> > > > > > > > > > +	return -ENOENT;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  #ifdef __cplusplus
> > > > > > > > > >  }
> > > > > > > > > >  #endif
> > > > > > > > > > --
> > > > > > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-13  4:36         ` Honnappa Nagarahalli
@ 2019-10-15 11:15           ` Ananyev, Konstantin
  2019-10-18  3:32             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-15 11:15 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce, Medvedkin, Vladimir,
	olivier.matz
  Cc: dev, stephen, paulmck, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	nd, Ruifeng Wang (Arm Technology China),
	nd


> <snip>
> 
> > Hi guys,
> I have tried to consolidate design related questions here. If I have missed anything, please add.
> 
> >
> > >
> > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > >
> > > Currently, the tbl8 group is freed even though the readers might be
> > > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > > quickly. This results in incorrect lookup results.
> > >
> > > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > > Refer to RCU documentation to understand various aspects of
> > > integrating RCU library into other libraries.
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > ---
> > >  lib/librte_lpm/Makefile            |   3 +-
> > >  lib/librte_lpm/meson.build         |   2 +
> > >  lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
> > >  lib/librte_lpm/rte_lpm.h           |  21 ++++++
> > >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> > >  5 files changed, 122 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > > a7946a1c5..ca9e16312 100644
> > > --- a/lib/librte_lpm/Makefile
> > > +++ b/lib/librte_lpm/Makefile
> > > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name
> > > LIB = librte_lpm.a
> > >
> > > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> > >  CFLAGS += -O3
> > >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal -lrte_hash
> > > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> > >
> > >  EXPORT_MAP := rte_lpm_version.map
> > >
> > > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > > index a5176d8ae..19a35107f 100644
> > > --- a/lib/librte_lpm/meson.build
> > > +++ b/lib/librte_lpm/meson.build
> > > @@ -2,9 +2,11 @@
> > >  # Copyright(c) 2017 Intel Corporation
> > >
> > >  version = 2
> > > +allow_experimental_apis = true
> > >  sources = files('rte_lpm.c', 'rte_lpm6.c')  headers =
> > > files('rte_lpm.h', 'rte_lpm6.h')  # since header files have different
> > > names, we can install all vector headers  # without worrying about
> > > which architecture we actually need  headers +=
> > > files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')  deps +=
> > > ['hash']
> > > +deps += ['rcu']
> > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > > 3a929a1b1..ca58d4b35 100644
> > > --- a/lib/librte_lpm/rte_lpm.c
> > > +++ b/lib/librte_lpm/rte_lpm.c
> > > @@ -1,5 +1,6 @@
> > >  /* SPDX-License-Identifier: BSD-3-Clause
> > >   * Copyright(c) 2010-2014 Intel Corporation
> > > + * Copyright(c) 2019 Arm Limited
> > >   */
> > >
> > >  #include <string.h>
> > > @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> > >
> > >  	rte_mcfg_tailq_write_unlock();
> > >
> > > +	if (lpm->dq)
> > > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> > >  	rte_free(lpm->tbl8);
> > >  	rte_free(lpm->rules_tbl);
> > >  	rte_free(lpm);
> > > @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> > 16.04);
> > > MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> > >  		rte_lpm_free_v1604);
> > >
> > > +struct __rte_lpm_rcu_dq_entry {
> > > +	uint32_t tbl8_group_index;
> > > +	uint32_t pad;
> > > +};
> > > +
> > > +static void
> > > +__lpm_rcu_qsbr_free_resource(void *p, void *data) {
> > > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > > +	struct __rte_lpm_rcu_dq_entry *e =
> > > +			(struct __rte_lpm_rcu_dq_entry *)data;
> > > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > > +
> > > +	/* Set tbl8 group invalid */
> > > +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> > > +		__ATOMIC_RELAXED);
> > > +}
> > > +
> > > +/* Associate QSBR variable with an LPM object.
> > > + */
> > > +int
> > > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > > +	struct rte_rcu_qsbr_dq_parameters params;
> > > +
> > > +	if ((lpm == NULL) || (v == NULL)) {
> > > +		rte_errno = EINVAL;
> > > +		return 1;
> > > +	}
> > > +
> > > +	if (lpm->dq) {
> > > +		rte_errno = EEXIST;
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Init QSBR defer queue. */
> > > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm-
> > >name);
> > > +	params.name = rcu_dq_name;
> > > +	params.size = lpm->number_tbl8s;
> > > +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> > > +	params.f = __lpm_rcu_qsbr_free_resource;
> > > +	params.p = lpm->tbl8;
> > > +	params.v = v;
> > > +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > > +	if (lpm->dq == NULL) {
> > > +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> > > +		return 1;
> > > +	}
> >
> > Few thoughts about that function:
> Few things to keep in mind, the goal of the design is to make it easy for the applications to adopt lock-free algorithms. The reclamation
> process in the writer is a major portion of code one has to write for using lock-free algorithms. The current design is such that the writer
> does not have to change any code or write additional code other than calling 'rte_lpm_rcu_qsbr_add'.
> 
> > It names rcu_qsbr_add() but in fact it allocates defer queue for give rcu var.
> > So first thought - is it always necessary?
> This is part of the design. If the application does not want to use this integrated logic then, it does not have to call this API. It can use the
> RCU defer APIs to implement its own logic. But, if I ask the question, does this integrated logic address most of the use cases of the LPM
> library, I think the answer is yes.
> 
> > For some use-cases I suppose user might be ok to wait for quiescent state
> > change
> > inside tbl8_free()?
> Yes, that is a possibility (for ex: no frequent route changes). But, I think that is very trivial for the application to implement. Though, the LPM
> library has to separate the 'delete' and 'free' operations. 

Exactly.
That's why it is not trivial with current LPM library.
In fact to do that himself right now, user would have to implement and support his own version of LPM code.

Honestly, I don't understand why you consider it as a drawback.
From my perspective only few things need to be changed:

1. Add 2 parameters to 'rte_lpm_rcu_qsbr_add():
    number of elems in defer_queue
    reclaim() threshold value.
If the user doesn't want to provide any values, that's fine we can use default ones here
(as you do it right now).
2. Make rte_lpm_rcu_qsbr_add() to return pointer to the defer_queue.
Again if user doesn't want to call reclaim() himself, he can just ignore return value.

These 2 changes will provide us with necessary flexibility that would help to cover more use-cases:
- user can decide how big should be the defer queue
- user can decide when/how he wants to do reclaim()

Konstantin

>Similar operations are provided in rte_hash library. IMO, we should follow
> consistent approach.
> 
> > Another thing you do allocate defer queue, but it is internal, so user can't call
> > reclaim() manually, which looks strange.
> > Why not to return defer_queue pointer to the user, so he can call reclaim()
> > himself at appropriate time?
> The intention of the design is to take the complexity away from the user of LPM library. IMO, the current design will address most uses
> cases of LPM library. If we expose the 2 parameters (when to trigger reclamation and how much to reclaim) in the 'rte_lpm_rcu_qsbr_add'
> API, it should provide enough flexibility to the application.
> 
> > Third thing - you always allocate defer queue with size equal to number of
> > tbl8.
> > Though I understand it could be up to 16M tbl8 groups inside the LPM.
> > Do we really need defer queue that long?
> No, we do not need it to be this long. It is this long today to avoid returning no-space on the defer queue error.
> 
> > Especially  considering that current rcu_defer_queue will start reclamation
> > when 1/8 of defer_quueue becomes full and wouldn't reclaim more then
> > 1/16 of it.
> > Probably better to let user to decide himself how long defer_queue he needs
> > for that LPM?
> It makes sense to expose it to the user if the writer-writer concurrency is lock-free (no memory allocation allowed to expand the defer
> queue size when the queue is full). However, LPM is not lock-free on the writer side. If we think the writer could be lock-free in the future, it
> has to be exposed to the user.
> 
> >
> > Konstantin
> Pulling questions/comments from other threads:
> Can we leave reclamation to some other house-keeping thread to do (sort of garbage collector). Or such mode is not supported/planned?
> 
> [Honnappa] If the reclamation cost is small, the current method provides advantages over having a separate thread to do reclamation. I did
> not plan to provide such an option. But may be it makes sense to keep the options open (especially from ABI perspective). May be we
> should add a flags field which will allow us to implement different methods in the future?
> 
> >
> >
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  /*
> > >   * Adds a rule to the rule table.
> > >   *
> > > @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> > > *tbl8)  }
> > >
> > >  static int32_t
> > > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > number_tbl8s)
> > > +__tbl8_alloc_v1604(struct rte_lpm *lpm)
> > >  {
> > >  	uint32_t group_idx; /* tbl8 group index. */
> > >  	struct rte_lpm_tbl_entry *tbl8_entry;
> > >
> > >  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > > -		tbl8_entry = &tbl8[group_idx *
> > RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > > +		tbl8_entry = &lpm->tbl8[group_idx *
> > > +
> > 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > >  		/* If a free tbl8 group is found clean it and set as VALID. */
> > >  		if (!tbl8_entry->valid_group) {
> > >  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> > 712,6 +769,21 @@
> > > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> > >  	return -ENOSPC;
> > >  }
> > >
> > > +static int32_t
> > > +tbl8_alloc_v1604(struct rte_lpm *lpm) {
> > > +	int32_t group_idx; /* tbl8 group index. */
> > > +
> > > +	group_idx = __tbl8_alloc_v1604(lpm);
> > > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> > > +		/* If there are no tbl8 groups try to reclaim some. */
> > > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> > > +			group_idx = __tbl8_alloc_v1604(lpm);
> > > +	}
> > > +
> > > +	return group_idx;
> > > +}
> > > +
> > >  static void
> > >  tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t
> > > tbl8_group_start)  { @@ -728,13 +800,21 @@ tbl8_free_v20(struct
> > > rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)  }
> > >
> > >  static void
> > > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > tbl8_group_start)
> > > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> > >  {
> > > -	/* Set tbl8 group invalid*/
> > >  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > > +	struct __rte_lpm_rcu_dq_entry e;
> > >
> > > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > > -			__ATOMIC_RELAXED);
> > > +	if (lpm->dq != NULL) {
> > > +		e.tbl8_group_index = tbl8_group_start;
> > > +		e.pad = 0;
> > > +		/* Push into QSBR defer queue. */
> > > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> > > +	} else {
> > > +		/* Set tbl8 group invalid*/
> > > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> > &zero_tbl8_entry,
> > > +				__ATOMIC_RELAXED);
> > > +	}
> > >  }
> > >
> > >  static __rte_noinline int32_t
> > > @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > > uint32_t ip_masked, uint8_t depth,
> > >
> > >  	if (!lpm->tbl24[tbl24_index].valid) {
> > >  		/* Search for a free tbl8 group. */
> > > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> > >number_tbl8s);
> > > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> > >
> > >  		/* Check tbl8 allocation was successful. */
> > >  		if (tbl8_group_index < 0) {
> > > @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked, uint8_t depth,
> > >  	} /* If valid entry but not extended calculate the index into Table8. */
> > >  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> > >  		/* Search for free tbl8 group. */
> > > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> > >number_tbl8s);
> > > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> > >
> > >  		if (tbl8_group_index < 0) {
> > >  			return tbl8_group_index;
> > > @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked,
> > >  		 */
> > >  		lpm->tbl24[tbl24_index].valid = 0;
> > >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > > +		tbl8_free_v1604(lpm, tbl8_group_start);
> > >  	} else if (tbl8_recycle_index > -1) {
> > >  		/* Update tbl24 entry. */
> > >  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> > +1914,7 @@
> > > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> > >  		__atomic_store(&lpm->tbl24[tbl24_index],
> > &new_tbl24_entry,
> > >  				__ATOMIC_RELAXED);
> > >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > > +		tbl8_free_v1604(lpm, tbl8_group_start);
> > >  	}
> > >  #undef group_idx
> > >  	return 0;
> > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > > 906ec4483..49c12a68d 100644
> > > --- a/lib/librte_lpm/rte_lpm.h
> > > +++ b/lib/librte_lpm/rte_lpm.h
> > > @@ -1,5 +1,6 @@
> > >  /* SPDX-License-Identifier: BSD-3-Clause
> > >   * Copyright(c) 2010-2014 Intel Corporation
> > > + * Copyright(c) 2019 Arm Limited
> > >   */
> > >
> > >  #ifndef _RTE_LPM_H_
> > > @@ -21,6 +22,7 @@
> > >  #include <rte_common.h>
> > >  #include <rte_vect.h>
> > >  #include <rte_compat.h>
> > > +#include <rte_rcu_qsbr.h>
> > >
> > >  #ifdef __cplusplus
> > >  extern "C" {
> > > @@ -186,6 +188,7 @@ struct rte_lpm {
> > >  			__rte_cache_aligned; /**< LPM tbl24 table. */
> > >  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> > >  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > > +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
> > >  };
> > >
> > >  /**
> > > @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> > void
> > > rte_lpm_free_v1604(struct rte_lpm *lpm);
> > >
> > > +/**
> > > + * Associate RCU QSBR variable with an LPM object.
> > > + *
> > > + * @param lpm
> > > + *   the lpm object to add RCU QSBR
> > > + * @param v
> > > + *   RCU QSBR variable
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1 with error code set in rte_errno.
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - invalid pointer
> > > + *   - EEXIST - already added QSBR
> > > + *   - ENOMEM - memory allocation failure
> > > + */
> > > +__rte_experimental
> > > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > > +*v);
> > > +
> > >  /**
> > >   * Add a rule to the LPM table.
> > >   *
> > > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > > b/lib/librte_lpm/rte_lpm_version.map
> > > index 90beac853..b353aabd2 100644
> > > --- a/lib/librte_lpm/rte_lpm_version.map
> > > +++ b/lib/librte_lpm/rte_lpm_version.map
> > > @@ -44,3 +44,9 @@ DPDK_17.05 {
> > >  	rte_lpm6_lookup_bulk_func;
> > >
> > >  } DPDK_16.04;
> > > +
> > > +EXPERIMENTAL {
> > > +	global:
> > > +
> > > +	rte_lpm_rcu_qsbr_add;
> > > +};
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-13  3:02         ` Honnappa Nagarahalli
@ 2019-10-15 16:48           ` Medvedkin, Vladimir
  2019-10-18  3:47             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-10-15 16:48 UTC (permalink / raw)
  To: Honnappa Nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd

Hi Honnappa,

On 13/10/2019 04:02, Honnappa Nagarahalli wrote:
> Hi Vladimir,
> 	Apologies for the delayed response, I had to run few experiments.
>
> <snip>
>
>> Hi Honnappa,
>>
>> On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
>>> Add resource reclamation APIs to make it simple for applications and
>>> libraries to integrate rte_rcu library.
>>>
>>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> ---
>>>    app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
>>>    lib/librte_rcu/meson.build         |   2 +
>>>    lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
>>>    lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
>>>    lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
>>>    lib/librte_rcu/rte_rcu_version.map |   4 +
>>>    lib/meson.build                    |   6 +-
>>>    7 files changed, 700 insertions(+), 3 deletions(-)
>>>    create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
>>>
>>> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c index
>>> d1b9e46a2..3a6815243 100644
>>> --- a/app/test/test_rcu_qsbr.c
>>> +++ b/app/test/test_rcu_qsbr.c
>>> @@ -1,8 +1,9 @@
>>>    /* SPDX-License-Identifier: BSD-3-Clause
>>> - * Copyright (c) 2018 Arm Limited
>>> + * Copyright (c) 2019 Arm Limited
>>>     */
>>>
>>>    #include <stdio.h>
>>> +#include <string.h>
>>>    #include <rte_pause.h>
>>>    #include <rte_rcu_qsbr.h>
>>>    #include <rte_hash.h>
>>> @@ -33,6 +34,7 @@ static uint32_t *keys;
>>>    #define COUNTER_VALUE 4096
>>>    static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
>>>    static uint8_t writer_done;
>>> +static uint8_t cb_failed;
>>>
>>>    static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
>>>    struct rte_hash *h[RTE_MAX_LCORE];
>>> @@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
>>>    	return 0;
>>>    }
>>>
>>> +static void
>>> +rte_rcu_qsbr_test_free_resource(void *p, void *e) {
>>> +	if (p != NULL && e != NULL) {
>>> +		printf("%s: Test failed\n", __func__);
>>> +		cb_failed = 1;
>>> +	}
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_create: create a queue used to store the data
>>> +structure
>>> + * elements that can be freed later. This queue is referred to as 'defer
>> queue'.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_create(void)
>>> +{
>>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
>>> +	struct rte_rcu_qsbr_dq_parameters params;
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
>>> +
>>> +	/* Pass invalid parameters */
>>> +	dq = rte_rcu_qsbr_dq_create(NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
>>> +	params.name = rcu_dq_name;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	params.f = rte_rcu_qsbr_test_free_resource;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
>>> +	params.v = t[0];
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	params.size = 1;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	params.esize = 3;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	/* Pass all valid parameters */
>>> +	params.esize = 16;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
>> params");
>>> +	rte_rcu_qsbr_dq_delete(dq);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
>>> + * to be freed later after atleast one grace period is over.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_enqueue(void)
>>> +{
>>> +	int ret;
>>> +	uint64_t r;
>>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
>>> +	struct rte_rcu_qsbr_dq_parameters params;
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
>>> +
>>> +	/* Create a queue with simple parameters */
>>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
>>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
>>> +	params.name = rcu_dq_name;
>>> +	params.f = rte_rcu_qsbr_test_free_resource;
>>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
>>> +	params.v = t[0];
>>> +	params.size = 1;
>>> +	params.esize = 16;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
>>> +params");
>>> +
>>> +	/* Pass invalid parameters */
>>> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
>>> +params");
>>> +
>>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
>>> +params");
>>> +
>>> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
>>> +params");
>>> +
>>> +	ret = rte_rcu_qsbr_dq_delete(dq);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid
>> params");
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_reclaim(void)
>>> +{
>>> +	int ret;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
>>> +
>>> +	/* Pass invalid parameters */
>>> +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid
>>> +params");
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_delete(void)
>>> +{
>>> +	int ret;
>>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
>>> +	struct rte_rcu_qsbr_dq_parameters params;
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
>>> +
>>> +	/* Pass invalid parameters */
>>> +	ret = rte_rcu_qsbr_dq_delete(NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid
>>> +params");
>>> +
>>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
>>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
>>> +	params.name = rcu_dq_name;
>>> +	params.f = rte_rcu_qsbr_test_free_resource;
>>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
>>> +	params.v = t[0];
>>> +	params.size = 1;
>>> +	params.esize = 16;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
>> params");
>>> +	ret = rte_rcu_qsbr_dq_delete(dq);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
>> params");
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
>>> + * to be freed later after atleast one grace period is over.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize) {
>>> +	int i, j, ret;
>>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
>>> +	struct rte_rcu_qsbr_dq_parameters params;
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +	uint64_t *e;
>>> +	uint64_t sc = 200;
>>> +	int max_entries;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
>>> +	printf("Size = %d, esize = %d\n", size, esize);
>>> +
>>> +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
>>> +	if (e == NULL)
>>> +		return 0;
>>> +	cb_failed = 0;
>>> +
>>> +	/* Initialize the RCU variable. No threads are registered */
>>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
>>> +
>>> +	/* Create a queue with simple parameters */
>>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
>>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
>>> +	params.name = rcu_dq_name;
>>> +	params.f = rte_rcu_qsbr_test_free_resource;
>>> +	params.v = t[0];
>>> +	params.size = size;
>>> +	params.esize = esize;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
>>> +params");
>>> +
>>> +	/* Given the size and esize, calculate the maximum number of entries
>>> +	 * that can be stored on the defer queue (look at the logic used
>>> +	 * in capacity calculation of rte_ring).
>>> +	 */
>>> +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
>>> +	max_entries = (max_entries - 1)/(esize/8 + 1);
>>> +
>>> +	/* Enqueue few counters starting with the value 'sc' */
>>> +	/* The queue size will be rounded up to 2. The enqueue API also
>>> +	 * reclaims if the queue size is above certain limit. Since, there
>>> +	 * are no threads registered, reclamation succedes. Hence, it should
>>> +	 * be possible to enqueue more than the provided queue size.
>>> +	 */
>>> +	for (i = 0; i < 10; i++) {
>>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
>>> +			"dq enqueue functional");
>>> +		for (j = 0; j < esize/8; j++)
>>> +			e[j] = sc++;
>>> +	}
>>> +
>>> +	/* Register a thread on the RCU QSBR variable. Reclamation will not
>>> +	 * succeed. It should not be possible to enqueue more than the size
>>> +	 * number of resources.
>>> +	 */
>>> +	rte_rcu_qsbr_thread_register(t[0], 1);
>>> +	rte_rcu_qsbr_thread_online(t[0], 1);
>>> +
>>> +	for (i = 0; i < max_entries; i++) {
>>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
>>> +			"dq enqueue functional");
>>> +		for (j = 0; j < esize/8; j++)
>>> +			e[j] = sc++;
>>> +	}
>>> +
>>> +	/* Enqueue fails as queue is full */
>>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
>> functional");
>>> +
>>> +	/* Delete should fail as there are elements in defer queue which
>>> +	 * cannot be reclaimed.
>>> +	 */
>>> +	ret = rte_rcu_qsbr_dq_delete(dq);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid
>> params");
>>> +
>>> +	/* Report quiescent state, enqueue should succeed */
>>> +	rte_rcu_qsbr_quiescent(t[0], 1);
>>> +	for (i = 0; i < max_entries; i++) {
>>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
>>> +			"dq enqueue functional");
>>> +		for (j = 0; j < esize/8; j++)
>>> +			e[j] = sc++;
>>> +	}
>>> +
>>> +	/* Queue is full */
>>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
>> functional");
>>> +
>>> +	/* Report quiescent state, delete should succeed */
>>> +	rte_rcu_qsbr_quiescent(t[0], 1);
>>> +	ret = rte_rcu_qsbr_dq_delete(dq);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
>> params");
>>> +
>>> +	/* Validate that call back function did not return any error */
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
>>> +
>>> +	rte_free(e);
>>> +	return 0;
>>> +}
>>> +
>>>    /*
>>>     * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
>>>     */
>>> @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
>>>    	if (test_rcu_qsbr_thread_offline() < 0)
>>>    		goto test_fail;
>>>
>>> +	if (test_rcu_qsbr_dq_create() < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_reclaim() < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_delete() < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_enqueue() < 0)
>>> +		goto test_fail;
>>> +
>>>    	printf("\nFunctional tests\n");
>>>
>>>    	if (test_rcu_qsbr_sw_sv_3qs() < 0)
>>> @@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
>>>    	if (test_rcu_qsbr_mw_mv_mqs() < 0)
>>>    		goto test_fail;
>>>
>>> +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
>>> +		goto test_fail;
>>> +
>>>    	free_rcu();
>>>
>>>    	printf("\n");
>>> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
>>> index 62920ba02..e280b29c1 100644
>>> --- a/lib/librte_rcu/meson.build
>>> +++ b/lib/librte_rcu/meson.build
>>> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
>>>    if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
>>>    	ext_deps += cc.find_library('atomic')
>>>    endif
>>> +
>>> +deps += ['ring']
>>> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
>>> b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
>>> --- a/lib/librte_rcu/rte_rcu_qsbr.c
>>> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
>>> @@ -21,6 +21,7 @@
>>>    #include <rte_errno.h>
>>>
>>>    #include "rte_rcu_qsbr.h"
>>> +#include "rte_rcu_qsbr_pvt.h"
>>>
>>>    /* Get the memory size of QSBR variable */
>>>    size_t
>>> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr
>> *v)
>>>    	return 0;
>>>    }
>>>
>>> +/* Create a queue used to store the data structure elements that can
>>> + * be freed later. This queue is referred to as 'defer queue'.
>>> + */
>>> +struct rte_rcu_qsbr_dq *
>>> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
>>> +*params) {
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +	uint32_t qs_fifo_size;
>>> +
>>> +	if (params == NULL || params->f == NULL ||
>>> +		params->v == NULL || params->name == NULL ||
>>> +		params->size == 0 || params->esize == 0 ||
>>> +		(params->esize % 8 != 0)) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Invalid input parameter\n", __func__);
>>> +		rte_errno = EINVAL;
>>> +
>>> +		return NULL;
>>> +	}
>>> +
>>> +	dq = rte_zmalloc(NULL,
>>> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
>>> +		RTE_CACHE_LINE_SIZE);
>>> +	if (dq == NULL) {
>>> +		rte_errno = ENOMEM;
>>> +
>>> +		return NULL;
>>> +	}
>>> +
>>> +	/* round up qs_fifo_size to next power of two that is not less than
>>> +	 * max_size.
>>> +	 */
>>> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
>>> +					* params->size) + 1);
>>> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
>>> +					SOCKET_ID_ANY, 0);
>>> +	if (dq->r == NULL) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): defer queue create failed\n", __func__);
>>> +		rte_free(dq);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	dq->v = params->v;
>>> +	dq->size = params->size;
>>> +	dq->esize = params->esize;
>>> +	dq->f = params->f;
>>> +	dq->p = params->p;
>>> +
>>> +	return dq;
>>> +}
>>> +
>>> +/* Enqueue one resource to the defer queue to free after the grace
>>> + * period is over.
>>> + */
>>> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
>>> +	uint64_t token;
>>> +	uint64_t *tmp;
>>> +	uint32_t i;
>>> +	uint32_t cur_size, free_size;
>>> +
>>> +	if (dq == NULL || e == NULL) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Invalid input parameter\n", __func__);
>>> +		rte_errno = EINVAL;
>>> +
>>> +		return 1;
>>> +	}
>>> +
>>> +	/* Start the grace period */
>>> +	token = rte_rcu_qsbr_start(dq->v);
>>> +
>>> +	/* Reclaim resources if the queue is 1/8th full. This helps
>>> +	 * the queue from growing too large and allows time for reader
>>> +	 * threads to report their quiescent state.
>>> +	 */
>>> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
>>> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
>>> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
>>> +			"%s(): Triggering reclamation\n", __func__);
>>> +		rte_rcu_qsbr_dq_reclaim(dq);
>>> +	}
>> There are two problems I see:
>>
>> 1. rte_rcu_qsbr_dq_reclaim() reclaims only 1/16 of the defer queue while it
>> triggers on 1/8. This means that there will always be 1/16 of non reclaimed
>> entries in the queue.
> There will be 'at least' 1/16 non-reclaimed entries.
Correct, that's what I meant :)
>   It could be more depending on the length of the grace period and the rate of deletion.

Right, the number of entries to reclaim depends on:

- grace period which is application specific

- cost of delete operation which is library (algorithm) specific

- rate of deletion which depends on runtime.

So it is very hard to predict how big should be threshold to trigger 
reclamation and how many entries should it reclaim.

> The trigger of 1/8 is used to give sufficient time for the readers to report their quiescent state. 1/16 is used to spread the load of reclamation across multiple calls and provide a upper bound on the cycles consumed.

1/16 of max entries to reclaim within single call can cost a lot. 
Moreover, it could have an impact on the readers through massive cache 
evictions.

Consider a set of routes from test_lpm_perf.c. To install all routes you 
need to have at least 65k tbl8 entries (now it has 2k). So when 
reclaiming, besides the costs of rte_rcu_qsbr_check(), you'll need to 
rewrite 4k cache lines.

So 1/16 of max entries is relatively big and it's better to spread this 
load across multiple calls.

>
>> 2. Number of entries to reclaim depend on dq->size. So,
>> rte_rcu_qsbr_dq_reclaim() could take a lot of cycles. For LPM library this
> That is true. It depends on dq->size (number of tbl8 groups). However, note that there is patch [1] which provides batch reclamation kind of behavior which reduces the cycles consumed by reclamation significantly.
>
> [1] https://patches.dpdk.org/patch/58960/
>
>> means that rte_lpm_delete() sometimes takes a long time.
> Agree, sometimes takes additional time. It is good to spread it over multiple calls.
Right, with batch reclamation we have here classic throughput vs latency 
problem. Either reclaiming big number of entries relatively infrequently 
spreading the cost of readers quiescent state check or reclaiming small 
amount of entries more often spending more cycles in average. I'd prefer 
latency here because as I mentioned earlier huge batches could have an 
impact on readers and lead to big difference in cost of delete().
>
>> So, my suggestions here would be
>>
>> - trigger rte_rcu_qsbr_dq_reclaim() with every enqueue
> Given that the LPM APIs are mainly for control plane, I would think that, the next time LPM API is called, the readers have completed the grace period. But if there are frequent updates, we might end up with empty reclaims which will waste cycles. IMO, this trigger should happen after at least few entries are in the queue.
>
>> - reclaim small amount of entries (could be configurable of creation time)
> Agree. I would keep it a smaller than the trigger amount knowing that the elements added right before the trigger might not have completed the grace period.
>
>> - provide API to trigger reclaim from the application manually.
> IMO, this will add additional complexity to the application. I agree that there will be special needs for some applications. I think those applications might have to implement their own methods using the base RCU APIs.
> Instead, as agreed in other threads, I suggest we expose the parameters (when to trigger and how much to reclaim) to the application as optional configurable parameters. i.e. if the application does not provide we can use default values. I think this should provide enough flexibility to the application.

Agree.

Regarding default values, one strategy could be:

- if reported threshold isn't set (i.e. is equal 0) then call reclaim 
with every enqueue (i.e. threshold == 1)

- if max_entries_to_reclaim isn't set then reclaim as much as we can


>>> +
>>> +	/* Check if there is space for atleast for 1 resource */
>>> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
>>> +	if (!free_size) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Defer queue is full\n", __func__);
>>> +		rte_errno = ENOSPC;
>>> +		return 1;
>>> +	}
>>> +
>>> +	/* Enqueue the resource */
>>> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
>>> +
>>> +	/* The resource to enqueue needs to be a multiple of 64b
>>> +	 * due to the limitation of the rte_ring implementation.
>>> +	 */
>>> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
>>> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/* Reclaim resources from the defer queue. */ int
>>> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
>>> +	uint32_t max_cnt;
>>> +	uint32_t cnt;
>>> +	void *token;
>>> +	uint64_t *tmp;
>>> +	uint32_t i;
>>> +
>>> +	if (dq == NULL) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Invalid input parameter\n", __func__);
>>> +		rte_errno = EINVAL;
>>> +
>>> +		return 1;
>>> +	}
>>> +
>>> +	/* Anything to reclaim? */
>>> +	if (rte_ring_count(dq->r) == 0)
>>> +		return 0;
>>> +
>>> +	/* Reclaim at the max 1/16th the total number of entries. */
>>> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
>>> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
>>> +	cnt = 0;
>>> +
>>> +	/* Check reader threads quiescent state and reclaim resources */
>>> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
>>> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
>>> +			== 1)) {
>>> +		(void)rte_ring_sc_dequeue(dq->r, &token);
>>> +		/* The resource to dequeue needs to be a multiple of 64b
>>> +		 * due to the limitation of the rte_ring implementation.
>>> +		 */
>>> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
>>> +			i++, tmp++)
>>> +			(void)rte_ring_sc_dequeue(dq->r,
>>> +					(void *)(uintptr_t)tmp);
>>> +		dq->f(dq->p, dq->e);
>>> +
>>> +		cnt++;
>>> +	}
>>> +
>>> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
>>> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
>>> +
>>> +	if (cnt == 0) {
>>> +		/* No resources were reclaimed */
>>> +		rte_errno = EAGAIN;
>>> +		return 1;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/* Delete a defer queue. */
>>> +int
>>> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
>>> +	if (dq == NULL) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Invalid input parameter\n", __func__);
>>> +		rte_errno = EINVAL;
>>> +
>>> +		return 1;
>>> +	}
>>> +
>>> +	/* Reclaim all the resources */
>>> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
>>> +		/* Error number is already set by the reclaim API */
>>> +		return 1;
>>> +
>>> +	rte_ring_free(dq->r);
>>> +	rte_free(dq);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>    int rte_rcu_log_type;
>>>
>>>    RTE_INIT(rte_rcu_register)
>>> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
>>> b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
>>> --- a/lib/librte_rcu/rte_rcu_qsbr.h
>>> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
>>> @@ -34,6 +34,7 @@ extern "C" {
>>>    #include <rte_lcore.h>
>>>    #include <rte_debug.h>
>>>    #include <rte_atomic.h>
>>> +#include <rte_ring.h>
>>>
>>>    extern int rte_rcu_log_type;
>>>
>>> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>>>    	 */
>>>    } __rte_cache_aligned;
>>>
>>> +/**
>>> + * Call back function called to free the resources.
>>> + *
>>> + * @param p
>>> + *   Pointer provided while creating the defer queue
>>> + * @param e
>>> + *   Pointer to the resource data stored on the defer queue
>>> + *
>>> + * @return
>>> + *   None
>>> + */
>>> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
>>> +
>>> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
>>> +
>>> +/**
>>> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
>>> + */
>>> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
>>> +
>>> +/**
>>> + *  Reclaim at the max 1/16th the total number of resources.
>>> + */
>>> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
>>> +
>>> +/**
>>> + * Parameters used when creating the defer queue.
>>> + */
>>> +struct rte_rcu_qsbr_dq_parameters {
>>> +	const char *name;
>>> +	/**< Name of the queue. */
>>> +	uint32_t size;
>>> +	/**< Number of entries in queue. Typically, this will be
>>> +	 *   the same as the maximum number of entries supported in the
>>> +	 *   lock free data structure.
>>> +	 *   Data structures with unbounded number of entries is not
>>> +	 *   supported currently.
>>> +	 */
>>> +	uint32_t esize;
>>> +	/**< Size (in bytes) of each element in the defer queue.
>>> +	 *   This has to be multiple of 8B as the rte_ring APIs
>>> +	 *   support 8B element sizes only.
>>> +	 */
>>> +	rte_rcu_qsbr_free_resource f;
>>> +	/**< Function to call to free the resource. */
>>> +	void *p;
>>> +	/**< Pointer passed to the free function. Typically, this is the
>>> +	 *   pointer to the data structure to which the resource to free
>>> +	 *   belongs. This can be NULL.
>>> +	 */
>>> +	struct rte_rcu_qsbr *v;
>>> +	/**< RCU QSBR variable to use for this defer queue */ };
>>> +
>>> +/* RTE defer queue structure.
>>> + * This structure holds the defer queue. The defer queue is used to
>>> + * hold the deleted entries from the data structure that are not
>>> + * yet freed.
>>> + */
>>> +struct rte_rcu_qsbr_dq;
>>> +
>>>    /**
>>>     * @warning
>>>     * @b EXPERIMENTAL: this API may change without prior notice @@
>>> -648,6 +710,113 @@ __rte_experimental
>>>    int
>>>    rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Create a queue used to store the data structure elements that can
>>> + * be freed later. This queue is referred to as 'defer queue'.
>>> + *
>>> + * @param params
>>> + *   Parameters to create a defer queue.
>>> + * @return
>>> + *   On success - Valid pointer to defer queue
>>> + *   On error - NULL
>>> + *   Possible rte_errno codes are:
>>> + *   - EINVAL - NULL parameters are passed
>>> + *   - ENOMEM - Not enough memory
>>> + */
>>> +__rte_experimental
>>> +struct rte_rcu_qsbr_dq *
>>> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
>>> +*params);
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Enqueue one resource to the defer queue and start the grace period.
>>> + * The resource will be freed later after at least one grace period
>>> + * is over.
>>> + *
>>> + * If the defer queue is full, it will attempt to reclaim resources.
>>> + * It will also reclaim resources at regular intervals to avoid
>>> + * the defer queue from growing too big.
>>> + *
>>> + * This API is not multi-thread safe. It is expected that the caller
>>> + * provides multi-thread safety by locking a mutex or some other means.
>>> + *
>>> + * A lock free multi-thread writer algorithm could achieve
>>> +multi-thread
>>> + * safety by creating and using one defer queue per thread.
>>> + *
>>> + * @param dq
>>> + *   Defer queue to allocate an entry from.
>>> + * @param e
>>> + *   Pointer to resource data to copy to the defer queue. The size of
>>> + *   the data to copy is equal to the element size provided when the
>>> + *   defer queue was created.
>>> + * @return
>>> + *   On success - 0
>>> + *   On error - 1 with rte_errno set to
>>> + *   - EINVAL - NULL parameters are passed
>>> + *   - ENOSPC - Defer queue is full. This condition can not happen
>>> + *		if the defer queue size is equal (or larger) than the
>>> + *		number of elements in the data structure.
>>> + */
>>> +__rte_experimental
>>> +int
>>> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Reclaim resources from the defer queue.
>>> + *
>>> + * This API is not multi-thread safe. It is expected that the caller
>>> + * provides multi-thread safety by locking a mutex or some other means.
>>> + *
>>> + * A lock free multi-thread writer algorithm could achieve
>>> +multi-thread
>>> + * safety by creating and using one defer queue per thread.
>>> + *
>>> + * @param dq
>>> + *   Defer queue to reclaim an entry from.
>>> + * @return
>>> + *   On successful reclamation of at least 1 resource - 0
>>> + *   On error - 1 with rte_errno set to
>>> + *   - EINVAL - NULL parameters are passed
>>> + *   - EAGAIN - None of the resources have completed at least 1 grace
>> period,
>>> + *		try again.
>>> + */
>>> +__rte_experimental
>>> +int
>>> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Delete a defer queue.
>>> + *
>>> + * It tries to reclaim all the resources on the defer queue.
>>> + * If any of the resources have not completed the grace period
>>> + * the reclamation stops and returns immediately. The rest of
>>> + * the resources are not reclaimed and the defer queue is not
>>> + * freed.
>>> + *
>>> + * @param dq
>>> + *   Defer queue to delete.
>>> + * @return
>>> + *   On success - 0
>>> + *   On error - 1
>>> + *   Possible rte_errno codes are:
>>> + *   - EINVAL - NULL parameters are passed
>>> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
>>> + *		period, try again.
>>> + */
>>> +__rte_experimental
>>> +int
>>> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
>>> +
>>>    #ifdef __cplusplus
>>>    }
>>>    #endif
>>> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
>>> b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
>>> new file mode 100644
>>> index 000000000..2122bc36a
>>> --- /dev/null
>>> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
>>> @@ -0,0 +1,46 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright (c) 2019 Arm Limited
>>> + */
>>> +
>>> +#ifndef _RTE_RCU_QSBR_PVT_H_
>>> +#define _RTE_RCU_QSBR_PVT_H_
>>> +
>>> +/**
>>> + * This file is private to the RCU library. It should not be included
>>> + * by the user of this library.
>>> + */
>>> +
>>> +#ifdef __cplusplus
>>> +extern "C" {
>>> +#endif
>>> +
>>> +#include "rte_rcu_qsbr.h"
>>> +
>>> +/* RTE defer queue structure.
>>> + * This structure holds the defer queue. The defer queue is used to
>>> + * hold the deleted entries from the data structure that are not
>>> + * yet freed.
>>> + */
>>> +struct rte_rcu_qsbr_dq {
>>> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
>>> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
>>> +	uint32_t size;
>>> +	/**< Number of elements in the defer queue */
>>> +	uint32_t esize;
>>> +	/**< Size (in bytes) of data stored on the defer queue */
>>> +	rte_rcu_qsbr_free_resource f;
>>> +	/**< Function to call to free the resource. */
>>> +	void *p;
>>> +	/**< Pointer passed to the free function. Typically, this is the
>>> +	 *   pointer to the data structure to which the resource to free
>>> +	 *   belongs.
>>> +	 */
>>> +	char e[0];
>>> +	/**< Temporary storage to copy the defer queue element. */ };
>>> +
>>> +#ifdef __cplusplus
>>> +}
>>> +#endif
>>> +
>>> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
>>> diff --git a/lib/librte_rcu/rte_rcu_version.map
>>> b/lib/librte_rcu/rte_rcu_version.map
>>> index f8b9ef2ab..dfac88a37 100644
>>> --- a/lib/librte_rcu/rte_rcu_version.map
>>> +++ b/lib/librte_rcu/rte_rcu_version.map
>>> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>>>    	rte_rcu_qsbr_synchronize;
>>>    	rte_rcu_qsbr_thread_register;
>>>    	rte_rcu_qsbr_thread_unregister;
>>> +	rte_rcu_qsbr_dq_create;
>>> +	rte_rcu_qsbr_dq_enqueue;
>>> +	rte_rcu_qsbr_dq_reclaim;
>>> +	rte_rcu_qsbr_dq_delete;
>>>
>>>    	local: *;
>>>    };
>>> diff --git a/lib/meson.build b/lib/meson.build index
>>> e5ff83893..0e1be8407 100644
>>> --- a/lib/meson.build
>>> +++ b/lib/meson.build
>>> @@ -11,7 +11,9 @@
>>>    libraries = [
>>>    	'kvargs', # eal depends on kvargs
>>>    	'eal', # everything depends on eal
>>> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>>> +	'ring',
>>> +	'rcu', # rcu depends on ring
>>> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>>>    	'cmdline',
>>>    	'metrics', # bitrate/latency stats depends on this
>>>    	'hash',    # efd depends on this
>>> @@ -22,7 +24,7 @@ libraries = [
>>>    	'gro', 'gso', 'ip_frag', 'jobstats',
>>>    	'kni', 'latencystats', 'lpm', 'member',
>>>    	'power', 'pdump', 'rawdev',
>>> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
>>> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>>>    	# ipsec lib depends on net, crypto and security
>>>    	'ipsec',
>>>    	# add pkt framework libs which use other libs from above
>> --
>> Regards,
>> Vladimir

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-15 11:15           ` Ananyev, Konstantin
@ 2019-10-18  3:32             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-18  3:32 UTC (permalink / raw)
  To: Ananyev, Konstantin, Richardson, Bruce, Medvedkin, Vladimir,
	olivier.matz
  Cc: dev, stephen, paulmck, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

<snip>

> >
> > > Hi guys,
> > I have tried to consolidate design related questions here. If I have missed
> anything, please add.
> >
> > >
> > > >
> > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > >
> > > > Currently, the tbl8 group is freed even though the readers might
> > > > be using the tbl8 group entries. The freed tbl8 group can be
> > > > reallocated quickly. This results in incorrect lookup results.
> > > >
> > > > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > > > Refer to RCU documentation to understand various aspects of
> > > > integrating RCU library into other libraries.
> > > >
> > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > ---
> > > >  lib/librte_lpm/Makefile            |   3 +-
> > > >  lib/librte_lpm/meson.build         |   2 +
> > > >  lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
> > > >  lib/librte_lpm/rte_lpm.h           |  21 ++++++
> > > >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> > > >  5 files changed, 122 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> > > > index
> > > > a7946a1c5..ca9e16312 100644
> > > > --- a/lib/librte_lpm/Makefile
> > > > +++ b/lib/librte_lpm/Makefile
> > > > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name
> > > > LIB = librte_lpm.a
> > > >
> > > > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> > > >  CFLAGS += -O3
> > > >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal
> > > > -lrte_hash
> > > > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> > > >
> > > >  EXPORT_MAP := rte_lpm_version.map
> > > >
> > > > diff --git a/lib/librte_lpm/meson.build
> > > > b/lib/librte_lpm/meson.build index a5176d8ae..19a35107f 100644
> > > > --- a/lib/librte_lpm/meson.build
> > > > +++ b/lib/librte_lpm/meson.build
> > > > @@ -2,9 +2,11 @@
> > > >  # Copyright(c) 2017 Intel Corporation
> > > >
> > > >  version = 2
> > > > +allow_experimental_apis = true
> > > >  sources = files('rte_lpm.c', 'rte_lpm6.c')  headers =
> > > > files('rte_lpm.h', 'rte_lpm6.h')  # since header files have
> > > > different names, we can install all vector headers  # without
> > > > worrying about which architecture we actually need  headers +=
> > > > files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> > > > deps += ['hash']
> > > > +deps += ['rcu']
> > > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> > > > index
> > > > 3a929a1b1..ca58d4b35 100644
> > > > --- a/lib/librte_lpm/rte_lpm.c
> > > > +++ b/lib/librte_lpm/rte_lpm.c
> > > > @@ -1,5 +1,6 @@
> > > >  /* SPDX-License-Identifier: BSD-3-Clause
> > > >   * Copyright(c) 2010-2014 Intel Corporation
> > > > + * Copyright(c) 2019 Arm Limited
> > > >   */
> > > >
> > > >  #include <string.h>
> > > > @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> > > >
> > > >  	rte_mcfg_tailq_write_unlock();
> > > >
> > > > +	if (lpm->dq)
> > > > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> > > >  	rte_free(lpm->tbl8);
> > > >  	rte_free(lpm->rules_tbl);
> > > >  	rte_free(lpm);
> > > > @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> > > 16.04);
> > > > MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> > > >  		rte_lpm_free_v1604);
> > > >
> > > > +struct __rte_lpm_rcu_dq_entry {
> > > > +	uint32_t tbl8_group_index;
> > > > +	uint32_t pad;
> > > > +};
> > > > +
> > > > +static void
> > > > +__lpm_rcu_qsbr_free_resource(void *p, void *data) {
> > > > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > > > +	struct __rte_lpm_rcu_dq_entry *e =
> > > > +			(struct __rte_lpm_rcu_dq_entry *)data;
> > > > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > > > +
> > > > +	/* Set tbl8 group invalid */
> > > > +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> > > > +		__ATOMIC_RELAXED);
> > > > +}
> > > > +
> > > > +/* Associate QSBR variable with an LPM object.
> > > > + */
> > > > +int
> > > > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > > > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > > > +	struct rte_rcu_qsbr_dq_parameters params;
> > > > +
> > > > +	if ((lpm == NULL) || (v == NULL)) {
> > > > +		rte_errno = EINVAL;
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	if (lpm->dq) {
> > > > +		rte_errno = EEXIST;
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	/* Init QSBR defer queue. */
> > > > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm-
> > > >name);
> > > > +	params.name = rcu_dq_name;
> > > > +	params.size = lpm->number_tbl8s;
> > > > +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> > > > +	params.f = __lpm_rcu_qsbr_free_resource;
> > > > +	params.p = lpm->tbl8;
> > > > +	params.v = v;
> > > > +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > > > +	if (lpm->dq == NULL) {
> > > > +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> > > > +		return 1;
> > > > +	}
> > >
> > > Few thoughts about that function:
> > Few things to keep in mind, the goal of the design is to make it easy
> > for the applications to adopt lock-free algorithms. The reclamation
> > process in the writer is a major portion of code one has to write for using
> lock-free algorithms. The current design is such that the writer does not have
> to change any code or write additional code other than calling
> 'rte_lpm_rcu_qsbr_add'.
> >
> > > It names rcu_qsbr_add() but in fact it allocates defer queue for give rcu var.
> > > So first thought - is it always necessary?
> > This is part of the design. If the application does not want to use
> > this integrated logic then, it does not have to call this API. It can
> > use the RCU defer APIs to implement its own logic. But, if I ask the question,
> does this integrated logic address most of the use cases of the LPM library, I
> think the answer is yes.
> >
> > > For some use-cases I suppose user might be ok to wait for quiescent
> > > state change inside tbl8_free()?
> > Yes, that is a possibility (for ex: no frequent route changes). But, I
> > think that is very trivial for the application to implement. Though, the LPM
> library has to separate the 'delete' and 'free' operations.
> 
> Exactly.
> That's why it is not trivial with current LPM library.
> In fact to do that himself right now, user would have to implement and support
> his own version of LPM code.
😊, well we definitely don't want them to write their own library (if DPDK LPM is enough)
IMO, we need to be consistent with other libraries in terms of APIs. That's another topic.
I do not see any problem to implement this or provide facility to implement this in the future in the APIs now. We can add 'flags' field which will allow for other methods of reclamation.

> 
> Honestly, I don't understand why you consider it as a drawback.
> From my perspective only few things need to be changed:
> 
> 1. Add 2 parameters to 'rte_lpm_rcu_qsbr_add():
>     number of elems in defer_queue
>     reclaim() threshold value.
> If the user doesn't want to provide any values, that's fine we can use default
> ones here (as you do it right now).
I think we have agreed on this, I see the value in doing this.

> 2. Make rte_lpm_rcu_qsbr_add() to return pointer to the defer_queue.
> Again if user doesn't want to call reclaim() himself, he can just ignore return
> value.
Given the goal of reducing the burden on the user, this is not in that direction. But if you see a use case for it, I don't have any issues. Vladimir asked for it as well in the other thread.

> 
> These 2 changes will provide us with necessary flexibility that would help to
> cover more use-cases:
> - user can decide how big should be the defer queue
> - user can decide when/how he wants to do reclaim()
> 
> Konstantin
> 
> >Similar operations are provided in rte_hash library. IMO, we should
> >follow  consistent approach.
> >
> > > Another thing you do allocate defer queue, but it is internal, so
> > > user can't call
> > > reclaim() manually, which looks strange.
> > > Why not to return defer_queue pointer to the user, so he can call
> > > reclaim() himself at appropriate time?
> > The intention of the design is to take the complexity away from the
> > user of LPM library. IMO, the current design will address most uses cases of
> LPM library. If we expose the 2 parameters (when to trigger reclamation and
> how much to reclaim) in the 'rte_lpm_rcu_qsbr_add'
> > API, it should provide enough flexibility to the application.
> >
> > > Third thing - you always allocate defer queue with size equal to
> > > number of tbl8.
> > > Though I understand it could be up to 16M tbl8 groups inside the LPM.
> > > Do we really need defer queue that long?
> > No, we do not need it to be this long. It is this long today to avoid returning
> no-space on the defer queue error.
> >
> > > Especially  considering that current rcu_defer_queue will start
> > > reclamation when 1/8 of defer_quueue becomes full and wouldn't
> > > reclaim more then
> > > 1/16 of it.
> > > Probably better to let user to decide himself how long defer_queue
> > > he needs for that LPM?
> > It makes sense to expose it to the user if the writer-writer
> > concurrency is lock-free (no memory allocation allowed to expand the
> > defer queue size when the queue is full). However, LPM is not lock-free on
> the writer side. If we think the writer could be lock-free in the future, it has to
> be exposed to the user.
> >
> > >
> > > Konstantin
> > Pulling questions/comments from other threads:
> > Can we leave reclamation to some other house-keeping thread to do (sort of
> garbage collector). Or such mode is not supported/planned?
> >
> > [Honnappa] If the reclamation cost is small, the current method
> > provides advantages over having a separate thread to do reclamation. I
> > did not plan to provide such an option. But may be it makes sense to keep the
> options open (especially from ABI perspective). May be we should add a flags
> field which will allow us to implement different methods in the future?
> >
> > >
> > >
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  /*
> > > >   * Adds a rule to the rule table.
> > > >   *
> > > > @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> > > > *tbl8)  }
> > > >
> > > >  static int32_t
> > > > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > > number_tbl8s)
> > > > +__tbl8_alloc_v1604(struct rte_lpm *lpm)
> > > >  {
> > > >  	uint32_t group_idx; /* tbl8 group index. */
> > > >  	struct rte_lpm_tbl_entry *tbl8_entry;
> > > >
> > > >  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > > > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > > > -		tbl8_entry = &tbl8[group_idx *
> > > RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > > > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > > > +		tbl8_entry = &lpm->tbl8[group_idx *
> > > > +
> > > 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > > >  		/* If a free tbl8 group is found clean it and set as VALID. */
> > > >  		if (!tbl8_entry->valid_group) {
> > > >  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> > > 712,6 +769,21 @@
> > > > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> > > >  	return -ENOSPC;
> > > >  }
> > > >
> > > > +static int32_t
> > > > +tbl8_alloc_v1604(struct rte_lpm *lpm) {
> > > > +	int32_t group_idx; /* tbl8 group index. */
> > > > +
> > > > +	group_idx = __tbl8_alloc_v1604(lpm);
> > > > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> > > > +		/* If there are no tbl8 groups try to reclaim some. */
> > > > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> > > > +			group_idx = __tbl8_alloc_v1604(lpm);
> > > > +	}
> > > > +
> > > > +	return group_idx;
> > > > +}
> > > > +
> > > >  static void
> > > >  tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t
> > > > tbl8_group_start)  { @@ -728,13 +800,21 @@ tbl8_free_v20(struct
> > > > rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)  }
> > > >
> > > >  static void
> > > > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > > tbl8_group_start)
> > > > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> > > >  {
> > > > -	/* Set tbl8 group invalid*/
> > > >  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > > > +	struct __rte_lpm_rcu_dq_entry e;
> > > >
> > > > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > > > -			__ATOMIC_RELAXED);
> > > > +	if (lpm->dq != NULL) {
> > > > +		e.tbl8_group_index = tbl8_group_start;
> > > > +		e.pad = 0;
> > > > +		/* Push into QSBR defer queue. */
> > > > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> > > > +	} else {
> > > > +		/* Set tbl8 group invalid*/
> > > > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> > > &zero_tbl8_entry,
> > > > +				__ATOMIC_RELAXED);
> > > > +	}
> > > >  }
> > > >
> > > >  static __rte_noinline int32_t
> > > > @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > > > uint32_t ip_masked, uint8_t depth,
> > > >
> > > >  	if (!lpm->tbl24[tbl24_index].valid) {
> > > >  		/* Search for a free tbl8 group. */
> > > > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> > > >number_tbl8s);
> > > > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> > > >
> > > >  		/* Check tbl8 allocation was successful. */
> > > >  		if (tbl8_group_index < 0) {
> > > > @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > > uint32_t ip_masked, uint8_t depth,
> > > >  	} /* If valid entry but not extended calculate the index into Table8. */
> > > >  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> > > >  		/* Search for free tbl8 group. */
> > > > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> > > >number_tbl8s);
> > > > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> > > >
> > > >  		if (tbl8_group_index < 0) {
> > > >  			return tbl8_group_index;
> > > > @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> > > uint32_t ip_masked,
> > > >  		 */
> > > >  		lpm->tbl24[tbl24_index].valid = 0;
> > > >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > > > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > > > +		tbl8_free_v1604(lpm, tbl8_group_start);
> > > >  	} else if (tbl8_recycle_index > -1) {
> > > >  		/* Update tbl24 entry. */
> > > >  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> > > +1914,7 @@
> > > > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> > > >  		__atomic_store(&lpm->tbl24[tbl24_index],
> > > &new_tbl24_entry,
> > > >  				__ATOMIC_RELAXED);
> > > >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > > > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > > > +		tbl8_free_v1604(lpm, tbl8_group_start);
> > > >  	}
> > > >  #undef group_idx
> > > >  	return 0;
> > > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> > > > index 906ec4483..49c12a68d 100644
> > > > --- a/lib/librte_lpm/rte_lpm.h
> > > > +++ b/lib/librte_lpm/rte_lpm.h
> > > > @@ -1,5 +1,6 @@
> > > >  /* SPDX-License-Identifier: BSD-3-Clause
> > > >   * Copyright(c) 2010-2014 Intel Corporation
> > > > + * Copyright(c) 2019 Arm Limited
> > > >   */
> > > >
> > > >  #ifndef _RTE_LPM_H_
> > > > @@ -21,6 +22,7 @@
> > > >  #include <rte_common.h>
> > > >  #include <rte_vect.h>
> > > >  #include <rte_compat.h>
> > > > +#include <rte_rcu_qsbr.h>
> > > >
> > > >  #ifdef __cplusplus
> > > >  extern "C" {
> > > > @@ -186,6 +188,7 @@ struct rte_lpm {
> > > >  			__rte_cache_aligned; /**< LPM tbl24 table. */
> > > >  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> > > >  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > > > +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
> > > >  };
> > > >
> > > >  /**
> > > > @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> > > void
> > > > rte_lpm_free_v1604(struct rte_lpm *lpm);
> > > >
> > > > +/**
> > > > + * Associate RCU QSBR variable with an LPM object.
> > > > + *
> > > > + * @param lpm
> > > > + *   the lpm object to add RCU QSBR
> > > > + * @param v
> > > > + *   RCU QSBR variable
> > > > + * @return
> > > > + *   On success - 0
> > > > + *   On error - 1 with error code set in rte_errno.
> > > > + *   Possible rte_errno codes are:
> > > > + *   - EINVAL - invalid pointer
> > > > + *   - EEXIST - already added QSBR
> > > > + *   - ENOMEM - memory allocation failure
> > > > + */
> > > > +__rte_experimental
> > > > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > > > +*v);
> > > > +
> > > >  /**
> > > >   * Add a rule to the LPM table.
> > > >   *
> > > > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > > > b/lib/librte_lpm/rte_lpm_version.map
> > > > index 90beac853..b353aabd2 100644
> > > > --- a/lib/librte_lpm/rte_lpm_version.map
> > > > +++ b/lib/librte_lpm/rte_lpm_version.map
> > > > @@ -44,3 +44,9 @@ DPDK_17.05 {
> > > >  	rte_lpm6_lookup_bulk_func;
> > > >
> > > >  } DPDK_16.04;
> > > > +
> > > > +EXPERIMENTAL {
> > > > +	global:
> > > > +
> > > > +	rte_lpm_rcu_qsbr_add;
> > > > +};
> > > > --
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-15 16:48           ` Medvedkin, Vladimir
@ 2019-10-18  3:47             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-18  3:47 UTC (permalink / raw)
  To: Medvedkin, Vladimir, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> 
> Hi Honnappa,
> 
> On 13/10/2019 04:02, Honnappa Nagarahalli wrote:
> > Hi Vladimir,
> > 	Apologies for the delayed response, I had to run few experiments.
> >
> > <snip>
> >
> >> Hi Honnappa,
> >>
> >> On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
> >>> Add resource reclamation APIs to make it simple for applications and
> >>> libraries to integrate rte_rcu library.
> >>>
> >>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >>> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>> ---
> >>>    app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> >>>    lib/librte_rcu/meson.build         |   2 +
> >>>    lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> >>>    lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> >>>    lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> >>>    lib/librte_rcu/rte_rcu_version.map |   4 +
> >>>    lib/meson.build                    |   6 +-
> >>>    7 files changed, 700 insertions(+), 3 deletions(-)
> >>>    create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >>>
> >>> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
> >>> index
> >>> d1b9e46a2..3a6815243 100644
> >>> --- a/app/test/test_rcu_qsbr.c
> >>> +++ b/app/test/test_rcu_qsbr.c
> >>> @@ -1,8 +1,9 @@
> >>>    /* SPDX-License-Identifier: BSD-3-Clause
> >>> - * Copyright (c) 2018 Arm Limited
> >>> + * Copyright (c) 2019 Arm Limited
> >>>     */
> >>>
> >>>    #include <stdio.h>
> >>> +#include <string.h>
> >>>    #include <rte_pause.h>
> >>>    #include <rte_rcu_qsbr.h>
> >>>    #include <rte_hash.h>
> >>> @@ -33,6 +34,7 @@ static uint32_t *keys;
> >>>    #define COUNTER_VALUE 4096
> >>>    static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
> >>>    static uint8_t writer_done;
> >>> +static uint8_t cb_failed;
> >>>
> >>>    static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
> >>>    struct rte_hash *h[RTE_MAX_LCORE]; @@ -582,6 +584,269 @@
> >>> test_rcu_qsbr_thread_offline(void)
> >>>    	return 0;
> >>>    }
> >>>
> >>> +static void
> >>> +rte_rcu_qsbr_test_free_resource(void *p, void *e) {
> >>> +	if (p != NULL && e != NULL) {
> >>> +		printf("%s: Test failed\n", __func__);
> >>> +		cb_failed = 1;
> >>> +	}
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_create: create a queue used to store the data
> >>> +structure
> >>> + * elements that can be freed later. This queue is referred to as
> >>> +'defer
> >> queue'.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_create(void)
> >>> +{
> >>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> >>> +	struct rte_rcu_qsbr_dq_parameters params;
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
> >>> +
> >>> +	/* Pass invalid parameters */
> >>> +	dq = rte_rcu_qsbr_dq_create(NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> >>> +	params.name = rcu_dq_name;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	params.f = rte_rcu_qsbr_test_free_resource;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> >>> +	params.v = t[0];
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	params.size = 1;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	params.esize = 3;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	/* Pass all valid parameters */
> >>> +	params.esize = 16;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> >> params");
> >>> +	rte_rcu_qsbr_dq_delete(dq);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer
> >>> +queue,
> >>> + * to be freed later after atleast one grace period is over.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_enqueue(void)
> >>> +{
> >>> +	int ret;
> >>> +	uint64_t r;
> >>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> >>> +	struct rte_rcu_qsbr_dq_parameters params;
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
> >>> +
> >>> +	/* Create a queue with simple parameters */
> >>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> >>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> >>> +	params.name = rcu_dq_name;
> >>> +	params.f = rte_rcu_qsbr_test_free_resource;
> >>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> >>> +	params.v = t[0];
> >>> +	params.size = 1;
> >>> +	params.esize = 16;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> >>> +params");
> >>> +
> >>> +	/* Pass invalid parameters */
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> >>> +params");
> >>> +
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> >>> +params");
> >>> +
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> >>> +params");
> >>> +
> >>> +	ret = rte_rcu_qsbr_dq_delete(dq);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid
> >> params");
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_reclaim(void)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
> >>> +
> >>> +	/* Pass invalid parameters */
> >>> +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid
> >>> +params");
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_delete(void)
> >>> +{
> >>> +	int ret;
> >>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> >>> +	struct rte_rcu_qsbr_dq_parameters params;
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
> >>> +
> >>> +	/* Pass invalid parameters */
> >>> +	ret = rte_rcu_qsbr_dq_delete(NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid
> >>> +params");
> >>> +
> >>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> >>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> >>> +	params.name = rcu_dq_name;
> >>> +	params.f = rte_rcu_qsbr_test_free_resource;
> >>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> >>> +	params.v = t[0];
> >>> +	params.size = 1;
> >>> +	params.esize = 16;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> >> params");
> >>> +	ret = rte_rcu_qsbr_dq_delete(dq);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
> >> params");
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer
> >>> +queue,
> >>> + * to be freed later after atleast one grace period is over.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize) {
> >>> +	int i, j, ret;
> >>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> >>> +	struct rte_rcu_qsbr_dq_parameters params;
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +	uint64_t *e;
> >>> +	uint64_t sc = 200;
> >>> +	int max_entries;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
> >>> +	printf("Size = %d, esize = %d\n", size, esize);
> >>> +
> >>> +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
> >>> +	if (e == NULL)
> >>> +		return 0;
> >>> +	cb_failed = 0;
> >>> +
> >>> +	/* Initialize the RCU variable. No threads are registered */
> >>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> >>> +
> >>> +	/* Create a queue with simple parameters */
> >>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> >>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> >>> +	params.name = rcu_dq_name;
> >>> +	params.f = rte_rcu_qsbr_test_free_resource;
> >>> +	params.v = t[0];
> >>> +	params.size = size;
> >>> +	params.esize = esize;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> >>> +params");
> >>> +
> >>> +	/* Given the size and esize, calculate the maximum number of entries
> >>> +	 * that can be stored on the defer queue (look at the logic used
> >>> +	 * in capacity calculation of rte_ring).
> >>> +	 */
> >>> +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
> >>> +	max_entries = (max_entries - 1)/(esize/8 + 1);
> >>> +
> >>> +	/* Enqueue few counters starting with the value 'sc' */
> >>> +	/* The queue size will be rounded up to 2. The enqueue API also
> >>> +	 * reclaims if the queue size is above certain limit. Since, there
> >>> +	 * are no threads registered, reclamation succedes. Hence, it should
> >>> +	 * be possible to enqueue more than the provided queue size.
> >>> +	 */
> >>> +	for (i = 0; i < 10; i++) {
> >>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> >>> +			"dq enqueue functional");
> >>> +		for (j = 0; j < esize/8; j++)
> >>> +			e[j] = sc++;
> >>> +	}
> >>> +
> >>> +	/* Register a thread on the RCU QSBR variable. Reclamation will not
> >>> +	 * succeed. It should not be possible to enqueue more than the size
> >>> +	 * number of resources.
> >>> +	 */
> >>> +	rte_rcu_qsbr_thread_register(t[0], 1);
> >>> +	rte_rcu_qsbr_thread_online(t[0], 1);
> >>> +
> >>> +	for (i = 0; i < max_entries; i++) {
> >>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> >>> +			"dq enqueue functional");
> >>> +		for (j = 0; j < esize/8; j++)
> >>> +			e[j] = sc++;
> >>> +	}
> >>> +
> >>> +	/* Enqueue fails as queue is full */
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
> >> functional");
> >>> +
> >>> +	/* Delete should fail as there are elements in defer queue which
> >>> +	 * cannot be reclaimed.
> >>> +	 */
> >>> +	ret = rte_rcu_qsbr_dq_delete(dq);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid
> >> params");
> >>> +
> >>> +	/* Report quiescent state, enqueue should succeed */
> >>> +	rte_rcu_qsbr_quiescent(t[0], 1);
> >>> +	for (i = 0; i < max_entries; i++) {
> >>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> >>> +			"dq enqueue functional");
> >>> +		for (j = 0; j < esize/8; j++)
> >>> +			e[j] = sc++;
> >>> +	}
> >>> +
> >>> +	/* Queue is full */
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
> >> functional");
> >>> +
> >>> +	/* Report quiescent state, delete should succeed */
> >>> +	rte_rcu_qsbr_quiescent(t[0], 1);
> >>> +	ret = rte_rcu_qsbr_dq_delete(dq);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
> >> params");
> >>> +
> >>> +	/* Validate that call back function did not return any error */
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
> >>> +
> >>> +	rte_free(e);
> >>> +	return 0;
> >>> +}
> >>> +
> >>>    /*
> >>>     * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
> >>>     */
> >>> @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
> >>>    	if (test_rcu_qsbr_thread_offline() < 0)
> >>>    		goto test_fail;
> >>>
> >>> +	if (test_rcu_qsbr_dq_create() < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_reclaim() < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_delete() < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_enqueue() < 0)
> >>> +		goto test_fail;
> >>> +
> >>>    	printf("\nFunctional tests\n");
> >>>
> >>>    	if (test_rcu_qsbr_sw_sv_3qs() < 0) @@ -1033,6 +1310,18 @@
> >>> test_rcu_qsbr_main(void)
> >>>    	if (test_rcu_qsbr_mw_mv_mqs() < 0)
> >>>    		goto test_fail;
> >>>
> >>> +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
> >>> +		goto test_fail;
> >>> +
> >>>    	free_rcu();
> >>>
> >>>    	printf("\n");
> >>> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> >>> index 62920ba02..e280b29c1 100644
> >>> --- a/lib/librte_rcu/meson.build
> >>> +++ b/lib/librte_rcu/meson.build
> >>> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
> >>>    if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> >>>    	ext_deps += cc.find_library('atomic')
> >>>    endif
> >>> +
> >>> +deps += ['ring']
> >>> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> >>> b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> >>> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> >>> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> >>> @@ -21,6 +21,7 @@
> >>>    #include <rte_errno.h>
> >>>
> >>>    #include "rte_rcu_qsbr.h"
> >>> +#include "rte_rcu_qsbr_pvt.h"
> >>>
> >>>    /* Get the memory size of QSBR variable */
> >>>    size_t
> >>> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct
> rte_rcu_qsbr
> >> *v)
> >>>    	return 0;
> >>>    }
> >>>
> >>> +/* Create a queue used to store the data structure elements that
> >>> +can
> >>> + * be freed later. This queue is referred to as 'defer queue'.
> >>> + */
> >>> +struct rte_rcu_qsbr_dq *
> >>> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> >>> +*params) {
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +	uint32_t qs_fifo_size;
> >>> +
> >>> +	if (params == NULL || params->f == NULL ||
> >>> +		params->v == NULL || params->name == NULL ||
> >>> +		params->size == 0 || params->esize == 0 ||
> >>> +		(params->esize % 8 != 0)) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Invalid input parameter\n", __func__);
> >>> +		rte_errno = EINVAL;
> >>> +
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	dq = rte_zmalloc(NULL,
> >>> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> >>> +		RTE_CACHE_LINE_SIZE);
> >>> +	if (dq == NULL) {
> >>> +		rte_errno = ENOMEM;
> >>> +
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	/* round up qs_fifo_size to next power of two that is not less than
> >>> +	 * max_size.
> >>> +	 */
> >>> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> >>> +					* params->size) + 1);
> >>> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> >>> +					SOCKET_ID_ANY, 0);
> >>> +	if (dq->r == NULL) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): defer queue create failed\n", __func__);
> >>> +		rte_free(dq);
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	dq->v = params->v;
> >>> +	dq->size = params->size;
> >>> +	dq->esize = params->esize;
> >>> +	dq->f = params->f;
> >>> +	dq->p = params->p;
> >>> +
> >>> +	return dq;
> >>> +}
> >>> +
> >>> +/* Enqueue one resource to the defer queue to free after the grace
> >>> + * period is over.
> >>> + */
> >>> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> >>> +	uint64_t token;
> >>> +	uint64_t *tmp;
> >>> +	uint32_t i;
> >>> +	uint32_t cur_size, free_size;
> >>> +
> >>> +	if (dq == NULL || e == NULL) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Invalid input parameter\n", __func__);
> >>> +		rte_errno = EINVAL;
> >>> +
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	/* Start the grace period */
> >>> +	token = rte_rcu_qsbr_start(dq->v);
> >>> +
> >>> +	/* Reclaim resources if the queue is 1/8th full. This helps
> >>> +	 * the queue from growing too large and allows time for reader
> >>> +	 * threads to report their quiescent state.
> >>> +	 */
> >>> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> >>> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> >>> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> >>> +			"%s(): Triggering reclamation\n", __func__);
> >>> +		rte_rcu_qsbr_dq_reclaim(dq);
> >>> +	}
> >> There are two problems I see:
> >>
> >> 1. rte_rcu_qsbr_dq_reclaim() reclaims only 1/16 of the defer queue
> >> while it triggers on 1/8. This means that there will always be 1/16
> >> of non reclaimed entries in the queue.
> > There will be 'at least' 1/16 non-reclaimed entries.
> Correct, that's what I meant :)
> >   It could be more depending on the length of the grace period and the rate
> of deletion.
> 
> Right, the number of entries to reclaim depends on:
> 
> - grace period which is application specific
> 
> - cost of delete operation which is library (algorithm) specific
> 
> - rate of deletion which depends on runtime.
> 
> So it is very hard to predict how big should be threshold to trigger
> reclamation and how many entries should it reclaim.
> 
> > The trigger of 1/8 is used to give sufficient time for the readers to report
> their quiescent state. 1/16 is used to spread the load of reclamation across
> multiple calls and provide a upper bound on the cycles consumed.
> 
> 1/16 of max entries to reclaim within single call can cost a lot.
> Moreover, it could have an impact on the readers through massive cache
> evictions.
> 
> Consider a set of routes from test_lpm_perf.c. To install all routes you need
> to have at least 65k tbl8 entries (now it has 2k). So when reclaiming, besides
> the costs of rte_rcu_qsbr_check(), you'll need to rewrite 4k cache lines.
> 
> So 1/16 of max entries is relatively big and it's better to spread this load
> across multiple calls.
> 
> >
> >> 2. Number of entries to reclaim depend on dq->size. So,
> >> rte_rcu_qsbr_dq_reclaim() could take a lot of cycles. For LPM library
> >> this
> > That is true. It depends on dq->size (number of tbl8 groups). However, note
> that there is patch [1] which provides batch reclamation kind of behavior
> which reduces the cycles consumed by reclamation significantly.
> >
> > [1] https://patches.dpdk.org/patch/58960/
> >
> >> means that rte_lpm_delete() sometimes takes a long time.
> > Agree, sometimes takes additional time. It is good to spread it over multiple
> calls.
> Right, with batch reclamation we have here classic throughput vs latency
> problem. Either reclaiming big number of entries relatively infrequently
> spreading the cost of readers quiescent state check or reclaiming small
> amount of entries more often spending more cycles in average. I'd prefer
> latency here because as I mentioned earlier huge batches could have an
> impact on readers and lead to big difference in cost of delete().
> >
> >> So, my suggestions here would be
> >>
> >> - trigger rte_rcu_qsbr_dq_reclaim() with every enqueue
> > Given that the LPM APIs are mainly for control plane, I would think that,
> the next time LPM API is called, the readers have completed the grace period.
> But if there are frequent updates, we might end up with empty reclaims
> which will waste cycles. IMO, this trigger should happen after at least few
> entries are in the queue.
> >
> >> - reclaim small amount of entries (could be configurable of creation
> >> time)
> > Agree. I would keep it a smaller than the trigger amount knowing that the
> elements added right before the trigger might not have completed the grace
> period.
> >
> >> - provide API to trigger reclaim from the application manually.
> > IMO, this will add additional complexity to the application. I agree that
> there will be special needs for some applications. I think those applications
> might have to implement their own methods using the base RCU APIs.
> > Instead, as agreed in other threads, I suggest we expose the parameters
> (when to trigger and how much to reclaim) to the application as optional
> configurable parameters. i.e. if the application does not provide we can use
> default values. I think this should provide enough flexibility to the application.
> 
> Agree.
> 
> Regarding default values, one strategy could be:
> 
> - if reported threshold isn't set (i.e. is equal 0) then call reclaim with every
> enqueue (i.e. threshold == 1)
> 
> - if max_entries_to_reclaim isn't set then reclaim as much as we can
> 
Ok, sounds good.

> 
> >>> +
> >>> +	/* Check if there is space for atleast for 1 resource */
> >>> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> >>> +	if (!free_size) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Defer queue is full\n", __func__);
> >>> +		rte_errno = ENOSPC;
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	/* Enqueue the resource */
> >>> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> >>> +
> >>> +	/* The resource to enqueue needs to be a multiple of 64b
> >>> +	 * due to the limitation of the rte_ring implementation.
> >>> +	 */
> >>> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> >>> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/* Reclaim resources from the defer queue. */ int
> >>> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> >>> +	uint32_t max_cnt;
> >>> +	uint32_t cnt;
> >>> +	void *token;
> >>> +	uint64_t *tmp;
> >>> +	uint32_t i;
> >>> +
> >>> +	if (dq == NULL) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Invalid input parameter\n", __func__);
> >>> +		rte_errno = EINVAL;
> >>> +
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	/* Anything to reclaim? */
> >>> +	if (rte_ring_count(dq->r) == 0)
> >>> +		return 0;
> >>> +
> >>> +	/* Reclaim at the max 1/16th the total number of entries. */
> >>> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> >>> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> >>> +	cnt = 0;
> >>> +
> >>> +	/* Check reader threads quiescent state and reclaim resources */
> >>> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> >>> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> >>> +			== 1)) {
> >>> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> >>> +		/* The resource to dequeue needs to be a multiple of 64b
> >>> +		 * due to the limitation of the rte_ring implementation.
> >>> +		 */
> >>> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> >>> +			i++, tmp++)
> >>> +			(void)rte_ring_sc_dequeue(dq->r,
> >>> +					(void *)(uintptr_t)tmp);
> >>> +		dq->f(dq->p, dq->e);
> >>> +
> >>> +		cnt++;
> >>> +	}
> >>> +
> >>> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> >>> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> >>> +
> >>> +	if (cnt == 0) {
> >>> +		/* No resources were reclaimed */
> >>> +		rte_errno = EAGAIN;
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/* Delete a defer queue. */
> >>> +int
> >>> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> >>> +	if (dq == NULL) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Invalid input parameter\n", __func__);
> >>> +		rte_errno = EINVAL;
> >>> +
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	/* Reclaim all the resources */
> >>> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> >>> +		/* Error number is already set by the reclaim API */
> >>> +		return 1;
> >>> +
> >>> +	rte_ring_free(dq->r);
> >>> +	rte_free(dq);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>    int rte_rcu_log_type;
> >>>
> >>>    RTE_INIT(rte_rcu_register)
> >>> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> >>> b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> >>> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> >>> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> >>> @@ -34,6 +34,7 @@ extern "C" {
> >>>    #include <rte_lcore.h>
> >>>    #include <rte_debug.h>
> >>>    #include <rte_atomic.h>
> >>> +#include <rte_ring.h>
> >>>
> >>>    extern int rte_rcu_log_type;
> >>>
> >>> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> >>>    	 */
> >>>    } __rte_cache_aligned;
> >>>
> >>> +/**
> >>> + * Call back function called to free the resources.
> >>> + *
> >>> + * @param p
> >>> + *   Pointer provided while creating the defer queue
> >>> + * @param e
> >>> + *   Pointer to the resource data stored on the defer queue
> >>> + *
> >>> + * @return
> >>> + *   None
> >>> + */
> >>> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> >>> +
> >>> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> >>> +
> >>> +/**
> >>> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> >>> + */
> >>> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> >>> +
> >>> +/**
> >>> + *  Reclaim at the max 1/16th the total number of resources.
> >>> + */
> >>> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> >>> +
> >>> +/**
> >>> + * Parameters used when creating the defer queue.
> >>> + */
> >>> +struct rte_rcu_qsbr_dq_parameters {
> >>> +	const char *name;
> >>> +	/**< Name of the queue. */
> >>> +	uint32_t size;
> >>> +	/**< Number of entries in queue. Typically, this will be
> >>> +	 *   the same as the maximum number of entries supported in the
> >>> +	 *   lock free data structure.
> >>> +	 *   Data structures with unbounded number of entries is not
> >>> +	 *   supported currently.
> >>> +	 */
> >>> +	uint32_t esize;
> >>> +	/**< Size (in bytes) of each element in the defer queue.
> >>> +	 *   This has to be multiple of 8B as the rte_ring APIs
> >>> +	 *   support 8B element sizes only.
> >>> +	 */
> >>> +	rte_rcu_qsbr_free_resource f;
> >>> +	/**< Function to call to free the resource. */
> >>> +	void *p;
> >>> +	/**< Pointer passed to the free function. Typically, this is the
> >>> +	 *   pointer to the data structure to which the resource to free
> >>> +	 *   belongs. This can be NULL.
> >>> +	 */
> >>> +	struct rte_rcu_qsbr *v;
> >>> +	/**< RCU QSBR variable to use for this defer queue */ };
> >>> +
> >>> +/* RTE defer queue structure.
> >>> + * This structure holds the defer queue. The defer queue is used to
> >>> + * hold the deleted entries from the data structure that are not
> >>> + * yet freed.
> >>> + */
> >>> +struct rte_rcu_qsbr_dq;
> >>> +
> >>>    /**
> >>>     * @warning
> >>>     * @b EXPERIMENTAL: this API may change without prior notice @@
> >>> -648,6 +710,113 @@ __rte_experimental
> >>>    int
> >>>    rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> >>>
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Create a queue used to store the data structure elements that
> >>> +can
> >>> + * be freed later. This queue is referred to as 'defer queue'.
> >>> + *
> >>> + * @param params
> >>> + *   Parameters to create a defer queue.
> >>> + * @return
> >>> + *   On success - Valid pointer to defer queue
> >>> + *   On error - NULL
> >>> + *   Possible rte_errno codes are:
> >>> + *   - EINVAL - NULL parameters are passed
> >>> + *   - ENOMEM - Not enough memory
> >>> + */
> >>> +__rte_experimental
> >>> +struct rte_rcu_qsbr_dq *
> >>> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> >>> +*params);
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Enqueue one resource to the defer queue and start the grace period.
> >>> + * The resource will be freed later after at least one grace period
> >>> + * is over.
> >>> + *
> >>> + * If the defer queue is full, it will attempt to reclaim resources.
> >>> + * It will also reclaim resources at regular intervals to avoid
> >>> + * the defer queue from growing too big.
> >>> + *
> >>> + * This API is not multi-thread safe. It is expected that the
> >>> +caller
> >>> + * provides multi-thread safety by locking a mutex or some other means.
> >>> + *
> >>> + * A lock free multi-thread writer algorithm could achieve
> >>> +multi-thread
> >>> + * safety by creating and using one defer queue per thread.
> >>> + *
> >>> + * @param dq
> >>> + *   Defer queue to allocate an entry from.
> >>> + * @param e
> >>> + *   Pointer to resource data to copy to the defer queue. The size of
> >>> + *   the data to copy is equal to the element size provided when the
> >>> + *   defer queue was created.
> >>> + * @return
> >>> + *   On success - 0
> >>> + *   On error - 1 with rte_errno set to
> >>> + *   - EINVAL - NULL parameters are passed
> >>> + *   - ENOSPC - Defer queue is full. This condition can not happen
> >>> + *		if the defer queue size is equal (or larger) than the
> >>> + *		number of elements in the data structure.
> >>> + */
> >>> +__rte_experimental
> >>> +int
> >>> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Reclaim resources from the defer queue.
> >>> + *
> >>> + * This API is not multi-thread safe. It is expected that the
> >>> +caller
> >>> + * provides multi-thread safety by locking a mutex or some other means.
> >>> + *
> >>> + * A lock free multi-thread writer algorithm could achieve
> >>> +multi-thread
> >>> + * safety by creating and using one defer queue per thread.
> >>> + *
> >>> + * @param dq
> >>> + *   Defer queue to reclaim an entry from.
> >>> + * @return
> >>> + *   On successful reclamation of at least 1 resource - 0
> >>> + *   On error - 1 with rte_errno set to
> >>> + *   - EINVAL - NULL parameters are passed
> >>> + *   - EAGAIN - None of the resources have completed at least 1 grace
> >> period,
> >>> + *		try again.
> >>> + */
> >>> +__rte_experimental
> >>> +int
> >>> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Delete a defer queue.
> >>> + *
> >>> + * It tries to reclaim all the resources on the defer queue.
> >>> + * If any of the resources have not completed the grace period
> >>> + * the reclamation stops and returns immediately. The rest of
> >>> + * the resources are not reclaimed and the defer queue is not
> >>> + * freed.
> >>> + *
> >>> + * @param dq
> >>> + *   Defer queue to delete.
> >>> + * @return
> >>> + *   On success - 0
> >>> + *   On error - 1
> >>> + *   Possible rte_errno codes are:
> >>> + *   - EINVAL - NULL parameters are passed
> >>> + *   - EAGAIN - Some of the resources have not completed at least 1
> grace
> >>> + *		period, try again.
> >>> + */
> >>> +__rte_experimental
> >>> +int
> >>> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> >>> +
> >>>    #ifdef __cplusplus
> >>>    }
> >>>    #endif
> >>> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >>> b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >>> new file mode 100644
> >>> index 000000000..2122bc36a
> >>> --- /dev/null
> >>> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >>> @@ -0,0 +1,46 @@
> >>> +/* SPDX-License-Identifier: BSD-3-Clause
> >>> + * Copyright (c) 2019 Arm Limited
> >>> + */
> >>> +
> >>> +#ifndef _RTE_RCU_QSBR_PVT_H_
> >>> +#define _RTE_RCU_QSBR_PVT_H_
> >>> +
> >>> +/**
> >>> + * This file is private to the RCU library. It should not be
> >>> +included
> >>> + * by the user of this library.
> >>> + */
> >>> +
> >>> +#ifdef __cplusplus
> >>> +extern "C" {
> >>> +#endif
> >>> +
> >>> +#include "rte_rcu_qsbr.h"
> >>> +
> >>> +/* RTE defer queue structure.
> >>> + * This structure holds the defer queue. The defer queue is used to
> >>> + * hold the deleted entries from the data structure that are not
> >>> + * yet freed.
> >>> + */
> >>> +struct rte_rcu_qsbr_dq {
> >>> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> >>> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> >>> +	uint32_t size;
> >>> +	/**< Number of elements in the defer queue */
> >>> +	uint32_t esize;
> >>> +	/**< Size (in bytes) of data stored on the defer queue */
> >>> +	rte_rcu_qsbr_free_resource f;
> >>> +	/**< Function to call to free the resource. */
> >>> +	void *p;
> >>> +	/**< Pointer passed to the free function. Typically, this is the
> >>> +	 *   pointer to the data structure to which the resource to free
> >>> +	 *   belongs.
> >>> +	 */
> >>> +	char e[0];
> >>> +	/**< Temporary storage to copy the defer queue element. */ };
> >>> +
> >>> +#ifdef __cplusplus
> >>> +}
> >>> +#endif
> >>> +
> >>> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> >>> diff --git a/lib/librte_rcu/rte_rcu_version.map
> >>> b/lib/librte_rcu/rte_rcu_version.map
> >>> index f8b9ef2ab..dfac88a37 100644
> >>> --- a/lib/librte_rcu/rte_rcu_version.map
> >>> +++ b/lib/librte_rcu/rte_rcu_version.map
> >>> @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >>>    	rte_rcu_qsbr_synchronize;
> >>>    	rte_rcu_qsbr_thread_register;
> >>>    	rte_rcu_qsbr_thread_unregister;
> >>> +	rte_rcu_qsbr_dq_create;
> >>> +	rte_rcu_qsbr_dq_enqueue;
> >>> +	rte_rcu_qsbr_dq_reclaim;
> >>> +	rte_rcu_qsbr_dq_delete;
> >>>
> >>>    	local: *;
> >>>    };
> >>> diff --git a/lib/meson.build b/lib/meson.build index
> >>> e5ff83893..0e1be8407 100644
> >>> --- a/lib/meson.build
> >>> +++ b/lib/meson.build
> >>> @@ -11,7 +11,9 @@
> >>>    libraries = [
> >>>    	'kvargs', # eal depends on kvargs
> >>>    	'eal', # everything depends on eal
> >>> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >>> +	'ring',
> >>> +	'rcu', # rcu depends on ring
> >>> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >>>    	'cmdline',
> >>>    	'metrics', # bitrate/latency stats depends on this
> >>>    	'hash',    # efd depends on this
> >>> @@ -22,7 +24,7 @@ libraries = [
> >>>    	'gro', 'gso', 'ip_frag', 'jobstats',
> >>>    	'kni', 'latencystats', 'lpm', 'member',
> >>>    	'power', 'pdump', 'rawdev',
> >>> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> >>> +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >>>    	# ipsec lib depends on net, crypto and security
> >>>    	'ipsec',
> >>>    	# add pkt framework libs which use other libs from above
> >> --
> >> Regards,
> >> Vladimir
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
                       ` (2 preceding siblings ...)
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 3/3] doc/rcu: add RCU integration design details Honnappa Nagarahalli
@ 2020-03-29 20:57     ` Thomas Monjalon
  2020-03-30 17:37       ` Honnappa Nagarahalli
  2020-04-03 18:41     ` [dpdk-dev] [PATCH v4 0/4] " Honnappa Nagarahalli
  2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
  5 siblings, 1 reply; 137+ messages in thread
From: Thomas Monjalon @ 2020-03-29 20:57 UTC (permalink / raw)
  To: honnappa.nagarahalli
  Cc: konstantin.ananyev, stephen, paulmck, dev, yipeng1.wang,
	vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, nd

01/10/2019 08:29, Honnappa Nagarahalli:
> This is not a new patch. This patch set is separated from the LPM
> changes as the size of the changes in RCU library has grown due
> to comments from community. These APIs will help reduce the changes
> in LPM and hash libraries that are getting integrated with RCU
> library.
> 
> This adds 4 new APIs to RCU library to create a defer queue, enqueue
> deleted resources, reclaim resources and delete the defer queue.

It is in the roadmap for 20.05.
What is the status of this patchset?

> The patches to LPM and HASH libraries to integrate RCU will depend on
> this patch.

I guess lpm and hash integrations are planned for 20.08?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs
  2020-03-29 20:57     ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Thomas Monjalon
@ 2020-03-30 17:37       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-03-30 17:37 UTC (permalink / raw)
  To: thomas
  Cc: konstantin.ananyev, stephen, paulmck, dev, yipeng1.wang,
	vladimir.medvedkin, Ruifeng Wang, Dharmik Thakkar, nd,
	Honnappa Nagarahalli, nd

<snip>

> 
> 01/10/2019 08:29, Honnappa Nagarahalli:
> > This is not a new patch. This patch set is separated from the LPM
> > changes as the size of the changes in RCU library has grown due to
> > comments from community. These APIs will help reduce the changes in
> > LPM and hash libraries that are getting integrated with RCU library.
> >
> > This adds 4 new APIs to RCU library to create a defer queue, enqueue
> > deleted resources, reclaim resources and delete the defer queue.
> 
> It is in the roadmap for 20.05.
> What is the status of this patchset?
It has a dependency on changes to rte_ring APIs, which have a clash from Konstantin's [1] and my patch [2]. Konstantin is working through his patch to address comments.

I am currently incorporating the review comments I received on the RCU defer APIs. We should see the next version soon.

[1] http://mails.dpdk.org/archives/dev/2020-March/160828.html
[2] http://mails.dpdk.org/archives/dev/2020-March/160787.html

> 
> > The patches to LPM and HASH libraries to integrate RCU will depend on
> > this patch.
> 
> I guess lpm and hash integrations are planned for