[dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library
@ 2019-08-22  6:34 Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details Ruifeng Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-08-22  6:34 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Document is added with suggested design of integrating RCU
library with other libraries in DPDK.
As an example, LPM library adds the integration. RCU is used
to safely free tbl8 groups that can be recycled. Table will not
be reclaimed or reused until reader finished referencing it.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use.

New API rte_ring_peek is introduced to help on management of
reclaiming FIFO queue.

Honnappa Nagarahalli (1):
  doc/rcu: add RCU integration design details

Ruifeng Wang (2):
  lib/ring: add peek API
  lib/lpm: integrate RCU QSBR

 doc/guides/prog_guide/rcu_lib.rst  |  51 +++++++
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 218 +++++++++++++++++++++++++++--
 lib/librte_lpm/rte_lpm.h           |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/librte_ring/rte_ring.h         |  30 ++++
 lib/meson.build                    |   3 +-
 8 files changed, 320 insertions(+), 15 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
@ 2019-08-22  6:34 ` Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API Ruifeng Wang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-08-22  6:34 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 51 +++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 8fe5b1f73..2869441ca 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -186,3 +186,54 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Integrating QSBR RCU with other libraries
+-----------------------------------------
+
+Lock-free algorithms place additional burden on the application to reclaim
+memory. Integrating memory reclaiming mechanisms in the libraries help
+remove some of the burden. Though QSBR method presents flexibility to
+achieve performance, it presents challenges while integrating with libraries.
+
+The memory reclaiming process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here requires the application to handle 'Initialization'
+and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader threads to report their quiescent state.
+* the application has to register the same RCU variable with the library.
+* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
+
+The library will handle 'Reclaiming Resources' part of the process. The
+libraries will make use of the writer thread context to execute the memory
+reclaiming algorithm. So,
+
+* library should provide an API to register a RCU variable that it will use.
+* library should trigger the readers to report quiescent state status upon deleting the resources by calling ``rte_rcu_qsbr_start``.
+
+* library should store the token and deleted resources for later use to free them after the readers have reported their quiescent state. Since the readers will report the quiescent state status in the order of deletion, the library must store the tokens/resources in the order in which the resources were deleted. A FIFO data structure would achieve the desired results. The length of the FIFO would depend on the rate of deletion and the rate at which the readers report their quiescent state. In the worst case the length of FIFO would be equal to the maximum number of resources the data structure supports. However, in most cases, the length will be much smaller. But, the library should not take the length of FIFO as an input from the application. Instead, it should implement a data structure which should be able to grow/shrink dynamically. Overhead introduced by such a data structure on delete operations should be considered as well.
+
+* library should query the quiescent state and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. This allows the application to do useful work while the readers report their quiescent state. If there are tokens/resources present in the FIFO already, the delete API should peek the head of the FIFO and check the quiescent state status. If the status is success, the token/resource should be dequeued and the resource should be freed. This process can be repeated till the quiescent state status for a token returns failure indicating that subsequent tokens will also fail quiescent state status query. The same process can be incorporated while adding new entries in the data structure if the library runs out of resources.
+
+The 'Shutdown' process needs to be shared between the application and the
+library.
+
+* library should check the quiescent state status of all the tokens that may be present in the FIFO and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of the tokens do not pass the quiescent state check, the library should print an error and stop the memory reclaimation process.
+
+* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the library's shutdown function.
+
+Integrating the resource reclaimation with libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclaimation happens as part of the writer thread without sacrificing
+   a lot of performance.
+#. The library has better control over the resources. For ex: the library can
+   attempt to reclaim when it has run out of resources.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details Ruifeng Wang
@ 2019-08-22  6:34 ` Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-08-22  6:34 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Ruifeng Wang

The peek API allows fetching the next available object in the ring
without dequeuing it. This helps in scenarios where dequeuing of
objects depend on their value.

Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..d3d0d5e18 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 				r->cons.single, available);
 }
 
+/**
+ * Peek one object from a ring.
+ *
+ * The peek API allows fetching the next available object in the ring
+ * without dequeuing it. This API is not multi-thread safe with respect
+ * to other consumer threads.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, object available
+ *   - -ENOENT: Not enough entries in the ring.
+ */
+__rte_experimental
+static __rte_always_inline int
+rte_ring_peek(struct rte_ring *r, void **obj_p)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_head = r->cons.head;
+	uint32_t count = (prod_tail - cons_head) & r->mask;
+	unsigned int n = 1;
+	if (count) {
+		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
+		return 0;
+	}
+	return -ENOENT;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details Ruifeng Wang
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API Ruifeng Wang
@ 2019-08-22  6:34 ` Ruifeng Wang
  2019-08-23  1:23   ` Stephen Hemminger
  2019-08-22 15:52 ` [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Honnappa Nagarahalli
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
  4 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-08-22  6:34 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, honnappa.nagarahalli, dharmik.thakkar, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 218 +++++++++++++++++++++++++++--
 lib/librte_lpm/rte_lpm.h           |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/meson.build                    |   3 +-
 6 files changed, 239 insertions(+), 15 deletions(-)

diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index a7946a1c5..ca9e16312 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_lpm.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index a5176d8ae..19a35107f 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -2,9 +2,11 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
+allow_experimental_apis = true
 sources = files('rte_lpm.c', 'rte_lpm6.c')
 headers = files('rte_lpm.h', 'rte_lpm6.h')
 # since header files have different names, we can install all vector headers
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 3a929a1b1..1efdef22d 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <string.h>
@@ -22,6 +23,7 @@
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
 #include <rte_tailq.h>
+#include <rte_ring.h>
 
 #include "rte_lpm.h"
 
@@ -39,6 +41,11 @@ enum valid_flag {
 	VALID
 };
 
+struct __rte_lpm_qs_item {
+	uint64_t token;	/**< QSBR token.*/
+	uint32_t index;	/**< tbl8 group index.*/
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -381,6 +388,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	if (lpm->qsv)
+		rte_ring_free(lpm->qs_fifo);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
@@ -390,6 +399,145 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
 MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
 		rte_lpm_free_v1604);
 
+/* Add an item into FIFO.
+ * return: 0 - success
+ */
+static int
+__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
+	struct __rte_lpm_qs_item *item)
+{
+	if (rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token) != 0) {
+		rte_errno = ENOSPC;
+		return 1;
+	}
+	if (rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index) != 0) {
+		void *obj;
+		/* token needs to be dequeued when index enqueue fails */
+		rte_ring_sc_dequeue(fifo, &obj);
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Remove item from FIFO.
+ * Used when data observed by rte_ring_peek.
+ */
+static void
+__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
+	struct __rte_lpm_qs_item *item)
+{
+	void *obj_token = NULL;
+	void *obj_index = NULL;
+
+	(void)rte_ring_sc_dequeue(fifo, &obj_token);
+	(void)rte_ring_sc_dequeue(fifo, &obj_index);
+
+	if (item) {
+		item->token = (uint64_t)((uintptr_t)obj_token);
+		item->index = (uint32_t)((uintptr_t)obj_index);
+	}
+}
+
+/* Max number of tbl8 groups to reclaim at one time. */
+#define RCU_QSBR_RECLAIM_SIZE	8
+
+/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
+ * reclaim will be triggered by tbl8_free.
+ */
+#define RCU_QSBR_RECLAIM_LEVEL	3
+
+/* Reclaim some tbl8 groups based on quiescent state check.
+ * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
+ * return: 0 - success, 1 - no group reclaimed.
+ */
+static uint32_t
+__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
+{
+	struct __rte_lpm_qs_item qs_item;
+	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+	void *obj_token;
+	uint32_t cnt = 0;
+
+	/* Check reader threads quiescent state and
+	 * reclaim as much tbl8 groups as possible.
+	 */
+	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
+		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
+		(rte_rcu_qsbr_check(lpm->qsv, (uint64_t)((uintptr_t)obj_token),
+					false) == 1)) {
+		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
+
+		tbl8_entry = &lpm->tbl8[qs_item.index *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+		memset(&tbl8_entry[0], 0,
+				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
+				sizeof(tbl8_entry[0]));
+		cnt++;
+	}
+
+	if (cnt) {
+		if (index)
+			*index = qs_item.index;
+		return 0;
+	}
+	return 1;
+}
+
+/* Trigger tbl8 group reclaim when necessary.
+ * Reclaim happens when RCU QSBR queue usage is over 12.5%.
+ */
+static void
+__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm)
+{
+	if (lpm->qsv == NULL)
+		return;
+
+	if (rte_ring_count(lpm->qs_fifo) <
+		(rte_ring_get_capacity(lpm->qs_fifo) >> RCU_QSBR_RECLAIM_LEVEL))
+		return;
+
+	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
+{
+	uint32_t qs_fifo_size;
+	char rcu_ring_name[RTE_RING_NAMESIZE];
+
+	if ((lpm == NULL) || (v == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->qsv) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	/* round up qs_fifo_size to next power of two that is not less than
+	 * number_tbl8s. Will store 'token' and 'index'.
+	 */
+	qs_fifo_size = 2 * rte_align32pow2(lpm->number_tbl8s);
+
+	/* Init QSBR reclaiming FIFO. */
+	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm->name);
+	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
+					SOCKET_ID_ANY, 0);
+	if (lpm->qs_fifo == NULL) {
+		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation failed\n");
+		rte_errno = ENOMEM;
+		return 1;
+	}
+	lpm->qsv = v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -640,6 +788,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
 	return -EINVAL;
 }
 
+static int32_t
+tbl8_alloc_reclaimed(struct rte_lpm *lpm)
+{
+	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+	uint32_t index;
+
+	if (lpm->qsv != NULL) {
+		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
+			/* Set the last reclaimed tbl8 group as VALID. */
+			struct rte_lpm_tbl_entry new_tbl8_entry = {
+				.next_hop = 0,
+				.valid = INVALID,
+				.depth = 0,
+				.valid_group = VALID,
+			};
+
+			tbl8_entry = &lpm->tbl8[index *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+			__atomic_store(tbl8_entry, &new_tbl8_entry,
+					__ATOMIC_RELAXED);
+
+			/* Return group index for reclaimed tbl8 group. */
+			return index;
+		}
+	}
+
+	return -ENOSPC;
+}
+
 /*
  * Find, clean and allocate a tbl8.
  */
@@ -679,14 +856,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
 }
 
 static int32_t
-tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+tbl8_alloc_v1604(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -708,8 +886,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 		}
 	}
 
-	/* If there are no tbl8 groups free then return error. */
-	return -ENOSPC;
+	/* If there are no tbl8 groups free then check reclaim queue. */
+	return tbl8_alloc_reclaimed(lpm);
 }
 
 static void
@@ -728,13 +906,27 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
 }
 
 static void
-tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm_qs_item qs_item;
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (lpm->qsv != NULL) {
+		/* Push into QSBR FIFO. */
+		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
+		qs_item.index = tbl8_group_start;
+		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0)
+			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
+
+		/* Speculatively reclaim tbl8 groups.
+		 * Help spread the reclaim work load across multiple calls.
+		 */
+		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
+	} else {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	}
 }
 
 static __rte_noinline int32_t
@@ -1037,7 +1229,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -1083,7 +1275,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -1818,7 +2010,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -1834,7 +2026,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 906ec4483..5079fb262 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -21,6 +22,7 @@
 #include <rte_common.h>
 #include <rte_vect.h>
 #include <rte_compat.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -186,6 +188,8 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8 group.*/
+	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
 };
 
 /**
@@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
 void
 rte_lpm_free_v1604(struct rte_lpm *lpm);
 
+/**
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param v
+ *   RCU QSBR variable
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 90beac853..b353aabd2 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -44,3 +44,9 @@ DPDK_17.05 {
 	rte_lpm6_lookup_bulk_func;
 
 } DPDK_16.04;
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
diff --git a/lib/meson.build b/lib/meson.build
index e5ff83893..3a96f005d 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,6 +11,7 @@
 libraries = [
 	'kvargs', # eal depends on kvargs
 	'eal', # everything depends on eal
+	'rcu', # hash and lpm depends on this
 	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
 	'cmdline',
 	'metrics', # bitrate/latency stats depends on this
@@ -22,7 +23,7 @@ libraries = [
 	'gro', 'gso', 'ip_frag', 'jobstats',
 	'kni', 'latencystats', 'lpm', 'member',
 	'power', 'pdump', 'rawdev',
-	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
+	'reorder', 'sched', 'security', 'stack', 'vhost',
 	# ipsec lib depends on net, crypto and security
 	'ipsec',
 	# add pkt framework libs which use other libs from above
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
                   ` (2 preceding siblings ...)
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2019-08-22 15:52 ` Honnappa Nagarahalli
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
  4 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-22 15:52 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, Dharmik Thakkar, nd, Ruifeng Wang (Arm Technology China),
	stephen, Konstantin Ananyev, nd

+ Stephen, Konstantin - for your feedback on the RCU integration design.

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Thursday, August 22, 2019 1:35 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> Dharmik Thakkar <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Ruifeng
> Wang (Arm Technology China) <Ruifeng.Wang@arm.com>
> Subject: [RFC PATCH 0/3] RCU integration with LPM library
> 
> This patchset integrates RCU QSBR support with LPM library.
> 
> Document is added with suggested design of integrating RCU library with other
> libraries in DPDK.
> As an example, LPM library adds the integration. RCU is used to safely free tbl8
> groups that can be recycled. Table will not be reclaimed or reused until reader
> finished referencing it.
> 
> New API rte_lpm_rcu_qsbr_add is introduced for application to register a RCU
> variable that LPM library will use.
> 
> New API rte_ring_peek is introduced to help on management of reclaiming
> FIFO queue.
> 
> 
> Honnappa Nagarahalli (1):
>   doc/rcu: add RCU integration design details
> 
> Ruifeng Wang (2):
>   lib/ring: add peek API
>   lib/lpm: integrate RCU QSBR
> 
>  doc/guides/prog_guide/rcu_lib.rst  |  51 +++++++
>  lib/librte_lpm/Makefile            |   3 +-
>  lib/librte_lpm/meson.build         |   2 +
>  lib/librte_lpm/rte_lpm.c           | 218 +++++++++++++++++++++++++++--
>  lib/librte_lpm/rte_lpm.h           |  22 +++
>  lib/librte_lpm/rte_lpm_version.map |   6 +
>  lib/librte_ring/rte_ring.h         |  30 ++++
>  lib/meson.build                    |   3 +-
>  8 files changed, 320 insertions(+), 15 deletions(-)
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
  2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2019-08-23  1:23   ` Stephen Hemminger
  2019-08-26  3:11     ` Ruifeng Wang (Arm Technology China)
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen Hemminger @ 2019-08-23  1:23 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: bruce.richardson, vladimir.medvedkin, olivier.matz, dev,
	honnappa.nagarahalli, dharmik.thakkar, nd

On Thu, 22 Aug 2019 14:34:57 +0800
Ruifeng Wang <ruifeng.wang@arm.com> wrote:

> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>

Having RCU in LPM is a good idea but difficult to find out how to
do it in DPDK. Not everyone wants to use RCU, so making a required part
of how LPM is used will impact users.

Also, it looks like DPDK RCU lacks a good generic way to handle deferred
free. Having to introduce a ring to handle is adding more complexity when
a generic solution would be better (see userspace RCU library for example).
Other parts of DPDK would benefit if deferred free was done better.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
  2019-08-23  1:23   ` Stephen Hemminger
@ 2019-08-26  3:11     ` Ruifeng Wang (Arm Technology China)
  2019-08-26  5:32       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang (Arm Technology China) @ 2019-08-26  3:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, vladimir.medvedkin, olivier.matz, dev,
	Honnappa Nagarahalli, Dharmik Thakkar, nd, nd


> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, August 23, 2019 09:23
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>
> Cc: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com; dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
> 
> On Thu, 22 Aug 2019 14:34:57 +0800
> Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> 
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> 
> Having RCU in LPM is a good idea but difficult to find out how to do it in DPDK.
> Not everyone wants to use RCU, so making a required part of how LPM is
> used will impact users.

LPM users will not be imposed to use RCU. New API is provided to enable the RCU
functionality in LPM library. For users not using RCU, code path is intact, and there
will be no performance drop.

> 
> Also, it looks like DPDK RCU lacks a good generic way to handle deferred free.
> Having to introduce a ring to handle is adding more complexity when a
> generic solution would be better (see userspace RCU library for example).
> Other parts of DPDK would benefit if deferred free was done better.

This requires support from RCU library. 
Needs Honnappa's comment.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
  2019-08-26  3:11     ` Ruifeng Wang (Arm Technology China)
@ 2019-08-26  5:32       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-08-26  5:32 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China), Stephen Hemminger
  Cc: bruce.richardson, vladimir.medvedkin, olivier.matz, dev,
	Dharmik Thakkar, Honnappa Nagarahalli, nd, nd

<snip>
Thank you Stephen for your comments, appreciate your inputs.

> > On Thu, 22 Aug 2019 14:34:57 +0800
> > Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> >
> > > Currently, the tbl8 group is freed even though the readers might be
> > > using the tbl8 group entries. The freed tbl8 group can be
> > > reallocated quickly. This results in incorrect lookup results.
> > >
> > > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > > Refer to RCU documentation to understand various aspects of
> > > integrating RCU library into other libraries.
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> >
> > Having RCU in LPM is a good idea but difficult to find out how to do it in
> DPDK.
> > Not everyone wants to use RCU, so making a required part of how LPM is
> > used will impact users.
> 
> LPM users will not be imposed to use RCU. New API is provided to enable the
> RCU functionality in LPM library. For users not using RCU, code path is intact,
> and there will be no performance drop.
> 
> >
> > Also, it looks like DPDK RCU lacks a good generic way to handle deferred
> free.
Both rcu_defer and call_rcu from 'userspace RCU library' are wrappers on top of the underlying basic mechanisms. Such wrappers can be added. However, I would prefer to integrate RCU into couple of libraries to clearly show the need for wrappers. Integrating RCU in the libraries also removes some burden from the application.

> > Having to introduce a ring to handle is adding more complexity when a
> > generic solution would be better (see userspace RCU library for example).
A ring is required in rcu_defer as well as call_rcu since the pointer needs to be stored while waiting for quiescent state updates. The ring is used in the proposed solution for the same purpose.
I briefly looked through rcu_defer. The solution proposed here seems to be similar to rcu_defer. However, there are several differences.
1) rcu_defer uses a single queue for each updater thread, the proposed solution uses per data structure ring. IMO, this provides a better control over the resources to reclaim. Note that currently, the ring usage itself is not optimized (intentionally) to keep the patches focused on the understanding the design.
2) rcu_defer also launches another thread which wakes up periodically and reclaims the resources in the ring (along with the updater thread calling synchronize_rcu, which blocks, when the queue is full). This requires additional synchronization between the updater thread and the reclaimer thread. The solution proposed here does not need another thread as the DPDK RCU library provides non-blocking reclaiming mechanism, reclaiming is done in the context of the update thread.

> > Other parts of DPDK would benefit if deferred free was done better.
Which other parts are you talking about? The design proposed in 1/3 is a common solution that should apply to other libraries as well.

> 
> This requires support from RCU library.
> Needs Honnappa's comment.


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 0/6] RCU integration with LPM library
  2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
                   ` (3 preceding siblings ...)
  2019-08-22 15:52 ` [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Honnappa Nagarahalli
@ 2019-09-06  9:45 ` Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
                     ` (14 more replies)
  4 siblings, 15 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Document is added with suggested design of integrating RCU
library with other libraries in DPDK.
As an example, LPM library adds the integration. As an option,
RCU is used to safely free tbl8 groups that can be recycled.
Table will not be reclaimed or reused until reader finished
referencing it.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

New API rte_ring_peek is introduced to help on management of
reclaiming FIFO queue.


Honnappa Nagarahalli (3):
  doc/rcu: add RCU integration design details
  test/lpm: reset total time
  test/lpm: add RCU integration performance tests

Ruifeng Wang (3):
  lib/ring: add peek API
  lib/lpm: integrate RCU QSBR
  app/test: add test case for LPM RCU integration

 app/test/test_lpm.c                | 153 +++++++++++++++-
 app/test/test_lpm_perf.c           | 278 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/rcu_lib.rst  |  52 ++++++
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++--
 lib/librte_lpm/rte_lpm.h           |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/librte_ring/rte_ring.h         |  30 ++++
 lib/meson.build                    |   3 +-
 10 files changed, 751 insertions(+), 21 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06 19:44     ` Honnappa Nagarahalli
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add peek API Ruifeng Wang
                     ` (13 subsequent siblings)
  14 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 52 +++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 8fe5b1f73..211948530 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -186,3 +186,55 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Integrating QSBR RCU with other libraries
+-----------------------------------------
+
+Lock-free algorithms place additional burden on the application to reclaim
+memory. Integrating memory reclamation mechanisms in the libraries help
+remove some of the burden. Though QSBR method presents flexibility to
+achieve performance, it presents challenges while integrating with libraries.
+
+The memory reclamation process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
+
+The application has to handle 'Initialization' and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader threads to report their quiescent state.
+* the application has to register the same RCU variable with the client library.
+* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
+
+The client library will handle 'Reclaiming Resources' part of the process. The
+client libraries will make use of the writer thread context to execute the memory
+reclamation algorithm. So,
+
+* client library should provide an API to register a RCU variable that it will use.
+* client library should trigger the readers to report quiescent state status upon deleting the resources by calling ``rte_rcu_qsbr_start``.
+
+* client library should store the token and deleted resources for later use to free them after the readers have reported their quiescent state. Since the readers will report the quiescent state status in the order of deletion, the library must store the tokens/resources in the order in which the resources were deleted. A FIFO data structure would achieve the desired results. The length of the FIFO would depend on the rate of deletion and the rate at which the readers report their quiescent state. In the worst case the length of FIFO would be equal to the maximum number of resources the data structure supports. However, in most cases, the length will be much smaller. But, the client library should not take the length of FIFO as an input from the application. Instead, it should implement a data structure which should be able to grow/shrink dynamically. Overhead introduced by such a data structure on delete operations should be considered as well.
+
+* client library should query the quiescent state and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. This allows the application to do useful work while the readers report their quiescent state. If there are tokens/resources present in the FIFO already, the delete API should peek the head of the FIFO and check the quiescent state status. If the status is success, the token/resource should be dequeued and the resource should be freed. This process can be repeated till the quiescent state status for a token returns failure indicating that subsequent tokens will also fail quiescent state status query. The same process can be incorporated while adding new entries in the data structure if the client library runs out of resources.
+
+The 'Shutdown' process needs to be shared between the application and the
+client library.
+
+* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function.
+
+* client library should check the quiescent state status of all the tokens that may be present in the FIFO and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of the tokens do not pass the quiescent state check, the client library should print an error and stop the memory reclamation process.
+
+Integrating the resource reclamation with client libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclamation happens as part of the writer thread with little impact on
+   performance.
+#. The client library has better control over the resources. For ex: the client
+   library can attempt to reclaim when it has run out of resources.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] lib/ring: add peek API
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, Ruifeng Wang

The peek API allows fetching the next available object in the ring
without dequeuing it. This helps in scenarios where dequeuing of
objects depend on their value.

Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..d3d0d5e18 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 				r->cons.single, available);
 }
 
+/**
+ * Peek one object from a ring.
+ *
+ * The peek API allows fetching the next available object in the ring
+ * without dequeuing it. This API is not multi-thread safe with respect
+ * to other consumer threads.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, object available
+ *   - -ENOENT: Not enough entries in the ring.
+ */
+__rte_experimental
+static __rte_always_inline int
+rte_ring_peek(struct rte_ring *r, void **obj_p)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_head = r->cons.head;
+	uint32_t count = (prod_tail - cons_head) & r->mask;
+	unsigned int n = 1;
+	if (count) {
+		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
+		return 0;
+	}
+	return -ENOENT;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add peek API Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06 19:44     ` Honnappa Nagarahalli
  2019-09-18 16:15     ` Medvedkin, Vladimir
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration Ruifeng Wang
                     ` (11 subsequent siblings)
  14 siblings, 2 replies; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++++++++--
 lib/librte_lpm/rte_lpm.h           |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/meson.build                    |   3 +-
 6 files changed, 244 insertions(+), 15 deletions(-)

diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index a7946a1c5..ca9e16312 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_lpm.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index a5176d8ae..19a35107f 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -2,9 +2,11 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
+allow_experimental_apis = true
 sources = files('rte_lpm.c', 'rte_lpm6.c')
 headers = files('rte_lpm.h', 'rte_lpm6.h')
 # since header files have different names, we can install all vector headers
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 3a929a1b1..9764b8de6 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <string.h>
@@ -22,6 +23,7 @@
 #include <rte_rwlock.h>
 #include <rte_spinlock.h>
 #include <rte_tailq.h>
+#include <rte_ring.h>
 
 #include "rte_lpm.h"
 
@@ -39,6 +41,11 @@ enum valid_flag {
 	VALID
 };
 
+struct __rte_lpm_qs_item {
+	uint64_t token;	/**< QSBR token.*/
+	uint32_t index;	/**< tbl8 group index.*/
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -381,6 +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	rte_ring_free(lpm->qs_fifo);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
@@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
 MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
 		rte_lpm_free_v1604);
 
+/* Add an item into FIFO.
+ * return: 0 - success
+ */
+static int
+__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
+	struct __rte_lpm_qs_item *item)
+{
+	if (rte_ring_free_count(fifo) < 2) {
+		RTE_LOG(ERR, LPM, "QS FIFO full\n");
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token);
+	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index);
+
+	return 0;
+}
+
+/* Remove item from FIFO.
+ * Used when data observed by rte_ring_peek.
+ */
+static void
+__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
+	struct __rte_lpm_qs_item *item)
+{
+	void *obj_token = NULL;
+	void *obj_index = NULL;
+
+	(void)rte_ring_sc_dequeue(fifo, &obj_token);
+	(void)rte_ring_sc_dequeue(fifo, &obj_index);
+
+	if (item) {
+		item->token = (uint64_t)((uintptr_t)obj_token);
+		item->index = (uint32_t)((uintptr_t)obj_index);
+	}
+}
+
+/* Max number of tbl8 groups to reclaim at one time. */
+#define RCU_QSBR_RECLAIM_SIZE	8
+
+/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
+ * reclaim will be triggered by tbl8_free.
+ */
+#define RCU_QSBR_RECLAIM_LEVEL	3
+
+/* Reclaim some tbl8 groups based on quiescent state check.
+ * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
+ * Params: lpm   - lpm object handle
+ *         index - (onput) one of successfully reclaimed tbl8 groups
+ * return: 0 - success, 1 - no group reclaimed.
+ */
+static uint32_t
+__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
+{
+	struct __rte_lpm_qs_item qs_item;
+	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+	void *obj_token;
+	uint32_t cnt = 0;
+
+	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n");
+	/* Check reader threads quiescent state and
+	 * reclaim as much tbl8 groups as possible.
+	 */
+	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
+		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
+		(rte_rcu_qsbr_check(lpm->qsv, (uint64_t)((uintptr_t)obj_token),
+					false) == 1)) {
+		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
+
+		tbl8_entry = &lpm->tbl8[qs_item.index *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+		memset(&tbl8_entry[0], 0,
+				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
+				sizeof(tbl8_entry[0]));
+		cnt++;
+	}
+
+	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt);
+	if (cnt) {
+		if (index)
+			*index = qs_item.index;
+		return 0;
+	}
+	return 1;
+}
+
+/* Trigger tbl8 group reclaim when necessary.
+ * Reclaim happens when RCU QSBR queue usage
+ * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL).
+ */
+static void
+__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm)
+{
+	if (lpm->qsv == NULL)
+		return;
+
+	if (rte_ring_count(lpm->qs_fifo) <
+		(rte_ring_get_capacity(lpm->qs_fifo) >> RCU_QSBR_RECLAIM_LEVEL))
+		return;
+
+	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
+{
+	uint32_t qs_fifo_size;
+	char rcu_ring_name[RTE_RING_NAMESIZE];
+
+	if ((lpm == NULL) || (v == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->qsv) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	/* round up qs_fifo_size to next power of two that is not less than
+	 * number_tbl8s. Will store 'token' and 'index'.
+	 */
+	qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1);
+
+	/* Init QSBR reclaiming FIFO. */
+	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm->name);
+	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
+					SOCKET_ID_ANY, 0);
+	if (lpm->qs_fifo == NULL) {
+		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation failed\n");
+		rte_errno = ENOMEM;
+		return 1;
+	}
+	lpm->qsv = v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
 	return -EINVAL;
 }
 
+static int32_t
+tbl8_alloc_reclaimed(struct rte_lpm *lpm)
+{
+	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+	uint32_t index;
+
+	if (lpm->qsv != NULL) {
+		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
+			/* Set the last reclaimed tbl8 group as VALID. */
+			struct rte_lpm_tbl_entry new_tbl8_entry = {
+				.next_hop = 0,
+				.valid = INVALID,
+				.depth = 0,
+				.valid_group = VALID,
+			};
+
+			tbl8_entry = &lpm->tbl8[index *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+			__atomic_store(tbl8_entry, &new_tbl8_entry,
+					__ATOMIC_RELAXED);
+
+			/* Return group index for reclaimed tbl8 group. */
+			return index;
+		}
+	}
+
+	return -ENOSPC;
+}
+
 /*
  * Find, clean and allocate a tbl8.
  */
@@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
 }
 
 static int32_t
-tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+tbl8_alloc_v1604(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -708,8 +887,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 		}
 	}
 
-	/* If there are no tbl8 groups free then return error. */
-	return -ENOSPC;
+	/* If there are no tbl8 groups free then check reclaim queue. */
+	return tbl8_alloc_reclaimed(lpm);
 }
 
 static void
@@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
 }
 
 static void
-tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm_qs_item qs_item;
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (lpm->qsv != NULL) {
+		/* Push into QSBR FIFO. */
+		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
+		qs_item.index =
+			tbl8_group_start / RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
+		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0)
+			/* This should never happen as FIFO size is big enough
+			 * to hold all tbl8 groups.
+			 */
+			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
+
+		/* Speculatively reclaim tbl8 groups.
+		 * Help spread the reclaim work load across multiple calls.
+		 */
+		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
+	} else {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	}
 }
 
 static __rte_noinline int32_t
@@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -1834,7 +2031,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 906ec4483..5079fb262 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -21,6 +22,7 @@
 #include <rte_common.h>
 #include <rte_vect.h>
 #include <rte_compat.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -186,6 +188,8 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8 group.*/
+	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
 };
 
 /**
@@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
 void
 rte_lpm_free_v1604(struct rte_lpm *lpm);
 
+/**
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param v
+ *   RCU QSBR variable
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 90beac853..b353aabd2 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -44,3 +44,9 @@ DPDK_17.05 {
 	rte_lpm6_lookup_bulk_func;
 
 } DPDK_16.04;
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
diff --git a/lib/meson.build b/lib/meson.build
index e5ff83893..3a96f005d 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,6 +11,7 @@
 libraries = [
 	'kvargs', # eal depends on kvargs
 	'eal', # everything depends on eal
+	'rcu', # hash and lpm depends on this
 	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
 	'cmdline',
 	'metrics', # bitrate/latency stats depends on this
@@ -22,7 +23,7 @@ libraries = [
 	'gro', 'gso', 'ip_frag', 'jobstats',
 	'kni', 'latencystats', 'lpm', 'member',
 	'power', 'pdump', 'rawdev',
-	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
+	'reorder', 'sched', 'security', 'stack', 'vhost',
 	# ipsec lib depends on net, crypto and security
 	'ipsec',
 	# add pkt framework libs which use other libs from above
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (2 preceding siblings ...)
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06 19:45     ` Honnappa Nagarahalli
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time Ruifeng Wang
                     ` (10 subsequent siblings)
  14 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_lpm.c | 153 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 152 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index e969fe051..cfd372395 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,8 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +64,9 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20
 };
 
 #define NUM_LPM_TESTS (sizeof(tests)/sizeof(tests[0]))
@@ -1266,6 +1271,152 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check LPM attached RCU QSBR variable and FIFO queue
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv2);
+	TEST_LPM_ASSERT(status != 0);
+
+	TEST_LPM_ASSERT(lpm->qsv == qsv);
+	TEST_LPM_ASSERT(lpm->qs_fifo != NULL);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add functional test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 18, 100, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (3 preceding siblings ...)
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-18 16:17     ` Medvedkin, Vladimir
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests Ruifeng Wang
                     ` (9 subsequent siblings)
  14 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, stable

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

total_time needs to be reset to measure the cycles for delete API.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 77eea66ad..a2578fe90 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -460,7 +460,7 @@ test_lpm_perf(void)
 			(double)total_time / ((double)ITERATIONS * BATCH_SIZE),
 			(count * 100.0) / (double)(ITERATIONS * BATCH_SIZE));
 
-	/* Delete */
+	/* Measure Delete */
 	status = 0;
 	begin = rte_rdtsc();
 
@@ -470,7 +470,7 @@ test_lpm_perf(void)
 				large_route_table[i].depth);
 	}
 
-	total_time += rte_rdtsc() - begin;
+	total_time = rte_rdtsc() - begin;
 
 	printf("Average LPM Delete: %g cycles\n",
 			(double)total_time / NUM_ROUTE_ENTRIES);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (4 preceding siblings ...)
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time Ruifeng Wang
@ 2019-09-06  9:45   ` Ruifeng Wang
  2019-09-06 19:46     ` Honnappa Nagarahalli
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
                     ` (8 subsequent siblings)
  14 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2019-09-06  9:45 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 274 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 271 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index a2578fe90..475e5d488 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,23 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_rcu_qsbr.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+/* Report quiescent state interval every 8192 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 8192
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +36,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +48,13 @@ struct route_rule {
 };
 
 struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +208,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +253,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +300,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +344,248 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(__attribute__((unused)) void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * Single writer, Single QS variable, Single QSBR query,
+ * Non-blocking rcu_qsbr_check
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +609,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +744,8 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
@ 2019-09-06 19:44     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:44 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, nd, paulmck, nd

Adding Paul for feedback on design

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, September 6, 2019 4:45 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>
> Subject: [PATCH v2 1/6] doc/rcu: add RCU integration design details
> 
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> Add a section to describe a design to integrate QSBR RCU library with other
> libraries in DPDK.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  doc/guides/prog_guide/rcu_lib.rst | 52 +++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rcu_lib.rst
> b/doc/guides/prog_guide/rcu_lib.rst
> index 8fe5b1f73..211948530 100644
> --- a/doc/guides/prog_guide/rcu_lib.rst
> +++ b/doc/guides/prog_guide/rcu_lib.rst
> @@ -186,3 +186,55 @@ However, when
> ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid  in debugging
> issues. One can mark the access to shared data structures on the  reader side
> using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if  all the locks are
> unlocked.
> +
> +Integrating QSBR RCU with other libraries
> +-----------------------------------------
> +
> +Lock-free algorithms place additional burden on the application to
> +reclaim memory. Integrating memory reclamation mechanisms in the
> +libraries help remove some of the burden. Though QSBR method presents
> +flexibility to achieve performance, it presents challenges while integrating
> with libraries.
> +
> +The memory reclamation process using QSBR can be split into 4 parts:
> +
> +#. Initialization
> +#. Quiescent State Reporting
> +#. Reclaiming Resources
> +#. Shutdown
> +
> +The design proposed here assigns different parts of this process to client
> libraries and applications. The term 'client library' refers to data structure
> libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of
> DPDK. The term 'application' refers to the packet processing application that
> makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
> +
> +The application has to handle 'Initialization' and 'Quiescent State
> +Reporting'. So,
> +
> +* the application has to create the RCU variable and register the reader
> threads to report their quiescent state.
> +* the application has to register the same RCU variable with the client library.
> +* reader threads in the application have to report the quiescent state. This
> allows for the application to control the length of the critical section/how
> frequently the application wants to report the quiescent state.
> +
> +The client library will handle 'Reclaiming Resources' part of the
> +process. The client libraries will make use of the writer thread
> +context to execute the memory reclamation algorithm. So,
> +
> +* client library should provide an API to register a RCU variable that it will use.
> +* client library should trigger the readers to report quiescent state status
> upon deleting the resources by calling ``rte_rcu_qsbr_start``.
> +
> +* client library should store the token and deleted resources for later use to
> free them after the readers have reported their quiescent state. Since the
> readers will report the quiescent state status in the order of deletion, the
> library must store the tokens/resources in the order in which the resources
> were deleted. A FIFO data structure would achieve the desired results. The
> length of the FIFO would depend on the rate of deletion and the rate at which
> the readers report their quiescent state. In the worst case the length of FIFO
> would be equal to the maximum number of resources the data structure
> supports. However, in most cases, the length will be much smaller. But, the
> client library should not take the length of FIFO as an input from the
> application. Instead, it should implement a data structure which should be able
> to grow/shrink dynamically. Overhead introduced by such a data structure on
> delete operations should be considered as well.
> +
> +* client library should query the quiescent state and free the resources. It
> should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the
> quiescent state. This allows the application to do useful work while the readers
> report their quiescent state. If there are tokens/resources present in the FIFO
> already, the delete API should peek the head of the FIFO and check the
> quiescent state status. If the status is success, the token/resource should be
> dequeued and the resource should be freed. This process can be repeated till
> the quiescent state status for a token returns failure indicating that
> subsequent tokens will also fail quiescent state status query. The same process
> can be incorporated while adding new entries in the data structure if the client
> library runs out of resources.
> +
> +The 'Shutdown' process needs to be shared between the application and
> +the client library.
> +
> +* the application should make sure that the reader threads are not using the
> shared data structure, unregister the reader threads from the QSBR variable
> before calling the client library's shutdown function.
> +
> +* client library should check the quiescent state status of all the tokens that
> may be present in the FIFO and free the resources. It should make use of non-
> blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of the
> tokens do not pass the quiescent state check, the client library should print an
> error and stop the memory reclamation process.
> +
> +Integrating the resource reclamation with client libraries removes the
> +burden from the application and makes it easy to use lock-free algorithms.
> +
> +This design has several advantages over currently known methods.
> +
> +#. Application does not need a dedicated thread to reclaim resources.
> Memory
> +   reclamation happens as part of the writer thread with little impact on
> +   performance.
> +#. The client library has better control over the resources. For ex: the client
> +   library can attempt to reclaim when it has run out of resources.
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2019-09-06 19:44     ` Honnappa Nagarahalli
  2019-09-18 16:15     ` Medvedkin, Vladimir
  1 sibling, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:44 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, nd, Ruifeng Wang (Arm Technology China),
	paulmck, nd

Adding Paul for feedback

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, September 6, 2019 4:46 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Ruifeng Wang (Arm
> Technology China) <Ruifeng.Wang@arm.com>
> Subject: [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
> 
> Currently, the tbl8 group is freed even though the readers might be using the
> tbl8 group entries. The freed tbl8 group can be reallocated quickly. This results
> in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of integrating RCU
> library into other libraries.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_lpm/Makefile            |   3 +-
>  lib/librte_lpm/meson.build         |   2 +
>  lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++++++++--
>  lib/librte_lpm/rte_lpm.h           |  22 +++
>  lib/librte_lpm/rte_lpm_version.map |   6 +
>  lib/meson.build                    |   3 +-
>  6 files changed, 244 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> a7946a1c5..ca9e16312 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name  LIB =
> librte_lpm.a
> 
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> 
>  EXPORT_MAP := rte_lpm_version.map
> 
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build index
> a5176d8ae..19a35107f 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -2,9 +2,11 @@
>  # Copyright(c) 2017 Intel Corporation
> 
>  version = 2
> +allow_experimental_apis = true
>  sources = files('rte_lpm.c', 'rte_lpm6.c')  headers = files('rte_lpm.h',
> 'rte_lpm6.h')  # since header files have different names, we can install all vector
> headers  # without worrying about which architecture we actually need
> headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')  deps +=
> ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> 3a929a1b1..9764b8de6 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #include <string.h>
> @@ -22,6 +23,7 @@
>  #include <rte_rwlock.h>
>  #include <rte_spinlock.h>
>  #include <rte_tailq.h>
> +#include <rte_ring.h>
> 
>  #include "rte_lpm.h"
> 
> @@ -39,6 +41,11 @@ enum valid_flag {
>  	VALID
>  };
> 
> +struct __rte_lpm_qs_item {
> +	uint64_t token;	/**< QSBR token.*/
> +	uint32_t index;	/**< tbl8 group index.*/
> +};
> +
>  /* Macro to enable/disable run-time checks. */  #if
> defined(RTE_LIBRTE_LPM_DEBUG)  #include <rte_debug.h> @@ -381,6
> +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> 
>  	rte_mcfg_tailq_write_unlock();
> 
> +	rte_ring_free(lpm->qs_fifo);
>  	rte_free(lpm->tbl8);
>  	rte_free(lpm->rules_tbl);
>  	rte_free(lpm);
> @@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> 16.04);  MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
>  		rte_lpm_free_v1604);
> 
> +/* Add an item into FIFO.
> + * return: 0 - success
> + */
> +static int
> +__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
> +	struct __rte_lpm_qs_item *item)
> +{
> +	if (rte_ring_free_count(fifo) < 2) {
> +		RTE_LOG(ERR, LPM, "QS FIFO full\n");
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token);
> +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index);
> +
> +	return 0;
> +}
> +
> +/* Remove item from FIFO.
> + * Used when data observed by rte_ring_peek.
> + */
> +static void
> +__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
> +	struct __rte_lpm_qs_item *item)
> +{
> +	void *obj_token = NULL;
> +	void *obj_index = NULL;
> +
> +	(void)rte_ring_sc_dequeue(fifo, &obj_token);
> +	(void)rte_ring_sc_dequeue(fifo, &obj_index);
> +
> +	if (item) {
> +		item->token = (uint64_t)((uintptr_t)obj_token);
> +		item->index = (uint32_t)((uintptr_t)obj_index);
> +	}
> +}
> +
> +/* Max number of tbl8 groups to reclaim at one time. */
> +#define RCU_QSBR_RECLAIM_SIZE	8
> +
> +/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
> + * reclaim will be triggered by tbl8_free.
> + */
> +#define RCU_QSBR_RECLAIM_LEVEL	3
> +
> +/* Reclaim some tbl8 groups based on quiescent state check.
> + * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
> + * Params: lpm   - lpm object handle
> + *         index - (onput) one of successfully reclaimed tbl8 groups
> + * return: 0 - success, 1 - no group reclaimed.
> + */
> +static uint32_t
> +__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
> +{
> +	struct __rte_lpm_qs_item qs_item;
> +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> +	void *obj_token;
> +	uint32_t cnt = 0;
> +
> +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n");
> +	/* Check reader threads quiescent state and
> +	 * reclaim as much tbl8 groups as possible.
> +	 */
> +	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
> +		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
> +		(rte_rcu_qsbr_check(lpm->qsv,
> (uint64_t)((uintptr_t)obj_token),
> +					false) == 1)) {
> +		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
> +
> +		tbl8_entry = &lpm->tbl8[qs_item.index *
> +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +		memset(&tbl8_entry[0], 0,
> +				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> +				sizeof(tbl8_entry[0]));
> +		cnt++;
> +	}
> +
> +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt);
> +	if (cnt) {
> +		if (index)
> +			*index = qs_item.index;
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +/* Trigger tbl8 group reclaim when necessary.
> + * Reclaim happens when RCU QSBR queue usage
> + * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL).
> + */
> +static void
> +__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm) {
> +	if (lpm->qsv == NULL)
> +		return;
> +
> +	if (rte_ring_count(lpm->qs_fifo) <
> +		(rte_ring_get_capacity(lpm->qs_fifo) >>
> RCU_QSBR_RECLAIM_LEVEL))
> +		return;
> +
> +	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL); }
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> +	uint32_t qs_fifo_size;
> +	char rcu_ring_name[RTE_RING_NAMESIZE];
> +
> +	if ((lpm == NULL) || (v == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->qsv) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * number_tbl8s. Will store 'token' and 'index'.
> +	 */
> +	qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1);
> +
> +	/* Init QSBR reclaiming FIFO. */
> +	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm-
> >name);
> +	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);
> +	if (lpm->qs_fifo == NULL) {
> +		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation
> failed\n");
> +		rte_errno = ENOMEM;
> +		return 1;
> +	}
> +	lpm->qsv = v;
> +
> +	return 0;
> +}
> +
>  /*
>   * Adds a rule to the rule table.
>   *
> @@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth)
>  	return -EINVAL;
>  }
> 
> +static int32_t
> +tbl8_alloc_reclaimed(struct rte_lpm *lpm) {
> +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> +	uint32_t index;
> +
> +	if (lpm->qsv != NULL) {
> +		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
> +			/* Set the last reclaimed tbl8 group as VALID. */
> +			struct rte_lpm_tbl_entry new_tbl8_entry = {
> +				.next_hop = 0,
> +				.valid = INVALID,
> +				.depth = 0,
> +				.valid_group = VALID,
> +			};
> +
> +			tbl8_entry = &lpm->tbl8[index *
> +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +			__atomic_store(tbl8_entry, &new_tbl8_entry,
> +					__ATOMIC_RELAXED);
> +
> +			/* Return group index for reclaimed tbl8 group. */
> +			return index;
> +		}
> +	}
> +
> +	return -ENOSPC;
> +}
> +
>  /*
>   * Find, clean and allocate a tbl8.
>   */
> @@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> *tbl8)  }
> 
>  static int32_t
> -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +tbl8_alloc_v1604(struct rte_lpm *lpm)
>  {
>  	uint32_t group_idx; /* tbl8 group index. */
>  	struct rte_lpm_tbl_entry *tbl8_entry;
> 
>  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>  		/* If a free tbl8 group is found clean it and set as VALID. */
>  		if (!tbl8_entry->valid_group) {
>  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 708,8 +887,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> number_tbl8s)
>  		}
>  	}
> 
> -	/* If there are no tbl8 groups free then return error. */
> -	return -ENOSPC;
> +	/* If there are no tbl8 groups free then check reclaim queue. */
> +	return tbl8_alloc_reclaimed(lpm);
>  }
> 
>  static void
> @@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8,
> uint32_t tbl8_group_start)  }
> 
>  static void
> -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>  {
> -	/* Set tbl8 group invalid*/
> +	struct __rte_lpm_qs_item qs_item;
>  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> 
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (lpm->qsv != NULL) {
> +		/* Push into QSBR FIFO. */
> +		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
> +		qs_item.index =
> +			tbl8_group_start /
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> +		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0)
> +			/* This should never happen as FIFO size is big enough
> +			 * to hold all tbl8 groups.
> +			 */
> +			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
> +
> +		/* Speculatively reclaim tbl8 groups.
> +		 * Help spread the reclaim work load across multiple calls.
> +		 */
> +		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
> +	} else {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	}
>  }
> 
>  static __rte_noinline int32_t
> @@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> 
>  	if (!lpm->tbl24[tbl24_index].valid) {
>  		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> 
>  		/* Check tbl8 allocation was successful. */
>  		if (tbl8_group_index < 0) {
> @@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
>  	} /* If valid entry but not extended calculate the index into Table8. */
>  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>  		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> 
>  		if (tbl8_group_index < 0) {
>  			return tbl8_group_index;
> @@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
>  		 */
>  		lpm->tbl24[tbl24_index].valid = 0;
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>  	} else if (tbl8_recycle_index > -1) {
>  		/* Update tbl24 entry. */
>  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> +2031,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked,
>  		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>  				__ATOMIC_RELAXED);
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>  	}
>  #undef group_idx
>  	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> 906ec4483..5079fb262 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #ifndef _RTE_LPM_H_
> @@ -21,6 +22,7 @@
>  #include <rte_common.h>
>  #include <rte_vect.h>
>  #include <rte_compat.h>
> +#include <rte_rcu_qsbr.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -186,6 +188,8 @@ struct rte_lpm {
>  			__rte_cache_aligned; /**< LPM tbl24 table. */
>  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8
> group.*/
> +	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
>  };
> 
>  /**
> @@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);  void
> rte_lpm_free_v1604(struct rte_lpm *lpm);
> 
> +/**
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param v
> + *   RCU QSBR variable
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
> +
>  /**
>   * Add a rule to the LPM table.
>   *
> diff --git a/lib/librte_lpm/rte_lpm_version.map
> b/lib/librte_lpm/rte_lpm_version.map
> index 90beac853..b353aabd2 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -44,3 +44,9 @@ DPDK_17.05 {
>  	rte_lpm6_lookup_bulk_func;
> 
>  } DPDK_16.04;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};
> diff --git a/lib/meson.build b/lib/meson.build index e5ff83893..3a96f005d
> 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,6 +11,7 @@
>  libraries = [
>  	'kvargs', # eal depends on kvargs
>  	'eal', # everything depends on eal
> +	'rcu', # hash and lpm depends on this
>  	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>  	'cmdline',
>  	'metrics', # bitrate/latency stats depends on this @@ -22,7 +23,7 @@
> libraries = [
>  	'gro', 'gso', 'ip_frag', 'jobstats',
>  	'kni', 'latencystats', 'lpm', 'member',
>  	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>  	# ipsec lib depends on net, crypto and security
>  	'ipsec',
>  	# add pkt framework libs which use other libs from above
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration Ruifeng Wang
@ 2019-09-06 19:45     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:45 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, nd, Ruifeng Wang (Arm Technology China),
	paulmck, nd

Adding Paul for feedback

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, September 6, 2019 4:46 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; Ruifeng Wang (Arm
> Technology China) <Ruifeng.Wang@arm.com>
> Subject: [PATCH v2 4/6] app/test: add test case for LPM RCU integration
> 
> Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
> Also test LPM library behavior when RCU QSBR is enabled.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  app/test/test_lpm.c | 153 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 152 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c index
> e969fe051..cfd372395 100644
> --- a/app/test/test_lpm.c
> +++ b/app/test/test_lpm.c
> @@ -8,6 +8,7 @@
> 
>  #include <rte_ip.h>
>  #include <rte_lpm.h>
> +#include <rte_malloc.h>
> 
>  #include "test.h"
>  #include "test_xmmt_ops.h"
> @@ -40,6 +41,8 @@ static int32_t test15(void);  static int32_t test16(void);
> static int32_t test17(void);  static int32_t test18(void);
> +static int32_t test19(void);
> +static int32_t test20(void);
> 
>  rte_lpm_test tests[] = {
>  /* Test Cases */
> @@ -61,7 +64,9 @@ rte_lpm_test tests[] = {
>  	test15,
>  	test16,
>  	test17,
> -	test18
> +	test18,
> +	test19,
> +	test20
>  };
> 
>  #define NUM_LPM_TESTS (sizeof(tests)/sizeof(tests[0])) @@ -1266,6
> +1271,152 @@ test18(void)
>  	return PASS;
>  }
> 
> +/*
> + * rte_lpm_rcu_qsbr_add positive and negative tests.
> + *  - Add RCU QSBR variable to LPM
> + *  - Add another RCU QSBR variable to LPM
> + *  - Check LPM attached RCU QSBR variable and FIFO queue  */ int32_t
> +test19(void)
> +{
> +	struct rte_lpm *lpm = NULL;
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	struct rte_rcu_qsbr *qsv;
> +	struct rte_rcu_qsbr *qsv2;
> +	int32_t status;
> +
> +	config.max_rules = MAX_RULES;
> +	config.number_tbl8s = NUMBER_TBL8S;
> +	config.flags = 0;
> +
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Create RCU QSBR variable */
> +	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
> +	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +					RTE_CACHE_LINE_SIZE,
> SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv != NULL);
> +
> +	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	/* Attach RCU QSBR to LPM table */
> +	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	/* Create and attach another RCU QSBR to LPM table */
> +	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +					RTE_CACHE_LINE_SIZE,
> SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv2 != NULL);
> +
> +	status = rte_lpm_rcu_qsbr_add(lpm, qsv2);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	TEST_LPM_ASSERT(lpm->qsv == qsv);
> +	TEST_LPM_ASSERT(lpm->qs_fifo != NULL);
> +
> +	rte_lpm_free(lpm);
> +	rte_free(qsv);
> +	rte_free(qsv2);
> +
> +	return PASS;
> +}
> +
> +/*
> + * rte_lpm_rcu_qsbr_add functional test.
> + *  - Create LPM which supports 1 tbl8 group at max
> + *  - Add RCU QSBR variable to LPM
> + *  - Add a rule with depth=28 (> 24)
> + *  - Register a reader thread (not a real thread)
> + *  - Reader lookup existing rule
> + *  - Writer delete the rule
> + *  - Reader lookup the rule
> + *  - Writer re-add the rule (no available tbl8 group)
> + *  - Reader report quiescent state and unregister
> + *  - Writer re-add the rule
> + *  - Reader lookup the rule
> + */
> +int32_t
> +test20(void)
> +{
> +	struct rte_lpm *lpm = NULL;
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	struct rte_rcu_qsbr *qsv;
> +	int32_t status;
> +	uint32_t ip, next_hop, next_hop_return;
> +	uint8_t depth;
> +
> +	config.max_rules = MAX_RULES;
> +	config.number_tbl8s = 1;
> +	config.flags = 0;
> +
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Create RCU QSBR variable */
> +	sz = rte_rcu_qsbr_get_memsize(1);
> +	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +					RTE_CACHE_LINE_SIZE,
> SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv != NULL);
> +
> +	status = rte_rcu_qsbr_init(qsv, 1);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	/* Attach RCU QSBR to LPM table */
> +	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	ip = RTE_IPV4(192, 18, 100, 100);
> +	depth = 28;
> +	next_hop = 1;
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
> +
> +	/* Register pseudo reader */
> +	status = rte_rcu_qsbr_thread_register(qsv, 0);
> +	TEST_LPM_ASSERT(status == 0);
> +	rte_rcu_qsbr_thread_online(qsv, 0);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(next_hop_return == next_hop);
> +
> +	/* Writer update */
> +	status = rte_lpm_delete(lpm, ip, depth);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	/* Reader quiescent */
> +	rte_rcu_qsbr_quiescent(qsv, 0);
> +
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	rte_rcu_qsbr_thread_offline(qsv, 0);
> +	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(next_hop_return == next_hop);
> +
> +	rte_lpm_free(lpm);
> +	rte_free(qsv);
> +
> +	return PASS;
> +}
> +
>  /*
>   * Do all unit tests.
>   */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests Ruifeng Wang
@ 2019-09-06 19:46     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-09-06 19:46 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China),
	bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, nd, paulmck, nd

Adding Paul for feedback

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, September 6, 2019 4:46 AM
> To: bruce.richardson@intel.com; vladimir.medvedkin@intel.com;
> olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>
> Subject: [PATCH v2 6/6] test/lpm: add RCU integration performance tests
> 
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> Add performance tests for RCU integration. The performance difference with
> and without RCU integration is very small (~1% to ~2%) on both Arm and x86
> platforms.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_lpm_perf.c | 274 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 271 insertions(+), 3 deletions(-)
> 
> diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c index
> a2578fe90..475e5d488 100644
> --- a/app/test/test_lpm_perf.c
> +++ b/app/test/test_lpm_perf.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #include <stdio.h>
> @@ -10,12 +11,23 @@
>  #include <rte_cycles.h>
>  #include <rte_random.h>
>  #include <rte_branch_prediction.h>
> +#include <rte_malloc.h>
>  #include <rte_ip.h>
>  #include <rte_lpm.h>
> +#include <rte_rcu_qsbr.h>
> 
>  #include "test.h"
>  #include "test_xmmt_ops.h"
> 
> +struct rte_lpm *lpm;
> +static struct rte_rcu_qsbr *rv;
> +static volatile uint8_t writer_done;
> +static volatile uint32_t thr_id;
> +/* Report quiescent state interval every 8192 lookups. Larger critical
> + * sections in reader will result in writer polling multiple times.
> + */
> +#define QSBR_REPORTING_INTERVAL 8192
> +
>  #define TEST_LPM_ASSERT(cond) do {                                            \
>  	if (!(cond)) {                                                        \
>  		printf("Error at line %d: \n", __LINE__);                     \
> @@ -24,6 +36,7 @@
>  } while(0)
> 
>  #define ITERATIONS (1 << 10)
> +#define RCU_ITERATIONS 10
>  #define BATCH_SIZE (1 << 12)
>  #define BULK_SIZE 32
> 
> @@ -35,9 +48,13 @@ struct route_rule {
>  };
> 
>  struct route_rule large_route_table[MAX_RULE_NUM];
> +/* Route table for routes with depth > 24 */ struct route_rule
> +large_ldepth_route_table[MAX_RULE_NUM];
> 
>  static uint32_t num_route_entries;
> +static uint32_t num_ldepth_route_entries;
>  #define NUM_ROUTE_ENTRIES num_route_entries
> +#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
> 
>  enum {
>  	IP_CLASS_A,
> @@ -191,7 +208,7 @@ static void generate_random_rule_prefix(uint32_t
> ip_class, uint8_t depth)
>  	uint32_t ip_head_mask;
>  	uint32_t rule_num;
>  	uint32_t k;
> -	struct route_rule *ptr_rule;
> +	struct route_rule *ptr_rule, *ptr_ldepth_rule;
> 
>  	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
>  		fixed_bit_num = IP_HEAD_BIT_NUM_A;
> @@ -236,10 +253,20 @@ static void generate_random_rule_prefix(uint32_t
> ip_class, uint8_t depth)
>  	 */
>  	start = lrand48() & mask;
>  	ptr_rule = &large_route_table[num_route_entries];
> +	ptr_ldepth_rule =
> &large_ldepth_route_table[num_ldepth_route_entries];
>  	for (k = 0; k < rule_num; k++) {
>  		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
>  			| ip_head_mask;
>  		ptr_rule->depth = depth;
> +		/* If the depth of the route is more than 24, store it
> +		 * in another table as well.
> +		 */
> +		if (depth > 24) {
> +			ptr_ldepth_rule->ip = ptr_rule->ip;
> +			ptr_ldepth_rule->depth = ptr_rule->depth;
> +			ptr_ldepth_rule++;
> +			num_ldepth_route_entries++;
> +		}
>  		ptr_rule++;
>  		start = (start + step) & mask;
>  	}
> @@ -273,6 +300,7 @@ static void generate_large_route_rule_table(void)
>  	uint8_t  depth;
> 
>  	num_route_entries = 0;
> +	num_ldepth_route_entries = 0;
>  	memset(large_route_table, 0, sizeof(large_route_table));
> 
>  	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) { @@ -
> 316,10 +344,248 @@ print_route_distribution(const struct route_rule *table,
> uint32_t n)
>  	printf("\n");
>  }
> 
> +/* Check condition and return an error if true. */ static uint16_t
> +enabled_core_ids[RTE_MAX_LCORE]; static unsigned int num_cores;
> +
> +/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
> +static inline uint32_t
> +alloc_thread_id(void)
> +{
> +	uint32_t tmp_thr_id;
> +
> +	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
> +	if (tmp_thr_id >= RTE_MAX_LCORE)
> +		printf("Invalid thread id %u\n", tmp_thr_id);
> +
> +	return tmp_thr_id;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure without RCU.
> + */
> +static int
> +test_lpm_reader(__attribute__((unused)) void *arg) {
> +	int i;
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +	} while (!writer_done);
> +
> +	return 0;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg) {
> +	int i;
> +	uint32_t thread_id = alloc_thread_id();
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	/* Register this thread to report quiescent state */
> +	rte_rcu_qsbr_thread_register(rv, thread_id);
> +	rte_rcu_qsbr_thread_online(rv, thread_id);
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +		/* Update quiescent state */
> +		rte_rcu_qsbr_quiescent(rv, thread_id);
> +	} while (!writer_done);
> +
> +	rte_rcu_qsbr_thread_offline(rv, thread_id);
> +	rte_rcu_qsbr_thread_unregister(rv, thread_id);
> +
> +	return 0;
> +}
> +
> +/*
> + * Functional test:
> + * Single writer, Single QS variable, Single QSBR query,
> + * Non-blocking rcu_qsbr_check
> + */
> +static int
> +test_lpm_rcu_perf(void)
> +{
> +	struct rte_lpm_config config;
> +	uint64_t begin, total_cycles;
> +	size_t sz;
> +	unsigned int i, j;
> +	uint16_t core_id;
> +	uint32_t next_hop_add = 0xAA;
> +
> +	if (rte_lcore_count() < 2) {
> +		printf("Not enough cores for lpm_rcu_perf_autotest,
> expecting at least 2\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_cores = 0;
> +	RTE_LCORE_FOREACH_SLAVE(core_id) {
> +		enabled_core_ids[num_cores] = core_id;
> +		num_cores++;
> +	}
> +
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Init RCU variable */
> +	sz = rte_rcu_qsbr_get_memsize(num_cores);
> +	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
> +						RTE_CACHE_LINE_SIZE);
> +	rte_rcu_qsbr_init(rv, num_cores);
> +
> +	/* Assign the RCU variable to LPM */
> +	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
> +		printf("RCU variable assignment failed\n");
> +		goto error;
> +	}
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d,
> route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d,
> route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS *
> NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES *
> ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +	lpm = NULL;
> +	rv = NULL;
> +
> +	/* Test without RCU integration */
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d,
> route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d,
> route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS *
> NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES *
> ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			printf("Warning: lcore %u not finished.\n",
> +				enabled_core_ids[i]);
> +
> +	rte_lpm_free(lpm);
> +
> +	return 0;
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until all readers have exited */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +
> +	return -1;
> +}
> +
>  static int
>  test_lpm_perf(void)
>  {
> -	struct rte_lpm *lpm = NULL;
>  	struct rte_lpm_config config;
> 
>  	config.max_rules = 2000000;
> @@ -343,7 +609,7 @@ test_lpm_perf(void)
>  	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
>  	TEST_LPM_ASSERT(lpm != NULL);
> 
> -	/* Measue add. */
> +	/* Measure add. */
>  	begin = rte_rdtsc();
> 
>  	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) { @@ -478,6 +744,8 @@
> test_lpm_perf(void)
>  	rte_lpm_delete_all(lpm);
>  	rte_lpm_free(lpm);
> 
> +	test_lpm_rcu_perf();
> +
>  	return 0;
>  }
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2019-09-06 19:44     ` Honnappa Nagarahalli
@ 2019-09-18 16:15     ` Medvedkin, Vladimir
  2019-09-19  6:17       ` Ruifeng Wang (Arm Technology China)
  1 sibling, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-09-18 16:15 UTC (permalink / raw)
  To: Ruifeng Wang, bruce.richardson, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd

Hi Ruifeng,

Thanks for this patchseries, see comments below.

On 06/09/2019 10:45, Ruifeng Wang wrote:
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
>
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>   lib/librte_lpm/Makefile            |   3 +-
>   lib/librte_lpm/meson.build         |   2 +
>   lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++++++++--
>   lib/librte_lpm/rte_lpm.h           |  22 +++
>   lib/librte_lpm/rte_lpm_version.map |   6 +
>   lib/meson.build                    |   3 +-
>   6 files changed, 244 insertions(+), 15 deletions(-)
>
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index a7946a1c5..ca9e16312 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
>   # library name
>   LIB = librte_lpm.a
>   
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>   CFLAGS += -O3
>   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
>   
>   EXPORT_MAP := rte_lpm_version.map
>   
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index a5176d8ae..19a35107f 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -2,9 +2,11 @@
>   # Copyright(c) 2017 Intel Corporation
>   
>   version = 2
> +allow_experimental_apis = true
>   sources = files('rte_lpm.c', 'rte_lpm6.c')
>   headers = files('rte_lpm.h', 'rte_lpm6.h')
>   # since header files have different names, we can install all vector headers
>   # without worrying about which architecture we actually need
>   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>   deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 3a929a1b1..9764b8de6 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>    */
>   
>   #include <string.h>
> @@ -22,6 +23,7 @@
>   #include <rte_rwlock.h>
>   #include <rte_spinlock.h>
>   #include <rte_tailq.h>
> +#include <rte_ring.h>
>   
>   #include "rte_lpm.h"
>   
> @@ -39,6 +41,11 @@ enum valid_flag {
>   	VALID
>   };
>   
> +struct __rte_lpm_qs_item {
> +	uint64_t token;	/**< QSBR token.*/
> +	uint32_t index;	/**< tbl8 group index.*/
> +};
> +
>   /* Macro to enable/disable run-time checks. */
>   #if defined(RTE_LIBRTE_LPM_DEBUG)
>   #include <rte_debug.h>
> @@ -381,6 +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
>   
>   	rte_mcfg_tailq_write_unlock();
>   
> +	rte_ring_free(lpm->qs_fifo);
>   	rte_free(lpm->tbl8);
>   	rte_free(lpm->rules_tbl);
>   	rte_free(lpm);
> @@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
>   MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
>   		rte_lpm_free_v1604);
>   
> +/* Add an item into FIFO.
> + * return: 0 - success
> + */
> +static int
> +__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
> +	struct __rte_lpm_qs_item *item)
> +{
> +	if (rte_ring_free_count(fifo) < 2) {
> +		RTE_LOG(ERR, LPM, "QS FIFO full\n");
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token);
> +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index);
> +
> +	return 0;
> +}
> +
> +/* Remove item from FIFO.
> + * Used when data observed by rte_ring_peek.
> + */
> +static void
> +__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
> +	struct __rte_lpm_qs_item *item)
Is it necessary to pass the pointer for struct __rte_lpm_qs_item? 
According to code only item.index is used after this call.
> +{
> +	void *obj_token = NULL;
> +	void *obj_index = NULL;
> +
> +	(void)rte_ring_sc_dequeue(fifo, &obj_token);
I think it is not necessary to cast here.
> +	(void)rte_ring_sc_dequeue(fifo, &obj_index);
> +
> +	if (item) {
I think it is redundant, it is always not NULL.
> +		item->token = (uint64_t)((uintptr_t)obj_token);
> +		item->index = (uint32_t)((uintptr_t)obj_index);
> +	}
> +}
> +
> +/* Max number of tbl8 groups to reclaim at one time. */
> +#define RCU_QSBR_RECLAIM_SIZE	8
> +
> +/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
> + * reclaim will be triggered by tbl8_free.
> + */
> +#define RCU_QSBR_RECLAIM_LEVEL	3
> +
> +/* Reclaim some tbl8 groups based on quiescent state check.
> + * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
> + * Params: lpm   - lpm object handle
> + *         index - (onput) one of successfully reclaimed tbl8 groups
> + * return: 0 - success, 1 - no group reclaimed.
> + */
> +static uint32_t
> +__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
> +{
> +	struct __rte_lpm_qs_item qs_item;
> +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
It is not necessary to init it with NULL.
> +	void *obj_token;
> +	uint32_t cnt = 0;
> +
> +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n");
> +	/* Check reader threads quiescent state and
> +	 * reclaim as much tbl8 groups as possible.
> +	 */
> +	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
> +		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
> +		(rte_rcu_qsbr_check(lpm->qsv, (uint64_t)((uintptr_t)obj_token),
> +					false) == 1)) {
> +		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
> +
> +		tbl8_entry = &lpm->tbl8[qs_item.index *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +		memset(&tbl8_entry[0], 0,
> +				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> +				sizeof(tbl8_entry[0]));
> +		cnt++;
> +	}
> +
> +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt);
> +	if (cnt) {
> +		if (index)
> +			*index = qs_item.index;
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +/* Trigger tbl8 group reclaim when necessary.
> + * Reclaim happens when RCU QSBR queue usage
> + * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL).
> + */
> +static void
> +__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm)
> +{
> +	if (lpm->qsv == NULL)
> +		return;
This check is redundant.
> +
> +	if (rte_ring_count(lpm->qs_fifo) <
> +		(rte_ring_get_capacity(lpm->qs_fifo) >> RCU_QSBR_RECLAIM_LEVEL))
> +		return;
> +
> +	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
> +{
> +	uint32_t qs_fifo_size;
> +	char rcu_ring_name[RTE_RING_NAMESIZE];
> +
> +	if ((lpm == NULL) || (v == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->qsv) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * number_tbl8s. Will store 'token' and 'index'.
> +	 */
> +	qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1);
> +
> +	/* Init QSBR reclaiming FIFO. */
> +	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s", lpm->name);
> +	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);
> +	if (lpm->qs_fifo == NULL) {
> +		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation failed\n");
> +		rte_errno = ENOMEM;
rte_ring_create() sets rte_errno on error, I don't think we need to 
rewrite it here.
> +		return 1;
> +	}
> +	lpm->qsv = v;
> +
> +	return 0;
> +}
> +
>   /*
>    * Adds a rule to the rule table.
>    *
> @@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
>   	return -EINVAL;
>   }
>   
> +static int32_t
> +tbl8_alloc_reclaimed(struct rte_lpm *lpm)
> +{
> +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> +	uint32_t index;
> +
> +	if (lpm->qsv != NULL) {
> +		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
> +			/* Set the last reclaimed tbl8 group as VALID. */
> +			struct rte_lpm_tbl_entry new_tbl8_entry = {
> +				.next_hop = 0,
> +				.valid = INVALID,
> +				.depth = 0,
> +				.valid_group = VALID,
> +			};
> +
> +			tbl8_entry = &lpm->tbl8[index *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +			__atomic_store(tbl8_entry, &new_tbl8_entry,
> +					__ATOMIC_RELAXED);
> +
> +			/* Return group index for reclaimed tbl8 group. */
> +			return index;
> +		}
> +	}
> +
> +	return -ENOSPC;
> +}
> +
>   /*
>    * Find, clean and allocate a tbl8.
>    */
> @@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
>   }
>   
>   static int32_t
> -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +tbl8_alloc_v1604(struct rte_lpm *lpm)
>   {
>   	uint32_t group_idx; /* tbl8 group index. */
>   	struct rte_lpm_tbl_entry *tbl8_entry;
>   
>   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>   		/* If a free tbl8 group is found clean it and set as VALID. */
>   		if (!tbl8_entry->valid_group) {
>   			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -708,8 +887,8 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>   		}
>   	}
>   
> -	/* If there are no tbl8 groups free then return error. */
> -	return -ENOSPC;
> +	/* If there are no tbl8 groups free then check reclaim queue. */
> +	return tbl8_alloc_reclaimed(lpm);
>   }
>   
>   static void
> @@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>   }
>   
>   static void
> -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>   {
> -	/* Set tbl8 group invalid*/
> +	struct __rte_lpm_qs_item qs_item;
>   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
>   
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (lpm->qsv != NULL) {
> +		/* Push into QSBR FIFO. */
> +		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
> +		qs_item.index =
> +			tbl8_group_start / RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> +		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo, &qs_item) != 0)
> +			/* This should never happen as FIFO size is big enough
> +			 * to hold all tbl8 groups.
> +			 */
> +			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
> +
> +		/* Speculatively reclaim tbl8 groups.
> +		 * Help spread the reclaim work load across multiple calls.
> +		 */
> +		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
> +	} else {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	}
>   }
>   
>   static __rte_noinline int32_t
> @@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   
>   	if (!lpm->tbl24[tbl24_index].valid) {
>   		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
>   
>   		/* Check tbl8 allocation was successful. */
>   		if (tbl8_group_index < 0) {
> @@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   	} /* If valid entry but not extended calculate the index into Table8. */
>   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>   		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
>   
>   		if (tbl8_group_index < 0) {
>   			return tbl8_group_index;
> @@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   		 */
>   		lpm->tbl24[tbl24_index].valid = 0;
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>   	} else if (tbl8_recycle_index > -1) {
>   		/* Update tbl24 entry. */
>   		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -1834,7 +2031,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>   				__ATOMIC_RELAXED);
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>   	}
>   #undef group_idx
>   	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index 906ec4483..5079fb262 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>    */
>   
>   #ifndef _RTE_LPM_H_
> @@ -21,6 +22,7 @@
>   #include <rte_common.h>
>   #include <rte_vect.h>
>   #include <rte_compat.h>
> +#include <rte_rcu_qsbr.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -186,6 +188,8 @@ struct rte_lpm {
>   			__rte_cache_aligned; /**< LPM tbl24 table. */
>   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8 group.*/
> +	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
>   };
>   
>   /**
> @@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
>   void
>   rte_lpm_free_v1604(struct rte_lpm *lpm);
>   
> +/**
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param v
> + *   RCU QSBR variable
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
> +
>   /**
>    * Add a rule to the LPM table.
>    *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 90beac853..b353aabd2 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -44,3 +44,9 @@ DPDK_17.05 {
>   	rte_lpm6_lookup_bulk_func;
>   
>   } DPDK_16.04;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..3a96f005d 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,6 +11,7 @@
>   libraries = [
>   	'kvargs', # eal depends on kvargs
>   	'eal', # everything depends on eal
> +	'rcu', # hash and lpm depends on this
>   	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>   	'cmdline',
>   	'metrics', # bitrate/latency stats depends on this
> @@ -22,7 +23,7 @@ libraries = [
>   	'gro', 'gso', 'ip_frag', 'jobstats',
>   	'kni', 'latencystats', 'lpm', 'member',
>   	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>   	# ipsec lib depends on net, crypto and security
>   	'ipsec',
>   	# add pkt framework libs which use other libs from above

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time Ruifeng Wang
@ 2019-09-18 16:17     ` Medvedkin, Vladimir
  2019-09-19  6:22       ` Ruifeng Wang (Arm Technology China)
  0 siblings, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-09-18 16:17 UTC (permalink / raw)
  To: Ruifeng Wang, bruce.richardson, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, gavin.hu, honnappa.nagarahalli,
	dharmik.thakkar, nd, stable

Hi Ruifeng,

Thanks for this bug fix.

I think it should be sent separately from this RCU related patch series.

On 06/09/2019 10:45, Ruifeng Wang wrote:
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>
> total_time needs to be reset to measure the cycles for delete API.
>
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   app/test/test_lpm_perf.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
> index 77eea66ad..a2578fe90 100644
> --- a/app/test/test_lpm_perf.c
> +++ b/app/test/test_lpm_perf.c
> @@ -460,7 +460,7 @@ test_lpm_perf(void)
>   			(double)total_time / ((double)ITERATIONS * BATCH_SIZE),
>   			(count * 100.0) / (double)(ITERATIONS * BATCH_SIZE));
>   
> -	/* Delete */
> +	/* Measure Delete */
>   	status = 0;
>   	begin = rte_rdtsc();
>   
> @@ -470,7 +470,7 @@ test_lpm_perf(void)
>   				large_route_table[i].depth);
>   	}
>   
> -	total_time += rte_rdtsc() - begin;
> +	total_time = rte_rdtsc() - begin;
>   
>   	printf("Average LPM Delete: %g cycles\n",
>   			(double)total_time / NUM_ROUTE_ENTRIES);

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
  2019-09-18 16:15     ` Medvedkin, Vladimir
@ 2019-09-19  6:17       ` Ruifeng Wang (Arm Technology China)
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang (Arm Technology China) @ 2019-09-19  6:17 UTC (permalink / raw)
  To: Medvedkin, Vladimir, bruce.richardson, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Honnappa Nagarahalli, Dharmik Thakkar, nd, nd

Hi Vladimir,

Thanks for your review and  the comments.
All the comments will be addressed in next version.

/Ruifeng

> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Thursday, September 19, 2019 00:16
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> bruce.richardson@intel.com; olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v2 3/6] lib/lpm: integrate RCU QSBR
> 
> Hi Ruifeng,
> 
> Thanks for this patchseries, see comments below.
> 
> On 06/09/2019 10:45, Ruifeng Wang wrote:
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >   lib/librte_lpm/Makefile            |   3 +-
> >   lib/librte_lpm/meson.build         |   2 +
> >   lib/librte_lpm/rte_lpm.c           | 223 +++++++++++++++++++++++++++--
> >   lib/librte_lpm/rte_lpm.h           |  22 +++
> >   lib/librte_lpm/rte_lpm_version.map |   6 +
> >   lib/meson.build                    |   3 +-
> >   6 files changed, 244 insertions(+), 15 deletions(-)
> >
> > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > a7946a1c5..ca9e16312 100644
> > --- a/lib/librte_lpm/Makefile
> > +++ b/lib/librte_lpm/Makefile
> > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
> >   # library name
> >   LIB = librte_lpm.a
> >
> > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> >   CFLAGS += -O3
> >   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal
> > -lrte_hash
> > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> >
> >   EXPORT_MAP := rte_lpm_version.map
> >
> > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > index a5176d8ae..19a35107f 100644
> > --- a/lib/librte_lpm/meson.build
> > +++ b/lib/librte_lpm/meson.build
> > @@ -2,9 +2,11 @@
> >   # Copyright(c) 2017 Intel Corporation
> >
> >   version = 2
> > +allow_experimental_apis = true
> >   sources = files('rte_lpm.c', 'rte_lpm6.c')
> >   headers = files('rte_lpm.h', 'rte_lpm6.h')
> >   # since header files have different names, we can install all vector headers
> >   # without worrying about which architecture we actually need
> >   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> >   deps += ['hash']
> > +deps += ['rcu']
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 3a929a1b1..9764b8de6 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >    */
> >
> >   #include <string.h>
> > @@ -22,6 +23,7 @@
> >   #include <rte_rwlock.h>
> >   #include <rte_spinlock.h>
> >   #include <rte_tailq.h>
> > +#include <rte_ring.h>
> >
> >   #include "rte_lpm.h"
> >
> > @@ -39,6 +41,11 @@ enum valid_flag {
> >   	VALID
> >   };
> >
> > +struct __rte_lpm_qs_item {
> > +	uint64_t token;	/**< QSBR token.*/
> > +	uint32_t index;	/**< tbl8 group index.*/
> > +};
> > +
> >   /* Macro to enable/disable run-time checks. */
> >   #if defined(RTE_LIBRTE_LPM_DEBUG)
> >   #include <rte_debug.h>
> > @@ -381,6 +388,7 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> >
> >   	rte_mcfg_tailq_write_unlock();
> >
> > +	rte_ring_free(lpm->qs_fifo);
> >   	rte_free(lpm->tbl8);
> >   	rte_free(lpm->rules_tbl);
> >   	rte_free(lpm);
> > @@ -390,6 +398,147 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> 16.04);
> >   MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> >   		rte_lpm_free_v1604);
> >
> > +/* Add an item into FIFO.
> > + * return: 0 - success
> > + */
> > +static int
> > +__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
> > +	struct __rte_lpm_qs_item *item)
> > +{
> > +	if (rte_ring_free_count(fifo) < 2) {
> > +		RTE_LOG(ERR, LPM, "QS FIFO full\n");
> > +		rte_errno = ENOSPC;
> > +		return 1;
> > +	}
> > +
> > +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token);
> > +	(void)rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Remove item from FIFO.
> > + * Used when data observed by rte_ring_peek.
> > + */
> > +static void
> > +__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
> > +	struct __rte_lpm_qs_item *item)
> Is it necessary to pass the pointer for struct __rte_lpm_qs_item?
> According to code only item.index is used after this call.
> > +{
> > +	void *obj_token = NULL;
> > +	void *obj_index = NULL;
> > +
> > +	(void)rte_ring_sc_dequeue(fifo, &obj_token);
> I think it is not necessary to cast here.
> > +	(void)rte_ring_sc_dequeue(fifo, &obj_index);
> > +
> > +	if (item) {
> I think it is redundant, it is always not NULL.
> > +		item->token = (uint64_t)((uintptr_t)obj_token);
> > +		item->index = (uint32_t)((uintptr_t)obj_index);
> > +	}
> > +}
> > +
> > +/* Max number of tbl8 groups to reclaim at one time. */
> > +#define RCU_QSBR_RECLAIM_SIZE	8
> > +
> > +/* When RCU QSBR FIFO usage is above
> 1/(2^RCU_QSBR_RECLAIM_LEVEL),
> > + * reclaim will be triggered by tbl8_free.
> > + */
> > +#define RCU_QSBR_RECLAIM_LEVEL	3
> > +
> > +/* Reclaim some tbl8 groups based on quiescent state check.
> > + * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
> > + * Params: lpm   - lpm object handle
> > + *         index - (onput) one of successfully reclaimed tbl8 groups
> > + * return: 0 - success, 1 - no group reclaimed.
> > + */
> > +static uint32_t
> > +__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t
> > +*index) {
> > +	struct __rte_lpm_qs_item qs_item;
> > +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> It is not necessary to init it with NULL.
> > +	void *obj_token;
> > +	uint32_t cnt = 0;
> > +
> > +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimation triggered.\n");
> > +	/* Check reader threads quiescent state and
> > +	 * reclaim as much tbl8 groups as possible.
> > +	 */
> > +	while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
> > +		(rte_ring_peek(lpm->qs_fifo, &obj_token) == 0) &&
> > +		(rte_rcu_qsbr_check(lpm->qsv,
> (uint64_t)((uintptr_t)obj_token),
> > +					false) == 1)) {
> > +		__rte_lpm_rcu_qsbr_fifo_pop(lpm->qs_fifo, &qs_item);
> > +
> > +		tbl8_entry = &lpm->tbl8[qs_item.index *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +		memset(&tbl8_entry[0], 0,
> > +				RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> > +				sizeof(tbl8_entry[0]));
> > +		cnt++;
> > +	}
> > +
> > +	RTE_LOG(DEBUG, LPM, "RCU QSBR reclaimed %u groups.\n", cnt);
> > +	if (cnt) {
> > +		if (index)
> > +			*index = qs_item.index;
> > +		return 0;
> > +	}
> > +	return 1;
> > +}
> > +
> > +/* Trigger tbl8 group reclaim when necessary.
> > + * Reclaim happens when RCU QSBR queue usage
> > + * is over 1/(2^RCU_QSBR_RECLAIM_LEVEL).
> > + */
> > +static void
> > +__rte_lpm_rcu_qsbr_try_reclaim(struct rte_lpm *lpm) {
> > +	if (lpm->qsv == NULL)
> > +		return;
> This check is redundant.
> > +
> > +	if (rte_ring_count(lpm->qs_fifo) <
> > +		(rte_ring_get_capacity(lpm->qs_fifo) >>
> RCU_QSBR_RECLAIM_LEVEL))
> > +		return;
> > +
> > +	(void)__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, NULL); }
> > +
> > +/* Associate QSBR variable with an LPM object.
> > + */
> > +int
> > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > +	uint32_t qs_fifo_size;
> > +	char rcu_ring_name[RTE_RING_NAMESIZE];
> > +
> > +	if ((lpm == NULL) || (v == NULL)) {
> > +		rte_errno = EINVAL;
> > +		return 1;
> > +	}
> > +
> > +	if (lpm->qsv) {
> > +		rte_errno = EEXIST;
> > +		return 1;
> > +	}
> > +
> > +	/* round up qs_fifo_size to next power of two that is not less than
> > +	 * number_tbl8s. Will store 'token' and 'index'.
> > +	 */
> > +	qs_fifo_size = rte_align32pow2((2 * lpm->number_tbl8s) + 1);
> > +
> > +	/* Init QSBR reclaiming FIFO. */
> > +	snprintf(rcu_ring_name, sizeof(rcu_ring_name), "LPM_RCU_%s",
> lpm->name);
> > +	lpm->qs_fifo = rte_ring_create(rcu_ring_name, qs_fifo_size,
> > +					SOCKET_ID_ANY, 0);
> > +	if (lpm->qs_fifo == NULL) {
> > +		RTE_LOG(ERR, LPM, "LPM QS FIFO memory allocation
> failed\n");
> > +		rte_errno = ENOMEM;
> rte_ring_create() sets rte_errno on error, I don't think we need to rewrite it
> here.
> > +		return 1;
> > +	}
> > +	lpm->qsv = v;
> > +
> > +	return 0;
> > +}
> > +
> >   /*
> >    * Adds a rule to the rule table.
> >    *
> > @@ -640,6 +789,35 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth)
> >   	return -EINVAL;
> >   }
> >
> > +static int32_t
> > +tbl8_alloc_reclaimed(struct rte_lpm *lpm) {
> > +	struct rte_lpm_tbl_entry *tbl8_entry = NULL;
> > +	uint32_t index;
> > +
> > +	if (lpm->qsv != NULL) {
> > +		if (__rte_lpm_rcu_qsbr_reclaim_chunk(lpm, &index) == 0) {
> > +			/* Set the last reclaimed tbl8 group as VALID. */
> > +			struct rte_lpm_tbl_entry new_tbl8_entry = {
> > +				.next_hop = 0,
> > +				.valid = INVALID,
> > +				.depth = 0,
> > +				.valid_group = VALID,
> > +			};
> > +
> > +			tbl8_entry = &lpm->tbl8[index *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +			__atomic_store(tbl8_entry, &new_tbl8_entry,
> > +					__ATOMIC_RELAXED);
> > +
> > +			/* Return group index for reclaimed tbl8 group. */
> > +			return index;
> > +		}
> > +	}
> > +
> > +	return -ENOSPC;
> > +}
> > +
> >   /*
> >    * Find, clean and allocate a tbl8.
> >    */
> > @@ -679,14 +857,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> *tbl8)
> >   }
> >
> >   static int32_t
> > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > number_tbl8s)
> > +tbl8_alloc_v1604(struct rte_lpm *lpm)
> >   {
> >   	uint32_t group_idx; /* tbl8 group index. */
> >   	struct rte_lpm_tbl_entry *tbl8_entry;
> >
> >   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > +		tbl8_entry = &lpm->tbl8[group_idx *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> >   		/* If a free tbl8 group is found clean it and set as VALID. */
> >   		if (!tbl8_entry->valid_group) {
> >   			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 708,8 +887,8 @@
> > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> >   		}
> >   	}
> >
> > -	/* If there are no tbl8 groups free then return error. */
> > -	return -ENOSPC;
> > +	/* If there are no tbl8 groups free then check reclaim queue. */
> > +	return tbl8_alloc_reclaimed(lpm);
> >   }
> >
> >   static void
> > @@ -728,13 +907,31 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20
> *tbl8, uint32_t tbl8_group_start)
> >   }
> >
> >   static void
> > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > tbl8_group_start)
> > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> >   {
> > -	/* Set tbl8 group invalid*/
> > +	struct __rte_lpm_qs_item qs_item;
> >   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> >
> > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > -			__ATOMIC_RELAXED);
> > +	if (lpm->qsv != NULL) {
> > +		/* Push into QSBR FIFO. */
> > +		qs_item.token = rte_rcu_qsbr_start(lpm->qsv);
> > +		qs_item.index =
> > +			tbl8_group_start /
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
> > +		if (__rte_lpm_rcu_qsbr_fifo_push(lpm->qs_fifo,
> &qs_item) != 0)
> > +			/* This should never happen as FIFO size is big
> enough
> > +			 * to hold all tbl8 groups.
> > +			 */
> > +			RTE_LOG(ERR, LPM, "Failed to push QSBR FIFO\n");
> > +
> > +		/* Speculatively reclaim tbl8 groups.
> > +		 * Help spread the reclaim work load across multiple calls.
> > +		 */
> > +		__rte_lpm_rcu_qsbr_try_reclaim(lpm);
> > +	} else {
> > +		/* Set tbl8 group invalid*/
> > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> > +				__ATOMIC_RELAXED);
> > +	}
> >   }
> >
> >   static __rte_noinline int32_t
> > @@ -1037,7 +1234,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked, uint8_t depth,
> >
> >   	if (!lpm->tbl24[tbl24_index].valid) {
> >   		/* Search for a free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >   		/* Check tbl8 allocation was successful. */
> >   		if (tbl8_group_index < 0) {
> > @@ -1083,7 +1280,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> >   	} /* If valid entry but not extended calculate the index into Table8. */
> >   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> >   		/* Search for free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >   		if (tbl8_group_index < 0) {
> >   			return tbl8_group_index;
> > @@ -1818,7 +2015,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
> >   		 */
> >   		lpm->tbl24[tbl24_index].valid = 0;
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >   	} else if (tbl8_recycle_index > -1) {
> >   		/* Update tbl24 entry. */
> >   		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> +2031,7 @@
> > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> >   		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> >   				__ATOMIC_RELAXED);
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >   	}
> >   #undef group_idx
> >   	return 0;
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > 906ec4483..5079fb262 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >    */
> >
> >   #ifndef _RTE_LPM_H_
> > @@ -21,6 +22,7 @@
> >   #include <rte_common.h>
> >   #include <rte_vect.h>
> >   #include <rte_compat.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >   #ifdef __cplusplus
> >   extern "C" {
> > @@ -186,6 +188,8 @@ struct rte_lpm {
> >   			__rte_cache_aligned; /**< LPM tbl24 table. */
> >   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +	struct rte_rcu_qsbr *qsv;	/**< RCU QSBR variable for tbl8
> group.*/
> > +	struct rte_ring *qs_fifo;	/**< RCU QSBR reclaiming queue. */
> >   };
> >
> >   /**
> > @@ -248,6 +252,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> >   void
> >   rte_lpm_free_v1604(struct rte_lpm *lpm);
> >
> > +/**
> > + * Associate RCU QSBR variable with an LPM object.
> > + *
> > + * @param lpm
> > + *   the lpm object to add RCU QSBR
> > + * @param v
> > + *   RCU QSBR variable
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with error code set in rte_errno.
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - invalid pointer
> > + *   - EEXIST - already added QSBR
> > + *   - ENOMEM - memory allocation failure
> > + */
> > +__rte_experimental
> > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > +*v);
> > +
> >   /**
> >    * Add a rule to the LPM table.
> >    *
> > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > b/lib/librte_lpm/rte_lpm_version.map
> > index 90beac853..b353aabd2 100644
> > --- a/lib/librte_lpm/rte_lpm_version.map
> > +++ b/lib/librte_lpm/rte_lpm_version.map
> > @@ -44,3 +44,9 @@ DPDK_17.05 {
> >   	rte_lpm6_lookup_bulk_func;
> >
> >   } DPDK_16.04;
> > +
> > +EXPERIMENTAL {
> > +	global:
> > +
> > +	rte_lpm_rcu_qsbr_add;
> > +};
> > diff --git a/lib/meson.build b/lib/meson.build index
> > e5ff83893..3a96f005d 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,6 +11,7 @@
> >   libraries = [
> >   	'kvargs', # eal depends on kvargs
> >   	'eal', # everything depends on eal
> > +	'rcu', # hash and lpm depends on this
> >   	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >   	'cmdline',
> >   	'metrics', # bitrate/latency stats depends on this @@ -22,7 +23,7
> > @@ libraries = [
> >   	'gro', 'gso', 'ip_frag', 'jobstats',
> >   	'kni', 'latencystats', 'lpm', 'member',
> >   	'power', 'pdump', 'rawdev',
> > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >   	# ipsec lib depends on net, crypto and security
> >   	'ipsec',
> >   	# add pkt framework libs which use other libs from above
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time
  2019-09-18 16:17     ` Medvedkin, Vladimir
@ 2019-09-19  6:22       ` Ruifeng Wang (Arm Technology China)
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang (Arm Technology China) @ 2019-09-19  6:22 UTC (permalink / raw)
  To: Medvedkin, Vladimir, bruce.richardson, olivier.matz
  Cc: dev, stephen, konstantin.ananyev, Gavin Hu (Arm Technology China),
	Honnappa Nagarahalli, Dharmik Thakkar, nd, stable, nd

Hi Vladimir,

> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Thursday, September 19, 2019 00:18
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>;
> bruce.richardson@intel.com; olivier.matz@6wind.com
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> konstantin.ananyev@intel.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; stable@dpdk.org
> Subject: Re: [PATCH v2 5/6] test/lpm: reset total time
> 
> Hi Ruifeng,
> 
> Thanks for this bug fix.
> 
> I think it should be sent separately from this RCU related patch series.
Agree. It will be sent out separately.
> 
> On 06/09/2019 10:45, Ruifeng Wang wrote:
> > From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >
> > total_time needs to be reset to measure the cycles for delete API.
> >
> > Fixes: af75078fece3 ("first public release")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >   app/test/test_lpm_perf.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c index
> > 77eea66ad..a2578fe90 100644
> > --- a/app/test/test_lpm_perf.c
> > +++ b/app/test/test_lpm_perf.c
> > @@ -460,7 +460,7 @@ test_lpm_perf(void)
> >   			(double)total_time / ((double)ITERATIONS *
> BATCH_SIZE),
> >   			(count * 100.0) / (double)(ITERATIONS *
> BATCH_SIZE));
> >
> > -	/* Delete */
> > +	/* Measure Delete */
> >   	status = 0;
> >   	begin = rte_rdtsc();
> >
> > @@ -470,7 +470,7 @@ test_lpm_perf(void)
> >   				large_route_table[i].depth);
> >   	}
> >
> > -	total_time += rte_rdtsc() - begin;
> > +	total_time = rte_rdtsc() - begin;
> >
> >   	printf("Average LPM Delete: %g cycles\n",
> >   			(double)total_time / NUM_ROUTE_ENTRIES);
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (5 preceding siblings ...)
  2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests Ruifeng Wang
@ 2019-10-01  6:29   ` Honnappa Nagarahalli
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
                       ` (5 more replies)
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
                     ` (7 subsequent siblings)
  14 siblings, 6 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01  6:29 UTC (permalink / raw)
  To: honnappa.nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, dev, nd

This is not a new patch. This patch set is separated from the LPM
changes as the size of the changes in RCU library has grown due
to comments from community. These APIs will help reduce the changes
in LPM and hash libraries that are getting integrated with RCU
library.

This adds 4 new APIs to RCU library to create a defer queue, enqueue
deleted resources, reclaim resources and delete the defer queue.

The rationale for the APIs is documented in 3/3.

The patches to LPM and HASH libraries to integrate RCU will depend on
this patch.

v3
1) Separated from the original series (https://patches.dpdk.org/cover/58811/)
2) Added reclamation APIs and test cases (Stephen, Yipeng)

Honnappa Nagarahalli (1):
  lib/rcu: add resource reclamation APIs

Ruifeng Wang (2):
  lib/ring: add peek API
  doc/rcu: add RCU integration design details

 app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/rcu_lib.rst  |  59 ++++++
 lib/librte_rcu/meson.build         |   2 +
 lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
 lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
 lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
 lib/librte_rcu/rte_rcu_version.map |   4 +
 lib/librte_ring/rte_ring.h         |  30 +++
 lib/meson.build                    |   6 +-
 9 files changed, 789 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
@ 2019-10-01  6:29     ` Honnappa Nagarahalli
  2019-10-02 18:42       ` Ananyev, Konstantin
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01  6:29 UTC (permalink / raw)
  To: honnappa.nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, dev, nd

From: Ruifeng Wang <ruifeng.wang@arm.com>

The peek API allows fetching the next available object in the ring
without dequeuing it. This helps in scenarios where dequeuing of
objects depend on their value.

Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..d3d0d5e18 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 				r->cons.single, available);
 }
 
+/**
+ * Peek one object from a ring.
+ *
+ * The peek API allows fetching the next available object in the ring
+ * without dequeuing it. This API is not multi-thread safe with respect
+ * to other consumer threads.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, object available
+ *   - -ENOENT: Not enough entries in the ring.
+ */
+__rte_experimental
+static __rte_always_inline int
+rte_ring_peek(struct rte_ring *r, void **obj_p)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_head = r->cons.head;
+	uint32_t count = (prod_tail - cons_head) & r->mask;
+	unsigned int n = 1;
+	if (count) {
+		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
+		return 0;
+	}
+	return -ENOENT;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
@ 2019-10-01  6:29     ` Honnappa Nagarahalli
  2019-10-02 17:39       ` Ananyev, Konstantin
                         ` (3 more replies)
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 3/3] doc/rcu: add RCU integration design details Honnappa Nagarahalli
                       ` (3 subsequent siblings)
  5 siblings, 4 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01  6:29 UTC (permalink / raw)
  To: honnappa.nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, dev, nd

Add resource reclamation APIs to make it simple for applications
and libraries to integrate rte_rcu library.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
 lib/librte_rcu/meson.build         |   2 +
 lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
 lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
 lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
 lib/librte_rcu/rte_rcu_version.map |   4 +
 lib/meson.build                    |   6 +-
 7 files changed, 700 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h

diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
index d1b9e46a2..3a6815243 100644
--- a/app/test/test_rcu_qsbr.c
+++ b/app/test/test_rcu_qsbr.c
@@ -1,8 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright (c) 2018 Arm Limited
+ * Copyright (c) 2019 Arm Limited
  */
 
 #include <stdio.h>
+#include <string.h>
 #include <rte_pause.h>
 #include <rte_rcu_qsbr.h>
 #include <rte_hash.h>
@@ -33,6 +34,7 @@ static uint32_t *keys;
 #define COUNTER_VALUE 4096
 static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
 static uint8_t writer_done;
+static uint8_t cb_failed;
 
 static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
 struct rte_hash *h[RTE_MAX_LCORE];
@@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
 	return 0;
 }
 
+static void
+rte_rcu_qsbr_test_free_resource(void *p, void *e)
+{
+	if (p != NULL && e != NULL) {
+		printf("%s: Test failed\n", __func__);
+		cb_failed = 1;
+	}
+}
+
+/*
+ * rte_rcu_qsbr_dq_create: create a queue used to store the data structure
+ * elements that can be freed later. This queue is referred to as 'defer queue'.
+ */
+static int
+test_rcu_qsbr_dq_create(void)
+{
+	char rcu_dq_name[RTE_RING_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_create()\n");
+
+	/* Pass invalid parameters */
+	dq = rte_rcu_qsbr_dq_create(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.f = rte_rcu_qsbr_test_free_resource;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.size = 1;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.esize = 3;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	/* Pass all valid parameters */
+	params.esize = 16;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	rte_rcu_qsbr_dq_delete(dq);
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
+ * to be freed later after atleast one grace period is over.
+ */
+static int
+test_rcu_qsbr_dq_enqueue(void)
+{
+	int ret;
+	uint64_t r;
+	char rcu_dq_name[RTE_RING_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
+
+	/* Create a queue with simple parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.f = rte_rcu_qsbr_test_free_resource;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 16;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
+ */
+static int
+test_rcu_qsbr_dq_reclaim(void)
+{
+	int ret;
+
+	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_reclaim(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_delete: Delete a defer queue.
+ */
+static int
+test_rcu_qsbr_dq_delete(void)
+{
+	int ret;
+	char rcu_dq_name[RTE_RING_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_delete(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid params");
+
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.f = rte_rcu_qsbr_test_free_resource;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 16;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
+ * to be freed later after atleast one grace period is over.
+ */
+static int
+test_rcu_qsbr_dq_functional(int32_t size, int32_t esize)
+{
+	int i, j, ret;
+	char rcu_dq_name[RTE_RING_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+	uint64_t *e;
+	uint64_t sc = 200;
+	int max_entries;
+
+	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
+	printf("Size = %d, esize = %d\n", size, esize);
+
+	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
+	if (e == NULL)
+		return 0;
+	cb_failed = 0;
+
+	/* Initialize the RCU variable. No threads are registered */
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+
+	/* Create a queue with simple parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.f = rte_rcu_qsbr_test_free_resource;
+	params.v = t[0];
+	params.size = size;
+	params.esize = esize;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+
+	/* Given the size and esize, calculate the maximum number of entries
+	 * that can be stored on the defer queue (look at the logic used
+	 * in capacity calculation of rte_ring).
+	 */
+	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
+	max_entries = (max_entries - 1)/(esize/8 + 1);
+
+	/* Enqueue few counters starting with the value 'sc' */
+	/* The queue size will be rounded up to 2. The enqueue API also
+	 * reclaims if the queue size is above certain limit. Since, there
+	 * are no threads registered, reclamation succedes. Hence, it should
+	 * be possible to enqueue more than the provided queue size.
+	 */
+	for (i = 0; i < 10; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
+			"dq enqueue functional");
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Register a thread on the RCU QSBR variable. Reclamation will not
+	 * succeed. It should not be possible to enqueue more than the size
+	 * number of resources.
+	 */
+	rte_rcu_qsbr_thread_register(t[0], 1);
+	rte_rcu_qsbr_thread_online(t[0], 1);
+
+	for (i = 0; i < max_entries; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
+			"dq enqueue functional");
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Enqueue fails as queue is full */
+	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
+
+	/* Delete should fail as there are elements in defer queue which
+	 * cannot be reclaimed.
+	 */
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid params");
+
+	/* Report quiescent state, enqueue should succeed */
+	rte_rcu_qsbr_quiescent(t[0], 1);
+	for (i = 0; i < max_entries; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
+			"dq enqueue functional");
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Queue is full */
+	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
+
+	/* Report quiescent state, delete should succeed */
+	rte_rcu_qsbr_quiescent(t[0], 1);
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+
+	/* Validate that call back function did not return any error */
+	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
+
+	rte_free(e);
+	return 0;
+}
+
 /*
  * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
  */
@@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
 	if (test_rcu_qsbr_thread_offline() < 0)
 		goto test_fail;
 
+	if (test_rcu_qsbr_dq_create() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_reclaim() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_delete() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_enqueue() < 0)
+		goto test_fail;
+
 	printf("\nFunctional tests\n");
 
 	if (test_rcu_qsbr_sw_sv_3qs() < 0)
@@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
 	if (test_rcu_qsbr_mw_mv_mqs() < 0)
 		goto test_fail;
 
+	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
+		goto test_fail;
+
 	free_rcu();
 
 	printf("\n");
diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
index 62920ba02..e280b29c1 100644
--- a/lib/librte_rcu/meson.build
+++ b/lib/librte_rcu/meson.build
@@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
 if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
 	ext_deps += cc.find_library('atomic')
 endif
+
+deps += ['ring']
diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
index ce7f93dd3..76814f50b 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.c
+++ b/lib/librte_rcu/rte_rcu_qsbr.c
@@ -21,6 +21,7 @@
 #include <rte_errno.h>
 
 #include "rte_rcu_qsbr.h"
+#include "rte_rcu_qsbr_pvt.h"
 
 /* Get the memory size of QSBR variable */
 size_t
@@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
 	return 0;
 }
 
+/* Create a queue used to store the data structure elements that can
+ * be freed later. This queue is referred to as 'defer queue'.
+ */
+struct rte_rcu_qsbr_dq *
+rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
+{
+	struct rte_rcu_qsbr_dq *dq;
+	uint32_t qs_fifo_size;
+
+	if (params == NULL || params->f == NULL ||
+		params->v == NULL || params->name == NULL ||
+		params->size == 0 || params->esize == 0 ||
+		(params->esize % 8 != 0)) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return NULL;
+	}
+
+	dq = rte_zmalloc(NULL,
+		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
+		RTE_CACHE_LINE_SIZE);
+	if (dq == NULL) {
+		rte_errno = ENOMEM;
+
+		return NULL;
+	}
+
+	/* round up qs_fifo_size to next power of two that is not less than
+	 * max_size.
+	 */
+	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
+					* params->size) + 1);
+	dq->r = rte_ring_create(params->name, qs_fifo_size,
+					SOCKET_ID_ANY, 0);
+	if (dq->r == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): defer queue create failed\n", __func__);
+		rte_free(dq);
+		return NULL;
+	}
+
+	dq->v = params->v;
+	dq->size = params->size;
+	dq->esize = params->esize;
+	dq->f = params->f;
+	dq->p = params->p;
+
+	return dq;
+}
+
+/* Enqueue one resource to the defer queue to free after the grace
+ * period is over.
+ */
+int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
+{
+	uint64_t token;
+	uint64_t *tmp;
+	uint32_t i;
+	uint32_t cur_size, free_size;
+
+	if (dq == NULL || e == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	/* Start the grace period */
+	token = rte_rcu_qsbr_start(dq->v);
+
+	/* Reclaim resources if the queue is 1/8th full. This helps
+	 * the queue from growing too large and allows time for reader
+	 * threads to report their quiescent state.
+	 */
+	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
+	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Triggering reclamation\n", __func__);
+		rte_rcu_qsbr_dq_reclaim(dq);
+	}
+
+	/* Check if there is space for atleast for 1 resource */
+	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
+	if (!free_size) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Defer queue is full\n", __func__);
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	/* Enqueue the resource */
+	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
+
+	/* The resource to enqueue needs to be a multiple of 64b
+	 * due to the limitation of the rte_ring implementation.
+	 */
+	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
+		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
+
+	return 0;
+}
+
+/* Reclaim resources from the defer queue. */
+int
+rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
+{
+	uint32_t max_cnt;
+	uint32_t cnt;
+	void *token;
+	uint64_t *tmp;
+	uint32_t i;
+
+	if (dq == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	/* Anything to reclaim? */
+	if (rte_ring_count(dq->r) == 0)
+		return 0;
+
+	/* Reclaim at the max 1/16th the total number of entries. */
+	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
+	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
+	cnt = 0;
+
+	/* Check reader threads quiescent state and reclaim resources */
+	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
+		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
+			== 1)) {
+		(void)rte_ring_sc_dequeue(dq->r, &token);
+		/* The resource to dequeue needs to be a multiple of 64b
+		 * due to the limitation of the rte_ring implementation.
+		 */
+		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
+			i++, tmp++)
+			(void)rte_ring_sc_dequeue(dq->r,
+					(void *)(uintptr_t)tmp);
+		dq->f(dq->p, dq->e);
+
+		cnt++;
+	}
+
+	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+		"%s(): Reclaimed %u resources\n", __func__, cnt);
+
+	if (cnt == 0) {
+		/* No resources were reclaimed */
+		rte_errno = EAGAIN;
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Delete a defer queue. */
+int
+rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
+{
+	if (dq == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	/* Reclaim all the resources */
+	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
+		/* Error number is already set by the reclaim API */
+		return 1;
+
+	rte_ring_free(dq->r);
+	rte_free(dq);
+
+	return 0;
+}
+
 int rte_rcu_log_type;
 
 RTE_INIT(rte_rcu_register)
diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index c80f15c00..185d4b50a 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -34,6 +34,7 @@ extern "C" {
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_atomic.h>
+#include <rte_ring.h>
 
 extern int rte_rcu_log_type;
 
@@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
 	 */
 } __rte_cache_aligned;
 
+/**
+ * Call back function called to free the resources.
+ *
+ * @param p
+ *   Pointer provided while creating the defer queue
+ * @param e
+ *   Pointer to the resource data stored on the defer queue
+ *
+ * @return
+ *   None
+ */
+typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
+
+#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
+
+/**
+ *  Trigger automatic reclamation after 1/8th the defer queue is full.
+ */
+#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
+
+/**
+ *  Reclaim at the max 1/16th the total number of resources.
+ */
+#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
+
+/**
+ * Parameters used when creating the defer queue.
+ */
+struct rte_rcu_qsbr_dq_parameters {
+	const char *name;
+	/**< Name of the queue. */
+	uint32_t size;
+	/**< Number of entries in queue. Typically, this will be
+	 *   the same as the maximum number of entries supported in the
+	 *   lock free data structure.
+	 *   Data structures with unbounded number of entries is not
+	 *   supported currently.
+	 */
+	uint32_t esize;
+	/**< Size (in bytes) of each element in the defer queue.
+	 *   This has to be multiple of 8B as the rte_ring APIs
+	 *   support 8B element sizes only.
+	 */
+	rte_rcu_qsbr_free_resource f;
+	/**< Function to call to free the resource. */
+	void *p;
+	/**< Pointer passed to the free function. Typically, this is the
+	 *   pointer to the data structure to which the resource to free
+	 *   belongs. This can be NULL.
+	 */
+	struct rte_rcu_qsbr *v;
+	/**< RCU QSBR variable to use for this defer queue */
+};
+
+/* RTE defer queue structure.
+ * This structure holds the defer queue. The defer queue is used to
+ * hold the deleted entries from the data structure that are not
+ * yet freed.
+ */
+struct rte_rcu_qsbr_dq;
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
@@ -648,6 +710,113 @@ __rte_experimental
 int
 rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a queue used to store the data structure elements that can
+ * be freed later. This queue is referred to as 'defer queue'.
+ *
+ * @param params
+ *   Parameters to create a defer queue.
+ * @return
+ *   On success - Valid pointer to defer queue
+ *   On error - NULL
+ *   Possible rte_errno codes are:
+ *   - EINVAL - NULL parameters are passed
+ *   - ENOMEM - Not enough memory
+ */
+__rte_experimental
+struct rte_rcu_qsbr_dq *
+rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enqueue one resource to the defer queue and start the grace period.
+ * The resource will be freed later after at least one grace period
+ * is over.
+ *
+ * If the defer queue is full, it will attempt to reclaim resources.
+ * It will also reclaim resources at regular intervals to avoid
+ * the defer queue from growing too big.
+ *
+ * This API is not multi-thread safe. It is expected that the caller
+ * provides multi-thread safety by locking a mutex or some other means.
+ *
+ * A lock free multi-thread writer algorithm could achieve multi-thread
+ * safety by creating and using one defer queue per thread.
+ *
+ * @param dq
+ *   Defer queue to allocate an entry from.
+ * @param e
+ *   Pointer to resource data to copy to the defer queue. The size of
+ *   the data to copy is equal to the element size provided when the
+ *   defer queue was created.
+ * @return
+ *   On success - 0
+ *   On error - 1 with rte_errno set to
+ *   - EINVAL - NULL parameters are passed
+ *   - ENOSPC - Defer queue is full. This condition can not happen
+ *		if the defer queue size is equal (or larger) than the
+ *		number of elements in the data structure.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Reclaim resources from the defer queue.
+ *
+ * This API is not multi-thread safe. It is expected that the caller
+ * provides multi-thread safety by locking a mutex or some other means.
+ *
+ * A lock free multi-thread writer algorithm could achieve multi-thread
+ * safety by creating and using one defer queue per thread.
+ *
+ * @param dq
+ *   Defer queue to reclaim an entry from.
+ * @return
+ *   On successful reclamation of at least 1 resource - 0
+ *   On error - 1 with rte_errno set to
+ *   - EINVAL - NULL parameters are passed
+ *   - EAGAIN - None of the resources have completed at least 1 grace period,
+ *		try again.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Delete a defer queue.
+ *
+ * It tries to reclaim all the resources on the defer queue.
+ * If any of the resources have not completed the grace period
+ * the reclamation stops and returns immediately. The rest of
+ * the resources are not reclaimed and the defer queue is not
+ * freed.
+ *
+ * @param dq
+ *   Defer queue to delete.
+ * @return
+ *   On success - 0
+ *   On error - 1
+ *   Possible rte_errno codes are:
+ *   - EINVAL - NULL parameters are passed
+ *   - EAGAIN - Some of the resources have not completed at least 1 grace
+ *		period, try again.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
new file mode 100644
index 000000000..2122bc36a
--- /dev/null
+++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_RCU_QSBR_PVT_H_
+#define _RTE_RCU_QSBR_PVT_H_
+
+/**
+ * This file is private to the RCU library. It should not be included
+ * by the user of this library.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_rcu_qsbr.h"
+
+/* RTE defer queue structure.
+ * This structure holds the defer queue. The defer queue is used to
+ * hold the deleted entries from the data structure that are not
+ * yet freed.
+ */
+struct rte_rcu_qsbr_dq {
+	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
+	struct rte_ring *r;     /**< RCU QSBR defer queue. */
+	uint32_t size;
+	/**< Number of elements in the defer queue */
+	uint32_t esize;
+	/**< Size (in bytes) of data stored on the defer queue */
+	rte_rcu_qsbr_free_resource f;
+	/**< Function to call to free the resource. */
+	void *p;
+	/**< Pointer passed to the free function. Typically, this is the
+	 *   pointer to the data structure to which the resource to free
+	 *   belongs.
+	 */
+	char e[0];
+	/**< Temporary storage to copy the defer queue element. */
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RCU_QSBR_PVT_H_ */
diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
index f8b9ef2ab..dfac88a37 100644
--- a/lib/librte_rcu/rte_rcu_version.map
+++ b/lib/librte_rcu/rte_rcu_version.map
@@ -8,6 +8,10 @@ EXPERIMENTAL {
 	rte_rcu_qsbr_synchronize;
 	rte_rcu_qsbr_thread_register;
 	rte_rcu_qsbr_thread_unregister;
+	rte_rcu_qsbr_dq_create;
+	rte_rcu_qsbr_dq_enqueue;
+	rte_rcu_qsbr_dq_reclaim;
+	rte_rcu_qsbr_dq_delete;
 
 	local: *;
 };
diff --git a/lib/meson.build b/lib/meson.build
index e5ff83893..0e1be8407 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,7 +11,9 @@
 libraries = [
 	'kvargs', # eal depends on kvargs
 	'eal', # everything depends on eal
-	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
+	'ring',
+	'rcu', # rcu depends on ring
+	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
 	'cmdline',
 	'metrics', # bitrate/latency stats depends on this
 	'hash',    # efd depends on this
@@ -22,7 +24,7 @@ libraries = [
 	'gro', 'gso', 'ip_frag', 'jobstats',
 	'kni', 'latencystats', 'lpm', 'member',
 	'power', 'pdump', 'rawdev',
-	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
+	'reorder', 'sched', 'security', 'stack', 'vhost',
 	# ipsec lib depends on net, crypto and security
 	'ipsec',
 	# add pkt framework libs which use other libs from above
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 3/3] doc/rcu: add RCU integration design details
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
@ 2019-10-01  6:29     ` Honnappa Nagarahalli
  2020-03-29 20:57     ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Thomas Monjalon
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01  6:29 UTC (permalink / raw)
  To: honnappa.nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, dev, nd

From: Ruifeng Wang <ruifeng.wang@arm.com>

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 59 +++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 8fe5b1f73..423ab283e 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -186,3 +186,62 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Resource reclamation framework for DPDK
+---------------------------------------
+
+Lock-free algorithms place additional burden of resource reclamation on
+the application. When a writer deletes an entry from a data structure, the writer:
+
+#. Has to start the grace period
+#. Has to store a reference to the deleted resources in a FIFO
+#. Should check if the readers have completed a grace period and free the resources. This can also be done when the writer runs out of free resources.
+
+There are several APIs provided to help with this process. The writer
+can create a FIFO to store the references to deleted resources using ``rte_rcu_qsbr_dq_create()``.
+The resources can be enqueued to this FIFO using ``rte_rcu_qsbr_dq_enqueue()``.
+If the FIFO is full, ``rte_rcu_qsbr_dq_enqueue`` will reclaim the resources before enqueuing. It will also reclaim resources on regular basis to keep the FIFO from growing too large. If the writer runs out of resources, the writer can call ``rte_rcu_qsbr_dq_reclaim`` API to reclaim resources. ``rte_rcu_qsbr_dq_delete`` is provided to reclaim any remaining resources and free the FIFO while shutting down.
+
+However, if this resource reclamation process were to be integrated in lock-free data structure libraries, it
+hides this complexity from the application and makes it easier for the application to adopt lock-free algorithms. The following paragraphs discuss how the reclamation process can be integrated in DPDK libraries.
+
+In any DPDK application, the resource reclamation process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to lock-free data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
+
+The application has to handle 'Initialization' and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader threads to report their quiescent state.
+* the application has to register the same RCU variable with the client library.
+* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
+
+The client library will handle 'Reclaiming Resources' part of the process. The
+client libraries will make use of the writer thread context to execute the memory
+reclamation algorithm. So,
+
+* client library should provide an API to register a RCU variable that it will use. It should call ``rte_rcu_qsbr_dq_create()`` to create the FIFO to store the references to deleted entries.
+* client library should use ``rte_rcu_qsbr_dq_enqueue`` to enqueue the deleted resources on the FIFO and start the grace period.
+* if the library runs out of resources while adding entries, it should call ``rte_rcu_qsbr_dq_reclaim`` to reclaim the resources and try the resource allocation again.
+
+The 'Shutdown' process needs to be shared between the application and the
+client library.
+
+* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function.
+
+* client library should call ``rte_rcu_qsbr_dq_delete`` to reclaim any remaining resources and free the FIFO.
+
+Integrating the resource reclamation with client libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclamation happens as part of the writer thread with little impact on
+   performance.
+#. The client library has better control over the resources. For ex: the client
+   library can attempt to reclaim when it has run out of resources.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (6 preceding siblings ...)
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
@ 2019-10-01 18:28   ` Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
                       ` (2 more replies)
  2020-06-08  5:16   ` [dpdk-dev] [PATCH v4 0/3] RCU integration with LPM library Ruifeng Wang
                     ` (6 subsequent siblings)
  14 siblings, 3 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01 18:28 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Honnappa.Nagarahalli, Dharmik.Thakkar, Ruifeng.Wang, nd,
	Honnappa Nagarahalli

This patch set is dependent on https://patches.dpdk.org/cover/60270/

This patchset integrates RCU QSBR support with LPM library.

Please refer to RCU documentation in the above mentioned patch series.
This patch set follows the suggested design of integrating RCU
library with other libraries in DPDK.

RCU is used to safely free tbl8 groups that can be recycled.
tbl8 groups will not be reclaimed or reused until readers stopped
referencing it.

This is implemented as an optional feature to ensure the existing
applications are not affected. New API rte_lpm_rcu_qsbr_add is
introduced for application to register a RCU variable that
LPM library will use. This provides user the handle to enable
this feature.

v3:
1) Integration with new RCU defer queue APIs (much smaller and simpler
   code in LPM library itself)
2) Separated the 'test/lpm: reset total time' patch from this series
3) Added multi-writer performance test. The performance difference
   between with and without RCU varies and is not small for
   multi-writer. However, this is due to the tbl8 group allocation
   algorithm in LPM, which is a linear search algorithm (given that
   the test case uses large number of tbl8 groups). We should look
   to change this algorithm to O(1) in the future.
4) Incorporated applicable feedback from Vladimir

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  app/test: add test case for LPM RCU integration

 app/test/test_lpm.c                | 152 ++++++++-
 app/test/test_lpm_perf.c           | 487 ++++++++++++++++++++++++++++-
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 102 +++++-
 lib/librte_lpm/rte_lpm.h           |  21 ++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 7 files changed, 757 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
@ 2019-10-01 18:28     ` Honnappa Nagarahalli
  2019-10-04 16:05       ` Medvedkin, Vladimir
  2019-10-07  9:21       ` Ananyev, Konstantin
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 2/3] app/test: add test case for LPM RCU integration Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests Honnappa Nagarahalli
  2 siblings, 2 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01 18:28 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Honnappa.Nagarahalli, Dharmik.Thakkar, Ruifeng.Wang, nd,
	Ruifeng Wang

From: Ruifeng Wang <ruifeng.wang@arm.com>

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_lpm/Makefile            |   3 +-
 lib/librte_lpm/meson.build         |   2 +
 lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  21 ++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 5 files changed, 122 insertions(+), 12 deletions(-)

diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index a7946a1c5..ca9e16312 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_lpm.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index a5176d8ae..19a35107f 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -2,9 +2,11 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
+allow_experimental_apis = true
 sources = files('rte_lpm.c', 'rte_lpm6.c')
 headers = files('rte_lpm.h', 'rte_lpm6.h')
 # since header files have different names, we can install all vector headers
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 3a929a1b1..ca58d4b35 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <string.h>
@@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	if (lpm->dq)
+		rte_rcu_qsbr_dq_delete(lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
@@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
 MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
 		rte_lpm_free_v1604);
 
+struct __rte_lpm_rcu_dq_entry {
+	uint32_t tbl8_group_index;
+	uint32_t pad;
+};
+
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	struct __rte_lpm_rcu_dq_entry *e =
+			(struct __rte_lpm_rcu_dq_entry *)data;
+	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
+
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
+{
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+
+	if ((lpm == NULL) || (v == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->dq) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	/* Init QSBR defer queue. */
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm->name);
+	params.name = rcu_dq_name;
+	params.size = lpm->number_tbl8s;
+	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
+	params.f = __lpm_rcu_qsbr_free_resource;
+	params.p = lpm->tbl8;
+	params.v = v;
+	lpm->dq = rte_rcu_qsbr_dq_create(&params);
+	if (lpm->dq == NULL) {
+		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
+		return 1;
+	}
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
 }
 
 static int32_t
-tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+__tbl8_alloc_v1604(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -712,6 +769,21 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc_v1604(struct rte_lpm *lpm)
+{
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = __tbl8_alloc_v1604(lpm);
+	if ((group_idx < 0) && (lpm->dq != NULL)) {
+		/* If there are no tbl8 groups try to reclaim some. */
+		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
+			group_idx = __tbl8_alloc_v1604(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
 tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
 {
@@ -728,13 +800,21 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
 }
 
 static void
-tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	struct __rte_lpm_rcu_dq_entry e;
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (lpm->dq != NULL) {
+		e.tbl8_group_index = tbl8_group_start;
+		e.pad = 0;
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
+	} else {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	}
 }
 
 static __rte_noinline int32_t
@@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc_v1604(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -1834,7 +1914,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
+		tbl8_free_v1604(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 906ec4483..49c12a68d 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -21,6 +22,7 @@
 #include <rte_common.h>
 #include <rte_vect.h>
 #include <rte_compat.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -186,6 +188,7 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
 };
 
 /**
@@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
 void
 rte_lpm_free_v1604(struct rte_lpm *lpm);
 
+/**
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param v
+ *   RCU QSBR variable
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 90beac853..b353aabd2 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -44,3 +44,9 @@ DPDK_17.05 {
 	rte_lpm6_lookup_bulk_func;
 
 } DPDK_16.04;
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 2/3] app/test: add test case for LPM RCU integration
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
@ 2019-10-01 18:28     ` Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests Honnappa Nagarahalli
  2 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01 18:28 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Honnappa.Nagarahalli, Dharmik.Thakkar, Ruifeng.Wang, nd,
	Ruifeng Wang

From: Ruifeng Wang <ruifeng.wang@arm.com>

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_lpm.c | 152 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 151 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index e969fe051..6882cae6a 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,8 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +64,9 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20
 };
 
 #define NUM_LPM_TESTS (sizeof(tests)/sizeof(tests[0]))
@@ -1266,6 +1271,151 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check LPM attached RCU QSBR variable and FIFO queue
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv2);
+	TEST_LPM_ASSERT(status != 0);
+
+	TEST_LPM_ASSERT(lpm->dq != NULL);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add functional test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, qsv);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 18, 100, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 2/3] app/test: add test case for LPM RCU integration Honnappa Nagarahalli
@ 2019-10-01 18:28     ` Honnappa Nagarahalli
  2019-10-02 13:02       ` Aaron Conole
  2 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-01 18:28 UTC (permalink / raw)
  To: bruce.richardson, vladimir.medvedkin, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Honnappa.Nagarahalli, Dharmik.Thakkar, Ruifeng.Wang, nd,
	Honnappa Nagarahalli

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 487 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 484 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 77eea66ad..a9f02d983 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,28 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_rcu_qsbr.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static rte_atomic64_t gwrite_cycles;
+static rte_atomic64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 8192 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +41,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +53,13 @@ struct route_rule {
 };
 
 struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +213,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +258,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +305,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +349,454 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(__attribute__((unused)) void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(__attribute__((unused)) void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	rte_atomic64_add(&gwrite_cycles, total_cycles);
+	rte_atomic64_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	rte_atomic64_init(&gwrite_cycles);
+	rte_atomic64_init(&gwrites);
+	rte_atomic64_clear(&gwrite_cycles);
+	rte_atomic64_clear(&gwrites);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %lu cycles\n",
+		rte_atomic64_read(&gwrite_cycles) / rte_atomic64_read(&gwrites)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	rte_atomic64_init(&gwrite_cycles);
+	rte_atomic64_init(&gwrites);
+	rte_atomic64_clear(&gwrite_cycles);
+	rte_atomic64_clear(&gwrites);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %lu cycles\n",
+		rte_atomic64_read(&gwrite_cycles) / rte_atomic64_read(&gwrites)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +820,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +955,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests Honnappa Nagarahalli
@ 2019-10-02 13:02       ` Aaron Conole
  2019-10-03  9:09         ` Bruce Richardson
  0 siblings, 1 reply; 137+ messages in thread
From: Aaron Conole @ 2019-10-02 13:02 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: bruce.richardson, vladimir.medvedkin, olivier.matz, dev,
	konstantin.ananyev, stephen, paulmck, Gavin.Hu, Dharmik.Thakkar,
	Ruifeng.Wang, nd

Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> writes:

> Add performance tests for RCU integration. The performance
> difference with and without RCU integration is very small
> (~1% to ~2%) on both Arm and x86 platforms.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---

I see the following:

  lib/meson.build:89:5: ERROR: Problem encountered: Missing dependency rcu
  for library rte_lpm

Maybe there's something wrong with the environment?  This isn't the
first time I've seen a dependency detection problem with meson.

>  app/test/test_lpm_perf.c | 487 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 484 insertions(+), 3 deletions(-)
>
> diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
> index 77eea66ad..a9f02d983 100644
> --- a/app/test/test_lpm_perf.c
> +++ b/app/test/test_lpm_perf.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
>  
>  #include <stdio.h>
> @@ -10,12 +11,28 @@
>  #include <rte_cycles.h>
>  #include <rte_random.h>
>  #include <rte_branch_prediction.h>
> +#include <rte_malloc.h>
>  #include <rte_ip.h>
>  #include <rte_lpm.h>
> +#include <rte_rcu_qsbr.h>
>  
>  #include "test.h"
>  #include "test_xmmt_ops.h"
>  
> +struct rte_lpm *lpm;
> +static struct rte_rcu_qsbr *rv;
> +static volatile uint8_t writer_done;
> +static volatile uint32_t thr_id;
> +static rte_atomic64_t gwrite_cycles;
> +static rte_atomic64_t gwrites;
> +/* LPM APIs are not thread safe, use mutex to provide thread safety */
> +static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
> +
> +/* Report quiescent state interval every 8192 lookups. Larger critical
> + * sections in reader will result in writer polling multiple times.
> + */
> +#define QSBR_REPORTING_INTERVAL 1024
> +
>  #define TEST_LPM_ASSERT(cond) do {                                            \
>  	if (!(cond)) {                                                        \
>  		printf("Error at line %d: \n", __LINE__);                     \
> @@ -24,6 +41,7 @@
>  } while(0)
>  
>  #define ITERATIONS (1 << 10)
> +#define RCU_ITERATIONS 10
>  #define BATCH_SIZE (1 << 12)
>  #define BULK_SIZE 32
>  
> @@ -35,9 +53,13 @@ struct route_rule {
>  };
>  
>  struct route_rule large_route_table[MAX_RULE_NUM];
> +/* Route table for routes with depth > 24 */
> +struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
>  
>  static uint32_t num_route_entries;
> +static uint32_t num_ldepth_route_entries;
>  #define NUM_ROUTE_ENTRIES num_route_entries
> +#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
>  
>  enum {
>  	IP_CLASS_A,
> @@ -191,7 +213,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
>  	uint32_t ip_head_mask;
>  	uint32_t rule_num;
>  	uint32_t k;
> -	struct route_rule *ptr_rule;
> +	struct route_rule *ptr_rule, *ptr_ldepth_rule;
>  
>  	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
>  		fixed_bit_num = IP_HEAD_BIT_NUM_A;
> @@ -236,10 +258,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
>  	 */
>  	start = lrand48() & mask;
>  	ptr_rule = &large_route_table[num_route_entries];
> +	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
>  	for (k = 0; k < rule_num; k++) {
>  		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
>  			| ip_head_mask;
>  		ptr_rule->depth = depth;
> +		/* If the depth of the route is more than 24, store it
> +		 * in another table as well.
> +		 */
> +		if (depth > 24) {
> +			ptr_ldepth_rule->ip = ptr_rule->ip;
> +			ptr_ldepth_rule->depth = ptr_rule->depth;
> +			ptr_ldepth_rule++;
> +			num_ldepth_route_entries++;
> +		}
>  		ptr_rule++;
>  		start = (start + step) & mask;
>  	}
> @@ -273,6 +305,7 @@ static void generate_large_route_rule_table(void)
>  	uint8_t  depth;
>  
>  	num_route_entries = 0;
> +	num_ldepth_route_entries = 0;
>  	memset(large_route_table, 0, sizeof(large_route_table));
>  
>  	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
> @@ -316,10 +349,454 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
>  	printf("\n");
>  }
>  
> +/* Check condition and return an error if true. */
> +static uint16_t enabled_core_ids[RTE_MAX_LCORE];
> +static unsigned int num_cores;
> +
> +/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
> +static inline uint32_t
> +alloc_thread_id(void)
> +{
> +	uint32_t tmp_thr_id;
> +
> +	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
> +	if (tmp_thr_id >= RTE_MAX_LCORE)
> +		printf("Invalid thread id %u\n", tmp_thr_id);
> +
> +	return tmp_thr_id;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure without RCU.
> + */
> +static int
> +test_lpm_reader(__attribute__((unused)) void *arg)
> +{
> +	int i;
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +	} while (!writer_done);
> +
> +	return 0;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_reader(__attribute__((unused)) void *arg)
> +{
> +	int i;
> +	uint32_t thread_id = alloc_thread_id();
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	/* Register this thread to report quiescent state */
> +	rte_rcu_qsbr_thread_register(rv, thread_id);
> +	rte_rcu_qsbr_thread_online(rv, thread_id);
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +		/* Update quiescent state */
> +		rte_rcu_qsbr_quiescent(rv, thread_id);
> +	} while (!writer_done);
> +
> +	rte_rcu_qsbr_thread_offline(rv, thread_id);
> +	rte_rcu_qsbr_thread_unregister(rv, thread_id);
> +
> +	return 0;
> +}
> +
> +/*
> + * Writer thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_writer(__attribute__((unused)) void *arg)
> +{
> +	unsigned int i, j, si, ei;
> +	uint64_t begin, total_cycles;
> +	uint8_t core_id = (uint8_t)((uintptr_t)arg);
> +	uint32_t next_hop_add = 0xAA;
> +
> +	/* 2 writer threads are used */
> +	if (core_id % 2 == 0) {
> +		si = 0;
> +		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
> +	} else {
> +		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
> +		ei = NUM_LDEPTH_ROUTE_ENTRIES;
> +	}
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = si; j < ei; j++) {
> +			pthread_mutex_lock(&lpm_mutex);
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +			}
> +			pthread_mutex_unlock(&lpm_mutex);
> +		}
> +
> +		/* Delete all the entries */
> +		for (j = si; j < ei; j++) {
> +			pthread_mutex_lock(&lpm_mutex);
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +			}
> +			pthread_mutex_unlock(&lpm_mutex);
> +		}
> +	}
> +
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	rte_atomic64_add(&gwrite_cycles, total_cycles);
> +	rte_atomic64_add(&gwrites,
> +			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS);
> +
> +	return 0;
> +}
> +
> +/*
> + * Functional test:
> + * 2 writers, rest are readers
> + */
> +static int
> +test_lpm_rcu_perf_multi_writer(void)
> +{
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	unsigned int i;
> +	uint16_t core_id;
> +
> +	if (rte_lcore_count() < 3) {
> +		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_cores = 0;
> +	RTE_LCORE_FOREACH_SLAVE(core_id) {
> +		enabled_core_ids[num_cores] = core_id;
> +		num_cores++;
> +	}
> +
> +	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
> +		num_cores - 2);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Init RCU variable */
> +	sz = rte_rcu_qsbr_get_memsize(num_cores);
> +	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
> +						RTE_CACHE_LINE_SIZE);
> +	rte_rcu_qsbr_init(rv, num_cores);
> +
> +	/* Assign the RCU variable to LPM */
> +	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
> +		printf("RCU variable assignment failed\n");
> +		goto error;
> +	}
> +
> +	writer_done = 0;
> +	rte_atomic64_init(&gwrite_cycles);
> +	rte_atomic64_init(&gwrites);
> +	rte_atomic64_clear(&gwrite_cycles);
> +	rte_atomic64_clear(&gwrites);
> +
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Launch writer threads */
> +	for (i = 0; i < 2; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
> +					(void *)(uintptr_t)i,
> +					enabled_core_ids[i]);
> +
> +	/* Wait for writer threads */
> +	for (i = 0; i < 2; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	printf("Total LPM Adds: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %lu cycles\n",
> +		rte_atomic64_read(&gwrite_cycles) / rte_atomic64_read(&gwrites)
> +		);
> +
> +	/* Wait and check return value from reader threads */
> +	writer_done = 1;
> +	for (i = 2; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +	lpm = NULL;
> +	rv = NULL;
> +
> +	/* Test without RCU integration */
> +	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
> +		num_cores - 2);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	writer_done = 0;
> +	rte_atomic64_init(&gwrite_cycles);
> +	rte_atomic64_init(&gwrites);
> +	rte_atomic64_clear(&gwrite_cycles);
> +	rte_atomic64_clear(&gwrites);
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Launch writer threads */
> +	for (i = 0; i < 2; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
> +					(void *)(uintptr_t)i,
> +					enabled_core_ids[i]);
> +
> +	/* Wait for writer threads */
> +	for (i = 0; i < 2; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	printf("Total LPM Adds: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %lu cycles\n",
> +		rte_atomic64_read(&gwrite_cycles) / rte_atomic64_read(&gwrites)
> +		);
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +
> +	return 0;
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until all readers have exited */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +
> +	return -1;
> +}
> +
> +/*
> + * Functional test:
> + * Single writer, rest are readers
> + */
> +static int
> +test_lpm_rcu_perf(void)
> +{
> +	struct rte_lpm_config config;
> +	uint64_t begin, total_cycles;
> +	size_t sz;
> +	unsigned int i, j;
> +	uint16_t core_id;
> +	uint32_t next_hop_add = 0xAA;
> +
> +	if (rte_lcore_count() < 2) {
> +		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_cores = 0;
> +	RTE_LCORE_FOREACH_SLAVE(core_id) {
> +		enabled_core_ids[num_cores] = core_id;
> +		num_cores++;
> +	}
> +
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Init RCU variable */
> +	sz = rte_rcu_qsbr_get_memsize(num_cores);
> +	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
> +						RTE_CACHE_LINE_SIZE);
> +	rte_rcu_qsbr_init(rv, num_cores);
> +
> +	/* Assign the RCU variable to LPM */
> +	if (rte_lpm_rcu_qsbr_add(lpm, rv) != 0) {
> +		printf("RCU variable assignment failed\n");
> +		goto error;
> +	}
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +	lpm = NULL;
> +	rv = NULL;
> +
> +	/* Test without RCU integration */
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			printf("Warning: lcore %u not finished.\n",
> +				enabled_core_ids[i]);
> +
> +	rte_lpm_free(lpm);
> +
> +	return 0;
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until all readers have exited */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +
> +	return -1;
> +}
> +
>  static int
>  test_lpm_perf(void)
>  {
> -	struct rte_lpm *lpm = NULL;
>  	struct rte_lpm_config config;
>  
>  	config.max_rules = 2000000;
> @@ -343,7 +820,7 @@ test_lpm_perf(void)
>  	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
>  	TEST_LPM_ASSERT(lpm != NULL);
>  
> -	/* Measue add. */
> +	/* Measure add. */
>  	begin = rte_rdtsc();
>  
>  	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
> @@ -478,6 +955,10 @@ test_lpm_perf(void)
>  	rte_lpm_delete_all(lpm);
>  	rte_lpm_free(lpm);
>  
> +	test_lpm_rcu_perf();
> +
> +	test_lpm_rcu_perf_multi_writer();
> +
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
@ 2019-10-02 17:39       ` Ananyev, Konstantin
  2019-10-03  6:29         ` Honnappa Nagarahalli
  2019-10-02 18:50       ` Ananyev, Konstantin
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-02 17:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir, ruifeng.wang,
	dharmik.thakkar, dev, nd

Hi Honnappa,

 
> Add resource reclamation APIs to make it simple for applications
> and libraries to integrate rte_rcu library.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
>  lib/librte_rcu/meson.build         |   2 +
>  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
>  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
>  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
>  lib/librte_rcu/rte_rcu_version.map |   4 +
>  lib/meson.build                    |   6 +-
>  7 files changed, 700 insertions(+), 3 deletions(-)
>  create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> 
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
> index ce7f93dd3..76814f50b 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> @@ -21,6 +21,7 @@
>  #include <rte_errno.h>
> 
>  #include "rte_rcu_qsbr.h"
> +#include "rte_rcu_qsbr_pvt.h"
> 
>  /* Get the memory size of QSBR variable */
>  size_t
> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
>  	return 0;
>  }
> 
> +/* Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + */
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
> +{
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint32_t qs_fifo_size;
> +
> +	if (params == NULL || params->f == NULL ||
> +		params->v == NULL || params->name == NULL ||
> +		params->size == 0 || params->esize == 0 ||
> +		(params->esize % 8 != 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +
> +	dq = rte_zmalloc(NULL,
> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> +		RTE_CACHE_LINE_SIZE);
> +	if (dq == NULL) {
> +		rte_errno = ENOMEM;
> +
> +		return NULL;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * max_size.
> +	 */
> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> +					* params->size) + 1);
> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);

If it is going to be not MT safe, then why not to create the ring with
(RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
Though I think it could be changed to allow MT safe multiple
enqeue/single dequeue, see below.

> +	if (dq->r == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): defer queue create failed\n", __func__);
> +		rte_free(dq);
> +		return NULL;
> +	}
> +
> +	dq->v = params->v;
> +	dq->size = params->size;
> +	dq->esize = params->esize;
> +	dq->f = params->f;
> +	dq->p = params->p;
> +
> +	return dq;
> +}
> +
> +/* Enqueue one resource to the defer queue to free after the grace
> + * period is over.
> + */
> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> +{
> +	uint64_t token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +	uint32_t cur_size, free_size;
> +
> +	if (dq == NULL || e == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;

Why just not to return -EINVAL straightway?
I think there is no much point to set rte_errno in that function at all,
just return value should do.

> +	}
> +
> +	/* Start the grace period */
> +	token = rte_rcu_qsbr_start(dq->v);
> +
> +	/* Reclaim resources if the queue is 1/8th full. This helps
> +	 * the queue from growing too large and allows time for reader
> +	 * threads to report their quiescent state.
> +	 */
> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);

Probably would be a bit easier if you just store in dq->esize (elt size + token size) / 8.

> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {

Why to make this threshold value hard-coded?
Why either not to put it into create parameter, or just return a special return value,
to indicate that threshold is reached?
Or even return number of filled/free entroes on success, so caller can decide
to reclaim or not based on that information on his own?

> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Triggering reclamation\n", __func__);
> +		rte_rcu_qsbr_dq_reclaim(dq);
> +	}
> +
> +	/* Check if there is space for atleast for 1 resource */
> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> +	if (!free_size) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Defer queue is full\n", __func__);
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	/* Enqueue the resource */
> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> +
> +	/* The resource to enqueue needs to be a multiple of 64b
> +	 * due to the limitation of the rte_ring implementation.
> +	 */
> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);


That whole construction above looks a bit clumsy and error prone...
I suppose just:

const uint32_t nb_elt =  dq->elt_size/8 + 1;
uint32_t free, n;
...
n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free);
if (n == 0)
  return -ENOSPC;
return free;

That way I think you can have MT-safe version of that function.

> +
> +	return 0;
> +}
> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
> +{
> +	uint32_t max_cnt;
> +	uint32_t cnt;
> +	void *token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;

Same story as above - I think rte_errno is excessive in this function.
Just return value should be enough.


> +	}
> +
> +	/* Anything to reclaim? */
> +	if (rte_ring_count(dq->r) == 0)
> +		return 0;

Not sure you need that, see below.

> +
> +	/* Reclaim at the max 1/16th the total number of entries. */
> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;

Again why not to make max_cnt a configurable at create() parameter?
Or even a parameter for that function?

> +	cnt = 0;
> +
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> +			== 1)) {


> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> +		/* The resource to dequeue needs to be a multiple of 64b
> +		 * due to the limitation of the rte_ring implementation.
> +		 */
> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> +			i++, tmp++)
> +			(void)rte_ring_sc_dequeue(dq->r,
> +					(void *)(uintptr_t)tmp);

Again, no need for such constructs with multiple dequeuer I believe.
Just:

const uint32_t nb_elt =  dq->elt_size/8 + 1;
uint32_t n;
uintptr_t elt[nb_elt];
...
n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL);
if (n != 0) {dq->f(dq->p, elt);}

Seems enough.
Again in that case you can have enqueue/reclaim running in
different threads simultaneously, plus you don't need dq->e at all. 

> +		dq->f(dq->p, dq->e);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (cnt == 0) {
> +		/* No resources were reclaimed */
> +		rte_errno = EAGAIN;
> +		return 1;
> +	}
> +
> +	return 0;

I'd suggest to return cnt on success.

> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> +		/* Error number is already set by the reclaim API */
> +		return 1;

How do you know that you have reclaimed everything?

> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>  int rte_rcu_log_type;
> 
>  RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index c80f15c00..185d4b50a 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> +#include <rte_ring.h>
> 
>  extern int rte_rcu_log_type;
> 
> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>  	 */
>  } __rte_cache_aligned;
> 
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);

Stylish thing - usually in DPDK we have typedf newtype_t ...
Though I am not sure you need a new typedef at all - just 
a function pointer inside the struct seems enough.

> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> +
> +/**
> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> + */
> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> +
> +/**
> + *  Reclaim at the max 1/16th the total number of resources.
> + */
> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4


As I said above, I don't think these thresholds need to be hardcoded.
In any case, there seems not much point to put them in the public header file.

> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 8B as the rte_ring APIs
> +	 *   support 8B element sizes only.
> +	 */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;

Style nit again - I like short names myself, but that seems a bit extreme... :)
Might be at least:
void (*reclaim)(void *, void *);
void * reclaim_data;
?

> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;

Does it need to be inside that struct?
Might be better:
rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct rte_rcu_qsbr_dq_parameters *params);

Another alternative: make both reclaim() and enqueue() to take v as a parameter.

> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this API may change without prior notice
> @@ -648,6 +710,113 @@ __rte_experimental
>  int
>  rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Reclaim resources from the defer queue.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to reclaim an entry from.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - None of the resources have completed at least 1 grace period,
> + *		try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..2122bc36a
> --- /dev/null
> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h

Again style suggestion: as it is not public header - don't use rte_ prefix for naming.
From my perspective - easier to relalize for reader what is public header, what is not.

> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data stored on the defer queue */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +	char e[0];
> +	/**< Temporary storage to copy the defer queue element. */

Do you really need 'e' at all?
Can't it be just temporary stack variable?

> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>  	rte_rcu_qsbr_synchronize;
>  	rte_rcu_qsbr_thread_register;
>  	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
> 
>  	local: *;
>  };
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..0e1be8407 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>  libraries = [
>  	'kvargs', # eal depends on kvargs
>  	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>  	'cmdline',
>  	'metrics', # bitrate/latency stats depends on this
>  	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>  	'gro', 'gso', 'ip_frag', 'jobstats',
>  	'kni', 'latencystats', 'lpm', 'member',
>  	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>  	# ipsec lib depends on net, crypto and security
>  	'ipsec',
>  	# add pkt framework libs which use other libs from above
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
@ 2019-10-02 18:42       ` Ananyev, Konstantin
  2019-10-03 19:49         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-02 18:42 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir, ruifeng.wang,
	dharmik.thakkar, dev, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:honnappa.nagarahalli@arm.com]
> Sent: Tuesday, October 1, 2019 7:29 AM
> To: honnappa.nagarahalli@arm.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org;
> paulmck@linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; ruifeng.wang@arm.com;
> dharmik.thakkar@arm.com; dev@dpdk.org; nd@arm.com
> Subject: [PATCH v3 1/3] lib/ring: add peek API
> 
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> The peek API allows fetching the next available object in the ring
> without dequeuing it. This helps in scenarios where dequeuing of
> objects depend on their value.
> 
> Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> ---
>  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 2a9f768a1..d3d0d5e18 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
>  				r->cons.single, available);
>  }
> 
> +/**
> + * Peek one object from a ring.
> + *
> + * The peek API allows fetching the next available object in the ring
> + * without dequeuing it. This API is not multi-thread safe with respect
> + * to other consumer threads.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_p
> + *   A pointer to a void * pointer (object) that will be filled.
> + * @return
> + *   - 0: Success, object available
> + *   - -ENOENT: Not enough entries in the ring.
> + */
> +__rte_experimental
> +static __rte_always_inline int
> +rte_ring_peek(struct rte_ring *r, void **obj_p)

As it is not MT safe, then I think we need _sc_ in the name,
to follow other rte_ring functions naming conventions
(rte_ring_sc_peek() or so).

As a better alternative what do you think about introducing 
a serialized versions of DPDK rte_ring dequeue functions?
Something like that:

/* same as original ring dequeue, but:
  * 1) move cons.head only if cons.head == const.tail
  * 2) don't update cons.tail
  */
unsigned int
rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
                unsigned int *available);

/* sets both cons.head and cons.tail to cons.head + num */
void rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);

/* resets cons.head to const.tail value */
void rte_ring_serial_dequeue_abort(struct rte_ring *r);

Then your dq_reclaim cycle function will look like that:

const uint32_t nb_elt =  dq->elt_size/8 + 1;
uint32_t avl, n;
uintptr_t elt[nb_elt];
...

do {

  /* read next elem from the queue */
  n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
  if (n == 0)
      break; 

 /* wrong period, keep elem in the queue */  
 if (rte_rcu_qsbr_check(dr->v, elt[0]) != 1) {
     rte_ring_serial_dequeue_abort(dq->r);
     break;
  }

  /* can reclaim, remove elem from the queue */
  rte_ring_serial_dequeue_finish(dr->q, nb_elt);

   /*call reclaim function */
  dr->f(dr->p, elt);

} while (avl >= nb_elt);

That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
As long as actual reclamation callback itself is MT safe of course.

> +{
> +	uint32_t prod_tail = r->prod.tail;
> +	uint32_t cons_head = r->cons.head;
> +	uint32_t count = (prod_tail - cons_head) & r->mask;
> +	unsigned int n = 1;
> +	if (count) {
> +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> +		return 0;
> +	}
> +	return -ENOENT;
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
  2019-10-02 17:39       ` Ananyev, Konstantin
@ 2019-10-02 18:50       ` Ananyev, Konstantin
  2019-10-03  6:42         ` Honnappa Nagarahalli
  2019-10-04 19:01       ` Medvedkin, Vladimir
  2019-10-07 13:11       ` Medvedkin, Vladimir
  3 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-02 18:50 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir, ruifeng.wang,
	dharmik.thakkar, dev, nd


> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
> +{
> +	uint32_t max_cnt;
> +	uint32_t cnt;
> +	void *token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Anything to reclaim? */
> +	if (rte_ring_count(dq->r) == 0)
> +		return 0;
> +
> +	/* Reclaim at the max 1/16th the total number of entries. */
> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> +	cnt = 0;
> +
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)

One more thing I forgot to ask - how this construct supposed to work on 32 bit machines?
peek() will return 32-bit value, while  qsbr_check() operates with 64bit tokens...
As I understand in that case you need to peek() 2 elems.
Might work, but still think better to introduce serialize version of ring_dequeue()
See my other mail about re_ring_peek().


> +			== 1)) {
> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> +		/* The resource to dequeue needs to be a multiple of 64b
> +		 * due to the limitation of the rte_ring implementation.
> +		 */
> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> +			i++, tmp++)
> +			(void)rte_ring_sc_dequeue(dq->r,
> +					(void *)(uintptr_t)tmp);
> +		dq->f(dq->p, dq->e);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (cnt == 0) {
> +		/* No resources were reclaimed */
> +		rte_errno = EAGAIN;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> +		/* Error number is already set by the reclaim API */
> +		return 1;
> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>  int rte_rcu_log_type;
> 
>  RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index c80f15c00..185d4b50a 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> +#include <rte_ring.h>
> 
>  extern int rte_rcu_log_type;
> 
> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>  	 */
>  } __rte_cache_aligned;
> 
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> +
> +/**
> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> + */
> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> +
> +/**
> + *  Reclaim at the max 1/16th the total number of resources.
> + */
> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 8B as the rte_ring APIs
> +	 *   support 8B element sizes only.
> +	 */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;
> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this API may change without prior notice
> @@ -648,6 +710,113 @@ __rte_experimental
>  int
>  rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Reclaim resources from the defer queue.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to reclaim an entry from.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - None of the resources have completed at least 1 grace period,
> + *		try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..2122bc36a
> --- /dev/null
> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data stored on the defer queue */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +	char e[0];
> +	/**< Temporary storage to copy the defer queue element. */
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>  	rte_rcu_qsbr_synchronize;
>  	rte_rcu_qsbr_thread_register;
>  	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
> 
>  	local: *;
>  };
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..0e1be8407 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>  libraries = [
>  	'kvargs', # eal depends on kvargs
>  	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>  	'cmdline',
>  	'metrics', # bitrate/latency stats depends on this
>  	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>  	'gro', 'gso', 'ip_frag', 'jobstats',
>  	'kni', 'latencystats', 'lpm', 'member',
>  	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>  	# ipsec lib depends on net, crypto and security
>  	'ipsec',
>  	# add pkt framework libs which use other libs from above
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-02 17:39       ` Ananyev, Konstantin
@ 2019-10-03  6:29         ` Honnappa Nagarahalli
  2019-10-03 12:26           ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-03  6:29 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, Honnappa Nagarahalli, dev, nd, nd

> 
> Hi Honnappa,
Thanks Konstantin for the feedback.

> 
> 
> > Add resource reclamation APIs to make it simple for applications and
> > libraries to integrate rte_rcu library.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> >  lib/librte_rcu/meson.build         |   2 +
> >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> >  lib/librte_rcu/rte_rcu_version.map |   4 +
> >  lib/meson.build                    |   6 +-
> >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > @@ -21,6 +21,7 @@
> >  #include <rte_errno.h>
> >
> >  #include "rte_rcu_qsbr.h"
> > +#include "rte_rcu_qsbr_pvt.h"
> >
> >  /* Get the memory size of QSBR variable */  size_t @@ -267,6 +268,190
> > @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> >  	return 0;
> >  }
> >
> > +/* Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + */
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params) {
> > +	struct rte_rcu_qsbr_dq *dq;
> > +	uint32_t qs_fifo_size;
> > +
> > +	if (params == NULL || params->f == NULL ||
> > +		params->v == NULL || params->name == NULL ||
> > +		params->size == 0 || params->esize == 0 ||
> > +		(params->esize % 8 != 0)) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	dq = rte_zmalloc(NULL,
> > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > +		RTE_CACHE_LINE_SIZE);
> > +	if (dq == NULL) {
> > +		rte_errno = ENOMEM;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	/* round up qs_fifo_size to next power of two that is not less than
> > +	 * max_size.
> > +	 */
> > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > +					* params->size) + 1);
> > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > +					SOCKET_ID_ANY, 0);
> 
> If it is going to be not MT safe, then why not to create the ring with
> (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
Agree.

> Though I think it could be changed to allow MT safe multiple enqeue/single
> dequeue, see below.
The MT safe issue is due to reclaim code. The reclaim code has the following sequence:

rte_ring_peek
rte_rcu_qsbr_check
rte_ring_dequeue

This entire sequence needs to be atomic as the entry cannot be dequeued without knowing that the grace period for that entry is over. Note that due to optimizations in rte_rcu_qsbr_check API, this sequence should not be large in most cases. I do not have ideas on how to make this sequence lock-free.

If the writer is on the control plane, most use cases will use mutex locks for synchronization if they are multi-threaded. That lock should be enough to provide the thread safety for these APIs.

If the writer is multi-threaded and lock-free, then one should use per thread defer queue.

> 
> > +	if (dq->r == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): defer queue create failed\n", __func__);
> > +		rte_free(dq);
> > +		return NULL;
> > +	}
> > +
> > +	dq->v = params->v;
> > +	dq->size = params->size;
> > +	dq->esize = params->esize;
> > +	dq->f = params->f;
> > +	dq->p = params->p;
> > +
> > +	return dq;
> > +}
> > +
> > +/* Enqueue one resource to the defer queue to free after the grace
> > + * period is over.
> > + */
> > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > +	uint64_t token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +	uint32_t cur_size, free_size;
> > +
> > +	if (dq == NULL || e == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> 
> Why just not to return -EINVAL straightway?
> I think there is no much point to set rte_errno in that function at all, just
> return value should do.
I am trying to keep these consistent with the existing APIs. They return 0 or 1 and set the rte_errno.

> 
> > +	}
> > +
> > +	/* Start the grace period */
> > +	token = rte_rcu_qsbr_start(dq->v);
> > +
> > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > +	 * the queue from growing too large and allows time for reader
> > +	 * threads to report their quiescent state.
> > +	 */
> > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> 
> Probably would be a bit easier if you just store in dq->esize (elt size + token
> size) / 8.
Agree

> 
> > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> 
> Why to make this threshold value hard-coded?
> Why either not to put it into create parameter, or just return a special return
> value, to indicate that threshold is reached?
My thinking was to keep the programming interface easy to use. The more the parameters, the more painful it is for the user. IMO, the constants chosen should be good enough for most cases. More advanced users could modify the constants. However, we could make these as part of the parameters, but make them optional for the user. For ex: if they set them to 0, default values can be used.

> Or even return number of filled/free entroes on success, so caller can decide
> to reclaim or not based on that information on his own?
This means more code on the user side. I think adding these to parameters seems like a better option.

> 
> > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +			"%s(): Triggering reclamation\n", __func__);
> > +		rte_rcu_qsbr_dq_reclaim(dq);
> > +	}
> > +
> > +	/* Check if there is space for atleast for 1 resource */
> > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > +	if (!free_size) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Defer queue is full\n", __func__);
> > +		rte_errno = ENOSPC;
> > +		return 1;
> > +	}
> > +
> > +	/* Enqueue the resource */
> > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > +
> > +	/* The resource to enqueue needs to be a multiple of 64b
> > +	 * due to the limitation of the rte_ring implementation.
> > +	 */
> > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> 
> 
> That whole construction above looks a bit clumsy and error prone...
> I suppose just:
> 
> const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
Yes, bulk enqueue can be used. But note that once the flexible element size ring patch is done, this code will use that.

>   return -ENOSPC;
> return free;
> 
> That way I think you can have MT-safe version of that function.
Please see the description of MT safe issue above.

> 
> > +
> > +	return 0;
> > +}
> > +
> > +/* Reclaim resources from the defer queue. */ int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > +	uint32_t max_cnt;
> > +	uint32_t cnt;
> > +	void *token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> 
> Same story as above - I think rte_errno is excessive in this function.
> Just return value should be enough.
> 
> 
> > +	}
> > +
> > +	/* Anything to reclaim? */
> > +	if (rte_ring_count(dq->r) == 0)
> > +		return 0;
> 
> Not sure you need that, see below.
> 
> > +
> > +	/* Reclaim at the max 1/16th the total number of entries. */
> > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> 
> Again why not to make max_cnt a configurable at create() parameter?
I think making this as an optional parameter for creating defer queue is a better option.

> Or even a parameter for that function?
> 
> > +	cnt = 0;
> > +
> > +	/* Check reader threads quiescent state and reclaim resources */
> > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > +			== 1)) {
> 
> 
> > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > +		/* The resource to dequeue needs to be a multiple of 64b
> > +		 * due to the limitation of the rte_ring implementation.
> > +		 */
> > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > +			i++, tmp++)
> > +			(void)rte_ring_sc_dequeue(dq->r,
> > +					(void *)(uintptr_t)tmp);
> 
> Again, no need for such constructs with multiple dequeuer I believe.
> Just:
> 
> const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> elt[nb_elt]; ...
> n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0) {dq->f(dq->p,
> elt);}
Agree on bulk API use.

> 
> Seems enough.
> Again in that case you can have enqueue/reclaim running in different threads
> simultaneously, plus you don't need dq->e at all.
Will check on dq->e

> 
> > +		dq->f(dq->p, dq->e);
> > +
> > +		cnt++;
> > +	}
> > +
> > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > +
> > +	if (cnt == 0) {
> > +		/* No resources were reclaimed */
> > +		rte_errno = EAGAIN;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> 
> I'd suggest to return cnt on success.
I am trying to keep the APIs simple. I do not see much use for 'cnt' as return value to the user. It exposes more details which I think are internal to the library.

> 
> > +}
> > +
> > +/* Delete a defer queue. */
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Reclaim all the resources */
> > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > +		/* Error number is already set by the reclaim API */
> > +		return 1;
> 
> How do you know that you have reclaimed everything?
Good point, will come back with a different solution.

> 
> > +
> > +	rte_ring_free(dq->r);
> > +	rte_free(dq);
> > +
> > +	return 0;
> > +}
> > +
> >  int rte_rcu_log_type;
> >
> >  RTE_INIT(rte_rcu_register)
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > @@ -34,6 +34,7 @@ extern "C" {
> >  #include <rte_lcore.h>
> >  #include <rte_debug.h>
> >  #include <rte_atomic.h>
> > +#include <rte_ring.h>
> >
> >  extern int rte_rcu_log_type;
> >
> > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> >  	 */
> >  } __rte_cache_aligned;
> >
> > +/**
> > + * Call back function called to free the resources.
> > + *
> > + * @param p
> > + *   Pointer provided while creating the defer queue
> > + * @param e
> > + *   Pointer to the resource data stored on the defer queue
> > + *
> > + * @return
> > + *   None
> > + */
> > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> 
> Stylish thing - usually in DPDK we have typedf newtype_t ...
> Though I am not sure you need a new typedef at all - just a function pointer
> inside the struct seems enough.
Other libraries (for ex: rte_hash) use this approach. I think it is better to keep it out of the structure to allow for better commenting.

> 
> > +
> > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > +
> > +/**
> > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > + */
> > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > +
> > +/**
> > + *  Reclaim at the max 1/16th the total number of resources.
> > + */
> > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> 
> 
> As I said above, I don't think these thresholds need to be hardcoded.
> In any case, there seems not much point to put them in the public header file.
> 
> > +
> > +/**
> > + * Parameters used when creating the defer queue.
> > + */
> > +struct rte_rcu_qsbr_dq_parameters {
> > +	const char *name;
> > +	/**< Name of the queue. */
> > +	uint32_t size;
> > +	/**< Number of entries in queue. Typically, this will be
> > +	 *   the same as the maximum number of entries supported in the
> > +	 *   lock free data structure.
> > +	 *   Data structures with unbounded number of entries is not
> > +	 *   supported currently.
> > +	 */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of each element in the defer queue.
> > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > +	 *   support 8B element sizes only.
> > +	 */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> 
> Style nit again - I like short names myself, but that seems a bit extreme... :)
> Might be at least:
> void (*reclaim)(void *, void *);
May be 'free_fn'?

> void * reclaim_data;
> ?
This is the pointer to the data structure to free the resource into. For ex: In LPM data structure, it will be pointer to LPM. 'reclaim_data' does not convey the meaning correctly.

> 
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs. This can be NULL.
> > +	 */
> > +	struct rte_rcu_qsbr *v;
> 
> Does it need to be inside that struct?
> Might be better:
> rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> rte_rcu_qsbr_dq_parameters *params);
The API takes a parameter structure as input anyway, why to add another argument to the function? QSBR variable is also another parameter.

> 
> Another alternative: make both reclaim() and enqueue() to take v as a
> parameter.
But both of them need access to some of the parameters provided in rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to the functions.

> 
> > +	/**< RCU QSBR variable to use for this defer queue */ };
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq;
> > +
> >  /**
> >   * @warning
> >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > struct rte_rcu_qsbr *v);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + *
> > + * @param params
> > + *   Parameters to create a defer queue.
> > + * @return
> > + *   On success - Valid pointer to defer queue
> > + *   On error - NULL
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOMEM - Not enough memory
> > + */
> > +__rte_experimental
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Enqueue one resource to the defer queue and start the grace period.
> > + * The resource will be freed later after at least one grace period
> > + * is over.
> > + *
> > + * If the defer queue is full, it will attempt to reclaim resources.
> > + * It will also reclaim resources at regular intervals to avoid
> > + * the defer queue from growing too big.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to allocate an entry from.
> > + * @param e
> > + *   Pointer to resource data to copy to the defer queue. The size of
> > + *   the data to copy is equal to the element size provided when the
> > + *   defer queue was created.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > + *		if the defer queue size is equal (or larger) than the
> > + *		number of elements in the data structure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Reclaim resources from the defer queue.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to reclaim an entry from.
> > + * @return
> > + *   On successful reclamation of at least 1 resource - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - None of the resources have completed at least 1 grace
> period,
> > + *		try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Delete a defer queue.
> > + *
> > + * It tries to reclaim all the resources on the defer queue.
> > + * If any of the resources have not completed the grace period
> > + * the reclamation stops and returns immediately. The rest of
> > + * the resources are not reclaimed and the defer queue is not
> > + * freed.
> > + *
> > + * @param dq
> > + *   Defer queue to delete.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > + *		period, try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > new file mode 100644
> > index 000000000..2122bc36a
> > --- /dev/null
> > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> 
> Again style suggestion: as it is not public header - don't use rte_ prefix for
> naming.
> From my perspective - easier to relalize for reader what is public header,
> what is not.
Looks like the guidelines are not defined very well. I see one private file with rte_ prefix. I see Stephen not using rte_ prefix. I do not have any preference. But, a consistent approach is required.

> 
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > +#define _RTE_RCU_QSBR_PVT_H_
> > +
> > +/**
> > + * This file is private to the RCU library. It should not be included
> > + * by the user of this library.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "rte_rcu_qsbr.h"
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq {
> > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > +	uint32_t size;
> > +	/**< Number of elements in the defer queue */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of data stored on the defer queue */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs.
> > +	 */
> > +	char e[0];
> > +	/**< Temporary storage to copy the defer queue element. */
> 
> Do you really need 'e' at all?
> Can't it be just temporary stack variable?
Ok, will check.

> 
> > +};
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > b/lib/librte_rcu/rte_rcu_version.map
> > index f8b9ef2ab..dfac88a37 100644
> > --- a/lib/librte_rcu/rte_rcu_version.map
> > +++ b/lib/librte_rcu/rte_rcu_version.map
> > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >  	rte_rcu_qsbr_synchronize;
> >  	rte_rcu_qsbr_thread_register;
> >  	rte_rcu_qsbr_thread_unregister;
> > +	rte_rcu_qsbr_dq_create;
> > +	rte_rcu_qsbr_dq_enqueue;
> > +	rte_rcu_qsbr_dq_reclaim;
> > +	rte_rcu_qsbr_dq_delete;
> >
> >  	local: *;
> >  };
> > diff --git a/lib/meson.build b/lib/meson.build index
> > e5ff83893..0e1be8407 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,7 +11,9 @@
> >  libraries = [
> >  	'kvargs', # eal depends on kvargs
> >  	'eal', # everything depends on eal
> > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > +	'ring',
> > +	'rcu', # rcu depends on ring
> > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >  	'cmdline',
> >  	'metrics', # bitrate/latency stats depends on this
> >  	'hash',    # efd depends on this
> > @@ -22,7 +24,7 @@ libraries = [
> >  	'gro', 'gso', 'ip_frag', 'jobstats',
> >  	'kni', 'latencystats', 'lpm', 'member',
> >  	'power', 'pdump', 'rawdev',
> > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >  	# ipsec lib depends on net, crypto and security
> >  	'ipsec',
> >  	# add pkt framework libs which use other libs from above
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-02 18:50       ` Ananyev, Konstantin
@ 2019-10-03  6:42         ` Honnappa Nagarahalli
  2019-10-03 11:52           ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-03  6:42 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, Honnappa Nagarahalli, dev, nd, nd

> 
> > +
> > +/* Reclaim resources from the defer queue. */ int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > +	uint32_t max_cnt;
> > +	uint32_t cnt;
> > +	void *token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Anything to reclaim? */
> > +	if (rte_ring_count(dq->r) == 0)
> > +		return 0;
> > +
> > +	/* Reclaim at the max 1/16th the total number of entries. */
> > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > +	cnt = 0;
> > +
> > +	/* Check reader threads quiescent state and reclaim resources */
> > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> 
> One more thing I forgot to ask - how this construct supposed to work on 32
> bit machines?
> peek() will return 32-bit value, while  qsbr_check() operates with 64bit
> tokens...
> As I understand in that case you need to peek() 2 elems.
Yes, that is the intention. Ring APIs with desired element size will help address the 32b machines.

> Might work, but still think better to introduce serialize version of
> ring_dequeue() See my other mail about re_ring_peek().
> 
> 
> > +			== 1)) {
> > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > +		/* The resource to dequeue needs to be a multiple of 64b
> > +		 * due to the limitation of the rte_ring implementation.
> > +		 */
> > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > +			i++, tmp++)
> > +			(void)rte_ring_sc_dequeue(dq->r,
> > +					(void *)(uintptr_t)tmp);
> > +		dq->f(dq->p, dq->e);
> > +
> > +		cnt++;
> > +	}
> > +
> > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > +
> > +	if (cnt == 0) {
> > +		/* No resources were reclaimed */
> > +		rte_errno = EAGAIN;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/* Delete a defer queue. */
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Reclaim all the resources */
> > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > +		/* Error number is already set by the reclaim API */
> > +		return 1;
> > +
> > +	rte_ring_free(dq->r);
> > +	rte_free(dq);
> > +
> > +	return 0;
> > +}
> > +
> >  int rte_rcu_log_type;
> >
> >  RTE_INIT(rte_rcu_register)
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > @@ -34,6 +34,7 @@ extern "C" {
> >  #include <rte_lcore.h>
> >  #include <rte_debug.h>
> >  #include <rte_atomic.h>
> > +#include <rte_ring.h>
> >
> >  extern int rte_rcu_log_type;
> >
> > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> >  	 */
> >  } __rte_cache_aligned;
> >
> > +/**
> > + * Call back function called to free the resources.
> > + *
> > + * @param p
> > + *   Pointer provided while creating the defer queue
> > + * @param e
> > + *   Pointer to the resource data stored on the defer queue
> > + *
> > + * @return
> > + *   None
> > + */
> > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > +
> > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > +
> > +/**
> > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > + */
> > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > +
> > +/**
> > + *  Reclaim at the max 1/16th the total number of resources.
> > + */
> > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > +
> > +/**
> > + * Parameters used when creating the defer queue.
> > + */
> > +struct rte_rcu_qsbr_dq_parameters {
> > +	const char *name;
> > +	/**< Name of the queue. */
> > +	uint32_t size;
> > +	/**< Number of entries in queue. Typically, this will be
> > +	 *   the same as the maximum number of entries supported in the
> > +	 *   lock free data structure.
> > +	 *   Data structures with unbounded number of entries is not
> > +	 *   supported currently.
> > +	 */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of each element in the defer queue.
> > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > +	 *   support 8B element sizes only.
> > +	 */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs. This can be NULL.
> > +	 */
> > +	struct rte_rcu_qsbr *v;
> > +	/**< RCU QSBR variable to use for this defer queue */ };
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq;
> > +
> >  /**
> >   * @warning
> >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > struct rte_rcu_qsbr *v);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + *
> > + * @param params
> > + *   Parameters to create a defer queue.
> > + * @return
> > + *   On success - Valid pointer to defer queue
> > + *   On error - NULL
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOMEM - Not enough memory
> > + */
> > +__rte_experimental
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Enqueue one resource to the defer queue and start the grace period.
> > + * The resource will be freed later after at least one grace period
> > + * is over.
> > + *
> > + * If the defer queue is full, it will attempt to reclaim resources.
> > + * It will also reclaim resources at regular intervals to avoid
> > + * the defer queue from growing too big.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to allocate an entry from.
> > + * @param e
> > + *   Pointer to resource data to copy to the defer queue. The size of
> > + *   the data to copy is equal to the element size provided when the
> > + *   defer queue was created.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > + *		if the defer queue size is equal (or larger) than the
> > + *		number of elements in the data structure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Reclaim resources from the defer queue.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to reclaim an entry from.
> > + * @return
> > + *   On successful reclamation of at least 1 resource - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - None of the resources have completed at least 1 grace
> period,
> > + *		try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Delete a defer queue.
> > + *
> > + * It tries to reclaim all the resources on the defer queue.
> > + * If any of the resources have not completed the grace period
> > + * the reclamation stops and returns immediately. The rest of
> > + * the resources are not reclaimed and the defer queue is not
> > + * freed.
> > + *
> > + * @param dq
> > + *   Defer queue to delete.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > + *		period, try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > new file mode 100644
> > index 000000000..2122bc36a
> > --- /dev/null
> > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > +#define _RTE_RCU_QSBR_PVT_H_
> > +
> > +/**
> > + * This file is private to the RCU library. It should not be included
> > + * by the user of this library.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "rte_rcu_qsbr.h"
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq {
> > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > +	uint32_t size;
> > +	/**< Number of elements in the defer queue */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of data stored on the defer queue */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs.
> > +	 */
> > +	char e[0];
> > +	/**< Temporary storage to copy the defer queue element. */ };
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > b/lib/librte_rcu/rte_rcu_version.map
> > index f8b9ef2ab..dfac88a37 100644
> > --- a/lib/librte_rcu/rte_rcu_version.map
> > +++ b/lib/librte_rcu/rte_rcu_version.map
> > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >  	rte_rcu_qsbr_synchronize;
> >  	rte_rcu_qsbr_thread_register;
> >  	rte_rcu_qsbr_thread_unregister;
> > +	rte_rcu_qsbr_dq_create;
> > +	rte_rcu_qsbr_dq_enqueue;
> > +	rte_rcu_qsbr_dq_reclaim;
> > +	rte_rcu_qsbr_dq_delete;
> >
> >  	local: *;
> >  };
> > diff --git a/lib/meson.build b/lib/meson.build index
> > e5ff83893..0e1be8407 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,7 +11,9 @@
> >  libraries = [
> >  	'kvargs', # eal depends on kvargs
> >  	'eal', # everything depends on eal
> > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > +	'ring',
> > +	'rcu', # rcu depends on ring
> > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >  	'cmdline',
> >  	'metrics', # bitrate/latency stats depends on this
> >  	'hash',    # efd depends on this
> > @@ -22,7 +24,7 @@ libraries = [
> >  	'gro', 'gso', 'ip_frag', 'jobstats',
> >  	'kni', 'latencystats', 'lpm', 'member',
> >  	'power', 'pdump', 'rawdev',
> > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >  	# ipsec lib depends on net, crypto and security
> >  	'ipsec',
> >  	# add pkt framework libs which use other libs from above
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests
  2019-10-02 13:02       ` Aaron Conole
@ 2019-10-03  9:09         ` Bruce Richardson
  0 siblings, 0 replies; 137+ messages in thread
From: Bruce Richardson @ 2019-10-03  9:09 UTC (permalink / raw)
  To: Aaron Conole
  Cc: Honnappa Nagarahalli, vladimir.medvedkin, olivier.matz, dev,
	konstantin.ananyev, stephen, paulmck, Gavin.Hu, Dharmik.Thakkar,
	Ruifeng.Wang, nd

On Wed, Oct 02, 2019 at 09:02:03AM -0400, Aaron Conole wrote:
> Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> writes:
> 
> > Add performance tests for RCU integration. The performance
> > difference with and without RCU integration is very small
> > (~1% to ~2%) on both Arm and x86 platforms.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> 
> I see the following:
> 
>   lib/meson.build:89:5: ERROR: Problem encountered: Missing dependency rcu
>   for library rte_lpm
> 
> Maybe there's something wrong with the environment?  This isn't the
> first time I've seen a dependency detection problem with meson.
> 
It probably not a detection problem, more likely the rcu library is not
being built for some reason. If you apply patch [1] the meson run will
print out each library and the dependency object generated for it as each
is processed. That should help debug issues like this.

/Bruce

[1] http://patches.dpdk.org/patch/59470/

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-03  6:42         ` Honnappa Nagarahalli
@ 2019-10-03 11:52           ` Ananyev, Konstantin
  0 siblings, 0 replies; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-03 11:52 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Thursday, October 3, 2019 7:42 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; paulmck@linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; Ruifeng Wang (Arm Technology
> China) <Ruifeng.Wang@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; dev@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
> 
> >
> > > +
> > > +/* Reclaim resources from the defer queue. */ int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > +	uint32_t max_cnt;
> > > +	uint32_t cnt;
> > > +	void *token;
> > > +	uint64_t *tmp;
> > > +	uint32_t i;
> > > +
> > > +	if (dq == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Anything to reclaim? */
> > > +	if (rte_ring_count(dq->r) == 0)
> > > +		return 0;
> > > +
> > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > > +	cnt = 0;
> > > +
> > > +	/* Check reader threads quiescent state and reclaim resources */
> > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> >
> > One more thing I forgot to ask - how this construct supposed to work on 32
> > bit machines?
> > peek() will return 32-bit value, while  qsbr_check() operates with 64bit
> > tokens...
> > As I understand in that case you need to peek() 2 elems.
> Yes, that is the intention. Ring APIs with desired element size will help address the 32b machines.

Or serialized dequeue :)

> 
> > Might work, but still think better to introduce serialize version of
> > ring_dequeue() See my other mail about re_ring_peek().
> >
> >
> > > +			== 1)) {
> > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > +		 * due to the limitation of the rte_ring implementation.
> > > +		 */
> > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > +			i++, tmp++)
> > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > +					(void *)(uintptr_t)tmp);
> > > +		dq->f(dq->p, dq->e);
> > > +
> > > +		cnt++;
> > > +	}
> > > +
> > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > +
> > > +	if (cnt == 0) {
> > > +		/* No resources were reclaimed */
> > > +		rte_errno = EAGAIN;
> > > +		return 1;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Delete a defer queue. */
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > +	if (dq == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Reclaim all the resources */
> > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > +		/* Error number is already set by the reclaim API */
> > > +		return 1;
> > > +
> > > +	rte_ring_free(dq->r);
> > > +	rte_free(dq);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  int rte_rcu_log_type;
> > >
> > >  RTE_INIT(rte_rcu_register)
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > @@ -34,6 +34,7 @@ extern "C" {
> > >  #include <rte_lcore.h>
> > >  #include <rte_debug.h>
> > >  #include <rte_atomic.h>
> > > +#include <rte_ring.h>
> > >
> > >  extern int rte_rcu_log_type;
> > >
> > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > >  	 */
> > >  } __rte_cache_aligned;
> > >
> > > +/**
> > > + * Call back function called to free the resources.
> > > + *
> > > + * @param p
> > > + *   Pointer provided while creating the defer queue
> > > + * @param e
> > > + *   Pointer to the resource data stored on the defer queue
> > > + *
> > > + * @return
> > > + *   None
> > > + */
> > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > > +
> > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > +
> > > +/**
> > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > + */
> > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > +
> > > +/**
> > > + *  Reclaim at the max 1/16th the total number of resources.
> > > + */
> > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > > +
> > > +/**
> > > + * Parameters used when creating the defer queue.
> > > + */
> > > +struct rte_rcu_qsbr_dq_parameters {
> > > +	const char *name;
> > > +	/**< Name of the queue. */
> > > +	uint32_t size;
> > > +	/**< Number of entries in queue. Typically, this will be
> > > +	 *   the same as the maximum number of entries supported in the
> > > +	 *   lock free data structure.
> > > +	 *   Data structures with unbounded number of entries is not
> > > +	 *   supported currently.
> > > +	 */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of each element in the defer queue.
> > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > +	 *   support 8B element sizes only.
> > > +	 */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs. This can be NULL.
> > > +	 */
> > > +	struct rte_rcu_qsbr *v;
> > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq;
> > > +
> > >  /**
> > >   * @warning
> > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > > struct rte_rcu_qsbr *v);
> > >
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Create a queue used to store the data structure elements that can
> > > + * be freed later. This queue is referred to as 'defer queue'.
> > > + *
> > > + * @param params
> > > + *   Parameters to create a defer queue.
> > > + * @return
> > > + *   On success - Valid pointer to defer queue
> > > + *   On error - NULL
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOMEM - Not enough memory
> > > + */
> > > +__rte_experimental
> > > +struct rte_rcu_qsbr_dq *
> > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > +*params);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Enqueue one resource to the defer queue and start the grace period.
> > > + * The resource will be freed later after at least one grace period
> > > + * is over.
> > > + *
> > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > + * It will also reclaim resources at regular intervals to avoid
> > > + * the defer queue from growing too big.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other means.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to allocate an entry from.
> > > + * @param e
> > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > + *   the data to copy is equal to the element size provided when the
> > > + *   defer queue was created.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > + *		if the defer queue size is equal (or larger) than the
> > > + *		number of elements in the data structure.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Reclaim resources from the defer queue.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other means.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to reclaim an entry from.
> > > + * @return
> > > + *   On successful reclamation of at least 1 resource - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - None of the resources have completed at least 1 grace
> > period,
> > > + *		try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Delete a defer queue.
> > > + *
> > > + * It tries to reclaim all the resources on the defer queue.
> > > + * If any of the resources have not completed the grace period
> > > + * the reclamation stops and returns immediately. The rest of
> > > + * the resources are not reclaimed and the defer queue is not
> > > + * freed.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to delete.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > > + *		period, try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > new file mode 100644
> > > index 000000000..2122bc36a
> > > --- /dev/null
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > @@ -0,0 +1,46 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright (c) 2019 Arm Limited
> > > + */
> > > +
> > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > +#define _RTE_RCU_QSBR_PVT_H_
> > > +
> > > +/**
> > > + * This file is private to the RCU library. It should not be included
> > > + * by the user of this library.
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include "rte_rcu_qsbr.h"
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq {
> > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > +	uint32_t size;
> > > +	/**< Number of elements in the defer queue */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs.
> > > +	 */
> > > +	char e[0];
> > > +	/**< Temporary storage to copy the defer queue element. */ };
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > b/lib/librte_rcu/rte_rcu_version.map
> > > index f8b9ef2ab..dfac88a37 100644
> > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > >  	rte_rcu_qsbr_synchronize;
> > >  	rte_rcu_qsbr_thread_register;
> > >  	rte_rcu_qsbr_thread_unregister;
> > > +	rte_rcu_qsbr_dq_create;
> > > +	rte_rcu_qsbr_dq_enqueue;
> > > +	rte_rcu_qsbr_dq_reclaim;
> > > +	rte_rcu_qsbr_dq_delete;
> > >
> > >  	local: *;
> > >  };
> > > diff --git a/lib/meson.build b/lib/meson.build index
> > > e5ff83893..0e1be8407 100644
> > > --- a/lib/meson.build
> > > +++ b/lib/meson.build
> > > @@ -11,7 +11,9 @@
> > >  libraries = [
> > >  	'kvargs', # eal depends on kvargs
> > >  	'eal', # everything depends on eal
> > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > +	'ring',
> > > +	'rcu', # rcu depends on ring
> > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > >  	'cmdline',
> > >  	'metrics', # bitrate/latency stats depends on this
> > >  	'hash',    # efd depends on this
> > > @@ -22,7 +24,7 @@ libraries = [
> > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > >  	'kni', 'latencystats', 'lpm', 'member',
> > >  	'power', 'pdump', 'rawdev',
> > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > >  	# ipsec lib depends on net, crypto and security
> > >  	'ipsec',
> > >  	# add pkt framework libs which use other libs from above
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-03  6:29         ` Honnappa Nagarahalli
@ 2019-10-03 12:26           ` Ananyev, Konstantin
  2019-10-04  6:07             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-03 12:26 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd

Hi Honnappa,

> > > Add resource reclamation APIs to make it simple for applications and
> > > libraries to integrate rte_rcu library.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> > >  lib/librte_rcu/meson.build         |   2 +
> > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > >  lib/meson.build                    |   6 +-
> > >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > >
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > @@ -21,6 +21,7 @@
> > >  #include <rte_errno.h>
> > >
> > >  #include "rte_rcu_qsbr.h"
> > > +#include "rte_rcu_qsbr_pvt.h"
> > >
> > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6 +268,190
> > > @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > >  	return 0;
> > >  }
> > >
> > > +/* Create a queue used to store the data structure elements that can
> > > + * be freed later. This queue is referred to as 'defer queue'.
> > > + */
> > > +struct rte_rcu_qsbr_dq *
> > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > +*params) {
> > > +	struct rte_rcu_qsbr_dq *dq;
> > > +	uint32_t qs_fifo_size;
> > > +
> > > +	if (params == NULL || params->f == NULL ||
> > > +		params->v == NULL || params->name == NULL ||
> > > +		params->size == 0 || params->esize == 0 ||
> > > +		(params->esize % 8 != 0)) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return NULL;
> > > +	}
> > > +
> > > +	dq = rte_zmalloc(NULL,
> > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > +		RTE_CACHE_LINE_SIZE);
> > > +	if (dq == NULL) {
> > > +		rte_errno = ENOMEM;
> > > +
> > > +		return NULL;
> > > +	}
> > > +
> > > +	/* round up qs_fifo_size to next power of two that is not less than
> > > +	 * max_size.
> > > +	 */
> > > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > > +					* params->size) + 1);
> > > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > > +					SOCKET_ID_ANY, 0);
> >
> > If it is going to be not MT safe, then why not to create the ring with
> > (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> Agree.
> 
> > Though I think it could be changed to allow MT safe multiple enqeue/single
> > dequeue, see below.
> The MT safe issue is due to reclaim code. The reclaim code has the following sequence:
> 
> rte_ring_peek
> rte_rcu_qsbr_check
> rte_ring_dequeue
> 
> This entire sequence needs to be atomic as the entry cannot be dequeued without knowing that the grace period for that entry is over.

I understand that, though I believe at least it should be possible to support multiple-enqueue/single dequeuer and reclaim mode.
With serialized dequeue() even multiple dequeue should be possible.

> Note that due to optimizations in rte_rcu_qsbr_check API, this sequence should not be large in most cases. I do not have ideas on how to
> make this sequence lock-free.
> 
> If the writer is on the control plane, most use cases will use mutex locks for synchronization if they are multi-threaded. That lock should be
> enough to provide the thread safety for these APIs.

In that is case, why do we need ring at all?
For sure people can create their own queue quite easily with mutex and TAILQ.
If performance is not an issue, they can even add pthread_cond to it, and have an ability
for the consumer to sleep/wakeup on empty/full queue. 

> 
> If the writer is multi-threaded and lock-free, then one should use per thread defer queue.

If that's the only working model, then the question is why do we need that API at all?
Just simple array with counter or linked-list should do for majority of cases.

> 
> >
> > > +	if (dq->r == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): defer queue create failed\n", __func__);
> > > +		rte_free(dq);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	dq->v = params->v;
> > > +	dq->size = params->size;
> > > +	dq->esize = params->esize;
> > > +	dq->f = params->f;
> > > +	dq->p = params->p;
> > > +
> > > +	return dq;
> > > +}
> > > +
> > > +/* Enqueue one resource to the defer queue to free after the grace
> > > + * period is over.
> > > + */
> > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > > +	uint64_t token;
> > > +	uint64_t *tmp;
> > > +	uint32_t i;
> > > +	uint32_t cur_size, free_size;
> > > +
> > > +	if (dq == NULL || e == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> >
> > Why just not to return -EINVAL straightway?
> > I think there is no much point to set rte_errno in that function at all, just
> > return value should do.
> I am trying to keep these consistent with the existing APIs. They return 0 or 1 and set the rte_errno.

A lot of public DPDK API functions do use return value to return status code
(0, or some positive numbers of success, negative errno values on failure),
I am not inventing anything new here.

> 
> >
> > > +	}
> > > +
> > > +	/* Start the grace period */
> > > +	token = rte_rcu_qsbr_start(dq->v);
> > > +
> > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > +	 * the queue from growing too large and allows time for reader
> > > +	 * threads to report their quiescent state.
> > > +	 */
> > > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> >
> > Probably would be a bit easier if you just store in dq->esize (elt size + token
> > size) / 8.
> Agree
> 
> >
> > > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> >
> > Why to make this threshold value hard-coded?
> > Why either not to put it into create parameter, or just return a special return
> > value, to indicate that threshold is reached?
> My thinking was to keep the programming interface easy to use. The more the parameters, the more painful it is for the user. IMO, the
> constants chosen should be good enough for most cases. More advanced users could modify the constants. However, we could make these
> as part of the parameters, but make them optional for the user. For ex: if they set them to 0, default values can be used.
> 
> > Or even return number of filled/free entroes on success, so caller can decide
> > to reclaim or not based on that information on his own?
> This means more code on the user side. 

I personally think it it really wouldn't be that big problem to the user to pass extra parameter to the function.
Again what if user doesn't want to reclaim() in enqueue() thread at all?

> I think adding these to parameters seems like a better option.
> 
> >
> > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +			"%s(): Triggering reclamation\n", __func__);
> > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > +	}
> > > +
> > > +	/* Check if there is space for atleast for 1 resource */
> > > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > +	if (!free_size) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Defer queue is full\n", __func__);
> > > +		rte_errno = ENOSPC;
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Enqueue the resource */
> > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > +
> > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > +	 * due to the limitation of the rte_ring implementation.
> > > +	 */
> > > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> >
> >
> > That whole construction above looks a bit clumsy and error prone...
> > I suppose just:
> >
> > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> > n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
> Yes, bulk enqueue can be used. But note that once the flexible element size ring patch is done, this code will use that.

Well, when it will be in the mainline, and it would provide a better way,
for sure this code can be updated to use new API (if it is provide some improvements).
But as I udenrstand, right now it is not there, while bulk enqueue/dequeue are.

> 
> >   return -ENOSPC;
> > return free;
> >
> > That way I think you can have MT-safe version of that function.
> Please see the description of MT safe issue above.
> 
> >
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Reclaim resources from the defer queue. */ int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > +	uint32_t max_cnt;
> > > +	uint32_t cnt;
> > > +	void *token;
> > > +	uint64_t *tmp;
> > > +	uint32_t i;
> > > +
> > > +	if (dq == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> >
> > Same story as above - I think rte_errno is excessive in this function.
> > Just return value should be enough.
> >
> >
> > > +	}
> > > +
> > > +	/* Anything to reclaim? */
> > > +	if (rte_ring_count(dq->r) == 0)
> > > +		return 0;
> >
> > Not sure you need that, see below.
> >
> > > +
> > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> >
> > Again why not to make max_cnt a configurable at create() parameter?
> I think making this as an optional parameter for creating defer queue is a better option.
> 
> > Or even a parameter for that function?
> >
> > > +	cnt = 0;
> > > +
> > > +	/* Check reader threads quiescent state and reclaim resources */
> > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > > +			== 1)) {
> >
> >
> > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > +		 * due to the limitation of the rte_ring implementation.
> > > +		 */
> > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > +			i++, tmp++)
> > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > +					(void *)(uintptr_t)tmp);
> >
> > Again, no need for such constructs with multiple dequeuer I believe.
> > Just:
> >
> > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> > elt[nb_elt]; ...
> > n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0) {dq->f(dq->p,
> > elt);}
> Agree on bulk API use.
> 
> >
> > Seems enough.
> > Again in that case you can have enqueue/reclaim running in different threads
> > simultaneously, plus you don't need dq->e at all.
> Will check on dq->e
> 
> >
> > > +		dq->f(dq->p, dq->e);
> > > +
> > > +		cnt++;
> > > +	}
> > > +
> > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > +
> > > +	if (cnt == 0) {
> > > +		/* No resources were reclaimed */
> > > +		rte_errno = EAGAIN;
> > > +		return 1;
> > > +	}
> > > +
> > > +	return 0;
> >
> > I'd suggest to return cnt on success.
> I am trying to keep the APIs simple. I do not see much use for 'cnt' as return value to the user. It exposes more details which I think are
> internal to the library.

Not sure what is the hassle to return number of completed reclamaitions?
If user doesn't need that information, he simply wouldn't use it.
But might be it would be usefull - he can decide should he try another attempt
of reclaim() immediately or is it ok to do something else.

> 
> >
> > > +}
> > > +
> > > +/* Delete a defer queue. */
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > +	if (dq == NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Reclaim all the resources */
> > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > +		/* Error number is already set by the reclaim API */
> > > +		return 1;
> >
> > How do you know that you have reclaimed everything?
> Good point, will come back with a different solution.
> 
> >
> > > +
> > > +	rte_ring_free(dq->r);
> > > +	rte_free(dq);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  int rte_rcu_log_type;
> > >
> > >  RTE_INIT(rte_rcu_register)
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > @@ -34,6 +34,7 @@ extern "C" {
> > >  #include <rte_lcore.h>
> > >  #include <rte_debug.h>
> > >  #include <rte_atomic.h>
> > > +#include <rte_ring.h>
> > >
> > >  extern int rte_rcu_log_type;
> > >
> > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > >  	 */
> > >  } __rte_cache_aligned;
> > >
> > > +/**
> > > + * Call back function called to free the resources.
> > > + *
> > > + * @param p
> > > + *   Pointer provided while creating the defer queue
> > > + * @param e
> > > + *   Pointer to the resource data stored on the defer queue
> > > + *
> > > + * @return
> > > + *   None
> > > + */
> > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> >
> > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > Though I am not sure you need a new typedef at all - just a function pointer
> > inside the struct seems enough.
> Other libraries (for ex: rte_hash) use this approach. I think it is better to keep it out of the structure to allow for better commenting.

I am saying majority of DPDK code use _t suffix for typedef:
typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);

> 
> >
> > > +
> > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > +
> > > +/**
> > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > + */
> > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > +
> > > +/**
> > > + *  Reclaim at the max 1/16th the total number of resources.
> > > + */
> > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> >
> >
> > As I said above, I don't think these thresholds need to be hardcoded.
> > In any case, there seems not much point to put them in the public header file.
> >
> > > +
> > > +/**
> > > + * Parameters used when creating the defer queue.
> > > + */
> > > +struct rte_rcu_qsbr_dq_parameters {
> > > +	const char *name;
> > > +	/**< Name of the queue. */
> > > +	uint32_t size;
> > > +	/**< Number of entries in queue. Typically, this will be
> > > +	 *   the same as the maximum number of entries supported in the
> > > +	 *   lock free data structure.
> > > +	 *   Data structures with unbounded number of entries is not
> > > +	 *   supported currently.
> > > +	 */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of each element in the defer queue.
> > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > +	 *   support 8B element sizes only.
> > > +	 */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> >
> > Style nit again - I like short names myself, but that seems a bit extreme... :)
> > Might be at least:
> > void (*reclaim)(void *, void *);
> May be 'free_fn'?
> 
> > void * reclaim_data;
> > ?
> This is the pointer to the data structure to free the resource into. For ex: In LPM data structure, it will be pointer to LPM. 'reclaim_data'
> does not convey the meaning correctly.

Ok, please free to comeup with your own names.
I just wanted to say that 'f' and 'p' are a bit an extreme for public API.

> 
> >
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs. This can be NULL.
> > > +	 */
> > > +	struct rte_rcu_qsbr *v;
> >
> > Does it need to be inside that struct?
> > Might be better:
> > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > rte_rcu_qsbr_dq_parameters *params);
> The API takes a parameter structure as input anyway, why to add another argument to the function? QSBR variable is also another
> parameter.
> 
> >
> > Another alternative: make both reclaim() and enqueue() to take v as a
> > parameter.
> But both of them need access to some of the parameters provided in rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to
> the functions.

Pure stylish thing.
From my perspective it just provides better visibility what is going in the code:
For QSBR var 'v' create a new deferred queue.
But no strong opinion here.

> 
> >
> > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq;
> > > +
> > >  /**
> > >   * @warning
> > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > > struct rte_rcu_qsbr *v);
> > >
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Create a queue used to store the data structure elements that can
> > > + * be freed later. This queue is referred to as 'defer queue'.
> > > + *
> > > + * @param params
> > > + *   Parameters to create a defer queue.
> > > + * @return
> > > + *   On success - Valid pointer to defer queue
> > > + *   On error - NULL
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOMEM - Not enough memory
> > > + */
> > > +__rte_experimental
> > > +struct rte_rcu_qsbr_dq *
> > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > +*params);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Enqueue one resource to the defer queue and start the grace period.
> > > + * The resource will be freed later after at least one grace period
> > > + * is over.
> > > + *
> > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > + * It will also reclaim resources at regular intervals to avoid
> > > + * the defer queue from growing too big.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other means.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to allocate an entry from.
> > > + * @param e
> > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > + *   the data to copy is equal to the element size provided when the
> > > + *   defer queue was created.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > + *		if the defer queue size is equal (or larger) than the
> > > + *		number of elements in the data structure.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Reclaim resources from the defer queue.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other means.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to reclaim an entry from.
> > > + * @return
> > > + *   On successful reclamation of at least 1 resource - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - None of the resources have completed at least 1 grace
> > period,
> > > + *		try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Delete a defer queue.
> > > + *
> > > + * It tries to reclaim all the resources on the defer queue.
> > > + * If any of the resources have not completed the grace period
> > > + * the reclamation stops and returns immediately. The rest of
> > > + * the resources are not reclaimed and the defer queue is not
> > > + * freed.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to delete.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > > + *		period, try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > new file mode 100644
> > > index 000000000..2122bc36a
> > > --- /dev/null
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >
> > Again style suggestion: as it is not public header - don't use rte_ prefix for
> > naming.
> > From my perspective - easier to relalize for reader what is public header,
> > what is not.
> Looks like the guidelines are not defined very well. I see one private file with rte_ prefix. I see Stephen not using rte_ prefix. I do not have
> any preference. But, a consistent approach is required.

That's just a suggestion.
For me (and I hope for others) it would be a bit easier.
When looking at the code for first time I had to look a t meson.build to check
is it a public header or not.
If the file doesn't have 'rte_' prefix, I assume that it is an internal one straightway.
But , as you said, there is no exact guidelines here, so up to you to decide.

> 
> >
> > > @@ -0,0 +1,46 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright (c) 2019 Arm Limited
> > > + */
> > > +
> > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > +#define _RTE_RCU_QSBR_PVT_H_
> > > +
> > > +/**
> > > + * This file is private to the RCU library. It should not be included
> > > + * by the user of this library.
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include "rte_rcu_qsbr.h"
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq {
> > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > +	uint32_t size;
> > > +	/**< Number of elements in the defer queue */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs.
> > > +	 */
> > > +	char e[0];
> > > +	/**< Temporary storage to copy the defer queue element. */
> >
> > Do you really need 'e' at all?
> > Can't it be just temporary stack variable?
> Ok, will check.
> 
> >
> > > +};
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > b/lib/librte_rcu/rte_rcu_version.map
> > > index f8b9ef2ab..dfac88a37 100644
> > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > >  	rte_rcu_qsbr_synchronize;
> > >  	rte_rcu_qsbr_thread_register;
> > >  	rte_rcu_qsbr_thread_unregister;
> > > +	rte_rcu_qsbr_dq_create;
> > > +	rte_rcu_qsbr_dq_enqueue;
> > > +	rte_rcu_qsbr_dq_reclaim;
> > > +	rte_rcu_qsbr_dq_delete;
> > >
> > >  	local: *;
> > >  };
> > > diff --git a/lib/meson.build b/lib/meson.build index
> > > e5ff83893..0e1be8407 100644
> > > --- a/lib/meson.build
> > > +++ b/lib/meson.build
> > > @@ -11,7 +11,9 @@
> > >  libraries = [
> > >  	'kvargs', # eal depends on kvargs
> > >  	'eal', # everything depends on eal
> > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > +	'ring',
> > > +	'rcu', # rcu depends on ring
> > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > >  	'cmdline',
> > >  	'metrics', # bitrate/latency stats depends on this
> > >  	'hash',    # efd depends on this
> > > @@ -22,7 +24,7 @@ libraries = [
> > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > >  	'kni', 'latencystats', 'lpm', 'member',
> > >  	'power', 'pdump', 'rawdev',
> > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > >  	# ipsec lib depends on net, crypto and security
> > >  	'ipsec',
> > >  	# add pkt framework libs which use other libs from above
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-02 18:42       ` Ananyev, Konstantin
@ 2019-10-03 19:49         ` Honnappa Nagarahalli
  2019-10-07  9:01           ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-03 19:49 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd

> > Subject: [PATCH v3 1/3] lib/ring: add peek API
> >
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> >
> > The peek API allows fetching the next available object in the ring
> > without dequeuing it. This helps in scenarios where dequeuing of
> > objects depend on their value.
> >
> > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> >  1 file changed, 30 insertions(+)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 2a9f768a1..d3d0d5e18 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> **obj_table,
> >  				r->cons.single, available);
> >  }
> >
> > +/**
> > + * Peek one object from a ring.
> > + *
> > + * The peek API allows fetching the next available object in the ring
> > + * without dequeuing it. This API is not multi-thread safe with
> > +respect
> > + * to other consumer threads.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_p
> > + *   A pointer to a void * pointer (object) that will be filled.
> > + * @return
> > + *   - 0: Success, object available
> > + *   - -ENOENT: Not enough entries in the ring.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline int
> > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> 
> As it is not MT safe, then I think we need _sc_ in the name, to follow other
> rte_ring functions naming conventions
> (rte_ring_sc_peek() or so).
Agree

> 
> As a better alternative what do you think about introducing a serialized
> versions of DPDK rte_ring dequeue functions?
> Something like that:
> 
> /* same as original ring dequeue, but:
>   * 1) move cons.head only if cons.head == const.tail
>   * 2) don't update cons.tail
>   */
> unsigned int
> rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned
> int n,
>                 unsigned int *available);
> 
> /* sets both cons.head and cons.tail to cons.head + num */ void
> rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> 
> /* resets cons.head to const.tail value */ void
> rte_ring_serial_dequeue_abort(struct rte_ring *r);
> 
> Then your dq_reclaim cycle function will look like that:
> 
> const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n; uintptr_t
> elt[nb_elt]; ...
> 
> do {
> 
>   /* read next elem from the queue */
>   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
>   if (n == 0)
>       break;
> 
>  /* wrong period, keep elem in the queue */  if (rte_rcu_qsbr_check(dr->v,
> elt[0]) != 1) {
>      rte_ring_serial_dequeue_abort(dq->r);
>      break;
>   }
> 
>   /* can reclaim, remove elem from the queue */
>   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> 
>    /*call reclaim function */
>   dr->f(dr->p, elt);
> 
> } while (avl >= nb_elt);
> 
> That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> As long as actual reclamation callback itself is MT safe of course.

I think it is a great idea. The other writers would still be polling for the current writer to update the tail or update the head. This makes it a blocking solution.

We can make the other threads not poll i.e. they will quit reclaiming if they see that other writers are dequeuing from the queue. The other way is to use per thread queues.

The other requirement I see is to support unbounded-size data structures where in the data structures do not have a pre-determined number of entries. Also, currently the defer queue size is equal to the total number of entries in a given data structure. There are plans to support dynamically resizable defer queue. This means, memory allocation which will affect the lock-free-ness of the solution.

So, IMO:
1) The API should provide the capability to support different algorithms - may be through some flags?
2) The requirements for the ring are pretty unique to the problem we have here (for ex: move the cons-head only if cons-tail is also the same, skip polling). So, we should probably implement a ring with-in the RCU library?

From the timeline perspective, adding all these capabilities would be difficult to get done with in 19.11 timeline. What I have here satisfies my current needs. I suggest that we make provisions in APIs now to support all these features, but do the implementation in the coming releases. Does this sound ok for you?

> 
> > +{
> > +	uint32_t prod_tail = r->prod.tail;
> > +	uint32_t cons_head = r->cons.head;
> > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > +	unsigned int n = 1;
> > +	if (count) {
> > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > +		return 0;
> > +	}
> > +	return -ENOENT;
> > +}
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-03 12:26           ` Ananyev, Konstantin
@ 2019-10-04  6:07             ` Honnappa Nagarahalli
  2019-10-07 10:46               ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-04  6:07 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd, nd

> 
> Hi Honnappa,
> 
> > > > Add resource reclamation APIs to make it simple for applications
> > > > and libraries to integrate rte_rcu library.
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> > > >  lib/librte_rcu/meson.build         |   2 +
> > > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > > >  lib/meson.build                    |   6 +-
> > > >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > >
> > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > > @@ -21,6 +21,7 @@
> > > >  #include <rte_errno.h>
> > > >
> > > >  #include "rte_rcu_qsbr.h"
> > > > +#include "rte_rcu_qsbr_pvt.h"
> > > >
> > > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6
> > > > +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > > >  	return 0;
> > > >  }
> > > >
> > > > +/* Create a queue used to store the data structure elements that
> > > > +can
> > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > + */
> > > > +struct rte_rcu_qsbr_dq *
> > > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > > +*params) {
> > > > +	struct rte_rcu_qsbr_dq *dq;
> > > > +	uint32_t qs_fifo_size;
> > > > +
> > > > +	if (params == NULL || params->f == NULL ||
> > > > +		params->v == NULL || params->name == NULL ||
> > > > +		params->size == 0 || params->esize == 0 ||
> > > > +		(params->esize % 8 != 0)) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > +		rte_errno = EINVAL;
> > > > +
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	dq = rte_zmalloc(NULL,
> > > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > > +		RTE_CACHE_LINE_SIZE);
> > > > +	if (dq == NULL) {
> > > > +		rte_errno = ENOMEM;
> > > > +
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	/* round up qs_fifo_size to next power of two that is not less than
> > > > +	 * max_size.
> > > > +	 */
> > > > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > > > +					* params->size) + 1);
> > > > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > > > +					SOCKET_ID_ANY, 0);
> > >
> > > If it is going to be not MT safe, then why not to create the ring
> > > with (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> > Agree.
> >
> > > Though I think it could be changed to allow MT safe multiple
> > > enqeue/single dequeue, see below.
> > The MT safe issue is due to reclaim code. The reclaim code has the following
> sequence:
> >
> > rte_ring_peek
> > rte_rcu_qsbr_check
> > rte_ring_dequeue
> >
> > This entire sequence needs to be atomic as the entry cannot be dequeued
> without knowing that the grace period for that entry is over.
> 
> I understand that, though I believe at least it should be possible to support
> multiple-enqueue/single dequeuer and reclaim mode.
> With serialized dequeue() even multiple dequeue should be possible.
Agreed. Please see the response on the other thread.

> 
> > Note that due to optimizations in rte_rcu_qsbr_check API, this
> > sequence should not be large in most cases. I do not have ideas on how to
> make this sequence lock-free.
> >
> > If the writer is on the control plane, most use cases will use mutex
> > locks for synchronization if they are multi-threaded. That lock should be
> enough to provide the thread safety for these APIs.
> 
> In that is case, why do we need ring at all?
> For sure people can create their own queue quite easily with mutex and TAILQ.
> If performance is not an issue, they can even add pthread_cond to it, and have
> an ability for the consumer to sleep/wakeup on empty/full queue.
> 
> >
> > If the writer is multi-threaded and lock-free, then one should use per thread
> defer queue.
> 
> If that's the only working model, then the question is why do we need that API
> at all?
> Just simple array with counter or linked-list should do for majority of cases.
Please see the other thread.

> 
> >
> > >
> > > > +	if (dq->r == NULL) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): defer queue create failed\n", __func__);
> > > > +		rte_free(dq);
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	dq->v = params->v;
> > > > +	dq->size = params->size;
> > > > +	dq->esize = params->esize;
> > > > +	dq->f = params->f;
> > > > +	dq->p = params->p;
> > > > +
> > > > +	return dq;
> > > > +}
> > > > +
> > > > +/* Enqueue one resource to the defer queue to free after the
> > > > +grace
> > > > + * period is over.
> > > > + */
> > > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > > > +	uint64_t token;
> > > > +	uint64_t *tmp;
> > > > +	uint32_t i;
> > > > +	uint32_t cur_size, free_size;
> > > > +
> > > > +	if (dq == NULL || e == NULL) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > +		rte_errno = EINVAL;
> > > > +
> > > > +		return 1;
> > >
> > > Why just not to return -EINVAL straightway?
> > > I think there is no much point to set rte_errno in that function at
> > > all, just return value should do.
> > I am trying to keep these consistent with the existing APIs. They return 0 or 1
> and set the rte_errno.
> 
> A lot of public DPDK API functions do use return value to return status code (0,
> or some positive numbers of success, negative errno values on failure), I am
> not inventing anything new here.
Agree, you are not proposing a new thing here. May be I was not clear. I really do not have an opinion on how this should be done. But, I do have an opinion on consistency. These new APIs follow what has been done in the existing RCU APIs. I think we have 2 options here.
1) Either we change existing RCU APIs to get rid of rte_errno (is it an ABI change?) or
2) The new APIs follow what has been done in the existing RCU APIs.
I want to make sure we are consistent at least within RCU APIs.

> 
> >
> > >
> > > > +	}
> > > > +
> > > > +	/* Start the grace period */
> > > > +	token = rte_rcu_qsbr_start(dq->v);
> > > > +
> > > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > > +	 * the queue from growing too large and allows time for reader
> > > > +	 * threads to report their quiescent state.
> > > > +	 */
> > > > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> > >
> > > Probably would be a bit easier if you just store in dq->esize (elt
> > > size + token
> > > size) / 8.
> > Agree
> >
> > >
> > > > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> > >
> > > Why to make this threshold value hard-coded?
> > > Why either not to put it into create parameter, or just return a
> > > special return value, to indicate that threshold is reached?
> > My thinking was to keep the programming interface easy to use. The
> > more the parameters, the more painful it is for the user. IMO, the
> > constants chosen should be good enough for most cases. More advanced
> users could modify the constants. However, we could make these as part of the
> parameters, but make them optional for the user. For ex: if they set them to 0,
> default values can be used.
> >
> > > Or even return number of filled/free entroes on success, so caller
> > > can decide to reclaim or not based on that information on his own?
> > This means more code on the user side.
> 
> I personally think it it really wouldn't be that big problem to the user to pass
> extra parameter to the function.
I will convert the 2 constants into optional parameters (user can set them to 0 to make the algorithm use default values)

> Again what if user doesn't want to reclaim() in enqueue() thread at all?
'enqueue' has to do reclamation if the defer queue is full. I do not think this is trivial.

In the current design, reclamation in enqueue is also done on regular basis (automatic triggering of reclamation when the queue reaches certain limit) to keep the queue from growing too large. This is required when we implement a dynamically adjusting defer queue. The current algorithm keeps the cost of reclamation spread across multiple calls and puts an upper bound on cycles for delete API by reclaiming a fixed number of entries.

This algorithm is proven to work in the LPM integration performance tests at a very low performance over head (~1%). So, I do not know why a user would not want to use this. The 2 additional parameters should give the user more flexibility.

However, if the user wants his own algorithm, he can create one with the base APIs provided.

> 
> > I think adding these to parameters seems like a better option.
> >
> > >
> > > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > +			"%s(): Triggering reclamation\n", __func__);
> > > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > > +	}
> > > > +
> > > > +	/* Check if there is space for atleast for 1 resource */
> > > > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > > +	if (!free_size) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Defer queue is full\n", __func__);
> > > > +		rte_errno = ENOSPC;
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	/* Enqueue the resource */
> > > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > > +
> > > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > > +	 * due to the limitation of the rte_ring implementation.
> > > > +	 */
> > > > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> > >
> > >
> > > That whole construction above looks a bit clumsy and error prone...
> > > I suppose just:
> > >
> > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> > > n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
> > Yes, bulk enqueue can be used. But note that once the flexible element size
> ring patch is done, this code will use that.
> 
> Well, when it will be in the mainline, and it would provide a better way, for sure
> this code can be updated to use new API (if it is provide some improvements).
> But as I udenrstand, right now it is not there, while bulk enqueue/dequeue are.
Apologies, I was not clear. I agree we can go with bulk APIs for now.

> 
> >
> > >   return -ENOSPC;
> > > return free;
> > >
> > > That way I think you can have MT-safe version of that function.
> > Please see the description of MT safe issue above.
> >
> > >
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +/* Reclaim resources from the defer queue. */ int
> > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > > +	uint32_t max_cnt;
> > > > +	uint32_t cnt;
> > > > +	void *token;
> > > > +	uint64_t *tmp;
> > > > +	uint32_t i;
> > > > +
> > > > +	if (dq == NULL) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > +		rte_errno = EINVAL;
> > > > +
> > > > +		return 1;
> > >
> > > Same story as above - I think rte_errno is excessive in this function.
> > > Just return value should be enough.
> > >
> > >
> > > > +	}
> > > > +
> > > > +	/* Anything to reclaim? */
> > > > +	if (rte_ring_count(dq->r) == 0)
> > > > +		return 0;
> > >
> > > Not sure you need that, see below.
> > >
> > > > +
> > > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > >
> > > Again why not to make max_cnt a configurable at create() parameter?
> > I think making this as an optional parameter for creating defer queue is a
> better option.
> >
> > > Or even a parameter for that function?
> > >
> > > > +	cnt = 0;
> > > > +
> > > > +	/* Check reader threads quiescent state and reclaim resources */
> > > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > > > +			== 1)) {
> > >
> > >
> > > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > > +		 * due to the limitation of the rte_ring implementation.
> > > > +		 */
> > > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > > +			i++, tmp++)
> > > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > > +					(void *)(uintptr_t)tmp);
> > >
> > > Again, no need for such constructs with multiple dequeuer I believe.
> > > Just:
> > >
> > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> > > elt[nb_elt]; ...
> > > n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0)
> > > {dq->f(dq->p, elt);}
> > Agree on bulk API use.
> >
> > >
> > > Seems enough.
> > > Again in that case you can have enqueue/reclaim running in different
> > > threads simultaneously, plus you don't need dq->e at all.
> > Will check on dq->e
> >
> > >
> > > > +		dq->f(dq->p, dq->e);
> > > > +
> > > > +		cnt++;
> > > > +	}
> > > > +
> > > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > > +
> > > > +	if (cnt == 0) {
> > > > +		/* No resources were reclaimed */
> > > > +		rte_errno = EAGAIN;
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	return 0;
> > >
> > > I'd suggest to return cnt on success.
> > I am trying to keep the APIs simple. I do not see much use for 'cnt'
> > as return value to the user. It exposes more details which I think are internal
> to the library.
> 
> Not sure what is the hassle to return number of completed reclamaitions?
> If user doesn't need that information, he simply wouldn't use it.
> But might be it would be usefull - he can decide should he try another attempt
> of reclaim() immediately or is it ok to do something else.
There is no hassle to return that information.

As per the current design, user calls 'reclaim' when it is out of resources while adding an entry to the data structure. At that point the user wants to know if at least 1 resource was reclaimed because the user has to allocate 1 resource. He does not have a use for the number of resources reclaimed.

If this API returns 0, then the user can decide to repeat the call or return failure. But that decision depends on the length of the grace period which is under user's control.

> 
> >
> > >
> > > > +}
> > > > +
> > > > +/* Delete a defer queue. */
> > > > +int
> > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > > +	if (dq == NULL) {
> > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > +		rte_errno = EINVAL;
> > > > +
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	/* Reclaim all the resources */
> > > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > > +		/* Error number is already set by the reclaim API */
> > > > +		return 1;
> > >
> > > How do you know that you have reclaimed everything?
> > Good point, will come back with a different solution.
> >
> > >
> > > > +
> > > > +	rte_ring_free(dq->r);
> > > > +	rte_free(dq);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  int rte_rcu_log_type;
> > > >
> > > >  RTE_INIT(rte_rcu_register)
> > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > > @@ -34,6 +34,7 @@ extern "C" {
> > > >  #include <rte_lcore.h>
> > > >  #include <rte_debug.h>
> > > >  #include <rte_atomic.h>
> > > > +#include <rte_ring.h>
> > > >
> > > >  extern int rte_rcu_log_type;
> > > >
> > > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > > >  	 */
> > > >  } __rte_cache_aligned;
> > > >
> > > > +/**
> > > > + * Call back function called to free the resources.
> > > > + *
> > > > + * @param p
> > > > + *   Pointer provided while creating the defer queue
> > > > + * @param e
> > > > + *   Pointer to the resource data stored on the defer queue
> > > > + *
> > > > + * @return
> > > > + *   None
> > > > + */
> > > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > >
> > > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > > Though I am not sure you need a new typedef at all - just a function
> > > pointer inside the struct seems enough.
> > Other libraries (for ex: rte_hash) use this approach. I think it is better to keep
> it out of the structure to allow for better commenting.
> 
> I am saying majority of DPDK code use _t suffix for typedef:
> typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
Apologies, got it, will change.

> 
> >
> > >
> > > > +
> > > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > > +
> > > > +/**
> > > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > > + */
> > > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > > +
> > > > +/**
> > > > + *  Reclaim at the max 1/16th the total number of resources.
> > > > + */
> > > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > >
> > >
> > > As I said above, I don't think these thresholds need to be hardcoded.
> > > In any case, there seems not much point to put them in the public header
> file.
> > >
> > > > +
> > > > +/**
> > > > + * Parameters used when creating the defer queue.
> > > > + */
> > > > +struct rte_rcu_qsbr_dq_parameters {
> > > > +	const char *name;
> > > > +	/**< Name of the queue. */
> > > > +	uint32_t size;
> > > > +	/**< Number of entries in queue. Typically, this will be
> > > > +	 *   the same as the maximum number of entries supported in the
> > > > +	 *   lock free data structure.
> > > > +	 *   Data structures with unbounded number of entries is not
> > > > +	 *   supported currently.
> > > > +	 */
> > > > +	uint32_t esize;
> > > > +	/**< Size (in bytes) of each element in the defer queue.
> > > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > > +	 *   support 8B element sizes only.
> > > > +	 */
> > > > +	rte_rcu_qsbr_free_resource f;
> > > > +	/**< Function to call to free the resource. */
> > > > +	void *p;
> > >
> > > Style nit again - I like short names myself, but that seems a bit
> > > extreme... :) Might be at least:
> > > void (*reclaim)(void *, void *);
> > May be 'free_fn'?
> >
> > > void * reclaim_data;
> > > ?
> > This is the pointer to the data structure to free the resource into. For ex: In
> LPM data structure, it will be pointer to LPM. 'reclaim_data'
> > does not convey the meaning correctly.
> 
> Ok, please free to comeup with your own names.
> I just wanted to say that 'f' and 'p' are a bit an extreme for public API.
ok, this is the hardest thing to do 😊

> 
> >
> > >
> > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > +	 *   pointer to the data structure to which the resource to free
> > > > +	 *   belongs. This can be NULL.
> > > > +	 */
> > > > +	struct rte_rcu_qsbr *v;
> > >
> > > Does it need to be inside that struct?
> > > Might be better:
> > > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > > rte_rcu_qsbr_dq_parameters *params);
> > The API takes a parameter structure as input anyway, why to add
> > another argument to the function? QSBR variable is also another parameter.
> >
> > >
> > > Another alternative: make both reclaim() and enqueue() to take v as
> > > a parameter.
> > But both of them need access to some of the parameters provided in
> > rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to the
> functions.
> 
> Pure stylish thing.
> From my perspective it just provides better visibility what is going in the code:
> For QSBR var 'v' create a new deferred queue.
> But no strong opinion here.
> 
> >
> > >
> > > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > > +
> > > > +/* RTE defer queue structure.
> > > > + * This structure holds the defer queue. The defer queue is used
> > > > +to
> > > > + * hold the deleted entries from the data structure that are not
> > > > + * yet freed.
> > > > + */
> > > > +struct rte_rcu_qsbr_dq;
> > > > +
> > > >  /**
> > > >   * @warning
> > > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE
> > > > *f, struct rte_rcu_qsbr *v);
> > > >
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Create a queue used to store the data structure elements that
> > > > +can
> > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > + *
> > > > + * @param params
> > > > + *   Parameters to create a defer queue.
> > > > + * @return
> > > > + *   On success - Valid pointer to defer queue
> > > > + *   On error - NULL
> > > > + *   Possible rte_errno codes are:
> > > > + *   - EINVAL - NULL parameters are passed
> > > > + *   - ENOMEM - Not enough memory
> > > > + */
> > > > +__rte_experimental
> > > > +struct rte_rcu_qsbr_dq *
> > > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > > +*params);
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Enqueue one resource to the defer queue and start the grace period.
> > > > + * The resource will be freed later after at least one grace
> > > > +period
> > > > + * is over.
> > > > + *
> > > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > > + * It will also reclaim resources at regular intervals to avoid
> > > > + * the defer queue from growing too big.
> > > > + *
> > > > + * This API is not multi-thread safe. It is expected that the
> > > > +caller
> > > > + * provides multi-thread safety by locking a mutex or some other means.
> > > > + *
> > > > + * A lock free multi-thread writer algorithm could achieve
> > > > +multi-thread
> > > > + * safety by creating and using one defer queue per thread.
> > > > + *
> > > > + * @param dq
> > > > + *   Defer queue to allocate an entry from.
> > > > + * @param e
> > > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > > + *   the data to copy is equal to the element size provided when the
> > > > + *   defer queue was created.
> > > > + * @return
> > > > + *   On success - 0
> > > > + *   On error - 1 with rte_errno set to
> > > > + *   - EINVAL - NULL parameters are passed
> > > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > > + *		if the defer queue size is equal (or larger) than the
> > > > + *		number of elements in the data structure.
> > > > + */
> > > > +__rte_experimental
> > > > +int
> > > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Reclaim resources from the defer queue.
> > > > + *
> > > > + * This API is not multi-thread safe. It is expected that the
> > > > +caller
> > > > + * provides multi-thread safety by locking a mutex or some other means.
> > > > + *
> > > > + * A lock free multi-thread writer algorithm could achieve
> > > > +multi-thread
> > > > + * safety by creating and using one defer queue per thread.
> > > > + *
> > > > + * @param dq
> > > > + *   Defer queue to reclaim an entry from.
> > > > + * @return
> > > > + *   On successful reclamation of at least 1 resource - 0
> > > > + *   On error - 1 with rte_errno set to
> > > > + *   - EINVAL - NULL parameters are passed
> > > > + *   - EAGAIN - None of the resources have completed at least 1 grace
> > > period,
> > > > + *		try again.
> > > > + */
> > > > +__rte_experimental
> > > > +int
> > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > + *
> > > > + * Delete a defer queue.
> > > > + *
> > > > + * It tries to reclaim all the resources on the defer queue.
> > > > + * If any of the resources have not completed the grace period
> > > > + * the reclamation stops and returns immediately. The rest of
> > > > + * the resources are not reclaimed and the defer queue is not
> > > > + * freed.
> > > > + *
> > > > + * @param dq
> > > > + *   Defer queue to delete.
> > > > + * @return
> > > > + *   On success - 0
> > > > + *   On error - 1
> > > > + *   Possible rte_errno codes are:
> > > > + *   - EINVAL - NULL parameters are passed
> > > > + *   - EAGAIN - Some of the resources have not completed at least 1
> grace
> > > > + *		period, try again.
> > > > + */
> > > > +__rte_experimental
> > > > +int
> > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > new file mode 100644
> > > > index 000000000..2122bc36a
> > > > --- /dev/null
> > > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > >
> > > Again style suggestion: as it is not public header - don't use rte_
> > > prefix for naming.
> > > From my perspective - easier to relalize for reader what is public
> > > header, what is not.
> > Looks like the guidelines are not defined very well. I see one private
> > file with rte_ prefix. I see Stephen not using rte_ prefix. I do not have any
> preference. But, a consistent approach is required.
> 
> That's just a suggestion.
> For me (and I hope for others) it would be a bit easier.
> When looking at the code for first time I had to look a t meson.build to check is
> it a public header or not.
> If the file doesn't have 'rte_' prefix, I assume that it is an internal one
> straightway.
> But , as you said, there is no exact guidelines here, so up to you to decide.
I think it makes sense to remove 'rte_' prefix. I will also change the file name to have '_private' suffix.
There are some inconsistencies in the existing code, will send a patch to correct them to follow this approach.

> 
> >
> > >
> > > > @@ -0,0 +1,46 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright (c) 2019 Arm Limited  */
> > > > +
> > > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > > +#define _RTE_RCU_QSBR_PVT_H_
> > > > +
> > > > +/**
> > > > + * This file is private to the RCU library. It should not be
> > > > +included
> > > > + * by the user of this library.
> > > > + */
> > > > +
> > > > +#ifdef __cplusplus
> > > > +extern "C" {
> > > > +#endif
> > > > +
> > > > +#include "rte_rcu_qsbr.h"
> > > > +
> > > > +/* RTE defer queue structure.
> > > > + * This structure holds the defer queue. The defer queue is used
> > > > +to
> > > > + * hold the deleted entries from the data structure that are not
> > > > + * yet freed.
> > > > + */
> > > > +struct rte_rcu_qsbr_dq {
> > > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > > +	uint32_t size;
> > > > +	/**< Number of elements in the defer queue */
> > > > +	uint32_t esize;
> > > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > > +	rte_rcu_qsbr_free_resource f;
> > > > +	/**< Function to call to free the resource. */
> > > > +	void *p;
> > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > +	 *   pointer to the data structure to which the resource to free
> > > > +	 *   belongs.
> > > > +	 */
> > > > +	char e[0];
> > > > +	/**< Temporary storage to copy the defer queue element. */
> > >
> > > Do you really need 'e' at all?
> > > Can't it be just temporary stack variable?
> > Ok, will check.
> >
> > >
> > > > +};
> > > > +
> > > > +#ifdef __cplusplus
> > > > +}
> > > > +#endif
> > > > +
> > > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > > b/lib/librte_rcu/rte_rcu_version.map
> > > > index f8b9ef2ab..dfac88a37 100644
> > > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > > >  	rte_rcu_qsbr_synchronize;
> > > >  	rte_rcu_qsbr_thread_register;
> > > >  	rte_rcu_qsbr_thread_unregister;
> > > > +	rte_rcu_qsbr_dq_create;
> > > > +	rte_rcu_qsbr_dq_enqueue;
> > > > +	rte_rcu_qsbr_dq_reclaim;
> > > > +	rte_rcu_qsbr_dq_delete;
> > > >
> > > >  	local: *;
> > > >  };
> > > > diff --git a/lib/meson.build b/lib/meson.build index
> > > > e5ff83893..0e1be8407 100644
> > > > --- a/lib/meson.build
> > > > +++ b/lib/meson.build
> > > > @@ -11,7 +11,9 @@
> > > >  libraries = [
> > > >  	'kvargs', # eal depends on kvargs
> > > >  	'eal', # everything depends on eal
> > > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > +	'ring',
> > > > +	'rcu', # rcu depends on ring
> > > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > >  	'cmdline',
> > > >  	'metrics', # bitrate/latency stats depends on this
> > > >  	'hash',    # efd depends on this
> > > > @@ -22,7 +24,7 @@ libraries = [
> > > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > > >  	'kni', 'latencystats', 'lpm', 'member',
> > > >  	'power', 'pdump', 'rawdev',
> > > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > > >  	# ipsec lib depends on net, crypto and security
> > > >  	'ipsec',
> > > >  	# add pkt framework libs which use other libs from above
> > > > --
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
@ 2019-10-04 16:05       ` Medvedkin, Vladimir
  2019-10-09  3:48         ` Honnappa Nagarahalli
  2019-10-07  9:21       ` Ananyev, Konstantin
  1 sibling, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-10-04 16:05 UTC (permalink / raw)
  To: Honnappa Nagarahalli, bruce.richardson, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck, Gavin.Hu,
	Dharmik.Thakkar, Ruifeng.Wang, nd

Hi Honnappa,

On 01/10/2019 19:28, Honnappa Nagarahalli wrote:
> From: Ruifeng Wang <ruifeng.wang@arm.com>
>
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
>
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>   lib/librte_lpm/Makefile            |   3 +-
>   lib/librte_lpm/meson.build         |   2 +
>   lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
>   lib/librte_lpm/rte_lpm.h           |  21 ++++++
>   lib/librte_lpm/rte_lpm_version.map |   6 ++
>   5 files changed, 122 insertions(+), 12 deletions(-)
>
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index a7946a1c5..ca9e16312 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
>   # library name
>   LIB = librte_lpm.a
>   
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>   CFLAGS += -O3
>   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
>   
>   EXPORT_MAP := rte_lpm_version.map
>   
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index a5176d8ae..19a35107f 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -2,9 +2,11 @@
>   # Copyright(c) 2017 Intel Corporation
>   
>   version = 2
> +allow_experimental_apis = true
>   sources = files('rte_lpm.c', 'rte_lpm6.c')
>   headers = files('rte_lpm.h', 'rte_lpm6.h')
>   # since header files have different names, we can install all vector headers
>   # without worrying about which architecture we actually need
>   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>   deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 3a929a1b1..ca58d4b35 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>    */
>   
>   #include <string.h>
> @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
>   
>   	rte_mcfg_tailq_write_unlock();
>   
> +	if (lpm->dq)
> +		rte_rcu_qsbr_dq_delete(lpm->dq);
>   	rte_free(lpm->tbl8);
>   	rte_free(lpm->rules_tbl);
>   	rte_free(lpm);
> @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
>   MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
>   		rte_lpm_free_v1604);
As a general comment, are you going to add rcu support to the legacy _v20 ?
>   
> +struct __rte_lpm_rcu_dq_entry {
> +	uint32_t tbl8_group_index;
> +	uint32_t pad;
> +};

Is this struct necessary? I mean in tbl8_free_v1604() you can pass 
tbl8_group_index as a pointer without "e.pad = 0;".

And what about 32bit environment?

> +
> +static void
> +__lpm_rcu_qsbr_free_resource(void *p, void *data)
> +{
> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	struct __rte_lpm_rcu_dq_entry *e =
> +			(struct __rte_lpm_rcu_dq_entry *)data;
> +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> +
> +	/* Set tbl8 group invalid */
> +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> +		__ATOMIC_RELAXED);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
> +{
> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +
> +	if ((lpm == NULL) || (v == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->dq) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	/* Init QSBR defer queue. */
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm->name);

Consider moving this logic into rte_rcu_qsbr_dq_create(). I think there 
you could prefix the name with just RCU_ . So it would be possible to 
move include <rte_ring.h> into the rte_rcu_qsbr.c from rte_rcu_qsbr.h 
and get rid of RTE_RCU_QSBR_DQ_NAMESIZE macro in rte_rcu_qsbr.h file.

> +	params.name = rcu_dq_name;
> +	params.size = lpm->number_tbl8s;
> +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> +	params.f = __lpm_rcu_qsbr_free_resource;
> +	params.p = lpm->tbl8;
> +	params.v = v;
> +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> +	if (lpm->dq == NULL) {
> +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
>   /*
>    * Adds a rule to the rule table.
>    *
> @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
>   }
>   
>   static int32_t
> -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +__tbl8_alloc_v1604(struct rte_lpm *lpm)
>   {
>   	uint32_t group_idx; /* tbl8 group index. */
>   	struct rte_lpm_tbl_entry *tbl8_entry;
>   
>   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>   		/* If a free tbl8 group is found clean it and set as VALID. */
>   		if (!tbl8_entry->valid_group) {
>   			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -712,6 +769,21 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>   	return -ENOSPC;
>   }
>   
> +static int32_t
> +tbl8_alloc_v1604(struct rte_lpm *lpm)
> +{
> +	int32_t group_idx; /* tbl8 group index. */
> +
> +	group_idx = __tbl8_alloc_v1604(lpm);
> +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> +		/* If there are no tbl8 groups try to reclaim some. */
> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> +			group_idx = __tbl8_alloc_v1604(lpm);
> +	}
> +
> +	return group_idx;
> +}
> +
>   static void
>   tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>   {
> @@ -728,13 +800,21 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>   }
>   
>   static void
> -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>   {
> -	/* Set tbl8 group invalid*/
>   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	struct __rte_lpm_rcu_dq_entry e;
>   
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (lpm->dq != NULL) {
> +		e.tbl8_group_index = tbl8_group_start;
> +		e.pad = 0;
> +		/* Push into QSBR defer queue. */
> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> +	} else {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	}
>   }
>   
>   static __rte_noinline int32_t
> @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   
>   	if (!lpm->tbl24[tbl24_index].valid) {
>   		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
>   
>   		/* Check tbl8 allocation was successful. */
>   		if (tbl8_group_index < 0) {
> @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   	} /* If valid entry but not extended calculate the index into Table8. */
>   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>   		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
>   
>   		if (tbl8_group_index < 0) {
>   			return tbl8_group_index;
> @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   		 */
>   		lpm->tbl24[tbl24_index].valid = 0;
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>   	} else if (tbl8_recycle_index > -1) {
>   		/* Update tbl24 entry. */
>   		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -1834,7 +1914,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>   		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>   				__ATOMIC_RELAXED);
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>   	}
>   #undef group_idx
>   	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index 906ec4483..49c12a68d 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>    */
>   
>   #ifndef _RTE_LPM_H_
> @@ -21,6 +22,7 @@
>   #include <rte_common.h>
>   #include <rte_vect.h>
>   #include <rte_compat.h>
> +#include <rte_rcu_qsbr.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -186,6 +188,7 @@ struct rte_lpm {
>   			__rte_cache_aligned; /**< LPM tbl24 table. */
>   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
>   };
>   
>   /**
> @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
>   void
>   rte_lpm_free_v1604(struct rte_lpm *lpm);
>   
> +/**
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param v
> + *   RCU QSBR variable
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
> +
>   /**
>    * Add a rule to the LPM table.
>    *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 90beac853..b353aabd2 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -44,3 +44,9 @@ DPDK_17.05 {
>   	rte_lpm6_lookup_bulk_func;
>   
>   } DPDK_16.04;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
  2019-10-02 17:39       ` Ananyev, Konstantin
  2019-10-02 18:50       ` Ananyev, Konstantin
@ 2019-10-04 19:01       ` Medvedkin, Vladimir
  2019-10-07 13:11       ` Medvedkin, Vladimir
  3 siblings, 0 replies; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-10-04 19:01 UTC (permalink / raw)
  To: Honnappa Nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, ruifeng.wang, dharmik.thakkar, dev, nd

Hi Honnappa,

On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
> Add resource reclamation APIs to make it simple for applications
> and libraries to integrate rte_rcu library.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
>   lib/librte_rcu/meson.build         |   2 +
>   lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
>   lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
>   lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
>   lib/librte_rcu/rte_rcu_version.map |   4 +
>   lib/meson.build                    |   6 +-
>   7 files changed, 700 insertions(+), 3 deletions(-)
>   create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
There are compilation errors when building DPDK as a shared library.

I think you need something like:

--- a/lib/librte_rcu/Makefile
+++ b/lib/librte_rcu/Makefile
@@ -8,7 +8,7 @@ LIB = librte_rcu.a

  CFLAGS += -DALLOW_EXPERIMENTAL_API
  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
-LDLIBS += -lrte_eal
+LDLIBS += -lrte_eal -lrte_ring
>
> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
> index d1b9e46a2..3a6815243 100644
> --- a/app/test/test_rcu_qsbr.c
> +++ b/app/test/test_rcu_qsbr.c
I think it's better to split unittests patches and the library patches
> @@ -1,8 +1,9 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright (c) 2018 Arm Limited
> + * Copyright (c) 2019 Arm Limited
>    */
>   
>   #include <stdio.h>
> +#include <string.h>
>   #include <rte_pause.h>
>   #include <rte_rcu_qsbr.h>
>   #include <rte_hash.h>
> @@ -33,6 +34,7 @@ static uint32_t *keys;
>   #define COUNTER_VALUE 4096
>   static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
>   static uint8_t writer_done;
> +static uint8_t cb_failed;
>   
>   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
>   struct rte_hash *h[RTE_MAX_LCORE];
> @@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
>   	return 0;
>   }
>   
> +static void
> +rte_rcu_qsbr_test_free_resource(void *p, void *e)
This function is not a part of DPDK API so it's better to name it like 
test_rcu_qsbr_free_resource().
> +{
> +	if (p != NULL && e != NULL) {
> +		printf("%s: Test failed\n", __func__);
> +		cb_failed = 1;
> +	}
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_create: create a queue used to store the data structure
> + * elements that can be freed later. This queue is referred to as 'defer queue'.
> + */
> +static int
> +test_rcu_qsbr_dq_create(void)
> +{
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
> +
> +	/* Pass invalid parameters */
> +	dq = rte_rcu_qsbr_dq_create(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.size = 1;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.esize = 3;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	/* Pass all valid parameters */
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +	rte_rcu_qsbr_dq_delete(dq);
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> + * to be freed later after atleast one grace period is over.
> + */
> +static int
> +test_rcu_qsbr_dq_enqueue(void)
> +{
> +	int ret;
> +	uint64_t r;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
> +
> +	/* Create a queue with simple parameters */
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	params.size = 1;
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
> + */
> +static int
> +test_rcu_qsbr_dq_reclaim(void)
> +{
> +	int ret;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
> + */
> +static int
> +test_rcu_qsbr_dq_delete(void)
> +{
> +	int ret;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_delete(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid params");
> +
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	params.size = 1;
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> + * to be freed later after atleast one grace period is over.
> + */
> +static int
> +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize)
> +{
> +	int i, j, ret;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint64_t *e;
> +	uint64_t sc = 200;
> +	int max_entries;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
> +	printf("Size = %d, esize = %d\n", size, esize);
> +
> +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
> +	if (e == NULL)
> +		return 0;
> +	cb_failed = 0;
> +
> +	/* Initialize the RCU variable. No threads are registered */
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +
> +	/* Create a queue with simple parameters */
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	params.v = t[0];
> +	params.size = size;
> +	params.esize = esize;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +
> +	/* Given the size and esize, calculate the maximum number of entries
> +	 * that can be stored on the defer queue (look at the logic used
> +	 * in capacity calculation of rte_ring).
> +	 */
> +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
> +	max_entries = (max_entries - 1)/(esize/8 + 1);
> +
> +	/* Enqueue few counters starting with the value 'sc' */
> +	/* The queue size will be rounded up to 2. The enqueue API also
> +	 * reclaims if the queue size is above certain limit. Since, there
> +	 * are no threads registered, reclamation succedes. Hence, it should
> +	 * be possible to enqueue more than the provided queue size.
> +	 */
> +	for (i = 0; i < 10; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Register a thread on the RCU QSBR variable. Reclamation will not
> +	 * succeed. It should not be possible to enqueue more than the size
> +	 * number of resources.
> +	 */
> +	rte_rcu_qsbr_thread_register(t[0], 1);
> +	rte_rcu_qsbr_thread_online(t[0], 1);
> +
> +	for (i = 0; i < max_entries; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Enqueue fails as queue is full */
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
> +
> +	/* Delete should fail as there are elements in defer queue which
> +	 * cannot be reclaimed.
> +	 */
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid params");
> +
> +	/* Report quiescent state, enqueue should succeed */
> +	rte_rcu_qsbr_quiescent(t[0], 1);
> +	for (i = 0; i < max_entries; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Queue is full */
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
> +
> +	/* Report quiescent state, delete should succeed */
> +	rte_rcu_qsbr_quiescent(t[0], 1);
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
> +
> +	/* Validate that call back function did not return any error */
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
> +
> +	rte_free(e);
> +	return 0;
> +}
> +
>   /*
>    * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
>    */
> @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
>   	if (test_rcu_qsbr_thread_offline() < 0)
>   		goto test_fail;
>   
> +	if (test_rcu_qsbr_dq_create() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_reclaim() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_delete() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_enqueue() < 0)
> +		goto test_fail;
> +
>   	printf("\nFunctional tests\n");
>   
>   	if (test_rcu_qsbr_sw_sv_3qs() < 0)
> @@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
>   	if (test_rcu_qsbr_mw_mv_mqs() < 0)
>   		goto test_fail;
>   
> +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
> +		goto test_fail;
> +
>   	free_rcu();
>   
>   	printf("\n");
> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> index 62920ba02..e280b29c1 100644
> --- a/lib/librte_rcu/meson.build
> +++ b/lib/librte_rcu/meson.build
> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
>   if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
>   	ext_deps += cc.find_library('atomic')
>   endif
> +
> +deps += ['ring']
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
> index ce7f93dd3..76814f50b 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> @@ -21,6 +21,7 @@
>   #include <rte_errno.h>
>   
>   #include "rte_rcu_qsbr.h"
> +#include "rte_rcu_qsbr_pvt.h"
>   
>   /* Get the memory size of QSBR variable */
>   size_t
> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
>   	return 0;
>   }
>   
> +/* Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + */
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
> +{
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint32_t qs_fifo_size;
> +
> +	if (params == NULL || params->f == NULL ||
> +		params->v == NULL || params->name == NULL ||
> +		params->size == 0 || params->esize == 0 ||
> +		(params->esize % 8 != 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +
> +	dq = rte_zmalloc(NULL,
> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> +		RTE_CACHE_LINE_SIZE);
> +	if (dq == NULL) {
> +		rte_errno = ENOMEM;
> +
> +		return NULL;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * max_size.
> +	 */
> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> +					* params->size) + 1);
> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);
> +	if (dq->r == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): defer queue create failed\n", __func__);
> +		rte_free(dq);
> +		return NULL;
> +	}
> +
> +	dq->v = params->v;
> +	dq->size = params->size;
> +	dq->esize = params->esize;
> +	dq->f = params->f;
> +	dq->p = params->p;
> +
> +	return dq;
> +}
> +
> +/* Enqueue one resource to the defer queue to free after the grace
> + * period is over.
> + */
> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> +{
> +	uint64_t token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +	uint32_t cur_size, free_size;
> +
> +	if (dq == NULL || e == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Start the grace period */
> +	token = rte_rcu_qsbr_start(dq->v);
> +
> +	/* Reclaim resources if the queue is 1/8th full. This helps
> +	 * the queue from growing too large and allows time for reader
> +	 * threads to report their quiescent state.
> +	 */
> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Triggering reclamation\n", __func__);
> +		rte_rcu_qsbr_dq_reclaim(dq);
> +	}
> +
> +	/* Check if there is space for atleast for 1 resource */
> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> +	if (!free_size) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Defer queue is full\n", __func__);
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	/* Enqueue the resource */
> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> +
> +	/* The resource to enqueue needs to be a multiple of 64b
> +	 * due to the limitation of the rte_ring implementation.
> +	 */
> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> +
> +	return 0;
> +}
> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
> +{
> +	uint32_t max_cnt;
> +	uint32_t cnt;
> +	void *token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Anything to reclaim? */
> +	if (rte_ring_count(dq->r) == 0)
> +		return 0;
> +
> +	/* Reclaim at the max 1/16th the total number of entries. */
> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> +	cnt = 0;
> +
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> +			== 1)) {
> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> +		/* The resource to dequeue needs to be a multiple of 64b
> +		 * due to the limitation of the rte_ring implementation.
> +		 */
> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> +			i++, tmp++)
> +			(void)rte_ring_sc_dequeue(dq->r,
> +					(void *)(uintptr_t)tmp);
> +		dq->f(dq->p, dq->e);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (cnt == 0) {
> +		/* No resources were reclaimed */
> +		rte_errno = EAGAIN;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> +		/* Error number is already set by the reclaim API */
> +		return 1;
Here could be a potential problem. rte_rcu_qsbr_dq_reclai() reclaims 
only max_cnt entries that is 1/16 of possible enqueued entries, so the 
rest won't be reclaimed.
> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>   int rte_rcu_log_type;
>   
>   RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index c80f15c00..185d4b50a 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>   #include <rte_lcore.h>
>   #include <rte_debug.h>
>   #include <rte_atomic.h>
> +#include <rte_ring.h>
I think it's better to move this include into rte_rcu_qsbr.c
>   
>   extern int rte_rcu_log_type;
>   
> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>   	 */
>   } __rte_cache_aligned;
>   
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
I don't see the usage of this macro anywhere in the rcu library (I see 
you are using it in LPM).

char rcu_dq_name[RTE_RING_NAMESIZE];
is using instead in the tests.
+ See my comments for  [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
> +
> +/**
> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> + */
> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> +
> +/**
> + *  Reclaim at the max 1/16th the total number of resources.
> + */
> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
Those two defines could be moved into .c file.
> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 8B as the rte_ring APIs
> +	 *   support 8B element sizes only.
> +	 */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;
> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>   /**
>    * @warning
>    * @b EXPERIMENTAL: this API may change without prior notice
> @@ -648,6 +710,113 @@ __rte_experimental
>   int
>   rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Reclaim resources from the defer queue.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to reclaim an entry from.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - None of the resources have completed at least 1 grace period,
> + *		try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..2122bc36a
> --- /dev/null
> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
Why this struct definition is separated into private .h? Maybe just 
define it in rte_rcu_qsbr.c instead?
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data stored on the defer queue */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +	char e[0];
> +	/**< Temporary storage to copy the defer queue element. */
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>   	rte_rcu_qsbr_synchronize;
>   	rte_rcu_qsbr_thread_register;
>   	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
>   
>   	local: *;
>   };
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..0e1be8407 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>   libraries = [
>   	'kvargs', # eal depends on kvargs
>   	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>   	'cmdline',
>   	'metrics', # bitrate/latency stats depends on this
>   	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>   	'gro', 'gso', 'ip_frag', 'jobstats',
>   	'kni', 'latencystats', 'lpm', 'member',
>   	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>   	# ipsec lib depends on net, crypto and security
>   	'ipsec',
>   	# add pkt framework libs which use other libs from above

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-03 19:49         ` Honnappa Nagarahalli
@ 2019-10-07  9:01           ` Ananyev, Konstantin
  2019-10-09  4:25             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-07  9:01 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd


> 
> > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > >
> > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > >
> > > The peek API allows fetching the next available object in the ring
> > > without dequeuing it. This helps in scenarios where dequeuing of
> > > objects depend on their value.
> > >
> > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > ---
> > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > >  1 file changed, 30 insertions(+)
> > >
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 2a9f768a1..d3d0d5e18 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> > **obj_table,
> > >  				r->cons.single, available);
> > >  }
> > >
> > > +/**
> > > + * Peek one object from a ring.
> > > + *
> > > + * The peek API allows fetching the next available object in the ring
> > > + * without dequeuing it. This API is not multi-thread safe with
> > > +respect
> > > + * to other consumer threads.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_p
> > > + *   A pointer to a void * pointer (object) that will be filled.
> > > + * @return
> > > + *   - 0: Success, object available
> > > + *   - -ENOENT: Not enough entries in the ring.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline int
> > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> >
> > As it is not MT safe, then I think we need _sc_ in the name, to follow other
> > rte_ring functions naming conventions
> > (rte_ring_sc_peek() or so).
> Agree
> 
> >
> > As a better alternative what do you think about introducing a serialized
> > versions of DPDK rte_ring dequeue functions?
> > Something like that:
> >
> > /* same as original ring dequeue, but:
> >   * 1) move cons.head only if cons.head == const.tail
> >   * 2) don't update cons.tail
> >   */
> > unsigned int
> > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned
> > int n,
> >                 unsigned int *available);
> >
> > /* sets both cons.head and cons.tail to cons.head + num */ void
> > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> >
> > /* resets cons.head to const.tail value */ void
> > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> >
> > Then your dq_reclaim cycle function will look like that:
> >
> > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n; uintptr_t
> > elt[nb_elt]; ...
> >
> > do {
> >
> >   /* read next elem from the queue */
> >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> >   if (n == 0)
> >       break;
> >
> >  /* wrong period, keep elem in the queue */  if (rte_rcu_qsbr_check(dr->v,
> > elt[0]) != 1) {
> >      rte_ring_serial_dequeue_abort(dq->r);
> >      break;
> >   }
> >
> >   /* can reclaim, remove elem from the queue */
> >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> >
> >    /*call reclaim function */
> >   dr->f(dr->p, elt);
> >
> > } while (avl >= nb_elt);
> >
> > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > As long as actual reclamation callback itself is MT safe of course.
> 
> I think it is a great idea. The other writers would still be polling for the current writer to update the tail or update the head. This makes it a
> blocking solution.

Yep, it is a blocking one.

> We can make the other threads not poll i.e. they will quit reclaiming if they see that other writers are dequeuing from the queue. 

Actually didn't think about that possibility, but yes should be possible to have _try_ semantics too. 

>The other  way is to use per thread queues.
> 
> The other requirement I see is to support unbounded-size data structures where in the data structures do not have a pre-determined
> number of entries. Also, currently the defer queue size is equal to the total number of entries in a given data structure. There are plans to
> support dynamically resizable defer queue. This means, memory allocation which will affect the lock-free-ness of the solution.
> 
> So, IMO:
> 1) The API should provide the capability to support different algorithms - may be through some flags?
> 2) The requirements for the ring are pretty unique to the problem we have here (for ex: move the cons-head only if cons-tail is also the
> same, skip polling). So, we should probably implement a ring with-in the RCU library?

Personally, I think such serialization ring API would be useful for other cases too.
There are few cases when user need to read contents of the queue without removing elements from it.
Let say we do use similar approach inside TLDK to implement TCP transmit queue.
If such API would exist in DPDK we can just use it straightway, without maintaining a separate one.

> 
> From the timeline perspective, adding all these capabilities would be difficult to get done with in 19.11 timeline. What I have here satisfies
> my current needs. I suggest that we make provisions in APIs now to support all these features, but do the implementation in the coming
> releases. Does this sound ok for you?

Not sure I understand your suggestion here...
Could you explain it a bit more - how new API will look like and what would be left for the future. 

> 
> >
> > > +{
> > > +	uint32_t prod_tail = r->prod.tail;
> > > +	uint32_t cons_head = r->cons.head;
> > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > +	unsigned int n = 1;
> > > +	if (count) {
> > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > +		return 0;
> > > +	}
> > > +	return -ENOENT;
> > > +}
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
  2019-10-04 16:05       ` Medvedkin, Vladimir
@ 2019-10-07  9:21       ` Ananyev, Konstantin
  2019-10-13  4:36         ` Honnappa Nagarahalli
  1 sibling, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-07  9:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce, Medvedkin, Vladimir,
	olivier.matz
  Cc: dev, stephen, paulmck, Gavin.Hu, Dharmik.Thakkar, Ruifeng.Wang,
	nd, Ruifeng Wang

Hi guys,

> 
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  lib/librte_lpm/Makefile            |   3 +-
>  lib/librte_lpm/meson.build         |   2 +
>  lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
>  lib/librte_lpm/rte_lpm.h           |  21 ++++++
>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>  5 files changed, 122 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index a7946a1c5..ca9e16312 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_lpm.a
> 
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> 
>  EXPORT_MAP := rte_lpm_version.map
> 
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index a5176d8ae..19a35107f 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -2,9 +2,11 @@
>  # Copyright(c) 2017 Intel Corporation
> 
>  version = 2
> +allow_experimental_apis = true
>  sources = files('rte_lpm.c', 'rte_lpm6.c')
>  headers = files('rte_lpm.h', 'rte_lpm6.h')
>  # since header files have different names, we can install all vector headers
>  # without worrying about which architecture we actually need
>  headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>  deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 3a929a1b1..ca58d4b35 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #include <string.h>
> @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> 
>  	rte_mcfg_tailq_write_unlock();
> 
> +	if (lpm->dq)
> +		rte_rcu_qsbr_dq_delete(lpm->dq);
>  	rte_free(lpm->tbl8);
>  	rte_free(lpm->rules_tbl);
>  	rte_free(lpm);
> @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
>  MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
>  		rte_lpm_free_v1604);
> 
> +struct __rte_lpm_rcu_dq_entry {
> +	uint32_t tbl8_group_index;
> +	uint32_t pad;
> +};
> +
> +static void
> +__lpm_rcu_qsbr_free_resource(void *p, void *data)
> +{
> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	struct __rte_lpm_rcu_dq_entry *e =
> +			(struct __rte_lpm_rcu_dq_entry *)data;
> +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> +
> +	/* Set tbl8 group invalid */
> +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> +		__ATOMIC_RELAXED);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v)
> +{
> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +
> +	if ((lpm == NULL) || (v == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->dq) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	/* Init QSBR defer queue. */
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm->name);
> +	params.name = rcu_dq_name;
> +	params.size = lpm->number_tbl8s;
> +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> +	params.f = __lpm_rcu_qsbr_free_resource;
> +	params.p = lpm->tbl8;
> +	params.v = v;
> +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> +	if (lpm->dq == NULL) {
> +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> +		return 1;
> +	}

Few thoughts about that function:
It names rcu_qsbr_add() but in fact it allocates defer queue for give rcu var.
So first thought - is it always necessary?
For some use-cases I suppose user might be ok to wait for quiescent state change
inside tbl8_free()?     
Another thing you do allocate defer queue, but it is internal, so user can't call
reclaim() manually, which looks strange.
Why not to return defer_queue pointer to the user, so he can call reclaim() himself
at appropriate time?
Third thing - you always allocate defer queue with size equal to number of tbl8.
Though I understand it could be up to 16M tbl8 groups inside the LPM.
Do we really need defer queue that long?
Especially  considering that current rcu_defer_queue will start reclamation when 1/8
of defer_quueue becomes full and wouldn't reclaim more then 1/16 of it.
Probably better to let user to decide himself how long defer_queue he needs for that LPM?

Konstantin


> +
> +	return 0;
> +}
> +
>  /*
>   * Adds a rule to the rule table.
>   *
> @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8)
>  }
> 
>  static int32_t
> -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +__tbl8_alloc_v1604(struct rte_lpm *lpm)
>  {
>  	uint32_t group_idx; /* tbl8 group index. */
>  	struct rte_lpm_tbl_entry *tbl8_entry;
> 
>  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>  		/* If a free tbl8 group is found clean it and set as VALID. */
>  		if (!tbl8_entry->valid_group) {
>  			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -712,6 +769,21 @@ tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>  	return -ENOSPC;
>  }
> 
> +static int32_t
> +tbl8_alloc_v1604(struct rte_lpm *lpm)
> +{
> +	int32_t group_idx; /* tbl8 group index. */
> +
> +	group_idx = __tbl8_alloc_v1604(lpm);
> +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> +		/* If there are no tbl8 groups try to reclaim some. */
> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> +			group_idx = __tbl8_alloc_v1604(lpm);
> +	}
> +
> +	return group_idx;
> +}
> +
>  static void
>  tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>  {
> @@ -728,13 +800,21 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)
>  }
> 
>  static void
> -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>  {
> -	/* Set tbl8 group invalid*/
>  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	struct __rte_lpm_rcu_dq_entry e;
> 
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (lpm->dq != NULL) {
> +		e.tbl8_group_index = tbl8_group_start;
> +		e.pad = 0;
> +		/* Push into QSBR defer queue. */
> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> +	} else {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	}
>  }
> 
>  static __rte_noinline int32_t
> @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
> 
>  	if (!lpm->tbl24[tbl24_index].valid) {
>  		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> 
>  		/* Check tbl8 allocation was successful. */
>  		if (tbl8_group_index < 0) {
> @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>  	} /* If valid entry but not extended calculate the index into Table8. */
>  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>  		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> 
>  		if (tbl8_group_index < 0) {
>  			return tbl8_group_index;
> @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>  		 */
>  		lpm->tbl24[tbl24_index].valid = 0;
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>  	} else if (tbl8_recycle_index > -1) {
>  		/* Update tbl24 entry. */
>  		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -1834,7 +1914,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
>  		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>  				__ATOMIC_RELAXED);
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> +		tbl8_free_v1604(lpm, tbl8_group_start);
>  	}
>  #undef group_idx
>  	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index 906ec4483..49c12a68d 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #ifndef _RTE_LPM_H_
> @@ -21,6 +22,7 @@
>  #include <rte_common.h>
>  #include <rte_vect.h>
>  #include <rte_compat.h>
> +#include <rte_rcu_qsbr.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -186,6 +188,7 @@ struct rte_lpm {
>  			__rte_cache_aligned; /**< LPM tbl24 table. */
>  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
>  };
> 
>  /**
> @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
>  void
>  rte_lpm_free_v1604(struct rte_lpm *lpm);
> 
> +/**
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param v
> + *   RCU QSBR variable
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v);
> +
>  /**
>   * Add a rule to the LPM table.
>   *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 90beac853..b353aabd2 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -44,3 +44,9 @@ DPDK_17.05 {
>  	rte_lpm6_lookup_bulk_func;
> 
>  } DPDK_16.04;
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-04  6:07             ` Honnappa Nagarahalli
@ 2019-10-07 10:46               ` Ananyev, Konstantin
  2019-10-13  4:35                 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-07 10:46 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd, nd



> > > > > Add resource reclamation APIs to make it simple for applications
> > > > > and libraries to integrate rte_rcu library.
> > > > >
> > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > ---
> > > > >  app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> > > > >  lib/librte_rcu/meson.build         |   2 +
> > > > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > > > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > > > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > > > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > > > >  lib/meson.build                    |   6 +-
> > > > >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > > > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > >
> > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > > > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > @@ -21,6 +21,7 @@
> > > > >  #include <rte_errno.h>
> > > > >
> > > > >  #include "rte_rcu_qsbr.h"
> > > > > +#include "rte_rcu_qsbr_pvt.h"
> > > > >
> > > > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6
> > > > > +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > +/* Create a queue used to store the data structure elements that
> > > > > +can
> > > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > > + */
> > > > > +struct rte_rcu_qsbr_dq *
> > > > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > > > +*params) {
> > > > > +	struct rte_rcu_qsbr_dq *dq;
> > > > > +	uint32_t qs_fifo_size;
> > > > > +
> > > > > +	if (params == NULL || params->f == NULL ||
> > > > > +		params->v == NULL || params->name == NULL ||
> > > > > +		params->size == 0 || params->esize == 0 ||
> > > > > +		(params->esize % 8 != 0)) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > +		rte_errno = EINVAL;
> > > > > +
> > > > > +		return NULL;
> > > > > +	}
> > > > > +
> > > > > +	dq = rte_zmalloc(NULL,
> > > > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > > > +		RTE_CACHE_LINE_SIZE);
> > > > > +	if (dq == NULL) {
> > > > > +		rte_errno = ENOMEM;
> > > > > +
> > > > > +		return NULL;
> > > > > +	}
> > > > > +
> > > > > +	/* round up qs_fifo_size to next power of two that is not less than
> > > > > +	 * max_size.
> > > > > +	 */
> > > > > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > > > > +					* params->size) + 1);
> > > > > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > > > > +					SOCKET_ID_ANY, 0);
> > > >
> > > > If it is going to be not MT safe, then why not to create the ring
> > > > with (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> > > Agree.
> > >
> > > > Though I think it could be changed to allow MT safe multiple
> > > > enqeue/single dequeue, see below.
> > > The MT safe issue is due to reclaim code. The reclaim code has the following
> > sequence:
> > >
> > > rte_ring_peek
> > > rte_rcu_qsbr_check
> > > rte_ring_dequeue
> > >
> > > This entire sequence needs to be atomic as the entry cannot be dequeued
> > without knowing that the grace period for that entry is over.
> >
> > I understand that, though I believe at least it should be possible to support
> > multiple-enqueue/single dequeuer and reclaim mode.
> > With serialized dequeue() even multiple dequeue should be possible.
> Agreed. Please see the response on the other thread.
> 
> >
> > > Note that due to optimizations in rte_rcu_qsbr_check API, this
> > > sequence should not be large in most cases. I do not have ideas on how to
> > make this sequence lock-free.
> > >
> > > If the writer is on the control plane, most use cases will use mutex
> > > locks for synchronization if they are multi-threaded. That lock should be
> > enough to provide the thread safety for these APIs.
> >
> > In that is case, why do we need ring at all?
> > For sure people can create their own queue quite easily with mutex and TAILQ.
> > If performance is not an issue, they can even add pthread_cond to it, and have
> > an ability for the consumer to sleep/wakeup on empty/full queue.
> >
> > >
> > > If the writer is multi-threaded and lock-free, then one should use per thread
> > defer queue.
> >
> > If that's the only working model, then the question is why do we need that API
> > at all?
> > Just simple array with counter or linked-list should do for majority of cases.
> Please see the other thread.
> 
> >
> > >
> > > >
> > > > > +	if (dq->r == NULL) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): defer queue create failed\n", __func__);
> > > > > +		rte_free(dq);
> > > > > +		return NULL;
> > > > > +	}
> > > > > +
> > > > > +	dq->v = params->v;
> > > > > +	dq->size = params->size;
> > > > > +	dq->esize = params->esize;
> > > > > +	dq->f = params->f;
> > > > > +	dq->p = params->p;
> > > > > +
> > > > > +	return dq;
> > > > > +}
> > > > > +
> > > > > +/* Enqueue one resource to the defer queue to free after the
> > > > > +grace
> > > > > + * period is over.
> > > > > + */
> > > > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > > > > +	uint64_t token;
> > > > > +	uint64_t *tmp;
> > > > > +	uint32_t i;
> > > > > +	uint32_t cur_size, free_size;
> > > > > +
> > > > > +	if (dq == NULL || e == NULL) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > +		rte_errno = EINVAL;
> > > > > +
> > > > > +		return 1;
> > > >
> > > > Why just not to return -EINVAL straightway?
> > > > I think there is no much point to set rte_errno in that function at
> > > > all, just return value should do.
> > > I am trying to keep these consistent with the existing APIs. They return 0 or 1
> > and set the rte_errno.
> >
> > A lot of public DPDK API functions do use return value to return status code (0,
> > or some positive numbers of success, negative errno values on failure), I am
> > not inventing anything new here.
> Agree, you are not proposing a new thing here. May be I was not clear. I really do not have an opinion on how this should be done. But, I do
> have an opinion on consistency. These new APIs follow what has been done in the existing RCU APIs. I think we have 2 options here.
> 1) Either we change existing RCU APIs to get rid of rte_errno (is it an ABI change?) or
> 2) The new APIs follow what has been done in the existing RCU APIs.
> I want to make sure we are consistent at least within RCU APIs.

But as I can see right now rcu API sets rte_errno only for control-path functions
(get_memsize, init, register, unregister, dump).
All fast-path (inline) function don't set/use it.
So from perspective that is consistent behavior, no?

> 
> >
> > >
> > > >
> > > > > +	}
> > > > > +
> > > > > +	/* Start the grace period */
> > > > > +	token = rte_rcu_qsbr_start(dq->v);
> > > > > +
> > > > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > > > +	 * the queue from growing too large and allows time for reader
> > > > > +	 * threads to report their quiescent state.
> > > > > +	 */
> > > > > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> > > >
> > > > Probably would be a bit easier if you just store in dq->esize (elt
> > > > size + token
> > > > size) / 8.
> > > Agree
> > >
> > > >
> > > > > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> > > >
> > > > Why to make this threshold value hard-coded?
> > > > Why either not to put it into create parameter, or just return a
> > > > special return value, to indicate that threshold is reached?
> > > My thinking was to keep the programming interface easy to use. The
> > > more the parameters, the more painful it is for the user. IMO, the
> > > constants chosen should be good enough for most cases. More advanced
> > users could modify the constants. However, we could make these as part of the
> > parameters, but make them optional for the user. For ex: if they set them to 0,
> > default values can be used.
> > >
> > > > Or even return number of filled/free entroes on success, so caller
> > > > can decide to reclaim or not based on that information on his own?
> > > This means more code on the user side.
> >
> > I personally think it it really wouldn't be that big problem to the user to pass
> > extra parameter to the function.
> I will convert the 2 constants into optional parameters (user can set them to 0 to make the algorithm use default values)
> 
> > Again what if user doesn't want to reclaim() in enqueue() thread at all?
> 'enqueue' has to do reclamation if the defer queue is full. I do not think this is trivial.
> 
> In the current design, reclamation in enqueue is also done on regular basis (automatic triggering of reclamation when the queue reaches
> certain limit) to keep the queue from growing too large. This is required when we implement a dynamically adjusting defer queue. The
> current algorithm keeps the cost of reclamation spread across multiple calls and puts an upper bound on cycles for delete API by reclaiming
> a fixed number of entries.
> 
> This algorithm is proven to work in the LPM integration performance tests at a very low performance over head (~1%). So, I do not know
> why a user would not want to use this. 

Yeh, I looked at LPM implementation and one thing I found strange -
defer_queue is hidden inside LPM struct and all reclamations are done internally.
Yes for sure it allows to defer and group actual reclaim(), which hopefully will lead to better performance.
But why not to allow user to call reclaim() for it directly too?
In that way user might avoid/(minimize) doing reclaim() in LPM write() at all.
And let say do it somewhere later in the same thread (when no other tasks to do),
or even leave it to some other house-keeping thread to do (sort of garbage collector).
Or such mode is not supported/planned?

> The 2 additional parameters should give the user more flexibility.

Ok, let's keep it as config params.
After another though - I think you right, it should be good enough.

> 
> However, if the user wants his own algorithm, he can create one with the base APIs provided.
> 
> >
> > > I think adding these to parameters seems like a better option.
> > >
> > > >
> > > > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > > +			"%s(): Triggering reclamation\n", __func__);
> > > > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > > > +	}
> > > > > +
> > > > > +	/* Check if there is space for atleast for 1 resource */
> > > > > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > > > +	if (!free_size) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Defer queue is full\n", __func__);
> > > > > +		rte_errno = ENOSPC;
> > > > > +		return 1;
> > > > > +	}
> > > > > +
> > > > > +	/* Enqueue the resource */
> > > > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > > > +
> > > > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > > > +	 * due to the limitation of the rte_ring implementation.
> > > > > +	 */
> > > > > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> > > >
> > > >
> > > > That whole construction above looks a bit clumsy and error prone...
> > > > I suppose just:
> > > >
> > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> > > > n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
> > > Yes, bulk enqueue can be used. But note that once the flexible element size
> > ring patch is done, this code will use that.
> >
> > Well, when it will be in the mainline, and it would provide a better way, for sure
> > this code can be updated to use new API (if it is provide some improvements).
> > But as I udenrstand, right now it is not there, while bulk enqueue/dequeue are.
> Apologies, I was not clear. I agree we can go with bulk APIs for now.
> 
> >
> > >
> > > >   return -ENOSPC;
> > > > return free;
> > > >
> > > > That way I think you can have MT-safe version of that function.
> > > Please see the description of MT safe issue above.
> > >
> > > >
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +/* Reclaim resources from the defer queue. */ int
> > > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > > > +	uint32_t max_cnt;
> > > > > +	uint32_t cnt;
> > > > > +	void *token;
> > > > > +	uint64_t *tmp;
> > > > > +	uint32_t i;
> > > > > +
> > > > > +	if (dq == NULL) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > +		rte_errno = EINVAL;
> > > > > +
> > > > > +		return 1;
> > > >
> > > > Same story as above - I think rte_errno is excessive in this function.
> > > > Just return value should be enough.
> > > >
> > > >
> > > > > +	}
> > > > > +
> > > > > +	/* Anything to reclaim? */
> > > > > +	if (rte_ring_count(dq->r) == 0)
> > > > > +		return 0;
> > > >
> > > > Not sure you need that, see below.
> > > >
> > > > > +
> > > > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > > >
> > > > Again why not to make max_cnt a configurable at create() parameter?
> > > I think making this as an optional parameter for creating defer queue is a
> > better option.
> > >
> > > > Or even a parameter for that function?
> > > >
> > > > > +	cnt = 0;
> > > > > +
> > > > > +	/* Check reader threads quiescent state and reclaim resources */
> > > > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > > > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > > > > +			== 1)) {
> > > >
> > > >
> > > > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > > > +		 * due to the limitation of the rte_ring implementation.
> > > > > +		 */
> > > > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > > > +			i++, tmp++)
> > > > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > > > +					(void *)(uintptr_t)tmp);
> > > >
> > > > Again, no need for such constructs with multiple dequeuer I believe.
> > > > Just:
> > > >
> > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> > > > elt[nb_elt]; ...
> > > > n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0)
> > > > {dq->f(dq->p, elt);}
> > > Agree on bulk API use.
> > >
> > > >
> > > > Seems enough.
> > > > Again in that case you can have enqueue/reclaim running in different
> > > > threads simultaneously, plus you don't need dq->e at all.
> > > Will check on dq->e
> > >
> > > >
> > > > > +		dq->f(dq->p, dq->e);
> > > > > +
> > > > > +		cnt++;
> > > > > +	}
> > > > > +
> > > > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > > > +
> > > > > +	if (cnt == 0) {
> > > > > +		/* No resources were reclaimed */
> > > > > +		rte_errno = EAGAIN;
> > > > > +		return 1;
> > > > > +	}
> > > > > +
> > > > > +	return 0;
> > > >
> > > > I'd suggest to return cnt on success.
> > > I am trying to keep the APIs simple. I do not see much use for 'cnt'
> > > as return value to the user. It exposes more details which I think are internal
> > to the library.
> >
> > Not sure what is the hassle to return number of completed reclamaitions?
> > If user doesn't need that information, he simply wouldn't use it.
> > But might be it would be usefull - he can decide should he try another attempt
> > of reclaim() immediately or is it ok to do something else.
> There is no hassle to return that information.
> 
> As per the current design, user calls 'reclaim' when it is out of resources while adding an entry to the data structure. At that point the user
> wants to know if at least 1 resource was reclaimed because the user has to allocate 1 resource. He does not have a use for the number of
> resources reclaimed.

Ok, but why user can't decide to do reclaim in advance, let say when he foresee that he would need a lot of allocations in nearest future?
Or when there is some idle time? Or some combination of these things?
At he would like to free some extra resources in that case to minimize number of reclaims in future peak interval?

> 
> If this API returns 0, then the user can decide to repeat the call or return failure. But that decision depends on the length of the grace period
> which is under user's control.
> 
> >
> > >
> > > >
> > > > > +}
> > > > > +
> > > > > +/* Delete a defer queue. */
> > > > > +int
> > > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > > > +	if (dq == NULL) {
> > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > +		rte_errno = EINVAL;
> > > > > +
> > > > > +		return 1;
> > > > > +	}
> > > > > +
> > > > > +	/* Reclaim all the resources */
> > > > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > > > +		/* Error number is already set by the reclaim API */
> > > > > +		return 1;
> > > >
> > > > How do you know that you have reclaimed everything?
> > > Good point, will come back with a different solution.
> > >
> > > >
> > > > > +
> > > > > +	rte_ring_free(dq->r);
> > > > > +	rte_free(dq);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >  int rte_rcu_log_type;
> > > > >
> > > > >  RTE_INIT(rte_rcu_register)
> > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > @@ -34,6 +34,7 @@ extern "C" {
> > > > >  #include <rte_lcore.h>
> > > > >  #include <rte_debug.h>
> > > > >  #include <rte_atomic.h>
> > > > > +#include <rte_ring.h>
> > > > >
> > > > >  extern int rte_rcu_log_type;
> > > > >
> > > > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > > > >  	 */
> > > > >  } __rte_cache_aligned;
> > > > >
> > > > > +/**
> > > > > + * Call back function called to free the resources.
> > > > > + *
> > > > > + * @param p
> > > > > + *   Pointer provided while creating the defer queue
> > > > > + * @param e
> > > > > + *   Pointer to the resource data stored on the defer queue
> > > > > + *
> > > > > + * @return
> > > > > + *   None
> > > > > + */
> > > > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > > >
> > > > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > > > Though I am not sure you need a new typedef at all - just a function
> > > > pointer inside the struct seems enough.
> > > Other libraries (for ex: rte_hash) use this approach. I think it is better to keep
> > it out of the structure to allow for better commenting.
> >
> > I am saying majority of DPDK code use _t suffix for typedef:
> > typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
> Apologies, got it, will change.
> 
> >
> > >
> > > >
> > > > > +
> > > > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > > > +
> > > > > +/**
> > > > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > > > + */
> > > > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > > > +
> > > > > +/**
> > > > > + *  Reclaim at the max 1/16th the total number of resources.
> > > > > + */
> > > > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > > >
> > > >
> > > > As I said above, I don't think these thresholds need to be hardcoded.
> > > > In any case, there seems not much point to put them in the public header
> > file.
> > > >
> > > > > +
> > > > > +/**
> > > > > + * Parameters used when creating the defer queue.
> > > > > + */
> > > > > +struct rte_rcu_qsbr_dq_parameters {
> > > > > +	const char *name;
> > > > > +	/**< Name of the queue. */
> > > > > +	uint32_t size;
> > > > > +	/**< Number of entries in queue. Typically, this will be
> > > > > +	 *   the same as the maximum number of entries supported in the
> > > > > +	 *   lock free data structure.
> > > > > +	 *   Data structures with unbounded number of entries is not
> > > > > +	 *   supported currently.
> > > > > +	 */
> > > > > +	uint32_t esize;
> > > > > +	/**< Size (in bytes) of each element in the defer queue.
> > > > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > > > +	 *   support 8B element sizes only.
> > > > > +	 */
> > > > > +	rte_rcu_qsbr_free_resource f;
> > > > > +	/**< Function to call to free the resource. */
> > > > > +	void *p;
> > > >
> > > > Style nit again - I like short names myself, but that seems a bit
> > > > extreme... :) Might be at least:
> > > > void (*reclaim)(void *, void *);
> > > May be 'free_fn'?
> > >
> > > > void * reclaim_data;
> > > > ?
> > > This is the pointer to the data structure to free the resource into. For ex: In
> > LPM data structure, it will be pointer to LPM. 'reclaim_data'
> > > does not convey the meaning correctly.
> >
> > Ok, please free to comeup with your own names.
> > I just wanted to say that 'f' and 'p' are a bit an extreme for public API.
> ok, this is the hardest thing to do 😊
> 
> >
> > >
> > > >
> > > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > > +	 *   pointer to the data structure to which the resource to free
> > > > > +	 *   belongs. This can be NULL.
> > > > > +	 */
> > > > > +	struct rte_rcu_qsbr *v;
> > > >
> > > > Does it need to be inside that struct?
> > > > Might be better:
> > > > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > > > rte_rcu_qsbr_dq_parameters *params);
> > > The API takes a parameter structure as input anyway, why to add
> > > another argument to the function? QSBR variable is also another parameter.
> > >
> > > >
> > > > Another alternative: make both reclaim() and enqueue() to take v as
> > > > a parameter.
> > > But both of them need access to some of the parameters provided in
> > > rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to the
> > functions.
> >
> > Pure stylish thing.
> > From my perspective it just provides better visibility what is going in the code:
> > For QSBR var 'v' create a new deferred queue.
> > But no strong opinion here.
> >
> > >
> > > >
> > > > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > > > +
> > > > > +/* RTE defer queue structure.
> > > > > + * This structure holds the defer queue. The defer queue is used
> > > > > +to
> > > > > + * hold the deleted entries from the data structure that are not
> > > > > + * yet freed.
> > > > > + */
> > > > > +struct rte_rcu_qsbr_dq;
> > > > > +
> > > > >  /**
> > > > >   * @warning
> > > > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE
> > > > > *f, struct rte_rcu_qsbr *v);
> > > > >
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > + *
> > > > > + * Create a queue used to store the data structure elements that
> > > > > +can
> > > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > > + *
> > > > > + * @param params
> > > > > + *   Parameters to create a defer queue.
> > > > > + * @return
> > > > > + *   On success - Valid pointer to defer queue
> > > > > + *   On error - NULL
> > > > > + *   Possible rte_errno codes are:
> > > > > + *   - EINVAL - NULL parameters are passed
> > > > > + *   - ENOMEM - Not enough memory
> > > > > + */
> > > > > +__rte_experimental
> > > > > +struct rte_rcu_qsbr_dq *
> > > > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > > > +*params);
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > + *
> > > > > + * Enqueue one resource to the defer queue and start the grace period.
> > > > > + * The resource will be freed later after at least one grace
> > > > > +period
> > > > > + * is over.
> > > > > + *
> > > > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > > > + * It will also reclaim resources at regular intervals to avoid
> > > > > + * the defer queue from growing too big.
> > > > > + *
> > > > > + * This API is not multi-thread safe. It is expected that the
> > > > > +caller
> > > > > + * provides multi-thread safety by locking a mutex or some other means.
> > > > > + *
> > > > > + * A lock free multi-thread writer algorithm could achieve
> > > > > +multi-thread
> > > > > + * safety by creating and using one defer queue per thread.
> > > > > + *
> > > > > + * @param dq
> > > > > + *   Defer queue to allocate an entry from.
> > > > > + * @param e
> > > > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > > > + *   the data to copy is equal to the element size provided when the
> > > > > + *   defer queue was created.
> > > > > + * @return
> > > > > + *   On success - 0
> > > > > + *   On error - 1 with rte_errno set to
> > > > > + *   - EINVAL - NULL parameters are passed
> > > > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > > > + *		if the defer queue size is equal (or larger) than the
> > > > > + *		number of elements in the data structure.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int
> > > > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > + *
> > > > > + * Reclaim resources from the defer queue.
> > > > > + *
> > > > > + * This API is not multi-thread safe. It is expected that the
> > > > > +caller
> > > > > + * provides multi-thread safety by locking a mutex or some other means.
> > > > > + *
> > > > > + * A lock free multi-thread writer algorithm could achieve
> > > > > +multi-thread
> > > > > + * safety by creating and using one defer queue per thread.
> > > > > + *
> > > > > + * @param dq
> > > > > + *   Defer queue to reclaim an entry from.
> > > > > + * @return
> > > > > + *   On successful reclamation of at least 1 resource - 0
> > > > > + *   On error - 1 with rte_errno set to
> > > > > + *   - EINVAL - NULL parameters are passed
> > > > > + *   - EAGAIN - None of the resources have completed at least 1 grace
> > > > period,
> > > > > + *		try again.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int
> > > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > + *
> > > > > + * Delete a defer queue.
> > > > > + *
> > > > > + * It tries to reclaim all the resources on the defer queue.
> > > > > + * If any of the resources have not completed the grace period
> > > > > + * the reclamation stops and returns immediately. The rest of
> > > > > + * the resources are not reclaimed and the defer queue is not
> > > > > + * freed.
> > > > > + *
> > > > > + * @param dq
> > > > > + *   Defer queue to delete.
> > > > > + * @return
> > > > > + *   On success - 0
> > > > > + *   On error - 1
> > > > > + *   Possible rte_errno codes are:
> > > > > + *   - EINVAL - NULL parameters are passed
> > > > > + *   - EAGAIN - Some of the resources have not completed at least 1
> > grace
> > > > > + *		period, try again.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int
> > > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > > > +
> > > > >  #ifdef __cplusplus
> > > > >  }
> > > > >  #endif
> > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > new file mode 100644
> > > > > index 000000000..2122bc36a
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > >
> > > > Again style suggestion: as it is not public header - don't use rte_
> > > > prefix for naming.
> > > > From my perspective - easier to relalize for reader what is public
> > > > header, what is not.
> > > Looks like the guidelines are not defined very well. I see one private
> > > file with rte_ prefix. I see Stephen not using rte_ prefix. I do not have any
> > preference. But, a consistent approach is required.
> >
> > That's just a suggestion.
> > For me (and I hope for others) it would be a bit easier.
> > When looking at the code for first time I had to look a t meson.build to check is
> > it a public header or not.
> > If the file doesn't have 'rte_' prefix, I assume that it is an internal one
> > straightway.
> > But , as you said, there is no exact guidelines here, so up to you to decide.
> I think it makes sense to remove 'rte_' prefix. I will also change the file name to have '_private' suffix.
> There are some inconsistencies in the existing code, will send a patch to correct them to follow this approach.
> 
> >
> > >
> > > >
> > > > > @@ -0,0 +1,46 @@
> > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > + * Copyright (c) 2019 Arm Limited  */
> > > > > +
> > > > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > > > +#define _RTE_RCU_QSBR_PVT_H_
> > > > > +
> > > > > +/**
> > > > > + * This file is private to the RCU library. It should not be
> > > > > +included
> > > > > + * by the user of this library.
> > > > > + */
> > > > > +
> > > > > +#ifdef __cplusplus
> > > > > +extern "C" {
> > > > > +#endif
> > > > > +
> > > > > +#include "rte_rcu_qsbr.h"
> > > > > +
> > > > > +/* RTE defer queue structure.
> > > > > + * This structure holds the defer queue. The defer queue is used
> > > > > +to
> > > > > + * hold the deleted entries from the data structure that are not
> > > > > + * yet freed.
> > > > > + */
> > > > > +struct rte_rcu_qsbr_dq {
> > > > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > > > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > > > +	uint32_t size;
> > > > > +	/**< Number of elements in the defer queue */
> > > > > +	uint32_t esize;
> > > > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > > > +	rte_rcu_qsbr_free_resource f;
> > > > > +	/**< Function to call to free the resource. */
> > > > > +	void *p;
> > > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > > +	 *   pointer to the data structure to which the resource to free
> > > > > +	 *   belongs.
> > > > > +	 */
> > > > > +	char e[0];
> > > > > +	/**< Temporary storage to copy the defer queue element. */
> > > >
> > > > Do you really need 'e' at all?
> > > > Can't it be just temporary stack variable?
> > > Ok, will check.
> > >
> > > >
> > > > > +};
> > > > > +
> > > > > +#ifdef __cplusplus
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > > > b/lib/librte_rcu/rte_rcu_version.map
> > > > > index f8b9ef2ab..dfac88a37 100644
> > > > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > > > >  	rte_rcu_qsbr_synchronize;
> > > > >  	rte_rcu_qsbr_thread_register;
> > > > >  	rte_rcu_qsbr_thread_unregister;
> > > > > +	rte_rcu_qsbr_dq_create;
> > > > > +	rte_rcu_qsbr_dq_enqueue;
> > > > > +	rte_rcu_qsbr_dq_reclaim;
> > > > > +	rte_rcu_qsbr_dq_delete;
> > > > >
> > > > >  	local: *;
> > > > >  };
> > > > > diff --git a/lib/meson.build b/lib/meson.build index
> > > > > e5ff83893..0e1be8407 100644
> > > > > --- a/lib/meson.build
> > > > > +++ b/lib/meson.build
> > > > > @@ -11,7 +11,9 @@
> > > > >  libraries = [
> > > > >  	'kvargs', # eal depends on kvargs
> > > > >  	'eal', # everything depends on eal
> > > > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > > +	'ring',
> > > > > +	'rcu', # rcu depends on ring
> > > > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > >  	'cmdline',
> > > > >  	'metrics', # bitrate/latency stats depends on this
> > > > >  	'hash',    # efd depends on this
> > > > > @@ -22,7 +24,7 @@ libraries = [
> > > > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > > > >  	'kni', 'latencystats', 'lpm', 'member',
> > > > >  	'power', 'pdump', 'rawdev',
> > > > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > >  	# ipsec lib depends on net, crypto and security
> > > > >  	'ipsec',
> > > > >  	# add pkt framework libs which use other libs from above
> > > > > --
> > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
                         ` (2 preceding siblings ...)
  2019-10-04 19:01       ` Medvedkin, Vladimir
@ 2019-10-07 13:11       ` Medvedkin, Vladimir
  2019-10-13  3:02         ` Honnappa Nagarahalli
  3 siblings, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-10-07 13:11 UTC (permalink / raw)
  To: Honnappa Nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, ruifeng.wang, dharmik.thakkar, dev, nd

Hi Honnappa,

On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
> Add resource reclamation APIs to make it simple for applications
> and libraries to integrate rte_rcu library.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
>   lib/librte_rcu/meson.build         |   2 +
>   lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
>   lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
>   lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
>   lib/librte_rcu/rte_rcu_version.map |   4 +
>   lib/meson.build                    |   6 +-
>   7 files changed, 700 insertions(+), 3 deletions(-)
>   create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
>
> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
> index d1b9e46a2..3a6815243 100644
> --- a/app/test/test_rcu_qsbr.c
> +++ b/app/test/test_rcu_qsbr.c
> @@ -1,8 +1,9 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright (c) 2018 Arm Limited
> + * Copyright (c) 2019 Arm Limited
>    */
>   
>   #include <stdio.h>
> +#include <string.h>
>   #include <rte_pause.h>
>   #include <rte_rcu_qsbr.h>
>   #include <rte_hash.h>
> @@ -33,6 +34,7 @@ static uint32_t *keys;
>   #define COUNTER_VALUE 4096
>   static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
>   static uint8_t writer_done;
> +static uint8_t cb_failed;
>   
>   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
>   struct rte_hash *h[RTE_MAX_LCORE];
> @@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
>   	return 0;
>   }
>   
> +static void
> +rte_rcu_qsbr_test_free_resource(void *p, void *e)
> +{
> +	if (p != NULL && e != NULL) {
> +		printf("%s: Test failed\n", __func__);
> +		cb_failed = 1;
> +	}
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_create: create a queue used to store the data structure
> + * elements that can be freed later. This queue is referred to as 'defer queue'.
> + */
> +static int
> +test_rcu_qsbr_dq_create(void)
> +{
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
> +
> +	/* Pass invalid parameters */
> +	dq = rte_rcu_qsbr_dq_create(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.size = 1;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	params.esize = 3;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
> +
> +	/* Pass all valid parameters */
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +	rte_rcu_qsbr_dq_delete(dq);
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> + * to be freed later after atleast one grace period is over.
> + */
> +static int
> +test_rcu_qsbr_dq_enqueue(void)
> +{
> +	int ret;
> +	uint64_t r;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
> +
> +	/* Create a queue with simple parameters */
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	params.size = 1;
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
> +
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
> + */
> +static int
> +test_rcu_qsbr_dq_reclaim(void)
> +{
> +	int ret;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
> + */
> +static int
> +test_rcu_qsbr_dq_delete(void)
> +{
> +	int ret;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
> +
> +	/* Pass invalid parameters */
> +	ret = rte_rcu_qsbr_dq_delete(NULL);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid params");
> +
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +	params.v = t[0];
> +	params.size = 1;
> +	params.esize = 16;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> + * to be freed later after atleast one grace period is over.
> + */
> +static int
> +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize)
> +{
> +	int i, j, ret;
> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params;
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint64_t *e;
> +	uint64_t sc = 200;
> +	int max_entries;
> +
> +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
> +	printf("Size = %d, esize = %d\n", size, esize);
> +
> +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
> +	if (e == NULL)
> +		return 0;
> +	cb_failed = 0;
> +
> +	/* Initialize the RCU variable. No threads are registered */
> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> +
> +	/* Create a queue with simple parameters */
> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> +	params.name = rcu_dq_name;
> +	params.f = rte_rcu_qsbr_test_free_resource;
> +	params.v = t[0];
> +	params.size = size;
> +	params.esize = esize;
> +	dq = rte_rcu_qsbr_dq_create(&params);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
> +
> +	/* Given the size and esize, calculate the maximum number of entries
> +	 * that can be stored on the defer queue (look at the logic used
> +	 * in capacity calculation of rte_ring).
> +	 */
> +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
> +	max_entries = (max_entries - 1)/(esize/8 + 1);
> +
> +	/* Enqueue few counters starting with the value 'sc' */
> +	/* The queue size will be rounded up to 2. The enqueue API also
> +	 * reclaims if the queue size is above certain limit. Since, there
> +	 * are no threads registered, reclamation succedes. Hence, it should
> +	 * be possible to enqueue more than the provided queue size.
> +	 */
> +	for (i = 0; i < 10; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Register a thread on the RCU QSBR variable. Reclamation will not
> +	 * succeed. It should not be possible to enqueue more than the size
> +	 * number of resources.
> +	 */
> +	rte_rcu_qsbr_thread_register(t[0], 1);
> +	rte_rcu_qsbr_thread_online(t[0], 1);
> +
> +	for (i = 0; i < max_entries; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Enqueue fails as queue is full */
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
> +
> +	/* Delete should fail as there are elements in defer queue which
> +	 * cannot be reclaimed.
> +	 */
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid params");
> +
> +	/* Report quiescent state, enqueue should succeed */
> +	rte_rcu_qsbr_quiescent(t[0], 1);
> +	for (i = 0; i < max_entries; i++) {
> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> +			"dq enqueue functional");
> +		for (j = 0; j < esize/8; j++)
> +			e[j] = sc++;
> +	}
> +
> +	/* Queue is full */
> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue functional");
> +
> +	/* Report quiescent state, delete should succeed */
> +	rte_rcu_qsbr_quiescent(t[0], 1);
> +	ret = rte_rcu_qsbr_dq_delete(dq);
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
> +
> +	/* Validate that call back function did not return any error */
> +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
> +
> +	rte_free(e);
> +	return 0;
> +}
> +
>   /*
>    * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
>    */
> @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
>   	if (test_rcu_qsbr_thread_offline() < 0)
>   		goto test_fail;
>   
> +	if (test_rcu_qsbr_dq_create() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_reclaim() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_delete() < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_enqueue() < 0)
> +		goto test_fail;
> +
>   	printf("\nFunctional tests\n");
>   
>   	if (test_rcu_qsbr_sw_sv_3qs() < 0)
> @@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
>   	if (test_rcu_qsbr_mw_mv_mqs() < 0)
>   		goto test_fail;
>   
> +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
> +		goto test_fail;
> +
> +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
> +		goto test_fail;
> +
>   	free_rcu();
>   
>   	printf("\n");
> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> index 62920ba02..e280b29c1 100644
> --- a/lib/librte_rcu/meson.build
> +++ b/lib/librte_rcu/meson.build
> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
>   if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
>   	ext_deps += cc.find_library('atomic')
>   endif
> +
> +deps += ['ring']
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
> index ce7f93dd3..76814f50b 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> @@ -21,6 +21,7 @@
>   #include <rte_errno.h>
>   
>   #include "rte_rcu_qsbr.h"
> +#include "rte_rcu_qsbr_pvt.h"
>   
>   /* Get the memory size of QSBR variable */
>   size_t
> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
>   	return 0;
>   }
>   
> +/* Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + */
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
> +{
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint32_t qs_fifo_size;
> +
> +	if (params == NULL || params->f == NULL ||
> +		params->v == NULL || params->name == NULL ||
> +		params->size == 0 || params->esize == 0 ||
> +		(params->esize % 8 != 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +
> +	dq = rte_zmalloc(NULL,
> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> +		RTE_CACHE_LINE_SIZE);
> +	if (dq == NULL) {
> +		rte_errno = ENOMEM;
> +
> +		return NULL;
> +	}
> +
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * max_size.
> +	 */
> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> +					* params->size) + 1);
> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> +					SOCKET_ID_ANY, 0);
> +	if (dq->r == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): defer queue create failed\n", __func__);
> +		rte_free(dq);
> +		return NULL;
> +	}
> +
> +	dq->v = params->v;
> +	dq->size = params->size;
> +	dq->esize = params->esize;
> +	dq->f = params->f;
> +	dq->p = params->p;
> +
> +	return dq;
> +}
> +
> +/* Enqueue one resource to the defer queue to free after the grace
> + * period is over.
> + */
> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> +{
> +	uint64_t token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +	uint32_t cur_size, free_size;
> +
> +	if (dq == NULL || e == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Start the grace period */
> +	token = rte_rcu_qsbr_start(dq->v);
> +
> +	/* Reclaim resources if the queue is 1/8th full. This helps
> +	 * the queue from growing too large and allows time for reader
> +	 * threads to report their quiescent state.
> +	 */
> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Triggering reclamation\n", __func__);
> +		rte_rcu_qsbr_dq_reclaim(dq);
> +	}

There are two problems I see:

1. rte_rcu_qsbr_dq_reclaim() reclaims only 1/16 of the defer queue while 
it triggers on 1/8. This means that there will always be 1/16 of non 
reclaimed entries in the queue.

2. Number of entries to reclaim depend on dq->size. So, 
rte_rcu_qsbr_dq_reclaim() could take a lot of cycles. For LPM library 
this means that rte_lpm_delete() sometimes takes a long time.

So, my suggestions here would be

- trigger rte_rcu_qsbr_dq_reclaim() with every enqueue

- reclaim small amount of entries (could be configurable of creation time)

- provide API to trigger reclaim from the application manually.

> +
> +	/* Check if there is space for atleast for 1 resource */
> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> +	if (!free_size) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Defer queue is full\n", __func__);
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	/* Enqueue the resource */
> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> +
> +	/* The resource to enqueue needs to be a multiple of 64b
> +	 * due to the limitation of the rte_ring implementation.
> +	 */
> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> +
> +	return 0;
> +}
> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq)
> +{
> +	uint32_t max_cnt;
> +	uint32_t cnt;
> +	void *token;
> +	uint64_t *tmp;
> +	uint32_t i;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Anything to reclaim? */
> +	if (rte_ring_count(dq->r) == 0)
> +		return 0;
> +
> +	/* Reclaim at the max 1/16th the total number of entries. */
> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> +	cnt = 0;
> +
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> +			== 1)) {
> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> +		/* The resource to dequeue needs to be a multiple of 64b
> +		 * due to the limitation of the rte_ring implementation.
> +		 */
> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> +			i++, tmp++)
> +			(void)rte_ring_sc_dequeue(dq->r,
> +					(void *)(uintptr_t)tmp);
> +		dq->f(dq->p, dq->e);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (cnt == 0) {
> +		/* No resources were reclaimed */
> +		rte_errno = EAGAIN;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> +		/* Error number is already set by the reclaim API */
> +		return 1;
> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>   int rte_rcu_log_type;
>   
>   RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index c80f15c00..185d4b50a 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>   #include <rte_lcore.h>
>   #include <rte_debug.h>
>   #include <rte_atomic.h>
> +#include <rte_ring.h>
>   
>   extern int rte_rcu_log_type;
>   
> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>   	 */
>   } __rte_cache_aligned;
>   
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> +
> +/**
> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> + */
> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> +
> +/**
> + *  Reclaim at the max 1/16th the total number of resources.
> + */
> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 8B as the rte_ring APIs
> +	 *   support 8B element sizes only.
> +	 */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;
> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>   /**
>    * @warning
>    * @b EXPERIMENTAL: this API may change without prior notice
> @@ -648,6 +710,113 @@ __rte_experimental
>   int
>   rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Reclaim resources from the defer queue.
> + *
> + * This API is not multi-thread safe. It is expected that the caller
> + * provides multi-thread safety by locking a mutex or some other means.
> + *
> + * A lock free multi-thread writer algorithm could achieve multi-thread
> + * safety by creating and using one defer queue per thread.
> + *
> + * @param dq
> + *   Defer queue to reclaim an entry from.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - None of the resources have completed at least 1 grace period,
> + *		try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..2122bc36a
> --- /dev/null
> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data stored on the defer queue */
> +	rte_rcu_qsbr_free_resource f;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +	char e[0];
> +	/**< Temporary storage to copy the defer queue element. */
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>   	rte_rcu_qsbr_synchronize;
>   	rte_rcu_qsbr_thread_register;
>   	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
>   
>   	local: *;
>   };
> diff --git a/lib/meson.build b/lib/meson.build
> index e5ff83893..0e1be8407 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>   libraries = [
>   	'kvargs', # eal depends on kvargs
>   	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>   	'cmdline',
>   	'metrics', # bitrate/latency stats depends on this
>   	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>   	'gro', 'gso', 'ip_frag', 'jobstats',
>   	'kni', 'latencystats', 'lpm', 'member',
>   	'power', 'pdump', 'rawdev',
> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>   	# ipsec lib depends on net, crypto and security
>   	'ipsec',
>   	# add pkt framework libs which use other libs from above

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-04 16:05       ` Medvedkin, Vladimir
@ 2019-10-09  3:48         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-09  3:48 UTC (permalink / raw)
  To: Medvedkin, Vladimir, bruce.richardson, olivier.matz
  Cc: dev, konstantin.ananyev, stephen, paulmck,
	Gavin Hu (Arm Technology China),
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

<snip>

> 
> Hi Honnappa,
> 
> On 01/10/2019 19:28, Honnappa Nagarahalli wrote:
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> >
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >   lib/librte_lpm/Makefile            |   3 +-
> >   lib/librte_lpm/meson.build         |   2 +
> >   lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
> >   lib/librte_lpm/rte_lpm.h           |  21 ++++++
> >   lib/librte_lpm/rte_lpm_version.map |   6 ++
> >   5 files changed, 122 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > a7946a1c5..ca9e16312 100644
> > --- a/lib/librte_lpm/Makefile
> > +++ b/lib/librte_lpm/Makefile
> > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
> >   # library name
> >   LIB = librte_lpm.a
> >
> > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> >   CFLAGS += -O3
> >   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal
> > -lrte_hash
> > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> >
> >   EXPORT_MAP := rte_lpm_version.map
> >
> > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > index a5176d8ae..19a35107f 100644
> > --- a/lib/librte_lpm/meson.build
> > +++ b/lib/librte_lpm/meson.build
> > @@ -2,9 +2,11 @@
> >   # Copyright(c) 2017 Intel Corporation
> >
> >   version = 2
> > +allow_experimental_apis = true
> >   sources = files('rte_lpm.c', 'rte_lpm6.c')
> >   headers = files('rte_lpm.h', 'rte_lpm6.h')
> >   # since header files have different names, we can install all vector headers
> >   # without worrying about which architecture we actually need
> >   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> >   deps += ['hash']
> > +deps += ['rcu']
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 3a929a1b1..ca58d4b35 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >    */
> >
> >   #include <string.h>
> > @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> >
> >   	rte_mcfg_tailq_write_unlock();
> >
> > +	if (lpm->dq)
> > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> >   	rte_free(lpm->tbl8);
> >   	rte_free(lpm->rules_tbl);
> >   	rte_free(lpm);
> > @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> 16.04);
> >   MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> >   		rte_lpm_free_v1604);
> As a general comment, are you going to add rcu support to the legacy _v20 ?
I do not see a requirement from my side. What's your suggestion?

> >
> > +struct __rte_lpm_rcu_dq_entry {
> > +	uint32_t tbl8_group_index;
> > +	uint32_t pad;
> > +};
> 
> Is this struct necessary? I mean in tbl8_free_v1604() you can pass
> tbl8_group_index as a pointer without "e.pad = 0;".
Agree, that is another way. This structure will go away once the ring library supports storing 32b elements.

> 
> And what about 32bit environment?
Waiting for rte_ring to support 32b elements (the patch is being discussed).

> 
> > +
> > +static void
> > +__lpm_rcu_qsbr_free_resource(void *p, void *data) {
> > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	struct __rte_lpm_rcu_dq_entry *e =
> > +			(struct __rte_lpm_rcu_dq_entry *)data;
> > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > +
> > +	/* Set tbl8 group invalid */
> > +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> > +		__ATOMIC_RELAXED);
> > +}
> > +
> > +/* Associate QSBR variable with an LPM object.
> > + */
> > +int
> > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +
> > +	if ((lpm == NULL) || (v == NULL)) {
> > +		rte_errno = EINVAL;
> > +		return 1;
> > +	}
> > +
> > +	if (lpm->dq) {
> > +		rte_errno = EEXIST;
> > +		return 1;
> > +	}
> > +
> > +	/* Init QSBR defer queue. */
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm-
> >name);
> 
> Consider moving this logic into rte_rcu_qsbr_dq_create(). I think there you
> could prefix the name with just RCU_ . So it would be possible to move
> include <rte_ring.h> into the rte_rcu_qsbr.c from rte_rcu_qsbr.h and get rid
> of RTE_RCU_QSBR_DQ_NAMESIZE macro in rte_rcu_qsbr.h file.
Macro is required to provide a length for the name, similar to what rte_ring does. What would be the length of the 'name' if RTE_RCU_QSBR_DQ_NAMESIZE is removed?
If the 'RCU_' has to be prefixed in 'rte_rcu_qsbr_dq_create', then RTE_RCU_QSBR_DQ_NAMESIZE needs to be readjusted in the header file. I am trying to keep it simple by constructing the string in a single function.

> 
> > +	params.name = rcu_dq_name;
> > +	params.size = lpm->number_tbl8s;
> > +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> > +	params.f = __lpm_rcu_qsbr_free_resource;
> > +	params.p = lpm->tbl8;
> > +	params.v = v;
> > +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > +	if (lpm->dq == NULL) {
> > +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >   /*
> >    * Adds a rule to the rule table.
> >    *
> > @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> *tbl8)
> >   }
> >
> >   static int32_t
> > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > number_tbl8s)
> > +__tbl8_alloc_v1604(struct rte_lpm *lpm)
> >   {
> >   	uint32_t group_idx; /* tbl8 group index. */
> >   	struct rte_lpm_tbl_entry *tbl8_entry;
> >
> >   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > +		tbl8_entry = &lpm->tbl8[group_idx *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> >   		/* If a free tbl8 group is found clean it and set as VALID. */
> >   		if (!tbl8_entry->valid_group) {
> >   			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 712,6 +769,21 @@
> > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> >   	return -ENOSPC;
> >   }
> >
> > +static int32_t
> > +tbl8_alloc_v1604(struct rte_lpm *lpm) {
> > +	int32_t group_idx; /* tbl8 group index. */
> > +
> > +	group_idx = __tbl8_alloc_v1604(lpm);
> > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> > +		/* If there are no tbl8 groups try to reclaim some. */
> > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> > +			group_idx = __tbl8_alloc_v1604(lpm);
> > +	}
> > +
> > +	return group_idx;
> > +}
> > +
> >   static void
> >   tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t
> tbl8_group_start)
> >   {
> > @@ -728,13 +800,21 @@ tbl8_free_v20(struct rte_lpm_tbl_entry_v20
> *tbl8, uint32_t tbl8_group_start)
> >   }
> >
> >   static void
> > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > tbl8_group_start)
> > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> >   {
> > -	/* Set tbl8 group invalid*/
> >   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	struct __rte_lpm_rcu_dq_entry e;
> >
> > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > -			__ATOMIC_RELAXED);
> > +	if (lpm->dq != NULL) {
> > +		e.tbl8_group_index = tbl8_group_start;
> > +		e.pad = 0;
> > +		/* Push into QSBR defer queue. */
> > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> > +	} else {
> > +		/* Set tbl8 group invalid*/
> > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> > +				__ATOMIC_RELAXED);
> > +	}
> >   }
> >
> >   static __rte_noinline int32_t
> > @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked, uint8_t depth,
> >
> >   	if (!lpm->tbl24[tbl24_index].valid) {
> >   		/* Search for a free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >   		/* Check tbl8 allocation was successful. */
> >   		if (tbl8_group_index < 0) {
> > @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> >   	} /* If valid entry but not extended calculate the index into Table8. */
> >   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> >   		/* Search for free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >   		if (tbl8_group_index < 0) {
> >   			return tbl8_group_index;
> > @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
> >   		 */
> >   		lpm->tbl24[tbl24_index].valid = 0;
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >   	} else if (tbl8_recycle_index > -1) {
> >   		/* Update tbl24 entry. */
> >   		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> +1914,7 @@
> > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> >   		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> >   				__ATOMIC_RELAXED);
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >   	}
> >   #undef group_idx
> >   	return 0;
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > 906ec4483..49c12a68d 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >    */
> >
> >   #ifndef _RTE_LPM_H_
> > @@ -21,6 +22,7 @@
> >   #include <rte_common.h>
> >   #include <rte_vect.h>
> >   #include <rte_compat.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >   #ifdef __cplusplus
> >   extern "C" {
> > @@ -186,6 +188,7 @@ struct rte_lpm {
> >   			__rte_cache_aligned; /**< LPM tbl24 table. */
> >   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
> >   };
> >
> >   /**
> > @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> >   void
> >   rte_lpm_free_v1604(struct rte_lpm *lpm);
> >
> > +/**
> > + * Associate RCU QSBR variable with an LPM object.
> > + *
> > + * @param lpm
> > + *   the lpm object to add RCU QSBR
> > + * @param v
> > + *   RCU QSBR variable
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with error code set in rte_errno.
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - invalid pointer
> > + *   - EEXIST - already added QSBR
> > + *   - ENOMEM - memory allocation failure
> > + */
> > +__rte_experimental
> > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > +*v);
> > +
> >   /**
> >    * Add a rule to the LPM table.
> >    *
> > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > b/lib/librte_lpm/rte_lpm_version.map
> > index 90beac853..b353aabd2 100644
> > --- a/lib/librte_lpm/rte_lpm_version.map
> > +++ b/lib/librte_lpm/rte_lpm_version.map
> > @@ -44,3 +44,9 @@ DPDK_17.05 {
> >   	rte_lpm6_lookup_bulk_func;
> >
> >   } DPDK_16.04;
> > +
> > +EXPERIMENTAL {
> > +	global:
> > +
> > +	rte_lpm_rcu_qsbr_add;
> > +};
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-07  9:01           ` Ananyev, Konstantin
@ 2019-10-09  4:25             ` Honnappa Nagarahalli
  2019-10-10 15:09               ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-09  4:25 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> 
> >
> > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > >
> > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > >
> > > > The peek API allows fetching the next available object in the ring
> > > > without dequeuing it. This helps in scenarios where dequeuing of
> > > > objects depend on their value.
> > > >
> > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > ---
> > > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > > >  1 file changed, 30 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r,
> > > > void
> > > **obj_table,
> > > >  				r->cons.single, available);
> > > >  }
> > > >
> > > > +/**
> > > > + * Peek one object from a ring.
> > > > + *
> > > > + * The peek API allows fetching the next available object in the
> > > > +ring
> > > > + * without dequeuing it. This API is not multi-thread safe with
> > > > +respect
> > > > + * to other consumer threads.
> > > > + *
> > > > + * @param r
> > > > + *   A pointer to the ring structure.
> > > > + * @param obj_p
> > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > + * @return
> > > > + *   - 0: Success, object available
> > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline int
> > > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> > >
> > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > follow other rte_ring functions naming conventions
> > > (rte_ring_sc_peek() or so).
> > Agree
> >
> > >
> > > As a better alternative what do you think about introducing a
> > > serialized versions of DPDK rte_ring dequeue functions?
> > > Something like that:
> > >
> > > /* same as original ring dequeue, but:
> > >   * 1) move cons.head only if cons.head == const.tail
> > >   * 2) don't update cons.tail
> > >   */
> > > unsigned int
> > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > unsigned int n,
> > >                 unsigned int *available);
> > >
> > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> > >
> > > /* resets cons.head to const.tail value */ void
> > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > >
> > > Then your dq_reclaim cycle function will look like that:
> > >
> > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > uintptr_t elt[nb_elt]; ...
> > >
> > > do {
> > >
> > >   /* read next elem from the queue */
> > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > >   if (n == 0)
> > >       break;
> > >
> > >  /* wrong period, keep elem in the queue */  if
> > > (rte_rcu_qsbr_check(dr->v,
> > > elt[0]) != 1) {
> > >      rte_ring_serial_dequeue_abort(dq->r);
> > >      break;
> > >   }
> > >
> > >   /* can reclaim, remove elem from the queue */
> > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > >
> > >    /*call reclaim function */
> > >   dr->f(dr->p, elt);
> > >
> > > } while (avl >= nb_elt);
> > >
> > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > As long as actual reclamation callback itself is MT safe of course.
> >
> > I think it is a great idea. The other writers would still be polling
> > for the current writer to update the tail or update the head. This makes it a
> blocking solution.
> 
> Yep, it is a blocking one.
> 
> > We can make the other threads not poll i.e. they will quit reclaiming if they
> see that other writers are dequeuing from the queue.
> 
> Actually didn't think about that possibility, but yes should be possible to have
> _try_ semantics too.
> 
> >The other  way is to use per thread queues.
> >
> > The other requirement I see is to support unbounded-size data
> > structures where in the data structures do not have a pre-determined
> > number of entries. Also, currently the defer queue size is equal to the total
> number of entries in a given data structure. There are plans to support
> dynamically resizable defer queue. This means, memory allocation which will
> affect the lock-free-ness of the solution.
> >
> > So, IMO:
> > 1) The API should provide the capability to support different algorithms -
> may be through some flags?
> > 2) The requirements for the ring are pretty unique to the problem we
> > have here (for ex: move the cons-head only if cons-tail is also the same, skip
> polling). So, we should probably implement a ring with-in the RCU library?
> 
> Personally, I think such serialization ring API would be useful for other cases
> too.
> There are few cases when user need to read contents of the queue without
> removing elements from it.
> Let say we do use similar approach inside TLDK to implement TCP transmit
> queue.
> If such API would exist in DPDK we can just use it straightway, without
> maintaining a separate one.
ok

> 
> >
> > From the timeline perspective, adding all these capabilities would be
> > difficult to get done with in 19.11 timeline. What I have here
> > satisfies my current needs. I suggest that we make provisions in APIs now to
> support all these features, but do the implementation in the coming releases.
> Does this sound ok for you?
> 
> Not sure I understand your suggestion here...
> Could you explain it a bit more - how new API will look like and what would
> be left for the future.
For this patch, I suggest we do not add any more complexity. If someone wants a lock-free/block-free mechanism, it is available by creating per thread defer queues.

We push the following to the future:
1) Dynamically size adjustable defer queue. IMO, with this, the lock-free/block-free reclamation will not be available (memory allocation requires locking). The memory for the defer queue will be allocated/freed in chunks of 'size' elements as the queue grows/shrinks.

2) Constant size defer queue with lock-free and block-free reclamation (single option). The defer queue will be of fixed length 'size'. If the queue gets full an error is returned. The user could provide a 'size' equal to the number of elements in a data structure to ensure queue never gets full.

I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide 2 #defines, one for dynamically variable size defer queue and the other for constant size defer queue.

However, IMO, using per thread defer queue is a much simpler way to achieve 2. It does not add any significant burden to the user either.

> 
> >
> > >
> > > > +{
> > > > +	uint32_t prod_tail = r->prod.tail;
> > > > +	uint32_t cons_head = r->cons.head;
> > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > +	unsigned int n = 1;
> > > > +	if (count) {
> > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > +		return 0;
> > > > +	}
> > > > +	return -ENOENT;
> > > > +}
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-09  4:25             ` Honnappa Nagarahalli
@ 2019-10-10 15:09               ` Ananyev, Konstantin
  2019-10-11  5:03                 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-10 15:09 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd


> <snip>
> 
> >
> > >
> > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > >
> > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > >
> > > > > The peek API allows fetching the next available object in the ring
> > > > > without dequeuing it. This helps in scenarios where dequeuing of
> > > > > objects depend on their value.
> > > > >
> > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > ---
> > > > >  lib/librte_ring/rte_ring.h | 30 ++++++++++++++++++++++++++++++
> > > > >  1 file changed, 30 insertions(+)
> > > > >
> > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r,
> > > > > void
> > > > **obj_table,
> > > > >  				r->cons.single, available);
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * Peek one object from a ring.
> > > > > + *
> > > > > + * The peek API allows fetching the next available object in the
> > > > > +ring
> > > > > + * without dequeuing it. This API is not multi-thread safe with
> > > > > +respect
> > > > > + * to other consumer threads.
> > > > > + *
> > > > > + * @param r
> > > > > + *   A pointer to the ring structure.
> > > > > + * @param obj_p
> > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > + * @return
> > > > > + *   - 0: Success, object available
> > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static __rte_always_inline int
> > > > > +rte_ring_peek(struct rte_ring *r, void **obj_p)
> > > >
> > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > follow other rte_ring functions naming conventions
> > > > (rte_ring_sc_peek() or so).
> > > Agree
> > >
> > > >
> > > > As a better alternative what do you think about introducing a
> > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > Something like that:
> > > >
> > > > /* same as original ring dequeue, but:
> > > >   * 1) move cons.head only if cons.head == const.tail
> > > >   * 2) don't update cons.tail
> > > >   */
> > > > unsigned int
> > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n,
> > > >                 unsigned int *available);
> > > >
> > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t num);
> > > >
> > > > /* resets cons.head to const.tail value */ void
> > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > >
> > > > Then your dq_reclaim cycle function will look like that:
> > > >
> > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > uintptr_t elt[nb_elt]; ...
> > > >
> > > > do {
> > > >
> > > >   /* read next elem from the queue */
> > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > >   if (n == 0)
> > > >       break;
> > > >
> > > >  /* wrong period, keep elem in the queue */  if
> > > > (rte_rcu_qsbr_check(dr->v,
> > > > elt[0]) != 1) {
> > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > >      break;
> > > >   }
> > > >
> > > >   /* can reclaim, remove elem from the queue */
> > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > >
> > > >    /*call reclaim function */
> > > >   dr->f(dr->p, elt);
> > > >
> > > > } while (avl >= nb_elt);
> > > >
> > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > As long as actual reclamation callback itself is MT safe of course.
> > >
> > > I think it is a great idea. The other writers would still be polling
> > > for the current writer to update the tail or update the head. This makes it a
> > blocking solution.
> >
> > Yep, it is a blocking one.
> >
> > > We can make the other threads not poll i.e. they will quit reclaiming if they
> > see that other writers are dequeuing from the queue.
> >
> > Actually didn't think about that possibility, but yes should be possible to have
> > _try_ semantics too.
> >
> > >The other  way is to use per thread queues.
> > >
> > > The other requirement I see is to support unbounded-size data
> > > structures where in the data structures do not have a pre-determined
> > > number of entries. Also, currently the defer queue size is equal to the total
> > number of entries in a given data structure. There are plans to support
> > dynamically resizable defer queue. This means, memory allocation which will
> > affect the lock-free-ness of the solution.
> > >
> > > So, IMO:
> > > 1) The API should provide the capability to support different algorithms -
> > may be through some flags?
> > > 2) The requirements for the ring are pretty unique to the problem we
> > > have here (for ex: move the cons-head only if cons-tail is also the same, skip
> > polling). So, we should probably implement a ring with-in the RCU library?
> >
> > Personally, I think such serialization ring API would be useful for other cases
> > too.
> > There are few cases when user need to read contents of the queue without
> > removing elements from it.
> > Let say we do use similar approach inside TLDK to implement TCP transmit
> > queue.
> > If such API would exist in DPDK we can just use it straightway, without
> > maintaining a separate one.
> ok
> 
> >
> > >
> > > From the timeline perspective, adding all these capabilities would be
> > > difficult to get done with in 19.11 timeline. What I have here
> > > satisfies my current needs. I suggest that we make provisions in APIs now to
> > support all these features, but do the implementation in the coming releases.
> > Does this sound ok for you?
> >
> > Not sure I understand your suggestion here...
> > Could you explain it a bit more - how new API will look like and what would
> > be left for the future.
> For this patch, I suggest we do not add any more complexity. If someone wants a lock-free/block-free mechanism, it is available by creating
> per thread defer queues.
> 
> We push the following to the future:
> 1) Dynamically size adjustable defer queue. IMO, with this, the lock-free/block-free reclamation will not be available (memory allocation
> requires locking). The memory for the defer queue will be allocated/freed in chunks of 'size' elements as the queue grows/shrinks.

That one is fine by me.
In fact I don't know would be there a real use-case for dynamic defer queue for rcu var...
But I suppose that's subject for another discussion.

> 
> 2) Constant size defer queue with lock-free and block-free reclamation (single option). The defer queue will be of fixed length 'size'. If the
> queue gets full an error is returned. The user could provide a 'size' equal to the number of elements in a data structure to ensure queue
> never gets full.

Ok so for 19.11 what enqueue/dequeue model do you plan to support?
- MP/MC
- MP/SC
- SP/SC
- non MT at all (only same single thread can do enqueue and dequeue)

And related question:
What additional rte_ring API you plan to introduce in that case?
- None
- rte_ring_sc_peek()
- rte_ring_serial_dequeue()

> 
> I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide 2 #defines, one for dynamically variable size defer queue and the
> other for constant size defer queue.
> 
> However, IMO, using per thread defer queue is a much simpler way to achieve 2. It does not add any significant burden to the user either.
> 
> >
> > >
> > > >
> > > > > +{
> > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > +	uint32_t cons_head = r->cons.head;
> > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > +	unsigned int n = 1;
> > > > > +	if (count) {
> > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > +		return 0;
> > > > > +	}
> > > > > +	return -ENOENT;
> > > > > +}
> > > > > +
> > > > >  #ifdef __cplusplus
> > > > >  }
> > > > >  #endif
> > > > > --
> > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-10 15:09               ` Ananyev, Konstantin
@ 2019-10-11  5:03                 ` Honnappa Nagarahalli
  2019-10-11 14:41                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-11  5:03 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd, nd

> 
> > <snip>
> >
> > >
> > > >
> > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > >
> > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > >
> > > > > > The peek API allows fetching the next available object in the
> > > > > > ring without dequeuing it. This helps in scenarios where
> > > > > > dequeuing of objects depend on their value.
> > > > > >
> > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > ---
> > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > ++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 30 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring
> > > > > > *r, void
> > > > > **obj_table,
> > > > > >  				r->cons.single, available);  }
> > > > > >
> > > > > > +/**
> > > > > > + * Peek one object from a ring.
> > > > > > + *
> > > > > > + * The peek API allows fetching the next available object in
> > > > > > +the ring
> > > > > > + * without dequeuing it. This API is not multi-thread safe
> > > > > > +with respect
> > > > > > + * to other consumer threads.
> > > > > > + *
> > > > > > + * @param r
> > > > > > + *   A pointer to the ring structure.
> > > > > > + * @param obj_p
> > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > + * @return
> > > > > > + *   - 0: Success, object available
> > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +static __rte_always_inline int rte_ring_peek(struct rte_ring
> > > > > > +*r, void **obj_p)
> > > > >
> > > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > > follow other rte_ring functions naming conventions
> > > > > (rte_ring_sc_peek() or so).
> > > > Agree
> > > >
> > > > >
> > > > > As a better alternative what do you think about introducing a
> > > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > > Something like that:
> > > > >
> > > > > /* same as original ring dequeue, but:
> > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > >   * 2) don't update cons.tail
> > > > >   */
> > > > > unsigned int
> > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > **obj_table, unsigned int n,
> > > > >                 unsigned int *available);
> > > > >
> > > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t
> > > > > num);
> > > > >
> > > > > /* resets cons.head to const.tail value */ void
> > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > >
> > > > > Then your dq_reclaim cycle function will look like that:
> > > > >
> > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > > uintptr_t elt[nb_elt]; ...
> > > > >
> > > > > do {
> > > > >
> > > > >   /* read next elem from the queue */
> > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > >   if (n == 0)
> > > > >       break;
> > > > >
> > > > >  /* wrong period, keep elem in the queue */  if
> > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > elt[0]) != 1) {
> > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > >      break;
> > > > >   }
> > > > >
> > > > >   /* can reclaim, remove elem from the queue */
> > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > >
> > > > >    /*call reclaim function */
> > > > >   dr->f(dr->p, elt);
> > > > >
> > > > > } while (avl >= nb_elt);
> > > > >
> > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > As long as actual reclamation callback itself is MT safe of course.
> > > >
> > > > I think it is a great idea. The other writers would still be
> > > > polling for the current writer to update the tail or update the
> > > > head. This makes it a
> > > blocking solution.
> > >
> > > Yep, it is a blocking one.
> > >
> > > > We can make the other threads not poll i.e. they will quit
> > > > reclaiming if they
> > > see that other writers are dequeuing from the queue.
> > >
> > > Actually didn't think about that possibility, but yes should be
> > > possible to have _try_ semantics too.
> > >
> > > >The other  way is to use per thread queues.
> > > >
> > > > The other requirement I see is to support unbounded-size data
> > > > structures where in the data structures do not have a
> > > > pre-determined number of entries. Also, currently the defer queue
> > > > size is equal to the total
> > > number of entries in a given data structure. There are plans to
> > > support dynamically resizable defer queue. This means, memory
> > > allocation which will affect the lock-free-ness of the solution.
> > > >
> > > > So, IMO:
> > > > 1) The API should provide the capability to support different
> > > > algorithms -
> > > may be through some flags?
> > > > 2) The requirements for the ring are pretty unique to the problem
> > > > we have here (for ex: move the cons-head only if cons-tail is also
> > > > the same, skip
> > > polling). So, we should probably implement a ring with-in the RCU library?
> > >
> > > Personally, I think such serialization ring API would be useful for
> > > other cases too.
> > > There are few cases when user need to read contents of the queue
> > > without removing elements from it.
> > > Let say we do use similar approach inside TLDK to implement TCP
> > > transmit queue.
> > > If such API would exist in DPDK we can just use it straightway,
> > > without maintaining a separate one.
> > ok
> >
> > >
> > > >
> > > > From the timeline perspective, adding all these capabilities would
> > > > be difficult to get done with in 19.11 timeline. What I have here
> > > > satisfies my current needs. I suggest that we make provisions in
> > > > APIs now to
> > > support all these features, but do the implementation in the coming
> releases.
> > > Does this sound ok for you?
> > >
> > > Not sure I understand your suggestion here...
> > > Could you explain it a bit more - how new API will look like and
> > > what would be left for the future.
> > For this patch, I suggest we do not add any more complexity. If
> > someone wants a lock-free/block-free mechanism, it is available by creating
> per thread defer queues.
> >
> > We push the following to the future:
> > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > lock-free/block-free reclamation will not be available (memory allocation
> requires locking). The memory for the defer queue will be allocated/freed in
> chunks of 'size' elements as the queue grows/shrinks.
> 
> That one is fine by me.
> In fact I don't know would be there a real use-case for dynamic defer queue
> for rcu var...
> But I suppose that's subject for another discussion.
Currently, the defer queue size is equal to the number of resources in the data structure. This is unnecessary as the reclamation is done regularly.
If a smaller queue size is used, the queue might get full (even after reclamation), in which case, the queue size should be increased.

> 
> >
> > 2) Constant size defer queue with lock-free and block-free reclamation
> > (single option). The defer queue will be of fixed length 'size'. If
> > the queue gets full an error is returned. The user could provide a 'size' equal
> to the number of elements in a data structure to ensure queue never gets full.
> 
> Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> - MP/MC
> - MP/SC
> - SP/SC
Just SP/SC

> - non MT at all (only same single thread can do enqueue and dequeue)
If MT safe is required, one should use 1 defer queue per thread for now.

> 
> And related question:
> What additional rte_ring API you plan to introduce in that case?
> - None
> - rte_ring_sc_peek()
rte_ring_peek will be changed to rte_ring_sc_peek

> - rte_ring_serial_dequeue()
> 
> >
> > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide
> > 2 #defines, one for dynamically variable size defer queue and the other for
> constant size defer queue.
> >
> > However, IMO, using per thread defer queue is a much simpler way to
> achieve 2. It does not add any significant burden to the user either.
> >
> > >
> > > >
> > > > >
> > > > > > +{
> > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > +	unsigned int n = 1;
> > > > > > +	if (count) {
> > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > +		return 0;
> > > > > > +	}
> > > > > > +	return -ENOENT;
> > > > > > +}
> > > > > > +
> > > > > >  #ifdef __cplusplus
> > > > > >  }
> > > > > >  #endif
> > > > > > --
> > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-11  5:03                 ` Honnappa Nagarahalli
@ 2019-10-11 14:41                   ` Ananyev, Konstantin
  2019-10-11 18:28                     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-11 14:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Friday, October 11, 2019 6:04 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; paulmck@linux.ibm.com
> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; Ruifeng Wang (Arm Technology
> China) <Ruifeng.Wang@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; dev@dpdk.org; nd <nd@arm.com>; nd
> <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v3 1/3] lib/ring: add peek API
> 
> >
> > > <snip>
> > >
> > > >
> > > > >
> > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > >
> > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > >
> > > > > > > The peek API allows fetching the next available object in the
> > > > > > > ring without dequeuing it. This helps in scenarios where
> > > > > > > dequeuing of objects depend on their value.
> > > > > > >
> > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > ---
> > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > ++++++++++++++++++++++++++++++
> > > > > > >  1 file changed, 30 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644
> > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring
> > > > > > > *r, void
> > > > > > **obj_table,
> > > > > > >  				r->cons.single, available);  }
> > > > > > >
> > > > > > > +/**
> > > > > > > + * Peek one object from a ring.
> > > > > > > + *
> > > > > > > + * The peek API allows fetching the next available object in
> > > > > > > +the ring
> > > > > > > + * without dequeuing it. This API is not multi-thread safe
> > > > > > > +with respect
> > > > > > > + * to other consumer threads.
> > > > > > > + *
> > > > > > > + * @param r
> > > > > > > + *   A pointer to the ring structure.
> > > > > > > + * @param obj_p
> > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > + * @return
> > > > > > > + *   - 0: Success, object available
> > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +static __rte_always_inline int rte_ring_peek(struct rte_ring
> > > > > > > +*r, void **obj_p)
> > > > > >
> > > > > > As it is not MT safe, then I think we need _sc_ in the name, to
> > > > > > follow other rte_ring functions naming conventions
> > > > > > (rte_ring_sc_peek() or so).
> > > > > Agree
> > > > >
> > > > > >
> > > > > > As a better alternative what do you think about introducing a
> > > > > > serialized versions of DPDK rte_ring dequeue functions?
> > > > > > Something like that:
> > > > > >
> > > > > > /* same as original ring dequeue, but:
> > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > >   * 2) don't update cons.tail
> > > > > >   */
> > > > > > unsigned int
> > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > **obj_table, unsigned int n,
> > > > > >                 unsigned int *available);
> > > > > >
> > > > > > /* sets both cons.head and cons.tail to cons.head + num */ void
> > > > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t
> > > > > > num);
> > > > > >
> > > > > > /* resets cons.head to const.tail value */ void
> > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > >
> > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > >
> > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl, n;
> > > > > > uintptr_t elt[nb_elt]; ...
> > > > > >
> > > > > > do {
> > > > > >
> > > > > >   /* read next elem from the queue */
> > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > >   if (n == 0)
> > > > > >       break;
> > > > > >
> > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > elt[0]) != 1) {
> > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > >      break;
> > > > > >   }
> > > > > >
> > > > > >   /* can reclaim, remove elem from the queue */
> > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > >
> > > > > >    /*call reclaim function */
> > > > > >   dr->f(dr->p, elt);
> > > > > >
> > > > > > } while (avl >= nb_elt);
> > > > > >
> > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > >
> > > > > I think it is a great idea. The other writers would still be
> > > > > polling for the current writer to update the tail or update the
> > > > > head. This makes it a
> > > > blocking solution.
> > > >
> > > > Yep, it is a blocking one.
> > > >
> > > > > We can make the other threads not poll i.e. they will quit
> > > > > reclaiming if they
> > > > see that other writers are dequeuing from the queue.
> > > >
> > > > Actually didn't think about that possibility, but yes should be
> > > > possible to have _try_ semantics too.
> > > >
> > > > >The other  way is to use per thread queues.
> > > > >
> > > > > The other requirement I see is to support unbounded-size data
> > > > > structures where in the data structures do not have a
> > > > > pre-determined number of entries. Also, currently the defer queue
> > > > > size is equal to the total
> > > > number of entries in a given data structure. There are plans to
> > > > support dynamically resizable defer queue. This means, memory
> > > > allocation which will affect the lock-free-ness of the solution.
> > > > >
> > > > > So, IMO:
> > > > > 1) The API should provide the capability to support different
> > > > > algorithms -
> > > > may be through some flags?
> > > > > 2) The requirements for the ring are pretty unique to the problem
> > > > > we have here (for ex: move the cons-head only if cons-tail is also
> > > > > the same, skip
> > > > polling). So, we should probably implement a ring with-in the RCU library?
> > > >
> > > > Personally, I think such serialization ring API would be useful for
> > > > other cases too.
> > > > There are few cases when user need to read contents of the queue
> > > > without removing elements from it.
> > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > transmit queue.
> > > > If such API would exist in DPDK we can just use it straightway,
> > > > without maintaining a separate one.
> > > ok
> > >
> > > >
> > > > >
> > > > > From the timeline perspective, adding all these capabilities would
> > > > > be difficult to get done with in 19.11 timeline. What I have here
> > > > > satisfies my current needs. I suggest that we make provisions in
> > > > > APIs now to
> > > > support all these features, but do the implementation in the coming
> > releases.
> > > > Does this sound ok for you?
> > > >
> > > > Not sure I understand your suggestion here...
> > > > Could you explain it a bit more - how new API will look like and
> > > > what would be left for the future.
> > > For this patch, I suggest we do not add any more complexity. If
> > > someone wants a lock-free/block-free mechanism, it is available by creating
> > per thread defer queues.
> > >
> > > We push the following to the future:
> > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > lock-free/block-free reclamation will not be available (memory allocation
> > requires locking). The memory for the defer queue will be allocated/freed in
> > chunks of 'size' elements as the queue grows/shrinks.
> >
> > That one is fine by me.
> > In fact I don't know would be there a real use-case for dynamic defer queue
> > for rcu var...
> > But I suppose that's subject for another discussion.
> Currently, the defer queue size is equal to the number of resources in the data structure. This is unnecessary as the reclamation is done
> regularly.
> If a smaller queue size is used, the queue might get full (even after reclamation), in which case, the queue size should be increased.

I understand the intention.
Though I am not very happy with approach where to free one resource we first have to allocate another one.
Sounds like a source of deadlocks and for that case probably unnecessary complication.
But again, as it is not for 19.11 we don't have to discuss it now.
 
> >
> > >
> > > 2) Constant size defer queue with lock-free and block-free reclamation
> > > (single option). The defer queue will be of fixed length 'size'. If
> > > the queue gets full an error is returned. The user could provide a 'size' equal
> > to the number of elements in a data structure to ensure queue never gets full.
> >
> > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > - MP/MC
> > - MP/SC
> > - SP/SC
> Just SP/SC

Ok, just to confirm we are on the same page:
there would be a possibility for one thread do dq_enqueue(), second one do  dq_reclaim() simultaneously
(of course if actual reclamation function is thread safe)?
 
> > - non MT at all (only same single thread can do enqueue and dequeue)
> If MT safe is required, one should use 1 defer queue per thread for now.
> 
> >
> > And related question:
> > What additional rte_ring API you plan to introduce in that case?
> > - None
> > - rte_ring_sc_peek()
> rte_ring_peek will be changed to rte_ring_sc_peek
> 
> > - rte_ring_serial_dequeue()
> >
> > >
> > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide
> > > 2 #defines, one for dynamically variable size defer queue and the other for
> > constant size defer queue.
> > >
> > > However, IMO, using per thread defer queue is a much simpler way to
> > achieve 2. It does not add any significant burden to the user either.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > +{
> > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > +	unsigned int n = 1;
> > > > > > > +	if (count) {
> > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > +		return 0;
> > > > > > > +	}
> > > > > > > +	return -ENOENT;
> > > > > > > +}
> > > > > > > +
> > > > > > >  #ifdef __cplusplus
> > > > > > >  }
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-11 14:41                   ` Ananyev, Konstantin
@ 2019-10-11 18:28                     ` Honnappa Nagarahalli
  2019-10-13 20:09                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-11 18:28 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> > > >
> > > > >
> > > > > >
> > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > >
> > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > >
> > > > > > > > The peek API allows fetching the next available object in
> > > > > > > > the ring without dequeuing it. This helps in scenarios
> > > > > > > > where dequeuing of objects depend on their value.
> > > > > > > >
> > > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > ---
> > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18
> > > > > > > > 100644
> > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > rte_ring *r, void
> > > > > > > **obj_table,
> > > > > > > >  				r->cons.single, available);  }
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * Peek one object from a ring.
> > > > > > > > + *
> > > > > > > > + * The peek API allows fetching the next available object
> > > > > > > > +in the ring
> > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > +safe with respect
> > > > > > > > + * to other consumer threads.
> > > > > > > > + *
> > > > > > > > + * @param r
> > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > + * @param obj_p
> > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > + * @return
> > > > > > > > + *   - 0: Success, object available
> > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > >
> > > > > > > As it is not MT safe, then I think we need _sc_ in the name,
> > > > > > > to follow other rte_ring functions naming conventions
> > > > > > > (rte_ring_sc_peek() or so).
> > > > > > Agree
> > > > > >
> > > > > > >
> > > > > > > As a better alternative what do you think about introducing
> > > > > > > a serialized versions of DPDK rte_ring dequeue functions?
> > > > > > > Something like that:
> > > > > > >
> > > > > > > /* same as original ring dequeue, but:
> > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > >   * 2) don't update cons.tail
> > > > > > >   */
> > > > > > > unsigned int
> > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > **obj_table, unsigned int n,
> > > > > > >                 unsigned int *available);
> > > > > > >
> > > > > > > /* sets both cons.head and cons.tail to cons.head + num */
> > > > > > > void rte_ring_serial_dequeue_finish(struct rte_ring *r,
> > > > > > > uint32_t num);
> > > > > > >
> > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > >
> > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > >
> > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl,
> > > > > > > n; uintptr_t elt[nb_elt]; ...
> > > > > > >
> > > > > > > do {
> > > > > > >
> > > > > > >   /* read next elem from the queue */
> > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > >   if (n == 0)
> > > > > > >       break;
> > > > > > >
> > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > elt[0]) != 1) {
> > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > >      break;
> > > > > > >   }
> > > > > > >
> > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > >
> > > > > > >    /*call reclaim function */
> > > > > > >   dr->f(dr->p, elt);
> > > > > > >
> > > > > > > } while (avl >= nb_elt);
> > > > > > >
> > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > > >
> > > > > > I think it is a great idea. The other writers would still be
> > > > > > polling for the current writer to update the tail or update
> > > > > > the head. This makes it a
> > > > > blocking solution.
> > > > >
> > > > > Yep, it is a blocking one.
> > > > >
> > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > reclaiming if they
> > > > > see that other writers are dequeuing from the queue.
> > > > >
> > > > > Actually didn't think about that possibility, but yes should be
> > > > > possible to have _try_ semantics too.
> > > > >
> > > > > >The other  way is to use per thread queues.
> > > > > >
> > > > > > The other requirement I see is to support unbounded-size data
> > > > > > structures where in the data structures do not have a
> > > > > > pre-determined number of entries. Also, currently the defer
> > > > > > queue size is equal to the total
> > > > > number of entries in a given data structure. There are plans to
> > > > > support dynamically resizable defer queue. This means, memory
> > > > > allocation which will affect the lock-free-ness of the solution.
> > > > > >
> > > > > > So, IMO:
> > > > > > 1) The API should provide the capability to support different
> > > > > > algorithms -
> > > > > may be through some flags?
> > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > cons-tail is also the same, skip
> > > > > polling). So, we should probably implement a ring with-in the RCU
> library?
> > > > >
> > > > > Personally, I think such serialization ring API would be useful
> > > > > for other cases too.
> > > > > There are few cases when user need to read contents of the queue
> > > > > without removing elements from it.
> > > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > > transmit queue.
> > > > > If such API would exist in DPDK we can just use it straightway,
> > > > > without maintaining a separate one.
> > > > ok
> > > >
> > > > >
> > > > > >
> > > > > > From the timeline perspective, adding all these capabilities
> > > > > > would be difficult to get done with in 19.11 timeline. What I
> > > > > > have here satisfies my current needs. I suggest that we make
> > > > > > provisions in APIs now to
> > > > > support all these features, but do the implementation in the
> > > > > coming
> > > releases.
> > > > > Does this sound ok for you?
> > > > >
> > > > > Not sure I understand your suggestion here...
> > > > > Could you explain it a bit more - how new API will look like and
> > > > > what would be left for the future.
> > > > For this patch, I suggest we do not add any more complexity. If
> > > > someone wants a lock-free/block-free mechanism, it is available by
> > > > creating
> > > per thread defer queues.
> > > >
> > > > We push the following to the future:
> > > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > > lock-free/block-free reclamation will not be available (memory
> > > > allocation
> > > requires locking). The memory for the defer queue will be
> > > allocated/freed in chunks of 'size' elements as the queue grows/shrinks.
> > >
> > > That one is fine by me.
> > > In fact I don't know would be there a real use-case for dynamic
> > > defer queue for rcu var...
> > > But I suppose that's subject for another discussion.
> > Currently, the defer queue size is equal to the number of resources in
> > the data structure. This is unnecessary as the reclamation is done regularly.
> > If a smaller queue size is used, the queue might get full (even after
> reclamation), in which case, the queue size should be increased.
> 
> I understand the intention.
> Though I am not very happy with approach where to free one resource we first
> have to allocate another one.
> Sounds like a source of deadlocks and for that case probably unnecessary
> complication.
It depends on the use case. For some use cases lock-free reader-writer concurrency is enough (in which case there is no need to have a queue large enough to hold all the resources) and some would require lock-free reader-writer and writer-writer concurrency (where, theoretically, a queue large enough to hold all the resources would be required).

> But again, as it is not for 19.11 we don't have to discuss it now.
> 
> > >
> > > >
> > > > 2) Constant size defer queue with lock-free and block-free
> > > > reclamation (single option). The defer queue will be of fixed
> > > > length 'size'. If the queue gets full an error is returned. The
> > > > user could provide a 'size' equal
> > > to the number of elements in a data structure to ensure queue never gets
> full.
> > >
> > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > - MP/MC
> > > - MP/SC
> > > - SP/SC
> > Just SP/SC
> 
> Ok, just to confirm we are on the same page:
> there would be a possibility for one thread do dq_enqueue(), second one do
> dq_reclaim() simultaneously (of course if actual reclamation function is thread
> safe)?
Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.

> 
> > > - non MT at all (only same single thread can do enqueue and dequeue)
> > If MT safe is required, one should use 1 defer queue per thread for now.
> >
> > >
> > > And related question:
> > > What additional rte_ring API you plan to introduce in that case?
> > > - None
> > > - rte_ring_sc_peek()
> > rte_ring_peek will be changed to rte_ring_sc_peek
> >
> > > - rte_ring_serial_dequeue()
> > >
> > > >
> > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > provide
> > > > 2 #defines, one for dynamically variable size defer queue and the
> > > > other for
> > > constant size defer queue.
> > > >
> > > > However, IMO, using per thread defer queue is a much simpler way
> > > > to
> > > achieve 2. It does not add any significant burden to the user either.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > +{
> > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > +	unsigned int n = 1;
> > > > > > > > +	if (count) {
> > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > > +		return 0;
> > > > > > > > +	}
> > > > > > > > +	return -ENOENT;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  #ifdef __cplusplus
> > > > > > > >  }
> > > > > > > >  #endif
> > > > > > > > --
> > > > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-07 13:11       ` Medvedkin, Vladimir
@ 2019-10-13  3:02         ` Honnappa Nagarahalli
  2019-10-15 16:48           ` Medvedkin, Vladimir
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-13  3:02 UTC (permalink / raw)
  To: Medvedkin, Vladimir, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

Hi Vladimir,
	Apologies for the delayed response, I had to run few experiments.

<snip>

> 
> Hi Honnappa,
> 
> On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
> > Add resource reclamation APIs to make it simple for applications and
> > libraries to integrate rte_rcu library.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >   app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> >   lib/librte_rcu/meson.build         |   2 +
> >   lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> >   lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> >   lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> >   lib/librte_rcu/rte_rcu_version.map |   4 +
> >   lib/meson.build                    |   6 +-
> >   7 files changed, 700 insertions(+), 3 deletions(-)
> >   create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >
> > diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c index
> > d1b9e46a2..3a6815243 100644
> > --- a/app/test/test_rcu_qsbr.c
> > +++ b/app/test/test_rcu_qsbr.c
> > @@ -1,8 +1,9 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright (c) 2018 Arm Limited
> > + * Copyright (c) 2019 Arm Limited
> >    */
> >
> >   #include <stdio.h>
> > +#include <string.h>
> >   #include <rte_pause.h>
> >   #include <rte_rcu_qsbr.h>
> >   #include <rte_hash.h>
> > @@ -33,6 +34,7 @@ static uint32_t *keys;
> >   #define COUNTER_VALUE 4096
> >   static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
> >   static uint8_t writer_done;
> > +static uint8_t cb_failed;
> >
> >   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
> >   struct rte_hash *h[RTE_MAX_LCORE];
> > @@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
> >   	return 0;
> >   }
> >
> > +static void
> > +rte_rcu_qsbr_test_free_resource(void *p, void *e) {
> > +	if (p != NULL && e != NULL) {
> > +		printf("%s: Test failed\n", __func__);
> > +		cb_failed = 1;
> > +	}
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_create: create a queue used to store the data
> > +structure
> > + * elements that can be freed later. This queue is referred to as 'defer
> queue'.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_create(void)
> > +{
> > +	char rcu_dq_name[RTE_RING_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +	struct rte_rcu_qsbr_dq *dq;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
> > +
> > +	/* Pass invalid parameters */
> > +	dq = rte_rcu_qsbr_dq_create(NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> > +	params.name = rcu_dq_name;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	params.f = rte_rcu_qsbr_test_free_resource;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> > +	params.v = t[0];
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	params.size = 1;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	params.esize = 3;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> > +params");
> > +
> > +	/* Pass all valid parameters */
> > +	params.esize = 16;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> params");
> > +	rte_rcu_qsbr_dq_delete(dq);
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> > + * to be freed later after atleast one grace period is over.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_enqueue(void)
> > +{
> > +	int ret;
> > +	uint64_t r;
> > +	char rcu_dq_name[RTE_RING_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +	struct rte_rcu_qsbr_dq *dq;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
> > +
> > +	/* Create a queue with simple parameters */
> > +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> > +	params.name = rcu_dq_name;
> > +	params.f = rte_rcu_qsbr_test_free_resource;
> > +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> > +	params.v = t[0];
> > +	params.size = 1;
> > +	params.esize = 16;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> > +params");
> > +
> > +	/* Pass invalid parameters */
> > +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> > +params");
> > +
> > +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> > +params");
> > +
> > +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> > +params");
> > +
> > +	ret = rte_rcu_qsbr_dq_delete(dq);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid
> params");
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_reclaim(void)
> > +{
> > +	int ret;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
> > +
> > +	/* Pass invalid parameters */
> > +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid
> > +params");
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_delete(void)
> > +{
> > +	int ret;
> > +	char rcu_dq_name[RTE_RING_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +	struct rte_rcu_qsbr_dq *dq;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
> > +
> > +	/* Pass invalid parameters */
> > +	ret = rte_rcu_qsbr_dq_delete(NULL);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid
> > +params");
> > +
> > +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> > +	params.name = rcu_dq_name;
> > +	params.f = rte_rcu_qsbr_test_free_resource;
> > +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> > +	params.v = t[0];
> > +	params.size = 1;
> > +	params.esize = 16;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> params");
> > +	ret = rte_rcu_qsbr_dq_delete(dq);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
> params");
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
> > + * to be freed later after atleast one grace period is over.
> > + */
> > +static int
> > +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize) {
> > +	int i, j, ret;
> > +	char rcu_dq_name[RTE_RING_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +	struct rte_rcu_qsbr_dq *dq;
> > +	uint64_t *e;
> > +	uint64_t sc = 200;
> > +	int max_entries;
> > +
> > +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
> > +	printf("Size = %d, esize = %d\n", size, esize);
> > +
> > +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
> > +	if (e == NULL)
> > +		return 0;
> > +	cb_failed = 0;
> > +
> > +	/* Initialize the RCU variable. No threads are registered */
> > +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> > +
> > +	/* Create a queue with simple parameters */
> > +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> > +	params.name = rcu_dq_name;
> > +	params.f = rte_rcu_qsbr_test_free_resource;
> > +	params.v = t[0];
> > +	params.size = size;
> > +	params.esize = esize;
> > +	dq = rte_rcu_qsbr_dq_create(&params);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> > +params");
> > +
> > +	/* Given the size and esize, calculate the maximum number of entries
> > +	 * that can be stored on the defer queue (look at the logic used
> > +	 * in capacity calculation of rte_ring).
> > +	 */
> > +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
> > +	max_entries = (max_entries - 1)/(esize/8 + 1);
> > +
> > +	/* Enqueue few counters starting with the value 'sc' */
> > +	/* The queue size will be rounded up to 2. The enqueue API also
> > +	 * reclaims if the queue size is above certain limit. Since, there
> > +	 * are no threads registered, reclamation succedes. Hence, it should
> > +	 * be possible to enqueue more than the provided queue size.
> > +	 */
> > +	for (i = 0; i < 10; i++) {
> > +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> > +			"dq enqueue functional");
> > +		for (j = 0; j < esize/8; j++)
> > +			e[j] = sc++;
> > +	}
> > +
> > +	/* Register a thread on the RCU QSBR variable. Reclamation will not
> > +	 * succeed. It should not be possible to enqueue more than the size
> > +	 * number of resources.
> > +	 */
> > +	rte_rcu_qsbr_thread_register(t[0], 1);
> > +	rte_rcu_qsbr_thread_online(t[0], 1);
> > +
> > +	for (i = 0; i < max_entries; i++) {
> > +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> > +			"dq enqueue functional");
> > +		for (j = 0; j < esize/8; j++)
> > +			e[j] = sc++;
> > +	}
> > +
> > +	/* Enqueue fails as queue is full */
> > +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
> functional");
> > +
> > +	/* Delete should fail as there are elements in defer queue which
> > +	 * cannot be reclaimed.
> > +	 */
> > +	ret = rte_rcu_qsbr_dq_delete(dq);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid
> params");
> > +
> > +	/* Report quiescent state, enqueue should succeed */
> > +	rte_rcu_qsbr_quiescent(t[0], 1);
> > +	for (i = 0; i < max_entries; i++) {
> > +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> > +			"dq enqueue functional");
> > +		for (j = 0; j < esize/8; j++)
> > +			e[j] = sc++;
> > +	}
> > +
> > +	/* Queue is full */
> > +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
> functional");
> > +
> > +	/* Report quiescent state, delete should succeed */
> > +	rte_rcu_qsbr_quiescent(t[0], 1);
> > +	ret = rte_rcu_qsbr_dq_delete(dq);
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
> params");
> > +
> > +	/* Validate that call back function did not return any error */
> > +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
> > +
> > +	rte_free(e);
> > +	return 0;
> > +}
> > +
> >   /*
> >    * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
> >    */
> > @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
> >   	if (test_rcu_qsbr_thread_offline() < 0)
> >   		goto test_fail;
> >
> > +	if (test_rcu_qsbr_dq_create() < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_reclaim() < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_delete() < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_enqueue() < 0)
> > +		goto test_fail;
> > +
> >   	printf("\nFunctional tests\n");
> >
> >   	if (test_rcu_qsbr_sw_sv_3qs() < 0)
> > @@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
> >   	if (test_rcu_qsbr_mw_mv_mqs() < 0)
> >   		goto test_fail;
> >
> > +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
> > +		goto test_fail;
> > +
> > +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
> > +		goto test_fail;
> > +
> >   	free_rcu();
> >
> >   	printf("\n");
> > diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> > index 62920ba02..e280b29c1 100644
> > --- a/lib/librte_rcu/meson.build
> > +++ b/lib/librte_rcu/meson.build
> > @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
> >   if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> >   	ext_deps += cc.find_library('atomic')
> >   endif
> > +
> > +deps += ['ring']
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > @@ -21,6 +21,7 @@
> >   #include <rte_errno.h>
> >
> >   #include "rte_rcu_qsbr.h"
> > +#include "rte_rcu_qsbr_pvt.h"
> >
> >   /* Get the memory size of QSBR variable */
> >   size_t
> > @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr
> *v)
> >   	return 0;
> >   }
> >
> > +/* Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + */
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params) {
> > +	struct rte_rcu_qsbr_dq *dq;
> > +	uint32_t qs_fifo_size;
> > +
> > +	if (params == NULL || params->f == NULL ||
> > +		params->v == NULL || params->name == NULL ||
> > +		params->size == 0 || params->esize == 0 ||
> > +		(params->esize % 8 != 0)) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	dq = rte_zmalloc(NULL,
> > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > +		RTE_CACHE_LINE_SIZE);
> > +	if (dq == NULL) {
> > +		rte_errno = ENOMEM;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	/* round up qs_fifo_size to next power of two that is not less than
> > +	 * max_size.
> > +	 */
> > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > +					* params->size) + 1);
> > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > +					SOCKET_ID_ANY, 0);
> > +	if (dq->r == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): defer queue create failed\n", __func__);
> > +		rte_free(dq);
> > +		return NULL;
> > +	}
> > +
> > +	dq->v = params->v;
> > +	dq->size = params->size;
> > +	dq->esize = params->esize;
> > +	dq->f = params->f;
> > +	dq->p = params->p;
> > +
> > +	return dq;
> > +}
> > +
> > +/* Enqueue one resource to the defer queue to free after the grace
> > + * period is over.
> > + */
> > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > +	uint64_t token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +	uint32_t cur_size, free_size;
> > +
> > +	if (dq == NULL || e == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Start the grace period */
> > +	token = rte_rcu_qsbr_start(dq->v);
> > +
> > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > +	 * the queue from growing too large and allows time for reader
> > +	 * threads to report their quiescent state.
> > +	 */
> > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +			"%s(): Triggering reclamation\n", __func__);
> > +		rte_rcu_qsbr_dq_reclaim(dq);
> > +	}
> 
> There are two problems I see:
> 
> 1. rte_rcu_qsbr_dq_reclaim() reclaims only 1/16 of the defer queue while it
> triggers on 1/8. This means that there will always be 1/16 of non reclaimed
> entries in the queue.
There will be 'at least' 1/16 non-reclaimed entries. It could be more depending on the length of the grace period and the rate of deletion.
The trigger of 1/8 is used to give sufficient time for the readers to report their quiescent state. 1/16 is used to spread the load of reclamation across multiple calls and provide a upper bound on the cycles consumed.

> 
> 2. Number of entries to reclaim depend on dq->size. So,
> rte_rcu_qsbr_dq_reclaim() could take a lot of cycles. For LPM library this
That is true. It depends on dq->size (number of tbl8 groups). However, note that there is patch [1] which provides batch reclamation kind of behavior which reduces the cycles consumed by reclamation significantly.

[1] https://patches.dpdk.org/patch/58960/

> means that rte_lpm_delete() sometimes takes a long time.
Agree, sometimes takes additional time. It is good to spread it over multiple calls.

> 
> So, my suggestions here would be
> 
> - trigger rte_rcu_qsbr_dq_reclaim() with every enqueue
Given that the LPM APIs are mainly for control plane, I would think that, the next time LPM API is called, the readers have completed the grace period. But if there are frequent updates, we might end up with empty reclaims which will waste cycles. IMO, this trigger should happen after at least few entries are in the queue. 

> 
> - reclaim small amount of entries (could be configurable of creation time)
Agree. I would keep it a smaller than the trigger amount knowing that the elements added right before the trigger might not have completed the grace period.

> 
> - provide API to trigger reclaim from the application manually.
IMO, this will add additional complexity to the application. I agree that there will be special needs for some applications. I think those applications might have to implement their own methods using the base RCU APIs.
Instead, as agreed in other threads, I suggest we expose the parameters (when to trigger and how much to reclaim) to the application as optional configurable parameters. i.e. if the application does not provide we can use default values. I think this should provide enough flexibility to the application.

> 
> > +
> > +	/* Check if there is space for atleast for 1 resource */
> > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > +	if (!free_size) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Defer queue is full\n", __func__);
> > +		rte_errno = ENOSPC;
> > +		return 1;
> > +	}
> > +
> > +	/* Enqueue the resource */
> > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > +
> > +	/* The resource to enqueue needs to be a multiple of 64b
> > +	 * due to the limitation of the rte_ring implementation.
> > +	 */
> > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Reclaim resources from the defer queue. */ int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > +	uint32_t max_cnt;
> > +	uint32_t cnt;
> > +	void *token;
> > +	uint64_t *tmp;
> > +	uint32_t i;
> > +
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Anything to reclaim? */
> > +	if (rte_ring_count(dq->r) == 0)
> > +		return 0;
> > +
> > +	/* Reclaim at the max 1/16th the total number of entries. */
> > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > +	cnt = 0;
> > +
> > +	/* Check reader threads quiescent state and reclaim resources */
> > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > +			== 1)) {
> > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > +		/* The resource to dequeue needs to be a multiple of 64b
> > +		 * due to the limitation of the rte_ring implementation.
> > +		 */
> > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > +			i++, tmp++)
> > +			(void)rte_ring_sc_dequeue(dq->r,
> > +					(void *)(uintptr_t)tmp);
> > +		dq->f(dq->p, dq->e);
> > +
> > +		cnt++;
> > +	}
> > +
> > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > +
> > +	if (cnt == 0) {
> > +		/* No resources were reclaimed */
> > +		rte_errno = EAGAIN;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/* Delete a defer queue. */
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Reclaim all the resources */
> > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > +		/* Error number is already set by the reclaim API */
> > +		return 1;
> > +
> > +	rte_ring_free(dq->r);
> > +	rte_free(dq);
> > +
> > +	return 0;
> > +}
> > +
> >   int rte_rcu_log_type;
> >
> >   RTE_INIT(rte_rcu_register)
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > @@ -34,6 +34,7 @@ extern "C" {
> >   #include <rte_lcore.h>
> >   #include <rte_debug.h>
> >   #include <rte_atomic.h>
> > +#include <rte_ring.h>
> >
> >   extern int rte_rcu_log_type;
> >
> > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> >   	 */
> >   } __rte_cache_aligned;
> >
> > +/**
> > + * Call back function called to free the resources.
> > + *
> > + * @param p
> > + *   Pointer provided while creating the defer queue
> > + * @param e
> > + *   Pointer to the resource data stored on the defer queue
> > + *
> > + * @return
> > + *   None
> > + */
> > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > +
> > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > +
> > +/**
> > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > + */
> > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > +
> > +/**
> > + *  Reclaim at the max 1/16th the total number of resources.
> > + */
> > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > +
> > +/**
> > + * Parameters used when creating the defer queue.
> > + */
> > +struct rte_rcu_qsbr_dq_parameters {
> > +	const char *name;
> > +	/**< Name of the queue. */
> > +	uint32_t size;
> > +	/**< Number of entries in queue. Typically, this will be
> > +	 *   the same as the maximum number of entries supported in the
> > +	 *   lock free data structure.
> > +	 *   Data structures with unbounded number of entries is not
> > +	 *   supported currently.
> > +	 */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of each element in the defer queue.
> > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > +	 *   support 8B element sizes only.
> > +	 */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs. This can be NULL.
> > +	 */
> > +	struct rte_rcu_qsbr *v;
> > +	/**< RCU QSBR variable to use for this defer queue */ };
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq;
> > +
> >   /**
> >    * @warning
> >    * @b EXPERIMENTAL: this API may change without prior notice @@
> > -648,6 +710,113 @@ __rte_experimental
> >   int
> >   rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + *
> > + * @param params
> > + *   Parameters to create a defer queue.
> > + * @return
> > + *   On success - Valid pointer to defer queue
> > + *   On error - NULL
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOMEM - Not enough memory
> > + */
> > +__rte_experimental
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Enqueue one resource to the defer queue and start the grace period.
> > + * The resource will be freed later after at least one grace period
> > + * is over.
> > + *
> > + * If the defer queue is full, it will attempt to reclaim resources.
> > + * It will also reclaim resources at regular intervals to avoid
> > + * the defer queue from growing too big.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to allocate an entry from.
> > + * @param e
> > + *   Pointer to resource data to copy to the defer queue. The size of
> > + *   the data to copy is equal to the element size provided when the
> > + *   defer queue was created.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > + *		if the defer queue size is equal (or larger) than the
> > + *		number of elements in the data structure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Reclaim resources from the defer queue.
> > + *
> > + * This API is not multi-thread safe. It is expected that the caller
> > + * provides multi-thread safety by locking a mutex or some other means.
> > + *
> > + * A lock free multi-thread writer algorithm could achieve
> > +multi-thread
> > + * safety by creating and using one defer queue per thread.
> > + *
> > + * @param dq
> > + *   Defer queue to reclaim an entry from.
> > + * @return
> > + *   On successful reclamation of at least 1 resource - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - None of the resources have completed at least 1 grace
> period,
> > + *		try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Delete a defer queue.
> > + *
> > + * It tries to reclaim all the resources on the defer queue.
> > + * If any of the resources have not completed the grace period
> > + * the reclamation stops and returns immediately. The rest of
> > + * the resources are not reclaimed and the defer queue is not
> > + * freed.
> > + *
> > + * @param dq
> > + *   Defer queue to delete.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > + *		period, try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > +
> >   #ifdef __cplusplus
> >   }
> >   #endif
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > new file mode 100644
> > index 000000000..2122bc36a
> > --- /dev/null
> > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > +#define _RTE_RCU_QSBR_PVT_H_
> > +
> > +/**
> > + * This file is private to the RCU library. It should not be included
> > + * by the user of this library.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include "rte_rcu_qsbr.h"
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq {
> > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > +	uint32_t size;
> > +	/**< Number of elements in the defer queue */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of data stored on the defer queue */
> > +	rte_rcu_qsbr_free_resource f;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs.
> > +	 */
> > +	char e[0];
> > +	/**< Temporary storage to copy the defer queue element. */ };
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > b/lib/librte_rcu/rte_rcu_version.map
> > index f8b9ef2ab..dfac88a37 100644
> > --- a/lib/librte_rcu/rte_rcu_version.map
> > +++ b/lib/librte_rcu/rte_rcu_version.map
> > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >   	rte_rcu_qsbr_synchronize;
> >   	rte_rcu_qsbr_thread_register;
> >   	rte_rcu_qsbr_thread_unregister;
> > +	rte_rcu_qsbr_dq_create;
> > +	rte_rcu_qsbr_dq_enqueue;
> > +	rte_rcu_qsbr_dq_reclaim;
> > +	rte_rcu_qsbr_dq_delete;
> >
> >   	local: *;
> >   };
> > diff --git a/lib/meson.build b/lib/meson.build index
> > e5ff83893..0e1be8407 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,7 +11,9 @@
> >   libraries = [
> >   	'kvargs', # eal depends on kvargs
> >   	'eal', # everything depends on eal
> > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > +	'ring',
> > +	'rcu', # rcu depends on ring
> > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >   	'cmdline',
> >   	'metrics', # bitrate/latency stats depends on this
> >   	'hash',    # efd depends on this
> > @@ -22,7 +24,7 @@ libraries = [
> >   	'gro', 'gso', 'ip_frag', 'jobstats',
> >   	'kni', 'latencystats', 'lpm', 'member',
> >   	'power', 'pdump', 'rawdev',
> > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >   	# ipsec lib depends on net, crypto and security
> >   	'ipsec',
> >   	# add pkt framework libs which use other libs from above
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-07 10:46               ` Ananyev, Konstantin
@ 2019-10-13  4:35                 ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-13  4:35 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> > > > > > Add resource reclamation APIs to make it simple for
> > > > > > applications and libraries to integrate rte_rcu library.
> > > > > >
> > > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > ---
> > > > > >  app/test/test_rcu_qsbr.c           | 291
> ++++++++++++++++++++++++++++-
> > > > > >  lib/librte_rcu/meson.build         |   2 +
> > > > > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > > > > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > > > > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > > > > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > > > > >  lib/meson.build                    |   6 +-
> > > > > >  7 files changed, 700 insertions(+), 3 deletions(-)  create
> > > > > > mode
> > > > > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > >
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b
> > > > > > 100644
> > > > > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > > > > @@ -21,6 +21,7 @@
> > > > > >  #include <rte_errno.h>
> > > > > >
> > > > > >  #include "rte_rcu_qsbr.h"
> > > > > > +#include "rte_rcu_qsbr_pvt.h"
> > > > > >
> > > > > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6
> > > > > > +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > > > > >  	return 0;
> > > > > >  }
> > > > > >
> > > > > > +/* Create a queue used to store the data structure elements
> > > > > > +that can
> > > > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > > > + */
> > > > > > +struct rte_rcu_qsbr_dq *
> > > > > > +rte_rcu_qsbr_dq_create(const struct
> > > > > > +rte_rcu_qsbr_dq_parameters
> > > > > > +*params) {
> > > > > > +	struct rte_rcu_qsbr_dq *dq;
> > > > > > +	uint32_t qs_fifo_size;
> > > > > > +
> > > > > > +	if (params == NULL || params->f == NULL ||
> > > > > > +		params->v == NULL || params->name == NULL ||
> > > > > > +		params->size == 0 || params->esize == 0 ||
> > > > > > +		(params->esize % 8 != 0)) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return NULL;
> > > > > > +	}
> > > > > > +
> > > > > > +	dq = rte_zmalloc(NULL,
> > > > > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > > > > +		RTE_CACHE_LINE_SIZE);
> > > > > > +	if (dq == NULL) {
> > > > > > +		rte_errno = ENOMEM;
> > > > > > +
> > > > > > +		return NULL;
> > > > > > +	}
> > > > > > +
> > > > > > +	/* round up qs_fifo_size to next power of two that is not less
> than
> > > > > > +	 * max_size.
> > > > > > +	 */
> > > > > > +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> > > > > > +					* params->size) + 1);
> > > > > > +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> > > > > > +					SOCKET_ID_ANY, 0);
> > > > >
> > > > > If it is going to be not MT safe, then why not to create the
> > > > > ring with (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> > > > Agree.
> > > >
> > > > > Though I think it could be changed to allow MT safe multiple
> > > > > enqeue/single dequeue, see below.
> > > > The MT safe issue is due to reclaim code. The reclaim code has the
> > > > following
> > > sequence:
> > > >
> > > > rte_ring_peek
> > > > rte_rcu_qsbr_check
> > > > rte_ring_dequeue
> > > >
> > > > This entire sequence needs to be atomic as the entry cannot be
> > > > dequeued
> > > without knowing that the grace period for that entry is over.
> > >
> > > I understand that, though I believe at least it should be possible
> > > to support multiple-enqueue/single dequeuer and reclaim mode.
> > > With serialized dequeue() even multiple dequeue should be possible.
> > Agreed. Please see the response on the other thread.
> >
> > >
> > > > Note that due to optimizations in rte_rcu_qsbr_check API, this
> > > > sequence should not be large in most cases. I do not have ideas on
> > > > how to
> > > make this sequence lock-free.
> > > >
> > > > If the writer is on the control plane, most use cases will use
> > > > mutex locks for synchronization if they are multi-threaded. That
> > > > lock should be
> > > enough to provide the thread safety for these APIs.
> > >
> > > In that is case, why do we need ring at all?
> > > For sure people can create their own queue quite easily with mutex and
> TAILQ.
> > > If performance is not an issue, they can even add pthread_cond to
> > > it, and have an ability for the consumer to sleep/wakeup on empty/full
> queue.
> > >
> > > >
> > > > If the writer is multi-threaded and lock-free, then one should use
> > > > per thread
> > > defer queue.
> > >
> > > If that's the only working model, then the question is why do we
> > > need that API at all?
> > > Just simple array with counter or linked-list should do for majority of
> cases.
> > Please see the other thread.
> >
> > >
> > > >
> > > > >
> > > > > > +	if (dq->r == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): defer queue create failed\n",
> __func__);
> > > > > > +		rte_free(dq);
> > > > > > +		return NULL;
> > > > > > +	}
> > > > > > +
> > > > > > +	dq->v = params->v;
> > > > > > +	dq->size = params->size;
> > > > > > +	dq->esize = params->esize;
> > > > > > +	dq->f = params->f;
> > > > > > +	dq->p = params->p;
> > > > > > +
> > > > > > +	return dq;
> > > > > > +}
> > > > > > +
> > > > > > +/* Enqueue one resource to the defer queue to free after the
> > > > > > +grace
> > > > > > + * period is over.
> > > > > > + */
> > > > > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> {
> > > > > > +	uint64_t token;
> > > > > > +	uint64_t *tmp;
> > > > > > +	uint32_t i;
> > > > > > +	uint32_t cur_size, free_size;
> > > > > > +
> > > > > > +	if (dq == NULL || e == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > >
> > > > > Why just not to return -EINVAL straightway?
> > > > > I think there is no much point to set rte_errno in that function
> > > > > at all, just return value should do.
> > > > I am trying to keep these consistent with the existing APIs. They
> > > > return 0 or 1
> > > and set the rte_errno.
> > >
> > > A lot of public DPDK API functions do use return value to return
> > > status code (0, or some positive numbers of success, negative errno
> > > values on failure), I am not inventing anything new here.
> > Agree, you are not proposing a new thing here. May be I was not clear.
> > I really do not have an opinion on how this should be done. But, I do have
> an opinion on consistency. These new APIs follow what has been done in the
> existing RCU APIs. I think we have 2 options here.
> > 1) Either we change existing RCU APIs to get rid of rte_errno (is it
> > an ABI change?) or
> > 2) The new APIs follow what has been done in the existing RCU APIs.
> > I want to make sure we are consistent at least within RCU APIs.
> 
> But as I can see right now rcu API sets rte_errno only for control-path
> functions (get_memsize, init, register, unregister, dump).
> All fast-path (inline) function don't set/use it.
> So from perspective that is consistent behavior, no?
Agree. I am treating this as a control plane function mainly (hence it is a non-inline function as well).

> 
> >
> > >
> > > >
> > > > >
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Start the grace period */
> > > > > > +	token = rte_rcu_qsbr_start(dq->v);
> > > > > > +
> > > > > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > > > > +	 * the queue from growing too large and allows time for
> reader
> > > > > > +	 * threads to report their quiescent state.
> > > > > > +	 */
> > > > > > +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> > > > >
> > > > > Probably would be a bit easier if you just store in dq->esize
> > > > > (elt size + token
> > > > > size) / 8.
> > > > Agree
> > > >
> > > > >
> > > > > > +	if (cur_size > (dq->size >>
> > > > > > +RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> > > > >
> > > > > Why to make this threshold value hard-coded?
> > > > > Why either not to put it into create parameter, or just return a
> > > > > special return value, to indicate that threshold is reached?
> > > > My thinking was to keep the programming interface easy to use. The
> > > > more the parameters, the more painful it is for the user. IMO, the
> > > > constants chosen should be good enough for most cases. More
> > > > advanced
> > > users could modify the constants. However, we could make these as
> > > part of the parameters, but make them optional for the user. For ex:
> > > if they set them to 0, default values can be used.
> > > >
> > > > > Or even return number of filled/free entroes on success, so
> > > > > caller can decide to reclaim or not based on that information on his
> own?
> > > > This means more code on the user side.
> > >
> > > I personally think it it really wouldn't be that big problem to the
> > > user to pass extra parameter to the function.
> > I will convert the 2 constants into optional parameters (user can set
> > them to 0 to make the algorithm use default values)
> >
> > > Again what if user doesn't want to reclaim() in enqueue() thread at all?
> > 'enqueue' has to do reclamation if the defer queue is full. I do not think this
> is trivial.
> >
> > In the current design, reclamation in enqueue is also done on regular
> > basis (automatic triggering of reclamation when the queue reaches
> > certain limit) to keep the queue from growing too large. This is
> > required when we implement a dynamically adjusting defer queue. The
> current algorithm keeps the cost of reclamation spread across multiple calls
> and puts an upper bound on cycles for delete API by reclaiming a fixed
> number of entries.
> >
> > This algorithm is proven to work in the LPM integration performance
> > tests at a very low performance over head (~1%). So, I do not know why a
> user would not want to use this.
> 
> Yeh, I looked at LPM implementation and one thing I found strange -
> defer_queue is hidden inside LPM struct and all reclamations are done
> internally.
> Yes for sure it allows to defer and group actual reclaim(), which hopefully will
> lead to better performance.
> But why not to allow user to call reclaim() for it directly too?
> In that way user might avoid/(minimize) doing reclaim() in LPM write() at all.
> And let say do it somewhere later in the same thread (when no other tasks to
> do), or even leave it to some other house-keeping thread to do (sort of
> garbage collector).
> Or such mode is not supported/planned?
The goal of integrating the RCU defer APIs with libraries is to take away the complexity on the writer to adopt the lock-free algorithms. I am looking to address most used use cases. There will be use cases which are not very common and I think those should be addressed by the application by using the base RCU APIs. Let us discuss this more in the other thread, where you have similar questions.

> 
> > The 2 additional parameters should give the user more flexibility.
> 
> Ok, let's keep it as config params.
> After another though - I think you right, it should be good enough.
> 
> >
> > However, if the user wants his own algorithm, he can create one with the
> base APIs provided.
> >
> > >
> > > > I think adding these to parameters seems like a better option.
> > > >
> > > > >
> > > > > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > > > +			"%s(): Triggering reclamation\n", __func__);
> > > > > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Check if there is space for atleast for 1 resource */
> > > > > > +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > > > > +	if (!free_size) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Defer queue is full\n", __func__);
> > > > > > +		rte_errno = ENOSPC;
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Enqueue the resource */
> > > > > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > > > > +
> > > > > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > > > > +	 * due to the limitation of the rte_ring implementation.
> > > > > > +	 */
> > > > > > +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > > > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> > > > >
> > > > >
> > > > > That whole construction above looks a bit clumsy and error prone...
> > > > > I suppose just:
> > > > >
> > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t free, n; ...
> > > > > n = rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n == 0)
> > > > Yes, bulk enqueue can be used. But note that once the flexible
> > > > element size
> > > ring patch is done, this code will use that.
> > >
> > > Well, when it will be in the mainline, and it would provide a better
> > > way, for sure this code can be updated to use new API (if it is provide
> some improvements).
> > > But as I udenrstand, right now it is not there, while bulk
> enqueue/dequeue are.
> > Apologies, I was not clear. I agree we can go with bulk APIs for now.
> >
> > >
> > > >
> > > > >   return -ENOSPC;
> > > > > return free;
> > > > >
> > > > > That way I think you can have MT-safe version of that function.
> > > > Please see the description of MT safe issue above.
> > > >
> > > > >
> > > > > > +
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > > +/* Reclaim resources from the defer queue. */ int
> > > > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > > > > +	uint32_t max_cnt;
> > > > > > +	uint32_t cnt;
> > > > > > +	void *token;
> > > > > > +	uint64_t *tmp;
> > > > > > +	uint32_t i;
> > > > > > +
> > > > > > +	if (dq == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > >
> > > > > Same story as above - I think rte_errno is excessive in this function.
> > > > > Just return value should be enough.
> > > > >
> > > > >
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Anything to reclaim? */
> > > > > > +	if (rte_ring_count(dq->r) == 0)
> > > > > > +		return 0;
> > > > >
> > > > > Not sure you need that, see below.
> > > > >
> > > > > > +
> > > > > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > > > > +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > > > > +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> > > > >
> > > > > Again why not to make max_cnt a configurable at create() parameter?
> > > > I think making this as an optional parameter for creating defer
> > > > queue is a
> > > better option.
> > > >
> > > > > Or even a parameter for that function?
> > > > >
> > > > > > +	cnt = 0;
> > > > > > +
> > > > > > +	/* Check reader threads quiescent state and reclaim
> resources */
> > > > > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) ==
> 0) &&
> > > > > > +		(rte_rcu_qsbr_check(dq->v,
> (uint64_t)((uintptr_t)token), false)
> > > > > > +			== 1)) {
> > > > >
> > > > >
> > > > > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > > > > +		/* The resource to dequeue needs to be a multiple of
> 64b
> > > > > > +		 * due to the limitation of the rte_ring
> implementation.
> > > > > > +		 */
> > > > > > +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> > > > > > +			i++, tmp++)
> > > > > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > > > > +					(void *)(uintptr_t)tmp);
> > > > >
> > > > > Again, no need for such constructs with multiple dequeuer I believe.
> > > > > Just:
> > > > >
> > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t n;
> > > > > uintptr_t elt[nb_elt]; ...
> > > > > n = rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n != 0)
> > > > > {dq->f(dq->p, elt);}
> > > > Agree on bulk API use.
> > > >
> > > > >
> > > > > Seems enough.
> > > > > Again in that case you can have enqueue/reclaim running in
> > > > > different threads simultaneously, plus you don't need dq->e at all.
> > > > Will check on dq->e
> > > >
> > > > >
> > > > > > +		dq->f(dq->p, dq->e);
> > > > > > +
> > > > > > +		cnt++;
> > > > > > +	}
> > > > > > +
> > > > > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > > > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > > > > +
> > > > > > +	if (cnt == 0) {
> > > > > > +		/* No resources were reclaimed */
> > > > > > +		rte_errno = EAGAIN;
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	return 0;
> > > > >
> > > > > I'd suggest to return cnt on success.
> > > > I am trying to keep the APIs simple. I do not see much use for 'cnt'
> > > > as return value to the user. It exposes more details which I think
> > > > are internal
> > > to the library.
> > >
> > > Not sure what is the hassle to return number of completed reclamaitions?
> > > If user doesn't need that information, he simply wouldn't use it.
> > > But might be it would be usefull - he can decide should he try
> > > another attempt of reclaim() immediately or is it ok to do something else.
> > There is no hassle to return that information.
> >
> > As per the current design, user calls 'reclaim' when it is out of
> > resources while adding an entry to the data structure. At that point
> > the user wants to know if at least 1 resource was reclaimed because the
> user has to allocate 1 resource. He does not have a use for the number of
> resources reclaimed.
> 
> Ok, but why user can't decide to do reclaim in advance, let say when he
> foresee that he would need a lot of allocations in nearest future?
> Or when there is some idle time? Or some combination of these things?
> At he would like to free some extra resources in that case to minimize
> number of reclaims in future peak interval?
If the user has free time he can call the reclaim API. By making the parameters configurable, he should be able to control how much he can reclaim.
If the user wants to make sure that he has enough free resources for the future. He should be able to do it by knowing how many free resources are available in his data structure currently.
But, I do not see it as a problem to return the number of resources reclaimed. I will add that.

> 
> >
> > If this API returns 0, then the user can decide to repeat the call or
> > return failure. But that decision depends on the length of the grace period
> which is under user's control.
> >
> > >
> > > >
> > > > >
> > > > > > +}
> > > > > > +
> > > > > > +/* Delete a defer queue. */
> > > > > > +int
> > > > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > > > > +	if (dq == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	/* Reclaim all the resources */
> > > > > > +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> > > > > > +		/* Error number is already set by the reclaim API */
> > > > > > +		return 1;
> > > > >
> > > > > How do you know that you have reclaimed everything?
> > > > Good point, will come back with a different solution.
> > > >
> > > > >
> > > > > > +
> > > > > > +	rte_ring_free(dq->r);
> > > > > > +	rte_free(dq);
> > > > > > +
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > >  int rte_rcu_log_type;
> > > > > >
> > > > > >  RTE_INIT(rte_rcu_register)
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a
> > > > > > 100644
> > > > > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > @@ -34,6 +34,7 @@ extern "C" {  #include <rte_lcore.h>
> > > > > > #include <rte_debug.h>  #include <rte_atomic.h>
> > > > > > +#include <rte_ring.h>
> > > > > >
> > > > > >  extern int rte_rcu_log_type;
> > > > > >
> > > > > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > > > > >  	 */
> > > > > >  } __rte_cache_aligned;
> > > > > >
> > > > > > +/**
> > > > > > + * Call back function called to free the resources.
> > > > > > + *
> > > > > > + * @param p
> > > > > > + *   Pointer provided while creating the defer queue
> > > > > > + * @param e
> > > > > > + *   Pointer to the resource data stored on the defer queue
> > > > > > + *
> > > > > > + * @return
> > > > > > + *   None
> > > > > > + */
> > > > > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> > > > >
> > > > > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > > > > Though I am not sure you need a new typedef at all - just a
> > > > > function pointer inside the struct seems enough.
> > > > Other libraries (for ex: rte_hash) use this approach. I think it
> > > > is better to keep
> > > it out of the structure to allow for better commenting.
> > >
> > > I am saying majority of DPDK code use _t suffix for typedef:
> > > typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
> > Apologies, got it, will change.
> >
> > >
> > > >
> > > > >
> > > > > > +
> > > > > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > > > > +
> > > > > > +/**
> > > > > > + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> > > > > > + */
> > > > > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > > > > +
> > > > > > +/**
> > > > > > + *  Reclaim at the max 1/16th the total number of resources.
> > > > > > + */
> > > > > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> > > > >
> > > > >
> > > > > As I said above, I don't think these thresholds need to be hardcoded.
> > > > > In any case, there seems not much point to put them in the
> > > > > public header
> > > file.
> > > > >
> > > > > > +
> > > > > > +/**
> > > > > > + * Parameters used when creating the defer queue.
> > > > > > + */
> > > > > > +struct rte_rcu_qsbr_dq_parameters {
> > > > > > +	const char *name;
> > > > > > +	/**< Name of the queue. */
> > > > > > +	uint32_t size;
> > > > > > +	/**< Number of entries in queue. Typically, this will be
> > > > > > +	 *   the same as the maximum number of entries supported in
> the
> > > > > > +	 *   lock free data structure.
> > > > > > +	 *   Data structures with unbounded number of entries is not
> > > > > > +	 *   supported currently.
> > > > > > +	 */
> > > > > > +	uint32_t esize;
> > > > > > +	/**< Size (in bytes) of each element in the defer queue.
> > > > > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > > > > +	 *   support 8B element sizes only.
> > > > > > +	 */
> > > > > > +	rte_rcu_qsbr_free_resource f;
> > > > > > +	/**< Function to call to free the resource. */
> > > > > > +	void *p;
> > > > >
> > > > > Style nit again - I like short names myself, but that seems a
> > > > > bit extreme... :) Might be at least:
> > > > > void (*reclaim)(void *, void *);
> > > > May be 'free_fn'?
> > > >
> > > > > void * reclaim_data;
> > > > > ?
> > > > This is the pointer to the data structure to free the resource
> > > > into. For ex: In
> > > LPM data structure, it will be pointer to LPM. 'reclaim_data'
> > > > does not convey the meaning correctly.
> > >
> > > Ok, please free to comeup with your own names.
> > > I just wanted to say that 'f' and 'p' are a bit an extreme for public API.
> > ok, this is the hardest thing to do 😊
> >
> > >
> > > >
> > > > >
> > > > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > > > +	 *   pointer to the data structure to which the resource to
> free
> > > > > > +	 *   belongs. This can be NULL.
> > > > > > +	 */
> > > > > > +	struct rte_rcu_qsbr *v;
> > > > >
> > > > > Does it need to be inside that struct?
> > > > > Might be better:
> > > > > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > > > > rte_rcu_qsbr_dq_parameters *params);
> > > > The API takes a parameter structure as input anyway, why to add
> > > > another argument to the function? QSBR variable is also another
> parameter.
> > > >
> > > > >
> > > > > Another alternative: make both reclaim() and enqueue() to take v
> > > > > as a parameter.
> > > > But both of them need access to some of the parameters provided in
> > > > rte_rcu_qsbr_dq_create API. We would end up passing 2 arguments to
> > > > the
> > > functions.
> > >
> > > Pure stylish thing.
> > > From my perspective it just provides better visibility what is going in the
> code:
> > > For QSBR var 'v' create a new deferred queue.
> > > But no strong opinion here.
> > >
> > > >
> > > > >
> > > > > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > > > > +
> > > > > > +/* RTE defer queue structure.
> > > > > > + * This structure holds the defer queue. The defer queue is
> > > > > > +used to
> > > > > > + * hold the deleted entries from the data structure that are
> > > > > > +not
> > > > > > + * yet freed.
> > > > > > + */
> > > > > > +struct rte_rcu_qsbr_dq;
> > > > > > +
> > > > > >  /**
> > > > > >   * @warning
> > > > > >   * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > @@
> > > > > > -648,6 +710,113 @@ __rte_experimental  int
> > > > > > rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> > > > > >
> > > > > > +/**
> > > > > > + * @warning
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + *
> > > > > > + * Create a queue used to store the data structure elements
> > > > > > +that can
> > > > > > + * be freed later. This queue is referred to as 'defer queue'.
> > > > > > + *
> > > > > > + * @param params
> > > > > > + *   Parameters to create a defer queue.
> > > > > > + * @return
> > > > > > + *   On success - Valid pointer to defer queue
> > > > > > + *   On error - NULL
> > > > > > + *   Possible rte_errno codes are:
> > > > > > + *   - EINVAL - NULL parameters are passed
> > > > > > + *   - ENOMEM - Not enough memory
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +struct rte_rcu_qsbr_dq *
> > > > > > +rte_rcu_qsbr_dq_create(const struct
> > > > > > +rte_rcu_qsbr_dq_parameters *params);
> > > > > > +
> > > > > > +/**
> > > > > > + * @warning
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + *
> > > > > > + * Enqueue one resource to the defer queue and start the grace
> period.
> > > > > > + * The resource will be freed later after at least one grace
> > > > > > +period
> > > > > > + * is over.
> > > > > > + *
> > > > > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > > > > + * It will also reclaim resources at regular intervals to
> > > > > > +avoid
> > > > > > + * the defer queue from growing too big.
> > > > > > + *
> > > > > > + * This API is not multi-thread safe. It is expected that the
> > > > > > +caller
> > > > > > + * provides multi-thread safety by locking a mutex or some other
> means.
> > > > > > + *
> > > > > > + * A lock free multi-thread writer algorithm could achieve
> > > > > > +multi-thread
> > > > > > + * safety by creating and using one defer queue per thread.
> > > > > > + *
> > > > > > + * @param dq
> > > > > > + *   Defer queue to allocate an entry from.
> > > > > > + * @param e
> > > > > > + *   Pointer to resource data to copy to the defer queue. The size of
> > > > > > + *   the data to copy is equal to the element size provided when the
> > > > > > + *   defer queue was created.
> > > > > > + * @return
> > > > > > + *   On success - 0
> > > > > > + *   On error - 1 with rte_errno set to
> > > > > > + *   - EINVAL - NULL parameters are passed
> > > > > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > > > > + *		if the defer queue size is equal (or larger) than the
> > > > > > + *		number of elements in the data structure.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > > > > +
> > > > > > +/**
> > > > > > + * @warning
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + *
> > > > > > + * Reclaim resources from the defer queue.
> > > > > > + *
> > > > > > + * This API is not multi-thread safe. It is expected that the
> > > > > > +caller
> > > > > > + * provides multi-thread safety by locking a mutex or some other
> means.
> > > > > > + *
> > > > > > + * A lock free multi-thread writer algorithm could achieve
> > > > > > +multi-thread
> > > > > > + * safety by creating and using one defer queue per thread.
> > > > > > + *
> > > > > > + * @param dq
> > > > > > + *   Defer queue to reclaim an entry from.
> > > > > > + * @return
> > > > > > + *   On successful reclamation of at least 1 resource - 0
> > > > > > + *   On error - 1 with rte_errno set to
> > > > > > + *   - EINVAL - NULL parameters are passed
> > > > > > + *   - EAGAIN - None of the resources have completed at least 1
> grace
> > > > > period,
> > > > > > + *		try again.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > > > > +
> > > > > > +/**
> > > > > > + * @warning
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + *
> > > > > > + * Delete a defer queue.
> > > > > > + *
> > > > > > + * It tries to reclaim all the resources on the defer queue.
> > > > > > + * If any of the resources have not completed the grace
> > > > > > +period
> > > > > > + * the reclamation stops and returns immediately. The rest of
> > > > > > + * the resources are not reclaimed and the defer queue is not
> > > > > > + * freed.
> > > > > > + *
> > > > > > + * @param dq
> > > > > > + *   Defer queue to delete.
> > > > > > + * @return
> > > > > > + *   On success - 0
> > > > > > + *   On error - 1
> > > > > > + *   Possible rte_errno codes are:
> > > > > > + *   - EINVAL - NULL parameters are passed
> > > > > > + *   - EAGAIN - Some of the resources have not completed at least 1
> > > grace
> > > > > > + *		period, try again.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > > > > +
> > > > > >  #ifdef __cplusplus
> > > > > >  }
> > > > > >  #endif
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > > > new file mode 100644
> > > > > > index 000000000..2122bc36a
> > > > > > --- /dev/null
> > > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > > >
> > > > > Again style suggestion: as it is not public header - don't use
> > > > > rte_ prefix for naming.
> > > > > From my perspective - easier to relalize for reader what is
> > > > > public header, what is not.
> > > > Looks like the guidelines are not defined very well. I see one
> > > > private file with rte_ prefix. I see Stephen not using rte_
> > > > prefix. I do not have any
> > > preference. But, a consistent approach is required.
> > >
> > > That's just a suggestion.
> > > For me (and I hope for others) it would be a bit easier.
> > > When looking at the code for first time I had to look a t
> > > meson.build to check is it a public header or not.
> > > If the file doesn't have 'rte_' prefix, I assume that it is an
> > > internal one straightway.
> > > But , as you said, there is no exact guidelines here, so up to you to decide.
> > I think it makes sense to remove 'rte_' prefix. I will also change the file
> name to have '_private' suffix.
> > There are some inconsistencies in the existing code, will send a patch to
> correct them to follow this approach.
> >
> > >
> > > >
> > > > >
> > > > > > @@ -0,0 +1,46 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + * Copyright (c) 2019 Arm Limited  */
> > > > > > +
> > > > > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > > > > +#define _RTE_RCU_QSBR_PVT_H_
> > > > > > +
> > > > > > +/**
> > > > > > + * This file is private to the RCU library. It should not be
> > > > > > +included
> > > > > > + * by the user of this library.
> > > > > > + */
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +#include "rte_rcu_qsbr.h"
> > > > > > +
> > > > > > +/* RTE defer queue structure.
> > > > > > + * This structure holds the defer queue. The defer queue is
> > > > > > +used to
> > > > > > + * hold the deleted entries from the data structure that are
> > > > > > +not
> > > > > > + * yet freed.
> > > > > > + */
> > > > > > +struct rte_rcu_qsbr_dq {
> > > > > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this
> queue.*/
> > > > > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > > > > +	uint32_t size;
> > > > > > +	/**< Number of elements in the defer queue */
> > > > > > +	uint32_t esize;
> > > > > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > > > > +	rte_rcu_qsbr_free_resource f;
> > > > > > +	/**< Function to call to free the resource. */
> > > > > > +	void *p;
> > > > > > +	/**< Pointer passed to the free function. Typically, this is the
> > > > > > +	 *   pointer to the data structure to which the resource to
> free
> > > > > > +	 *   belongs.
> > > > > > +	 */
> > > > > > +	char e[0];
> > > > > > +	/**< Temporary storage to copy the defer queue element. */
> > > > >
> > > > > Do you really need 'e' at all?
> > > > > Can't it be just temporary stack variable?
> > > > Ok, will check.
> > > >
> > > > >
> > > > > > +};
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > > > > b/lib/librte_rcu/rte_rcu_version.map
> > > > > > index f8b9ef2ab..dfac88a37 100644
> > > > > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > > > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > > > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > > > > >  	rte_rcu_qsbr_synchronize;
> > > > > >  	rte_rcu_qsbr_thread_register;
> > > > > >  	rte_rcu_qsbr_thread_unregister;
> > > > > > +	rte_rcu_qsbr_dq_create;
> > > > > > +	rte_rcu_qsbr_dq_enqueue;
> > > > > > +	rte_rcu_qsbr_dq_reclaim;
> > > > > > +	rte_rcu_qsbr_dq_delete;
> > > > > >
> > > > > >  	local: *;
> > > > > >  };
> > > > > > diff --git a/lib/meson.build b/lib/meson.build index
> > > > > > e5ff83893..0e1be8407 100644
> > > > > > --- a/lib/meson.build
> > > > > > +++ b/lib/meson.build
> > > > > > @@ -11,7 +11,9 @@
> > > > > >  libraries = [
> > > > > >  	'kvargs', # eal depends on kvargs
> > > > > >  	'eal', # everything depends on eal
> > > > > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > > > +	'ring',
> > > > > > +	'rcu', # rcu depends on ring
> > > > > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > > > >  	'cmdline',
> > > > > >  	'metrics', # bitrate/latency stats depends on this
> > > > > >  	'hash',    # efd depends on this
> > > > > > @@ -22,7 +24,7 @@ libraries = [
> > > > > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > > > > >  	'kni', 'latencystats', 'lpm', 'member',
> > > > > >  	'power', 'pdump', 'rawdev',
> > > > > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > > > > >  	# ipsec lib depends on net, crypto and security
> > > > > >  	'ipsec',
> > > > > >  	# add pkt framework libs which use other libs from above
> > > > > > --
> > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-07  9:21       ` Ananyev, Konstantin
@ 2019-10-13  4:36         ` Honnappa Nagarahalli
  2019-10-15 11:15           ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-13  4:36 UTC (permalink / raw)
  To: Ananyev, Konstantin, Richardson, Bruce, Medvedkin, Vladimir,
	olivier.matz
  Cc: dev, stephen, paulmck, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	nd, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd

<snip>

> Hi guys,
I have tried to consolidate design related questions here. If I have missed anything, please add.

> 
> >
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> >
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  lib/librte_lpm/Makefile            |   3 +-
> >  lib/librte_lpm/meson.build         |   2 +
> >  lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
> >  lib/librte_lpm/rte_lpm.h           |  21 ++++++
> >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> >  5 files changed, 122 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > a7946a1c5..ca9e16312 100644
> > --- a/lib/librte_lpm/Makefile
> > +++ b/lib/librte_lpm/Makefile
> > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name
> > LIB = librte_lpm.a
> >
> > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> >  CFLAGS += -O3
> >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal -lrte_hash
> > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> >
> >  EXPORT_MAP := rte_lpm_version.map
> >
> > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > index a5176d8ae..19a35107f 100644
> > --- a/lib/librte_lpm/meson.build
> > +++ b/lib/librte_lpm/meson.build
> > @@ -2,9 +2,11 @@
> >  # Copyright(c) 2017 Intel Corporation
> >
> >  version = 2
> > +allow_experimental_apis = true
> >  sources = files('rte_lpm.c', 'rte_lpm6.c')  headers =
> > files('rte_lpm.h', 'rte_lpm6.h')  # since header files have different
> > names, we can install all vector headers  # without worrying about
> > which architecture we actually need  headers +=
> > files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')  deps +=
> > ['hash']
> > +deps += ['rcu']
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 3a929a1b1..ca58d4b35 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >   */
> >
> >  #include <string.h>
> > @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> >
> >  	rte_mcfg_tailq_write_unlock();
> >
> > +	if (lpm->dq)
> > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> >  	rte_free(lpm->tbl8);
> >  	rte_free(lpm->rules_tbl);
> >  	rte_free(lpm);
> > @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> 16.04);
> > MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> >  		rte_lpm_free_v1604);
> >
> > +struct __rte_lpm_rcu_dq_entry {
> > +	uint32_t tbl8_group_index;
> > +	uint32_t pad;
> > +};
> > +
> > +static void
> > +__lpm_rcu_qsbr_free_resource(void *p, void *data) {
> > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	struct __rte_lpm_rcu_dq_entry *e =
> > +			(struct __rte_lpm_rcu_dq_entry *)data;
> > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > +
> > +	/* Set tbl8 group invalid */
> > +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> > +		__ATOMIC_RELAXED);
> > +}
> > +
> > +/* Associate QSBR variable with an LPM object.
> > + */
> > +int
> > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params;
> > +
> > +	if ((lpm == NULL) || (v == NULL)) {
> > +		rte_errno = EINVAL;
> > +		return 1;
> > +	}
> > +
> > +	if (lpm->dq) {
> > +		rte_errno = EEXIST;
> > +		return 1;
> > +	}
> > +
> > +	/* Init QSBR defer queue. */
> > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm-
> >name);
> > +	params.name = rcu_dq_name;
> > +	params.size = lpm->number_tbl8s;
> > +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> > +	params.f = __lpm_rcu_qsbr_free_resource;
> > +	params.p = lpm->tbl8;
> > +	params.v = v;
> > +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > +	if (lpm->dq == NULL) {
> > +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> > +		return 1;
> > +	}
> 
> Few thoughts about that function:
Few things to keep in mind, the goal of the design is to make it easy for the applications to adopt lock-free algorithms. The reclamation process in the writer is a major portion of code one has to write for using lock-free algorithms. The current design is such that the writer does not have to change any code or write additional code other than calling 'rte_lpm_rcu_qsbr_add'.

> It names rcu_qsbr_add() but in fact it allocates defer queue for give rcu var.
> So first thought - is it always necessary?
This is part of the design. If the application does not want to use this integrated logic then, it does not have to call this API. It can use the RCU defer APIs to implement its own logic. But, if I ask the question, does this integrated logic address most of the use cases of the LPM library, I think the answer is yes.

> For some use-cases I suppose user might be ok to wait for quiescent state
> change
> inside tbl8_free()?
Yes, that is a possibility (for ex: no frequent route changes). But, I think that is very trivial for the application to implement. Though, the LPM library has to separate the 'delete' and 'free' operations. Similar operations are provided in rte_hash library. IMO, we should follow consistent approach.

> Another thing you do allocate defer queue, but it is internal, so user can't call
> reclaim() manually, which looks strange.
> Why not to return defer_queue pointer to the user, so he can call reclaim()
> himself at appropriate time?
The intention of the design is to take the complexity away from the user of LPM library. IMO, the current design will address most uses cases of LPM library. If we expose the 2 parameters (when to trigger reclamation and how much to reclaim) in the 'rte_lpm_rcu_qsbr_add' API, it should provide enough flexibility to the application.

> Third thing - you always allocate defer queue with size equal to number of
> tbl8.
> Though I understand it could be up to 16M tbl8 groups inside the LPM.
> Do we really need defer queue that long?
No, we do not need it to be this long. It is this long today to avoid returning no-space on the defer queue error.

> Especially  considering that current rcu_defer_queue will start reclamation
> when 1/8 of defer_quueue becomes full and wouldn't reclaim more then
> 1/16 of it.
> Probably better to let user to decide himself how long defer_queue he needs
> for that LPM?
It makes sense to expose it to the user if the writer-writer concurrency is lock-free (no memory allocation allowed to expand the defer queue size when the queue is full). However, LPM is not lock-free on the writer side. If we think the writer could be lock-free in the future, it has to be exposed to the user. 

> 
> Konstantin
Pulling questions/comments from other threads:
Can we leave reclamation to some other house-keeping thread to do (sort of garbage collector). Or such mode is not supported/planned?

[Honnappa] If the reclamation cost is small, the current method provides advantages over having a separate thread to do reclamation. I did not plan to provide such an option. But may be it makes sense to keep the options open (especially from ABI perspective). May be we should add a flags field which will allow us to implement different methods in the future?

> 
> 
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Adds a rule to the rule table.
> >   *
> > @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> > *tbl8)  }
> >
> >  static int32_t
> > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > number_tbl8s)
> > +__tbl8_alloc_v1604(struct rte_lpm *lpm)
> >  {
> >  	uint32_t group_idx; /* tbl8 group index. */
> >  	struct rte_lpm_tbl_entry *tbl8_entry;
> >
> >  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > +		tbl8_entry = &lpm->tbl8[group_idx *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> >  		/* If a free tbl8 group is found clean it and set as VALID. */
> >  		if (!tbl8_entry->valid_group) {
> >  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 712,6 +769,21 @@
> > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> >  	return -ENOSPC;
> >  }
> >
> > +static int32_t
> > +tbl8_alloc_v1604(struct rte_lpm *lpm) {
> > +	int32_t group_idx; /* tbl8 group index. */
> > +
> > +	group_idx = __tbl8_alloc_v1604(lpm);
> > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> > +		/* If there are no tbl8 groups try to reclaim some. */
> > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> > +			group_idx = __tbl8_alloc_v1604(lpm);
> > +	}
> > +
> > +	return group_idx;
> > +}
> > +
> >  static void
> >  tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t
> > tbl8_group_start)  { @@ -728,13 +800,21 @@ tbl8_free_v20(struct
> > rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)  }
> >
> >  static void
> > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > tbl8_group_start)
> > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> >  {
> > -	/* Set tbl8 group invalid*/
> >  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	struct __rte_lpm_rcu_dq_entry e;
> >
> > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > -			__ATOMIC_RELAXED);
> > +	if (lpm->dq != NULL) {
> > +		e.tbl8_group_index = tbl8_group_start;
> > +		e.pad = 0;
> > +		/* Push into QSBR defer queue. */
> > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> > +	} else {
> > +		/* Set tbl8 group invalid*/
> > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> > +				__ATOMIC_RELAXED);
> > +	}
> >  }
> >
> >  static __rte_noinline int32_t
> > @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked, uint8_t depth,
> >
> >  	if (!lpm->tbl24[tbl24_index].valid) {
> >  		/* Search for a free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >  		/* Check tbl8 allocation was successful. */
> >  		if (tbl8_group_index < 0) {
> > @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> >  	} /* If valid entry but not extended calculate the index into Table8. */
> >  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> >  		/* Search for free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> >
> >  		if (tbl8_group_index < 0) {
> >  			return tbl8_group_index;
> > @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
> >  		 */
> >  		lpm->tbl24[tbl24_index].valid = 0;
> >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >  	} else if (tbl8_recycle_index > -1) {
> >  		/* Update tbl24 entry. */
> >  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> +1914,7 @@
> > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> >  		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> >  				__ATOMIC_RELAXED);
> >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free_v1604(lpm, tbl8_group_start);
> >  	}
> >  #undef group_idx
> >  	return 0;
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > 906ec4483..49c12a68d 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2019 Arm Limited
> >   */
> >
> >  #ifndef _RTE_LPM_H_
> > @@ -21,6 +22,7 @@
> >  #include <rte_common.h>
> >  #include <rte_vect.h>
> >  #include <rte_compat.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> > @@ -186,6 +188,7 @@ struct rte_lpm {
> >  			__rte_cache_aligned; /**< LPM tbl24 table. */
> >  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
> >  };
> >
> >  /**
> > @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> void
> > rte_lpm_free_v1604(struct rte_lpm *lpm);
> >
> > +/**
> > + * Associate RCU QSBR variable with an LPM object.
> > + *
> > + * @param lpm
> > + *   the lpm object to add RCU QSBR
> > + * @param v
> > + *   RCU QSBR variable
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with error code set in rte_errno.
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - invalid pointer
> > + *   - EEXIST - already added QSBR
> > + *   - ENOMEM - memory allocation failure
> > + */
> > +__rte_experimental
> > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > +*v);
> > +
> >  /**
> >   * Add a rule to the LPM table.
> >   *
> > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > b/lib/librte_lpm/rte_lpm_version.map
> > index 90beac853..b353aabd2 100644
> > --- a/lib/librte_lpm/rte_lpm_version.map
> > +++ b/lib/librte_lpm/rte_lpm_version.map
> > @@ -44,3 +44,9 @@ DPDK_17.05 {
> >  	rte_lpm6_lookup_bulk_func;
> >
> >  } DPDK_16.04;
> > +
> > +EXPERIMENTAL {
> > +	global:
> > +
> > +	rte_lpm_rcu_qsbr_add;
> > +};
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-11 18:28                     ` Honnappa Nagarahalli
@ 2019-10-13 20:09                       ` Ananyev, Konstantin
  2019-10-14  4:11                         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-13 20:09 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd, nd



> > > > >
> > > > > >
> > > > > > >
> > > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > > >
> > > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > >
> > > > > > > > > The peek API allows fetching the next available object in
> > > > > > > > > the ring without dequeuing it. This helps in scenarios
> > > > > > > > > where dequeuing of objects depend on their value.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > > ---
> > > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18
> > > > > > > > > 100644
> > > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > > rte_ring *r, void
> > > > > > > > **obj_table,
> > > > > > > > >  				r->cons.single, available);  }
> > > > > > > > >
> > > > > > > > > +/**
> > > > > > > > > + * Peek one object from a ring.
> > > > > > > > > + *
> > > > > > > > > + * The peek API allows fetching the next available object
> > > > > > > > > +in the ring
> > > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > > +safe with respect
> > > > > > > > > + * to other consumer threads.
> > > > > > > > > + *
> > > > > > > > > + * @param r
> > > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > > + * @param obj_p
> > > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > > + * @return
> > > > > > > > > + *   - 0: Success, object available
> > > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > > + */
> > > > > > > > > +__rte_experimental
> > > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > > >
> > > > > > > > As it is not MT safe, then I think we need _sc_ in the name,
> > > > > > > > to follow other rte_ring functions naming conventions
> > > > > > > > (rte_ring_sc_peek() or so).
> > > > > > > Agree
> > > > > > >
> > > > > > > >
> > > > > > > > As a better alternative what do you think about introducing
> > > > > > > > a serialized versions of DPDK rte_ring dequeue functions?
> > > > > > > > Something like that:
> > > > > > > >
> > > > > > > > /* same as original ring dequeue, but:
> > > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > > >   * 2) don't update cons.tail
> > > > > > > >   */
> > > > > > > > unsigned int
> > > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > > **obj_table, unsigned int n,
> > > > > > > >                 unsigned int *available);
> > > > > > > >
> > > > > > > > /* sets both cons.head and cons.tail to cons.head + num */
> > > > > > > > void rte_ring_serial_dequeue_finish(struct rte_ring *r,
> > > > > > > > uint32_t num);
> > > > > > > >
> > > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > > >
> > > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > > >
> > > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t avl,
> > > > > > > > n; uintptr_t elt[nb_elt]; ...
> > > > > > > >
> > > > > > > > do {
> > > > > > > >
> > > > > > > >   /* read next elem from the queue */
> > > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > > >   if (n == 0)
> > > > > > > >       break;
> > > > > > > >
> > > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > > elt[0]) != 1) {
> > > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > > >      break;
> > > > > > > >   }
> > > > > > > >
> > > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > > >
> > > > > > > >    /*call reclaim function */
> > > > > > > >   dr->f(dr->p, elt);
> > > > > > > >
> > > > > > > > } while (avl >= nb_elt);
> > > > > > > >
> > > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe.
> > > > > > > > As long as actual reclamation callback itself is MT safe of course.
> > > > > > >
> > > > > > > I think it is a great idea. The other writers would still be
> > > > > > > polling for the current writer to update the tail or update
> > > > > > > the head. This makes it a
> > > > > > blocking solution.
> > > > > >
> > > > > > Yep, it is a blocking one.
> > > > > >
> > > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > > reclaiming if they
> > > > > > see that other writers are dequeuing from the queue.
> > > > > >
> > > > > > Actually didn't think about that possibility, but yes should be
> > > > > > possible to have _try_ semantics too.
> > > > > >
> > > > > > >The other  way is to use per thread queues.
> > > > > > >
> > > > > > > The other requirement I see is to support unbounded-size data
> > > > > > > structures where in the data structures do not have a
> > > > > > > pre-determined number of entries. Also, currently the defer
> > > > > > > queue size is equal to the total
> > > > > > number of entries in a given data structure. There are plans to
> > > > > > support dynamically resizable defer queue. This means, memory
> > > > > > allocation which will affect the lock-free-ness of the solution.
> > > > > > >
> > > > > > > So, IMO:
> > > > > > > 1) The API should provide the capability to support different
> > > > > > > algorithms -
> > > > > > may be through some flags?
> > > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > > cons-tail is also the same, skip
> > > > > > polling). So, we should probably implement a ring with-in the RCU
> > library?
> > > > > >
> > > > > > Personally, I think such serialization ring API would be useful
> > > > > > for other cases too.
> > > > > > There are few cases when user need to read contents of the queue
> > > > > > without removing elements from it.
> > > > > > Let say we do use similar approach inside TLDK to implement TCP
> > > > > > transmit queue.
> > > > > > If such API would exist in DPDK we can just use it straightway,
> > > > > > without maintaining a separate one.
> > > > > ok
> > > > >
> > > > > >
> > > > > > >
> > > > > > > From the timeline perspective, adding all these capabilities
> > > > > > > would be difficult to get done with in 19.11 timeline. What I
> > > > > > > have here satisfies my current needs. I suggest that we make
> > > > > > > provisions in APIs now to
> > > > > > support all these features, but do the implementation in the
> > > > > > coming
> > > > releases.
> > > > > > Does this sound ok for you?
> > > > > >
> > > > > > Not sure I understand your suggestion here...
> > > > > > Could you explain it a bit more - how new API will look like and
> > > > > > what would be left for the future.
> > > > > For this patch, I suggest we do not add any more complexity. If
> > > > > someone wants a lock-free/block-free mechanism, it is available by
> > > > > creating
> > > > per thread defer queues.
> > > > >
> > > > > We push the following to the future:
> > > > > 1) Dynamically size adjustable defer queue. IMO, with this, the
> > > > > lock-free/block-free reclamation will not be available (memory
> > > > > allocation
> > > > requires locking). The memory for the defer queue will be
> > > > allocated/freed in chunks of 'size' elements as the queue grows/shrinks.
> > > >
> > > > That one is fine by me.
> > > > In fact I don't know would be there a real use-case for dynamic
> > > > defer queue for rcu var...
> > > > But I suppose that's subject for another discussion.
> > > Currently, the defer queue size is equal to the number of resources in
> > > the data structure. This is unnecessary as the reclamation is done regularly.
> > > If a smaller queue size is used, the queue might get full (even after
> > reclamation), in which case, the queue size should be increased.
> >
> > I understand the intention.
> > Though I am not very happy with approach where to free one resource we first
> > have to allocate another one.
> > Sounds like a source of deadlocks and for that case probably unnecessary
> > complication.
> It depends on the use case. For some use cases lock-free reader-writer concurrency is enough (in which case there is no need to have a
> queue large enough to hold all the resources) and some would require lock-free reader-writer and writer-writer concurrency (where,
> theoretically, a queue large enough to hold all the resources would be required).
> 
> > But again, as it is not for 19.11 we don't have to discuss it now.
> >
> > > >
> > > > >
> > > > > 2) Constant size defer queue with lock-free and block-free
> > > > > reclamation (single option). The defer queue will be of fixed
> > > > > length 'size'. If the queue gets full an error is returned. The
> > > > > user could provide a 'size' equal
> > > > to the number of elements in a data structure to ensure queue never gets
> > full.
> > > >
> > > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > > - MP/MC
> > > > - MP/SC
> > > > - SP/SC
> > > Just SP/SC
> >
> > Ok, just to confirm we are on the same page:
> > there would be a possibility for one thread do dq_enqueue(), second one do
> > dq_reclaim() simultaneously (of course if actual reclamation function is thread
> > safe)?
> Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.

Ok, and that probably due to nature of ring_sc_peek(), right?.
BuT user can set reclaim threshold higher then number of elems in the defere queue,
and that should help to prevent dq_reclaim() from inside dq_enqueue(), correct?
If so, I have no objections in general to the proposed plan.
Konstantin

> 
> >
> > > > - non MT at all (only same single thread can do enqueue and dequeue)
> > > If MT safe is required, one should use 1 defer queue per thread for now.
> > >
> > > >
> > > > And related question:
> > > > What additional rte_ring API you plan to introduce in that case?
> > > > - None
> > > > - rte_ring_sc_peek()
> > > rte_ring_peek will be changed to rte_ring_sc_peek
> > >
> > > > - rte_ring_serial_dequeue()
> > > >
> > > > >
> > > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > > provide
> > > > > 2 #defines, one for dynamically variable size defer queue and the
> > > > > other for
> > > > constant size defer queue.
> > > > >
> > > > > However, IMO, using per thread defer queue is a much simpler way
> > > > > to
> > > > achieve 2. It does not add any significant burden to the user either.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > +{
> > > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > > +	unsigned int n = 1;
> > > > > > > > > +	if (count) {
> > > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
> > > > > > > > > +		return 0;
> > > > > > > > > +	}
> > > > > > > > > +	return -ENOENT;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  #ifdef __cplusplus
> > > > > > > > >  }
> > > > > > > > >  #endif
> > > > > > > > > --
> > > > > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API
  2019-10-13 20:09                       ` Ananyev, Konstantin
@ 2019-10-14  4:11                         ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-14  4:11 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, paulmck
  Cc: Wang, Yipeng1, Medvedkin, Vladimir,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> > > > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API
> > > > > > > > > >
> > > > > > > > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > >
> > > > > > > > > > The peek API allows fetching the next available object
> > > > > > > > > > in the ring without dequeuing it. This helps in
> > > > > > > > > > scenarios where dequeuing of objects depend on their value.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Dharmik Thakkar
> > > > > > > > > > <dharmik.thakkar@arm.com>
> > > > > > > > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > > > > > > Reviewed-by: Honnappa Nagarahalli
> > > > > > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > > > > > > ---
> > > > > > > > > >  lib/librte_ring/rte_ring.h | 30
> > > > > > > > > > ++++++++++++++++++++++++++++++
> > > > > > > > > >  1 file changed, 30 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > > > > > b/lib/librte_ring/rte_ring.h index
> > > > > > > > > > 2a9f768a1..d3d0d5e18
> > > > > > > > > > 100644
> > > > > > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct
> > > > > > > > > > rte_ring *r, void
> > > > > > > > > **obj_table,
> > > > > > > > > >  				r->cons.single, available);  }
> > > > > > > > > >
> > > > > > > > > > +/**
> > > > > > > > > > + * Peek one object from a ring.
> > > > > > > > > > + *
> > > > > > > > > > + * The peek API allows fetching the next available
> > > > > > > > > > +object in the ring
> > > > > > > > > > + * without dequeuing it. This API is not multi-thread
> > > > > > > > > > +safe with respect
> > > > > > > > > > + * to other consumer threads.
> > > > > > > > > > + *
> > > > > > > > > > + * @param r
> > > > > > > > > > + *   A pointer to the ring structure.
> > > > > > > > > > + * @param obj_p
> > > > > > > > > > + *   A pointer to a void * pointer (object) that will be filled.
> > > > > > > > > > + * @return
> > > > > > > > > > + *   - 0: Success, object available
> > > > > > > > > > + *   - -ENOENT: Not enough entries in the ring.
> > > > > > > > > > + */
> > > > > > > > > > +__rte_experimental
> > > > > > > > > > +static __rte_always_inline int rte_ring_peek(struct
> > > > > > > > > > +rte_ring *r, void **obj_p)
> > > > > > > > >
> > > > > > > > > As it is not MT safe, then I think we need _sc_ in the
> > > > > > > > > name, to follow other rte_ring functions naming
> > > > > > > > > conventions
> > > > > > > > > (rte_ring_sc_peek() or so).
> > > > > > > > Agree
> > > > > > > >
> > > > > > > > >
> > > > > > > > > As a better alternative what do you think about
> > > > > > > > > introducing a serialized versions of DPDK rte_ring dequeue
> functions?
> > > > > > > > > Something like that:
> > > > > > > > >
> > > > > > > > > /* same as original ring dequeue, but:
> > > > > > > > >   * 1) move cons.head only if cons.head == const.tail
> > > > > > > > >   * 2) don't update cons.tail
> > > > > > > > >   */
> > > > > > > > > unsigned int
> > > > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void
> > > > > > > > > **obj_table, unsigned int n,
> > > > > > > > >                 unsigned int *available);
> > > > > > > > >
> > > > > > > > > /* sets both cons.head and cons.tail to cons.head + num
> > > > > > > > > */ void rte_ring_serial_dequeue_finish(struct rte_ring
> > > > > > > > > *r, uint32_t num);
> > > > > > > > >
> > > > > > > > > /* resets cons.head to const.tail value */ void
> > > > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r);
> > > > > > > > >
> > > > > > > > > Then your dq_reclaim cycle function will look like that:
> > > > > > > > >
> > > > > > > > > const uint32_t nb_elt =  dq->elt_size/8 + 1; uint32_t
> > > > > > > > > avl, n; uintptr_t elt[nb_elt]; ...
> > > > > > > > >
> > > > > > > > > do {
> > > > > > > > >
> > > > > > > > >   /* read next elem from the queue */
> > > > > > > > >   n = rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl);
> > > > > > > > >   if (n == 0)
> > > > > > > > >       break;
> > > > > > > > >
> > > > > > > > >  /* wrong period, keep elem in the queue */  if
> > > > > > > > > (rte_rcu_qsbr_check(dr->v,
> > > > > > > > > elt[0]) != 1) {
> > > > > > > > >      rte_ring_serial_dequeue_abort(dq->r);
> > > > > > > > >      break;
> > > > > > > > >   }
> > > > > > > > >
> > > > > > > > >   /* can reclaim, remove elem from the queue */
> > > > > > > > >   rte_ring_serial_dequeue_finish(dr->q, nb_elt);
> > > > > > > > >
> > > > > > > > >    /*call reclaim function */
> > > > > > > > >   dr->f(dr->p, elt);
> > > > > > > > >
> > > > > > > > > } while (avl >= nb_elt);
> > > > > > > > >
> > > > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT
> safe.
> > > > > > > > > As long as actual reclamation callback itself is MT safe of
> course.
> > > > > > > >
> > > > > > > > I think it is a great idea. The other writers would still
> > > > > > > > be polling for the current writer to update the tail or
> > > > > > > > update the head. This makes it a
> > > > > > > blocking solution.
> > > > > > >
> > > > > > > Yep, it is a blocking one.
> > > > > > >
> > > > > > > > We can make the other threads not poll i.e. they will quit
> > > > > > > > reclaiming if they
> > > > > > > see that other writers are dequeuing from the queue.
> > > > > > >
> > > > > > > Actually didn't think about that possibility, but yes should
> > > > > > > be possible to have _try_ semantics too.
> > > > > > >
> > > > > > > >The other  way is to use per thread queues.
> > > > > > > >
> > > > > > > > The other requirement I see is to support unbounded-size
> > > > > > > > data structures where in the data structures do not have a
> > > > > > > > pre-determined number of entries. Also, currently the
> > > > > > > > defer queue size is equal to the total
> > > > > > > number of entries in a given data structure. There are plans
> > > > > > > to support dynamically resizable defer queue. This means,
> > > > > > > memory allocation which will affect the lock-free-ness of the
> solution.
> > > > > > > >
> > > > > > > > So, IMO:
> > > > > > > > 1) The API should provide the capability to support
> > > > > > > > different algorithms -
> > > > > > > may be through some flags?
> > > > > > > > 2) The requirements for the ring are pretty unique to the
> > > > > > > > problem we have here (for ex: move the cons-head only if
> > > > > > > > cons-tail is also the same, skip
> > > > > > > polling). So, we should probably implement a ring with-in
> > > > > > > the RCU
> > > library?
> > > > > > >
> > > > > > > Personally, I think such serialization ring API would be
> > > > > > > useful for other cases too.
> > > > > > > There are few cases when user need to read contents of the
> > > > > > > queue without removing elements from it.
> > > > > > > Let say we do use similar approach inside TLDK to implement
> > > > > > > TCP transmit queue.
> > > > > > > If such API would exist in DPDK we can just use it
> > > > > > > straightway, without maintaining a separate one.
> > > > > > ok
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > From the timeline perspective, adding all these
> > > > > > > > capabilities would be difficult to get done with in 19.11
> > > > > > > > timeline. What I have here satisfies my current needs. I
> > > > > > > > suggest that we make provisions in APIs now to
> > > > > > > support all these features, but do the implementation in the
> > > > > > > coming
> > > > > releases.
> > > > > > > Does this sound ok for you?
> > > > > > >
> > > > > > > Not sure I understand your suggestion here...
> > > > > > > Could you explain it a bit more - how new API will look like
> > > > > > > and what would be left for the future.
> > > > > > For this patch, I suggest we do not add any more complexity.
> > > > > > If someone wants a lock-free/block-free mechanism, it is
> > > > > > available by creating
> > > > > per thread defer queues.
> > > > > >
> > > > > > We push the following to the future:
> > > > > > 1) Dynamically size adjustable defer queue. IMO, with this,
> > > > > > the lock-free/block-free reclamation will not be available
> > > > > > (memory allocation
> > > > > requires locking). The memory for the defer queue will be
> > > > > allocated/freed in chunks of 'size' elements as the queue
> grows/shrinks.
> > > > >
> > > > > That one is fine by me.
> > > > > In fact I don't know would be there a real use-case for dynamic
> > > > > defer queue for rcu var...
> > > > > But I suppose that's subject for another discussion.
> > > > Currently, the defer queue size is equal to the number of
> > > > resources in the data structure. This is unnecessary as the reclamation is
> done regularly.
> > > > If a smaller queue size is used, the queue might get full (even
> > > > after
> > > reclamation), in which case, the queue size should be increased.
> > >
> > > I understand the intention.
> > > Though I am not very happy with approach where to free one resource
> > > we first have to allocate another one.
> > > Sounds like a source of deadlocks and for that case probably
> > > unnecessary complication.
> > It depends on the use case. For some use cases lock-free reader-writer
> > concurrency is enough (in which case there is no need to have a queue
> > large enough to hold all the resources) and some would require lock-free
> reader-writer and writer-writer concurrency (where, theoretically, a queue
> large enough to hold all the resources would be required).
> >
> > > But again, as it is not for 19.11 we don't have to discuss it now.
> > >
> > > > >
> > > > > >
> > > > > > 2) Constant size defer queue with lock-free and block-free
> > > > > > reclamation (single option). The defer queue will be of fixed
> > > > > > length 'size'. If the queue gets full an error is returned.
> > > > > > The user could provide a 'size' equal
> > > > > to the number of elements in a data structure to ensure queue
> > > > > never gets
> > > full.
> > > > >
> > > > > Ok so for 19.11 what enqueue/dequeue model do you plan to support?
> > > > > - MP/MC
> > > > > - MP/SC
> > > > > - SP/SC
> > > > Just SP/SC
> > >
> > > Ok, just to confirm we are on the same page:
> > > there would be a possibility for one thread do dq_enqueue(), second
> > > one do
> > > dq_reclaim() simultaneously (of course if actual reclamation
> > > function is thread safe)?
> > Yes, that is allowed. Mutual exclusion is required only around dq_reclaim.
This is not completely correct (as you have pointed out below) as dq_enqueue, will end up calling do_reclaim
> 
> Ok, and that probably due to nature of ring_sc_peek(), right?.
> BuT user can set reclaim threshold higher then number of elems in the defere
> queue, and that should help to prevent dq_reclaim() from inside
> dq_enqueue(), correct?
Yes, that is possible.

> If so, I have no objections in general to the proposed plan.
> Konstantin
> 
> >
> > >
> > > > > - non MT at all (only same single thread can do enqueue and
> > > > > dequeue)
> > > > If MT safe is required, one should use 1 defer queue per thread for now.
> > > >
> > > > >
> > > > > And related question:
> > > > > What additional rte_ring API you plan to introduce in that case?
> > > > > - None
> > > > > - rte_ring_sc_peek()
> > > > rte_ring_peek will be changed to rte_ring_sc_peek
> > > >
> > > > > - rte_ring_serial_dequeue()
> > > > >
> > > > > >
> > > > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and
> > > > > > provide
> > > > > > 2 #defines, one for dynamically variable size defer queue and
> > > > > > the other for
> > > > > constant size defer queue.
> > > > > >
> > > > > > However, IMO, using per thread defer queue is a much simpler
> > > > > > way to
> > > > > achieve 2. It does not add any significant burden to the user either.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > +{
> > > > > > > > > > +	uint32_t prod_tail = r->prod.tail;
> > > > > > > > > > +	uint32_t cons_head = r->cons.head;
> > > > > > > > > > +	uint32_t count = (prod_tail - cons_head) & r->mask;
> > > > > > > > > > +	unsigned int n = 1;
> > > > > > > > > > +	if (count) {
> > > > > > > > > > +		DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n,
> void *);
> > > > > > > > > > +		return 0;
> > > > > > > > > > +	}
> > > > > > > > > > +	return -ENOENT;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  #ifdef __cplusplus
> > > > > > > > > >  }
> > > > > > > > > >  #endif
> > > > > > > > > > --
> > > > > > > > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-13  4:36         ` Honnappa Nagarahalli
@ 2019-10-15 11:15           ` Ananyev, Konstantin
  2019-10-18  3:32             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2019-10-15 11:15 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce, Medvedkin, Vladimir,
	olivier.matz
  Cc: dev, stephen, paulmck, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	nd, Ruifeng Wang (Arm Technology China),
	nd


> <snip>
> 
> > Hi guys,
> I have tried to consolidate design related questions here. If I have missed anything, please add.
> 
> >
> > >
> > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > >
> > > Currently, the tbl8 group is freed even though the readers might be
> > > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > > quickly. This results in incorrect lookup results.
> > >
> > > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > > Refer to RCU documentation to understand various aspects of
> > > integrating RCU library into other libraries.
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > ---
> > >  lib/librte_lpm/Makefile            |   3 +-
> > >  lib/librte_lpm/meson.build         |   2 +
> > >  lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
> > >  lib/librte_lpm/rte_lpm.h           |  21 ++++++
> > >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> > >  5 files changed, 122 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > > a7946a1c5..ca9e16312 100644
> > > --- a/lib/librte_lpm/Makefile
> > > +++ b/lib/librte_lpm/Makefile
> > > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name
> > > LIB = librte_lpm.a
> > >
> > > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> > >  CFLAGS += -O3
> > >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal -lrte_hash
> > > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> > >
> > >  EXPORT_MAP := rte_lpm_version.map
> > >
> > > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > > index a5176d8ae..19a35107f 100644
> > > --- a/lib/librte_lpm/meson.build
> > > +++ b/lib/librte_lpm/meson.build
> > > @@ -2,9 +2,11 @@
> > >  # Copyright(c) 2017 Intel Corporation
> > >
> > >  version = 2
> > > +allow_experimental_apis = true
> > >  sources = files('rte_lpm.c', 'rte_lpm6.c')  headers =
> > > files('rte_lpm.h', 'rte_lpm6.h')  # since header files have different
> > > names, we can install all vector headers  # without worrying about
> > > which architecture we actually need  headers +=
> > > files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')  deps +=
> > > ['hash']
> > > +deps += ['rcu']
> > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > > 3a929a1b1..ca58d4b35 100644
> > > --- a/lib/librte_lpm/rte_lpm.c
> > > +++ b/lib/librte_lpm/rte_lpm.c
> > > @@ -1,5 +1,6 @@
> > >  /* SPDX-License-Identifier: BSD-3-Clause
> > >   * Copyright(c) 2010-2014 Intel Corporation
> > > + * Copyright(c) 2019 Arm Limited
> > >   */
> > >
> > >  #include <string.h>
> > > @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> > >
> > >  	rte_mcfg_tailq_write_unlock();
> > >
> > > +	if (lpm->dq)
> > > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> > >  	rte_free(lpm->tbl8);
> > >  	rte_free(lpm->rules_tbl);
> > >  	rte_free(lpm);
> > > @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> > 16.04);
> > > MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> > >  		rte_lpm_free_v1604);
> > >
> > > +struct __rte_lpm_rcu_dq_entry {
> > > +	uint32_t tbl8_group_index;
> > > +	uint32_t pad;
> > > +};
> > > +
> > > +static void
> > > +__lpm_rcu_qsbr_free_resource(void *p, void *data) {
> > > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > > +	struct __rte_lpm_rcu_dq_entry *e =
> > > +			(struct __rte_lpm_rcu_dq_entry *)data;
> > > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > > +
> > > +	/* Set tbl8 group invalid */
> > > +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> > > +		__ATOMIC_RELAXED);
> > > +}
> > > +
> > > +/* Associate QSBR variable with an LPM object.
> > > + */
> > > +int
> > > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > > +	struct rte_rcu_qsbr_dq_parameters params;
> > > +
> > > +	if ((lpm == NULL) || (v == NULL)) {
> > > +		rte_errno = EINVAL;
> > > +		return 1;
> > > +	}
> > > +
> > > +	if (lpm->dq) {
> > > +		rte_errno = EEXIST;
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Init QSBR defer queue. */
> > > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm-
> > >name);
> > > +	params.name = rcu_dq_name;
> > > +	params.size = lpm->number_tbl8s;
> > > +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> > > +	params.f = __lpm_rcu_qsbr_free_resource;
> > > +	params.p = lpm->tbl8;
> > > +	params.v = v;
> > > +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > > +	if (lpm->dq == NULL) {
> > > +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> > > +		return 1;
> > > +	}
> >
> > Few thoughts about that function:
> Few things to keep in mind, the goal of the design is to make it easy for the applications to adopt lock-free algorithms. The reclamation
> process in the writer is a major portion of code one has to write for using lock-free algorithms. The current design is such that the writer
> does not have to change any code or write additional code other than calling 'rte_lpm_rcu_qsbr_add'.
> 
> > It names rcu_qsbr_add() but in fact it allocates defer queue for give rcu var.
> > So first thought - is it always necessary?
> This is part of the design. If the application does not want to use this integrated logic then, it does not have to call this API. It can use the
> RCU defer APIs to implement its own logic. But, if I ask the question, does this integrated logic address most of the use cases of the LPM
> library, I think the answer is yes.
> 
> > For some use-cases I suppose user might be ok to wait for quiescent state
> > change
> > inside tbl8_free()?
> Yes, that is a possibility (for ex: no frequent route changes). But, I think that is very trivial for the application to implement. Though, the LPM
> library has to separate the 'delete' and 'free' operations. 

Exactly.
That's why it is not trivial with current LPM library.
In fact to do that himself right now, user would have to implement and support his own version of LPM code.

Honestly, I don't understand why you consider it as a drawback.
From my perspective only few things need to be changed:

1. Add 2 parameters to 'rte_lpm_rcu_qsbr_add():
    number of elems in defer_queue
    reclaim() threshold value.
If the user doesn't want to provide any values, that's fine we can use default ones here
(as you do it right now).
2. Make rte_lpm_rcu_qsbr_add() to return pointer to the defer_queue.
Again if user doesn't want to call reclaim() himself, he can just ignore return value.

These 2 changes will provide us with necessary flexibility that would help to cover more use-cases:
- user can decide how big should be the defer queue
- user can decide when/how he wants to do reclaim()

Konstantin

>Similar operations are provided in rte_hash library. IMO, we should follow
> consistent approach.
> 
> > Another thing you do allocate defer queue, but it is internal, so user can't call
> > reclaim() manually, which looks strange.
> > Why not to return defer_queue pointer to the user, so he can call reclaim()
> > himself at appropriate time?
> The intention of the design is to take the complexity away from the user of LPM library. IMO, the current design will address most uses
> cases of LPM library. If we expose the 2 parameters (when to trigger reclamation and how much to reclaim) in the 'rte_lpm_rcu_qsbr_add'
> API, it should provide enough flexibility to the application.
> 
> > Third thing - you always allocate defer queue with size equal to number of
> > tbl8.
> > Though I understand it could be up to 16M tbl8 groups inside the LPM.
> > Do we really need defer queue that long?
> No, we do not need it to be this long. It is this long today to avoid returning no-space on the defer queue error.
> 
> > Especially  considering that current rcu_defer_queue will start reclamation
> > when 1/8 of defer_quueue becomes full and wouldn't reclaim more then
> > 1/16 of it.
> > Probably better to let user to decide himself how long defer_queue he needs
> > for that LPM?
> It makes sense to expose it to the user if the writer-writer concurrency is lock-free (no memory allocation allowed to expand the defer
> queue size when the queue is full). However, LPM is not lock-free on the writer side. If we think the writer could be lock-free in the future, it
> has to be exposed to the user.
> 
> >
> > Konstantin
> Pulling questions/comments from other threads:
> Can we leave reclamation to some other house-keeping thread to do (sort of garbage collector). Or such mode is not supported/planned?
> 
> [Honnappa] If the reclamation cost is small, the current method provides advantages over having a separate thread to do reclamation. I did
> not plan to provide such an option. But may be it makes sense to keep the options open (especially from ABI perspective). May be we
> should add a flags field which will allow us to implement different methods in the future?
> 
> >
> >
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  /*
> > >   * Adds a rule to the rule table.
> > >   *
> > > @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> > > *tbl8)  }
> > >
> > >  static int32_t
> > > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > number_tbl8s)
> > > +__tbl8_alloc_v1604(struct rte_lpm *lpm)
> > >  {
> > >  	uint32_t group_idx; /* tbl8 group index. */
> > >  	struct rte_lpm_tbl_entry *tbl8_entry;
> > >
> > >  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > > -		tbl8_entry = &tbl8[group_idx *
> > RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > > +		tbl8_entry = &lpm->tbl8[group_idx *
> > > +
> > 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > >  		/* If a free tbl8 group is found clean it and set as VALID. */
> > >  		if (!tbl8_entry->valid_group) {
> > >  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> > 712,6 +769,21 @@
> > > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> > >  	return -ENOSPC;
> > >  }
> > >
> > > +static int32_t
> > > +tbl8_alloc_v1604(struct rte_lpm *lpm) {
> > > +	int32_t group_idx; /* tbl8 group index. */
> > > +
> > > +	group_idx = __tbl8_alloc_v1604(lpm);
> > > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> > > +		/* If there are no tbl8 groups try to reclaim some. */
> > > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> > > +			group_idx = __tbl8_alloc_v1604(lpm);
> > > +	}
> > > +
> > > +	return group_idx;
> > > +}
> > > +
> > >  static void
> > >  tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t
> > > tbl8_group_start)  { @@ -728,13 +800,21 @@ tbl8_free_v20(struct
> > > rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)  }
> > >
> > >  static void
> > > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > tbl8_group_start)
> > > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> > >  {
> > > -	/* Set tbl8 group invalid*/
> > >  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > > +	struct __rte_lpm_rcu_dq_entry e;
> > >
> > > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > > -			__ATOMIC_RELAXED);
> > > +	if (lpm->dq != NULL) {
> > > +		e.tbl8_group_index = tbl8_group_start;
> > > +		e.pad = 0;
> > > +		/* Push into QSBR defer queue. */
> > > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> > > +	} else {
> > > +		/* Set tbl8 group invalid*/
> > > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> > &zero_tbl8_entry,
> > > +				__ATOMIC_RELAXED);
> > > +	}
> > >  }
> > >
> > >  static __rte_noinline int32_t
> > > @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > > uint32_t ip_masked, uint8_t depth,
> > >
> > >  	if (!lpm->tbl24[tbl24_index].valid) {
> > >  		/* Search for a free tbl8 group. */
> > > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> > >number_tbl8s);
> > > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> > >
> > >  		/* Check tbl8 allocation was successful. */
> > >  		if (tbl8_group_index < 0) {
> > > @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked, uint8_t depth,
> > >  	} /* If valid entry but not extended calculate the index into Table8. */
> > >  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> > >  		/* Search for free tbl8 group. */
> > > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> > >number_tbl8s);
> > > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> > >
> > >  		if (tbl8_group_index < 0) {
> > >  			return tbl8_group_index;
> > > @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> > uint32_t ip_masked,
> > >  		 */
> > >  		lpm->tbl24[tbl24_index].valid = 0;
> > >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > > +		tbl8_free_v1604(lpm, tbl8_group_start);
> > >  	} else if (tbl8_recycle_index > -1) {
> > >  		/* Update tbl24 entry. */
> > >  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> > +1914,7 @@
> > > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> > >  		__atomic_store(&lpm->tbl24[tbl24_index],
> > &new_tbl24_entry,
> > >  				__ATOMIC_RELAXED);
> > >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > > +		tbl8_free_v1604(lpm, tbl8_group_start);
> > >  	}
> > >  #undef group_idx
> > >  	return 0;
> > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > > 906ec4483..49c12a68d 100644
> > > --- a/lib/librte_lpm/rte_lpm.h
> > > +++ b/lib/librte_lpm/rte_lpm.h
> > > @@ -1,5 +1,6 @@
> > >  /* SPDX-License-Identifier: BSD-3-Clause
> > >   * Copyright(c) 2010-2014 Intel Corporation
> > > + * Copyright(c) 2019 Arm Limited
> > >   */
> > >
> > >  #ifndef _RTE_LPM_H_
> > > @@ -21,6 +22,7 @@
> > >  #include <rte_common.h>
> > >  #include <rte_vect.h>
> > >  #include <rte_compat.h>
> > > +#include <rte_rcu_qsbr.h>
> > >
> > >  #ifdef __cplusplus
> > >  extern "C" {
> > > @@ -186,6 +188,7 @@ struct rte_lpm {
> > >  			__rte_cache_aligned; /**< LPM tbl24 table. */
> > >  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> > >  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > > +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
> > >  };
> > >
> > >  /**
> > > @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> > void
> > > rte_lpm_free_v1604(struct rte_lpm *lpm);
> > >
> > > +/**
> > > + * Associate RCU QSBR variable with an LPM object.
> > > + *
> > > + * @param lpm
> > > + *   the lpm object to add RCU QSBR
> > > + * @param v
> > > + *   RCU QSBR variable
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1 with error code set in rte_errno.
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - invalid pointer
> > > + *   - EEXIST - already added QSBR
> > > + *   - ENOMEM - memory allocation failure
> > > + */
> > > +__rte_experimental
> > > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > > +*v);
> > > +
> > >  /**
> > >   * Add a rule to the LPM table.
> > >   *
> > > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > > b/lib/librte_lpm/rte_lpm_version.map
> > > index 90beac853..b353aabd2 100644
> > > --- a/lib/librte_lpm/rte_lpm_version.map
> > > +++ b/lib/librte_lpm/rte_lpm_version.map
> > > @@ -44,3 +44,9 @@ DPDK_17.05 {
> > >  	rte_lpm6_lookup_bulk_func;
> > >
> > >  } DPDK_16.04;
> > > +
> > > +EXPERIMENTAL {
> > > +	global:
> > > +
> > > +	rte_lpm_rcu_qsbr_add;
> > > +};
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-13  3:02         ` Honnappa Nagarahalli
@ 2019-10-15 16:48           ` Medvedkin, Vladimir
  2019-10-18  3:47             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2019-10-15 16:48 UTC (permalink / raw)
  To: Honnappa Nagarahalli, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, nd

Hi Honnappa,

On 13/10/2019 04:02, Honnappa Nagarahalli wrote:
> Hi Vladimir,
> 	Apologies for the delayed response, I had to run few experiments.
>
> <snip>
>
>> Hi Honnappa,
>>
>> On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
>>> Add resource reclamation APIs to make it simple for applications and
>>> libraries to integrate rte_rcu library.
>>>
>>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> ---
>>>    app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
>>>    lib/librte_rcu/meson.build         |   2 +
>>>    lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
>>>    lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
>>>    lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
>>>    lib/librte_rcu/rte_rcu_version.map |   4 +
>>>    lib/meson.build                    |   6 +-
>>>    7 files changed, 700 insertions(+), 3 deletions(-)
>>>    create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
>>>
>>> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c index
>>> d1b9e46a2..3a6815243 100644
>>> --- a/app/test/test_rcu_qsbr.c
>>> +++ b/app/test/test_rcu_qsbr.c
>>> @@ -1,8 +1,9 @@
>>>    /* SPDX-License-Identifier: BSD-3-Clause
>>> - * Copyright (c) 2018 Arm Limited
>>> + * Copyright (c) 2019 Arm Limited
>>>     */
>>>
>>>    #include <stdio.h>
>>> +#include <string.h>
>>>    #include <rte_pause.h>
>>>    #include <rte_rcu_qsbr.h>
>>>    #include <rte_hash.h>
>>> @@ -33,6 +34,7 @@ static uint32_t *keys;
>>>    #define COUNTER_VALUE 4096
>>>    static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
>>>    static uint8_t writer_done;
>>> +static uint8_t cb_failed;
>>>
>>>    static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
>>>    struct rte_hash *h[RTE_MAX_LCORE];
>>> @@ -582,6 +584,269 @@ test_rcu_qsbr_thread_offline(void)
>>>    	return 0;
>>>    }
>>>
>>> +static void
>>> +rte_rcu_qsbr_test_free_resource(void *p, void *e) {
>>> +	if (p != NULL && e != NULL) {
>>> +		printf("%s: Test failed\n", __func__);
>>> +		cb_failed = 1;
>>> +	}
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_create: create a queue used to store the data
>>> +structure
>>> + * elements that can be freed later. This queue is referred to as 'defer
>> queue'.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_create(void)
>>> +{
>>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
>>> +	struct rte_rcu_qsbr_dq_parameters params;
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
>>> +
>>> +	/* Pass invalid parameters */
>>> +	dq = rte_rcu_qsbr_dq_create(NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
>>> +	params.name = rcu_dq_name;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	params.f = rte_rcu_qsbr_test_free_resource;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
>>> +	params.v = t[0];
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	params.size = 1;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	params.esize = 3;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
>>> +params");
>>> +
>>> +	/* Pass all valid parameters */
>>> +	params.esize = 16;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
>> params");
>>> +	rte_rcu_qsbr_dq_delete(dq);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
>>> + * to be freed later after atleast one grace period is over.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_enqueue(void)
>>> +{
>>> +	int ret;
>>> +	uint64_t r;
>>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
>>> +	struct rte_rcu_qsbr_dq_parameters params;
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
>>> +
>>> +	/* Create a queue with simple parameters */
>>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
>>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
>>> +	params.name = rcu_dq_name;
>>> +	params.f = rte_rcu_qsbr_test_free_resource;
>>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
>>> +	params.v = t[0];
>>> +	params.size = 1;
>>> +	params.esize = 16;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
>>> +params");
>>> +
>>> +	/* Pass invalid parameters */
>>> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
>>> +params");
>>> +
>>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
>>> +params");
>>> +
>>> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
>>> +params");
>>> +
>>> +	ret = rte_rcu_qsbr_dq_delete(dq);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid
>> params");
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_reclaim(void)
>>> +{
>>> +	int ret;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
>>> +
>>> +	/* Pass invalid parameters */
>>> +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid
>>> +params");
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_delete(void)
>>> +{
>>> +	int ret;
>>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
>>> +	struct rte_rcu_qsbr_dq_parameters params;
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
>>> +
>>> +	/* Pass invalid parameters */
>>> +	ret = rte_rcu_qsbr_dq_delete(NULL);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid
>>> +params");
>>> +
>>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
>>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
>>> +	params.name = rcu_dq_name;
>>> +	params.f = rte_rcu_qsbr_test_free_resource;
>>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
>>> +	params.v = t[0];
>>> +	params.size = 1;
>>> +	params.esize = 16;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
>> params");
>>> +	ret = rte_rcu_qsbr_dq_delete(dq);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
>> params");
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/*
>>> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
>>> + * to be freed later after atleast one grace period is over.
>>> + */
>>> +static int
>>> +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize) {
>>> +	int i, j, ret;
>>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
>>> +	struct rte_rcu_qsbr_dq_parameters params;
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +	uint64_t *e;
>>> +	uint64_t sc = 200;
>>> +	int max_entries;
>>> +
>>> +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
>>> +	printf("Size = %d, esize = %d\n", size, esize);
>>> +
>>> +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
>>> +	if (e == NULL)
>>> +		return 0;
>>> +	cb_failed = 0;
>>> +
>>> +	/* Initialize the RCU variable. No threads are registered */
>>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
>>> +
>>> +	/* Create a queue with simple parameters */
>>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
>>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
>>> +	params.name = rcu_dq_name;
>>> +	params.f = rte_rcu_qsbr_test_free_resource;
>>> +	params.v = t[0];
>>> +	params.size = size;
>>> +	params.esize = esize;
>>> +	dq = rte_rcu_qsbr_dq_create(&params);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
>>> +params");
>>> +
>>> +	/* Given the size and esize, calculate the maximum number of entries
>>> +	 * that can be stored on the defer queue (look at the logic used
>>> +	 * in capacity calculation of rte_ring).
>>> +	 */
>>> +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
>>> +	max_entries = (max_entries - 1)/(esize/8 + 1);
>>> +
>>> +	/* Enqueue few counters starting with the value 'sc' */
>>> +	/* The queue size will be rounded up to 2. The enqueue API also
>>> +	 * reclaims if the queue size is above certain limit. Since, there
>>> +	 * are no threads registered, reclamation succedes. Hence, it should
>>> +	 * be possible to enqueue more than the provided queue size.
>>> +	 */
>>> +	for (i = 0; i < 10; i++) {
>>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
>>> +			"dq enqueue functional");
>>> +		for (j = 0; j < esize/8; j++)
>>> +			e[j] = sc++;
>>> +	}
>>> +
>>> +	/* Register a thread on the RCU QSBR variable. Reclamation will not
>>> +	 * succeed. It should not be possible to enqueue more than the size
>>> +	 * number of resources.
>>> +	 */
>>> +	rte_rcu_qsbr_thread_register(t[0], 1);
>>> +	rte_rcu_qsbr_thread_online(t[0], 1);
>>> +
>>> +	for (i = 0; i < max_entries; i++) {
>>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
>>> +			"dq enqueue functional");
>>> +		for (j = 0; j < esize/8; j++)
>>> +			e[j] = sc++;
>>> +	}
>>> +
>>> +	/* Enqueue fails as queue is full */
>>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
>> functional");
>>> +
>>> +	/* Delete should fail as there are elements in defer queue which
>>> +	 * cannot be reclaimed.
>>> +	 */
>>> +	ret = rte_rcu_qsbr_dq_delete(dq);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid
>> params");
>>> +
>>> +	/* Report quiescent state, enqueue should succeed */
>>> +	rte_rcu_qsbr_quiescent(t[0], 1);
>>> +	for (i = 0; i < max_entries; i++) {
>>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
>>> +			"dq enqueue functional");
>>> +		for (j = 0; j < esize/8; j++)
>>> +			e[j] = sc++;
>>> +	}
>>> +
>>> +	/* Queue is full */
>>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
>> functional");
>>> +
>>> +	/* Report quiescent state, delete should succeed */
>>> +	rte_rcu_qsbr_quiescent(t[0], 1);
>>> +	ret = rte_rcu_qsbr_dq_delete(dq);
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
>> params");
>>> +
>>> +	/* Validate that call back function did not return any error */
>>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
>>> +
>>> +	rte_free(e);
>>> +	return 0;
>>> +}
>>> +
>>>    /*
>>>     * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
>>>     */
>>> @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
>>>    	if (test_rcu_qsbr_thread_offline() < 0)
>>>    		goto test_fail;
>>>
>>> +	if (test_rcu_qsbr_dq_create() < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_reclaim() < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_delete() < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_enqueue() < 0)
>>> +		goto test_fail;
>>> +
>>>    	printf("\nFunctional tests\n");
>>>
>>>    	if (test_rcu_qsbr_sw_sv_3qs() < 0)
>>> @@ -1033,6 +1310,18 @@ test_rcu_qsbr_main(void)
>>>    	if (test_rcu_qsbr_mw_mv_mqs() < 0)
>>>    		goto test_fail;
>>>
>>> +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
>>> +		goto test_fail;
>>> +
>>> +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
>>> +		goto test_fail;
>>> +
>>>    	free_rcu();
>>>
>>>    	printf("\n");
>>> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
>>> index 62920ba02..e280b29c1 100644
>>> --- a/lib/librte_rcu/meson.build
>>> +++ b/lib/librte_rcu/meson.build
>>> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
>>>    if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
>>>    	ext_deps += cc.find_library('atomic')
>>>    endif
>>> +
>>> +deps += ['ring']
>>> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
>>> b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
>>> --- a/lib/librte_rcu/rte_rcu_qsbr.c
>>> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
>>> @@ -21,6 +21,7 @@
>>>    #include <rte_errno.h>
>>>
>>>    #include "rte_rcu_qsbr.h"
>>> +#include "rte_rcu_qsbr_pvt.h"
>>>
>>>    /* Get the memory size of QSBR variable */
>>>    size_t
>>> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr
>> *v)
>>>    	return 0;
>>>    }
>>>
>>> +/* Create a queue used to store the data structure elements that can
>>> + * be freed later. This queue is referred to as 'defer queue'.
>>> + */
>>> +struct rte_rcu_qsbr_dq *
>>> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
>>> +*params) {
>>> +	struct rte_rcu_qsbr_dq *dq;
>>> +	uint32_t qs_fifo_size;
>>> +
>>> +	if (params == NULL || params->f == NULL ||
>>> +		params->v == NULL || params->name == NULL ||
>>> +		params->size == 0 || params->esize == 0 ||
>>> +		(params->esize % 8 != 0)) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Invalid input parameter\n", __func__);
>>> +		rte_errno = EINVAL;
>>> +
>>> +		return NULL;
>>> +	}
>>> +
>>> +	dq = rte_zmalloc(NULL,
>>> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
>>> +		RTE_CACHE_LINE_SIZE);
>>> +	if (dq == NULL) {
>>> +		rte_errno = ENOMEM;
>>> +
>>> +		return NULL;
>>> +	}
>>> +
>>> +	/* round up qs_fifo_size to next power of two that is not less than
>>> +	 * max_size.
>>> +	 */
>>> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
>>> +					* params->size) + 1);
>>> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
>>> +					SOCKET_ID_ANY, 0);
>>> +	if (dq->r == NULL) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): defer queue create failed\n", __func__);
>>> +		rte_free(dq);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	dq->v = params->v;
>>> +	dq->size = params->size;
>>> +	dq->esize = params->esize;
>>> +	dq->f = params->f;
>>> +	dq->p = params->p;
>>> +
>>> +	return dq;
>>> +}
>>> +
>>> +/* Enqueue one resource to the defer queue to free after the grace
>>> + * period is over.
>>> + */
>>> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
>>> +	uint64_t token;
>>> +	uint64_t *tmp;
>>> +	uint32_t i;
>>> +	uint32_t cur_size, free_size;
>>> +
>>> +	if (dq == NULL || e == NULL) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Invalid input parameter\n", __func__);
>>> +		rte_errno = EINVAL;
>>> +
>>> +		return 1;
>>> +	}
>>> +
>>> +	/* Start the grace period */
>>> +	token = rte_rcu_qsbr_start(dq->v);
>>> +
>>> +	/* Reclaim resources if the queue is 1/8th full. This helps
>>> +	 * the queue from growing too large and allows time for reader
>>> +	 * threads to report their quiescent state.
>>> +	 */
>>> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
>>> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
>>> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
>>> +			"%s(): Triggering reclamation\n", __func__);
>>> +		rte_rcu_qsbr_dq_reclaim(dq);
>>> +	}
>> There are two problems I see:
>>
>> 1. rte_rcu_qsbr_dq_reclaim() reclaims only 1/16 of the defer queue while it
>> triggers on 1/8. This means that there will always be 1/16 of non reclaimed
>> entries in the queue.
> There will be 'at least' 1/16 non-reclaimed entries.
Correct, that's what I meant :)
>   It could be more depending on the length of the grace period and the rate of deletion.

Right, the number of entries to reclaim depends on:

- grace period which is application specific

- cost of delete operation which is library (algorithm) specific

- rate of deletion which depends on runtime.

So it is very hard to predict how big should be threshold to trigger 
reclamation and how many entries should it reclaim.

> The trigger of 1/8 is used to give sufficient time for the readers to report their quiescent state. 1/16 is used to spread the load of reclamation across multiple calls and provide a upper bound on the cycles consumed.

1/16 of max entries to reclaim within single call can cost a lot. 
Moreover, it could have an impact on the readers through massive cache 
evictions.

Consider a set of routes from test_lpm_perf.c. To install all routes you 
need to have at least 65k tbl8 entries (now it has 2k). So when 
reclaiming, besides the costs of rte_rcu_qsbr_check(), you'll need to 
rewrite 4k cache lines.

So 1/16 of max entries is relatively big and it's better to spread this 
load across multiple calls.

>
>> 2. Number of entries to reclaim depend on dq->size. So,
>> rte_rcu_qsbr_dq_reclaim() could take a lot of cycles. For LPM library this
> That is true. It depends on dq->size (number of tbl8 groups). However, note that there is patch [1] which provides batch reclamation kind of behavior which reduces the cycles consumed by reclamation significantly.
>
> [1] https://patches.dpdk.org/patch/58960/
>
>> means that rte_lpm_delete() sometimes takes a long time.
> Agree, sometimes takes additional time. It is good to spread it over multiple calls.
Right, with batch reclamation we have here classic throughput vs latency 
problem. Either reclaiming big number of entries relatively infrequently 
spreading the cost of readers quiescent state check or reclaiming small 
amount of entries more often spending more cycles in average. I'd prefer 
latency here because as I mentioned earlier huge batches could have an 
impact on readers and lead to big difference in cost of delete().
>
>> So, my suggestions here would be
>>
>> - trigger rte_rcu_qsbr_dq_reclaim() with every enqueue
> Given that the LPM APIs are mainly for control plane, I would think that, the next time LPM API is called, the readers have completed the grace period. But if there are frequent updates, we might end up with empty reclaims which will waste cycles. IMO, this trigger should happen after at least few entries are in the queue.
>
>> - reclaim small amount of entries (could be configurable of creation time)
> Agree. I would keep it a smaller than the trigger amount knowing that the elements added right before the trigger might not have completed the grace period.
>
>> - provide API to trigger reclaim from the application manually.
> IMO, this will add additional complexity to the application. I agree that there will be special needs for some applications. I think those applications might have to implement their own methods using the base RCU APIs.
> Instead, as agreed in other threads, I suggest we expose the parameters (when to trigger and how much to reclaim) to the application as optional configurable parameters. i.e. if the application does not provide we can use default values. I think this should provide enough flexibility to the application.

Agree.

Regarding default values, one strategy could be:

- if reported threshold isn't set (i.e. is equal 0) then call reclaim 
with every enqueue (i.e. threshold == 1)

- if max_entries_to_reclaim isn't set then reclaim as much as we can


>>> +
>>> +	/* Check if there is space for atleast for 1 resource */
>>> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
>>> +	if (!free_size) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Defer queue is full\n", __func__);
>>> +		rte_errno = ENOSPC;
>>> +		return 1;
>>> +	}
>>> +
>>> +	/* Enqueue the resource */
>>> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
>>> +
>>> +	/* The resource to enqueue needs to be a multiple of 64b
>>> +	 * due to the limitation of the rte_ring implementation.
>>> +	 */
>>> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
>>> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/* Reclaim resources from the defer queue. */ int
>>> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
>>> +	uint32_t max_cnt;
>>> +	uint32_t cnt;
>>> +	void *token;
>>> +	uint64_t *tmp;
>>> +	uint32_t i;
>>> +
>>> +	if (dq == NULL) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Invalid input parameter\n", __func__);
>>> +		rte_errno = EINVAL;
>>> +
>>> +		return 1;
>>> +	}
>>> +
>>> +	/* Anything to reclaim? */
>>> +	if (rte_ring_count(dq->r) == 0)
>>> +		return 0;
>>> +
>>> +	/* Reclaim at the max 1/16th the total number of entries. */
>>> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
>>> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
>>> +	cnt = 0;
>>> +
>>> +	/* Check reader threads quiescent state and reclaim resources */
>>> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
>>> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
>>> +			== 1)) {
>>> +		(void)rte_ring_sc_dequeue(dq->r, &token);
>>> +		/* The resource to dequeue needs to be a multiple of 64b
>>> +		 * due to the limitation of the rte_ring implementation.
>>> +		 */
>>> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
>>> +			i++, tmp++)
>>> +			(void)rte_ring_sc_dequeue(dq->r,
>>> +					(void *)(uintptr_t)tmp);
>>> +		dq->f(dq->p, dq->e);
>>> +
>>> +		cnt++;
>>> +	}
>>> +
>>> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
>>> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
>>> +
>>> +	if (cnt == 0) {
>>> +		/* No resources were reclaimed */
>>> +		rte_errno = EAGAIN;
>>> +		return 1;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/* Delete a defer queue. */
>>> +int
>>> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
>>> +	if (dq == NULL) {
>>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
>>> +			"%s(): Invalid input parameter\n", __func__);
>>> +		rte_errno = EINVAL;
>>> +
>>> +		return 1;
>>> +	}
>>> +
>>> +	/* Reclaim all the resources */
>>> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
>>> +		/* Error number is already set by the reclaim API */
>>> +		return 1;
>>> +
>>> +	rte_ring_free(dq->r);
>>> +	rte_free(dq);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>    int rte_rcu_log_type;
>>>
>>>    RTE_INIT(rte_rcu_register)
>>> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
>>> b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
>>> --- a/lib/librte_rcu/rte_rcu_qsbr.h
>>> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
>>> @@ -34,6 +34,7 @@ extern "C" {
>>>    #include <rte_lcore.h>
>>>    #include <rte_debug.h>
>>>    #include <rte_atomic.h>
>>> +#include <rte_ring.h>
>>>
>>>    extern int rte_rcu_log_type;
>>>
>>> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
>>>    	 */
>>>    } __rte_cache_aligned;
>>>
>>> +/**
>>> + * Call back function called to free the resources.
>>> + *
>>> + * @param p
>>> + *   Pointer provided while creating the defer queue
>>> + * @param e
>>> + *   Pointer to the resource data stored on the defer queue
>>> + *
>>> + * @return
>>> + *   None
>>> + */
>>> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
>>> +
>>> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
>>> +
>>> +/**
>>> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
>>> + */
>>> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
>>> +
>>> +/**
>>> + *  Reclaim at the max 1/16th the total number of resources.
>>> + */
>>> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
>>> +
>>> +/**
>>> + * Parameters used when creating the defer queue.
>>> + */
>>> +struct rte_rcu_qsbr_dq_parameters {
>>> +	const char *name;
>>> +	/**< Name of the queue. */
>>> +	uint32_t size;
>>> +	/**< Number of entries in queue. Typically, this will be
>>> +	 *   the same as the maximum number of entries supported in the
>>> +	 *   lock free data structure.
>>> +	 *   Data structures with unbounded number of entries is not
>>> +	 *   supported currently.
>>> +	 */
>>> +	uint32_t esize;
>>> +	/**< Size (in bytes) of each element in the defer queue.
>>> +	 *   This has to be multiple of 8B as the rte_ring APIs
>>> +	 *   support 8B element sizes only.
>>> +	 */
>>> +	rte_rcu_qsbr_free_resource f;
>>> +	/**< Function to call to free the resource. */
>>> +	void *p;
>>> +	/**< Pointer passed to the free function. Typically, this is the
>>> +	 *   pointer to the data structure to which the resource to free
>>> +	 *   belongs. This can be NULL.
>>> +	 */
>>> +	struct rte_rcu_qsbr *v;
>>> +	/**< RCU QSBR variable to use for this defer queue */ };
>>> +
>>> +/* RTE defer queue structure.
>>> + * This structure holds the defer queue. The defer queue is used to
>>> + * hold the deleted entries from the data structure that are not
>>> + * yet freed.
>>> + */
>>> +struct rte_rcu_qsbr_dq;
>>> +
>>>    /**
>>>     * @warning
>>>     * @b EXPERIMENTAL: this API may change without prior notice @@
>>> -648,6 +710,113 @@ __rte_experimental
>>>    int
>>>    rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Create a queue used to store the data structure elements that can
>>> + * be freed later. This queue is referred to as 'defer queue'.
>>> + *
>>> + * @param params
>>> + *   Parameters to create a defer queue.
>>> + * @return
>>> + *   On success - Valid pointer to defer queue
>>> + *   On error - NULL
>>> + *   Possible rte_errno codes are:
>>> + *   - EINVAL - NULL parameters are passed
>>> + *   - ENOMEM - Not enough memory
>>> + */
>>> +__rte_experimental
>>> +struct rte_rcu_qsbr_dq *
>>> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
>>> +*params);
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Enqueue one resource to the defer queue and start the grace period.
>>> + * The resource will be freed later after at least one grace period
>>> + * is over.
>>> + *
>>> + * If the defer queue is full, it will attempt to reclaim resources.
>>> + * It will also reclaim resources at regular intervals to avoid
>>> + * the defer queue from growing too big.
>>> + *
>>> + * This API is not multi-thread safe. It is expected that the caller
>>> + * provides multi-thread safety by locking a mutex or some other means.
>>> + *
>>> + * A lock free multi-thread writer algorithm could achieve
>>> +multi-thread
>>> + * safety by creating and using one defer queue per thread.
>>> + *
>>> + * @param dq
>>> + *   Defer queue to allocate an entry from.
>>> + * @param e
>>> + *   Pointer to resource data to copy to the defer queue. The size of
>>> + *   the data to copy is equal to the element size provided when the
>>> + *   defer queue was created.
>>> + * @return
>>> + *   On success - 0
>>> + *   On error - 1 with rte_errno set to
>>> + *   - EINVAL - NULL parameters are passed
>>> + *   - ENOSPC - Defer queue is full. This condition can not happen
>>> + *		if the defer queue size is equal (or larger) than the
>>> + *		number of elements in the data structure.
>>> + */
>>> +__rte_experimental
>>> +int
>>> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Reclaim resources from the defer queue.
>>> + *
>>> + * This API is not multi-thread safe. It is expected that the caller
>>> + * provides multi-thread safety by locking a mutex or some other means.
>>> + *
>>> + * A lock free multi-thread writer algorithm could achieve
>>> +multi-thread
>>> + * safety by creating and using one defer queue per thread.
>>> + *
>>> + * @param dq
>>> + *   Defer queue to reclaim an entry from.
>>> + * @return
>>> + *   On successful reclamation of at least 1 resource - 0
>>> + *   On error - 1 with rte_errno set to
>>> + *   - EINVAL - NULL parameters are passed
>>> + *   - EAGAIN - None of the resources have completed at least 1 grace
>> period,
>>> + *		try again.
>>> + */
>>> +__rte_experimental
>>> +int
>>> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>> + *
>>> + * Delete a defer queue.
>>> + *
>>> + * It tries to reclaim all the resources on the defer queue.
>>> + * If any of the resources have not completed the grace period
>>> + * the reclamation stops and returns immediately. The rest of
>>> + * the resources are not reclaimed and the defer queue is not
>>> + * freed.
>>> + *
>>> + * @param dq
>>> + *   Defer queue to delete.
>>> + * @return
>>> + *   On success - 0
>>> + *   On error - 1
>>> + *   Possible rte_errno codes are:
>>> + *   - EINVAL - NULL parameters are passed
>>> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
>>> + *		period, try again.
>>> + */
>>> +__rte_experimental
>>> +int
>>> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
>>> +
>>>    #ifdef __cplusplus
>>>    }
>>>    #endif
>>> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
>>> b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
>>> new file mode 100644
>>> index 000000000..2122bc36a
>>> --- /dev/null
>>> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
>>> @@ -0,0 +1,46 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright (c) 2019 Arm Limited
>>> + */
>>> +
>>> +#ifndef _RTE_RCU_QSBR_PVT_H_
>>> +#define _RTE_RCU_QSBR_PVT_H_
>>> +
>>> +/**
>>> + * This file is private to the RCU library. It should not be included
>>> + * by the user of this library.
>>> + */
>>> +
>>> +#ifdef __cplusplus
>>> +extern "C" {
>>> +#endif
>>> +
>>> +#include "rte_rcu_qsbr.h"
>>> +
>>> +/* RTE defer queue structure.
>>> + * This structure holds the defer queue. The defer queue is used to
>>> + * hold the deleted entries from the data structure that are not
>>> + * yet freed.
>>> + */
>>> +struct rte_rcu_qsbr_dq {
>>> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
>>> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
>>> +	uint32_t size;
>>> +	/**< Number of elements in the defer queue */
>>> +	uint32_t esize;
>>> +	/**< Size (in bytes) of data stored on the defer queue */
>>> +	rte_rcu_qsbr_free_resource f;
>>> +	/**< Function to call to free the resource. */
>>> +	void *p;
>>> +	/**< Pointer passed to the free function. Typically, this is the
>>> +	 *   pointer to the data structure to which the resource to free
>>> +	 *   belongs.
>>> +	 */
>>> +	char e[0];
>>> +	/**< Temporary storage to copy the defer queue element. */ };
>>> +
>>> +#ifdef __cplusplus
>>> +}
>>> +#endif
>>> +
>>> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
>>> diff --git a/lib/librte_rcu/rte_rcu_version.map
>>> b/lib/librte_rcu/rte_rcu_version.map
>>> index f8b9ef2ab..dfac88a37 100644
>>> --- a/lib/librte_rcu/rte_rcu_version.map
>>> +++ b/lib/librte_rcu/rte_rcu_version.map
>>> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>>>    	rte_rcu_qsbr_synchronize;
>>>    	rte_rcu_qsbr_thread_register;
>>>    	rte_rcu_qsbr_thread_unregister;
>>> +	rte_rcu_qsbr_dq_create;
>>> +	rte_rcu_qsbr_dq_enqueue;
>>> +	rte_rcu_qsbr_dq_reclaim;
>>> +	rte_rcu_qsbr_dq_delete;
>>>
>>>    	local: *;
>>>    };
>>> diff --git a/lib/meson.build b/lib/meson.build index
>>> e5ff83893..0e1be8407 100644
>>> --- a/lib/meson.build
>>> +++ b/lib/meson.build
>>> @@ -11,7 +11,9 @@
>>>    libraries = [
>>>    	'kvargs', # eal depends on kvargs
>>>    	'eal', # everything depends on eal
>>> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>>> +	'ring',
>>> +	'rcu', # rcu depends on ring
>>> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>>>    	'cmdline',
>>>    	'metrics', # bitrate/latency stats depends on this
>>>    	'hash',    # efd depends on this
>>> @@ -22,7 +24,7 @@ libraries = [
>>>    	'gro', 'gso', 'ip_frag', 'jobstats',
>>>    	'kni', 'latencystats', 'lpm', 'member',
>>>    	'power', 'pdump', 'rawdev',
>>> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
>>> +	'reorder', 'sched', 'security', 'stack', 'vhost',
>>>    	# ipsec lib depends on net, crypto and security
>>>    	'ipsec',
>>>    	# add pkt framework libs which use other libs from above
>> --
>> Regards,
>> Vladimir

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR
  2019-10-15 11:15           ` Ananyev, Konstantin
@ 2019-10-18  3:32             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-18  3:32 UTC (permalink / raw)
  To: Ananyev, Konstantin, Richardson, Bruce, Medvedkin, Vladimir,
	olivier.matz
  Cc: dev, stephen, paulmck, Gavin Hu (Arm Technology China),
	Dharmik Thakkar, Ruifeng Wang (Arm Technology China),
	Honnappa Nagarahalli, nd, nd

<snip>

> >
> > > Hi guys,
> > I have tried to consolidate design related questions here. If I have missed
> anything, please add.
> >
> > >
> > > >
> > > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > >
> > > > Currently, the tbl8 group is freed even though the readers might
> > > > be using the tbl8 group entries. The freed tbl8 group can be
> > > > reallocated quickly. This results in incorrect lookup results.
> > > >
> > > > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > > > Refer to RCU documentation to understand various aspects of
> > > > integrating RCU library into other libraries.
> > > >
> > > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > ---
> > > >  lib/librte_lpm/Makefile            |   3 +-
> > > >  lib/librte_lpm/meson.build         |   2 +
> > > >  lib/librte_lpm/rte_lpm.c           | 102 +++++++++++++++++++++++++----
> > > >  lib/librte_lpm/rte_lpm.h           |  21 ++++++
> > > >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> > > >  5 files changed, 122 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> > > > index
> > > > a7946a1c5..ca9e16312 100644
> > > > --- a/lib/librte_lpm/Makefile
> > > > +++ b/lib/librte_lpm/Makefile
> > > > @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name
> > > > LIB = librte_lpm.a
> > > >
> > > > +CFLAGS += -DALLOW_EXPERIMENTAL_API
> > > >  CFLAGS += -O3
> > > >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal
> > > > -lrte_hash
> > > > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> > > >
> > > >  EXPORT_MAP := rte_lpm_version.map
> > > >
> > > > diff --git a/lib/librte_lpm/meson.build
> > > > b/lib/librte_lpm/meson.build index a5176d8ae..19a35107f 100644
> > > > --- a/lib/librte_lpm/meson.build
> > > > +++ b/lib/librte_lpm/meson.build
> > > > @@ -2,9 +2,11 @@
> > > >  # Copyright(c) 2017 Intel Corporation
> > > >
> > > >  version = 2
> > > > +allow_experimental_apis = true
> > > >  sources = files('rte_lpm.c', 'rte_lpm6.c')  headers =
> > > > files('rte_lpm.h', 'rte_lpm6.h')  # since header files have
> > > > different names, we can install all vector headers  # without
> > > > worrying about which architecture we actually need  headers +=
> > > > files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> > > > deps += ['hash']
> > > > +deps += ['rcu']
> > > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> > > > index
> > > > 3a929a1b1..ca58d4b35 100644
> > > > --- a/lib/librte_lpm/rte_lpm.c
> > > > +++ b/lib/librte_lpm/rte_lpm.c
> > > > @@ -1,5 +1,6 @@
> > > >  /* SPDX-License-Identifier: BSD-3-Clause
> > > >   * Copyright(c) 2010-2014 Intel Corporation
> > > > + * Copyright(c) 2019 Arm Limited
> > > >   */
> > > >
> > > >  #include <string.h>
> > > > @@ -381,6 +382,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
> > > >
> > > >  	rte_mcfg_tailq_write_unlock();
> > > >
> > > > +	if (lpm->dq)
> > > > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> > > >  	rte_free(lpm->tbl8);
> > > >  	rte_free(lpm->rules_tbl);
> > > >  	rte_free(lpm);
> > > > @@ -390,6 +393,59 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604,
> > > 16.04);
> > > > MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
> > > >  		rte_lpm_free_v1604);
> > > >
> > > > +struct __rte_lpm_rcu_dq_entry {
> > > > +	uint32_t tbl8_group_index;
> > > > +	uint32_t pad;
> > > > +};
> > > > +
> > > > +static void
> > > > +__lpm_rcu_qsbr_free_resource(void *p, void *data) {
> > > > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > > > +	struct __rte_lpm_rcu_dq_entry *e =
> > > > +			(struct __rte_lpm_rcu_dq_entry *)data;
> > > > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > > > +
> > > > +	/* Set tbl8 group invalid */
> > > > +	__atomic_store(&tbl8[e->tbl8_group_index], &zero_tbl8_entry,
> > > > +		__ATOMIC_RELAXED);
> > > > +}
> > > > +
> > > > +/* Associate QSBR variable with an LPM object.
> > > > + */
> > > > +int
> > > > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr *v) {
> > > > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > > > +	struct rte_rcu_qsbr_dq_parameters params;
> > > > +
> > > > +	if ((lpm == NULL) || (v == NULL)) {
> > > > +		rte_errno = EINVAL;
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	if (lpm->dq) {
> > > > +		rte_errno = EEXIST;
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	/* Init QSBR defer queue. */
> > > > +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s", lpm-
> > > >name);
> > > > +	params.name = rcu_dq_name;
> > > > +	params.size = lpm->number_tbl8s;
> > > > +	params.esize = sizeof(struct __rte_lpm_rcu_dq_entry);
> > > > +	params.f = __lpm_rcu_qsbr_free_resource;
> > > > +	params.p = lpm->tbl8;
> > > > +	params.v = v;
> > > > +	lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > > > +	if (lpm->dq == NULL) {
> > > > +		RTE_LOG(ERR, LPM, "LPM QS defer queue creation failed\n");
> > > > +		return 1;
> > > > +	}
> > >
> > > Few thoughts about that function:
> > Few things to keep in mind, the goal of the design is to make it easy
> > for the applications to adopt lock-free algorithms. The reclamation
> > process in the writer is a major portion of code one has to write for using
> lock-free algorithms. The current design is such that the writer does not have
> to change any code or write additional code other than calling
> 'rte_lpm_rcu_qsbr_add'.
> >
> > > It names rcu_qsbr_add() but in fact it allocates defer queue for give rcu var.
> > > So first thought - is it always necessary?
> > This is part of the design. If the application does not want to use
> > this integrated logic then, it does not have to call this API. It can
> > use the RCU defer APIs to implement its own logic. But, if I ask the question,
> does this integrated logic address most of the use cases of the LPM library, I
> think the answer is yes.
> >
> > > For some use-cases I suppose user might be ok to wait for quiescent
> > > state change inside tbl8_free()?
> > Yes, that is a possibility (for ex: no frequent route changes). But, I
> > think that is very trivial for the application to implement. Though, the LPM
> library has to separate the 'delete' and 'free' operations.
> 
> Exactly.
> That's why it is not trivial with current LPM library.
> In fact to do that himself right now, user would have to implement and support
> his own version of LPM code.
😊, well we definitely don't want them to write their own library (if DPDK LPM is enough)
IMO, we need to be consistent with other libraries in terms of APIs. That's another topic.
I do not see any problem to implement this or provide facility to implement this in the future in the APIs now. We can add 'flags' field which will allow for other methods of reclamation.

> 
> Honestly, I don't understand why you consider it as a drawback.
> From my perspective only few things need to be changed:
> 
> 1. Add 2 parameters to 'rte_lpm_rcu_qsbr_add():
>     number of elems in defer_queue
>     reclaim() threshold value.
> If the user doesn't want to provide any values, that's fine we can use default
> ones here (as you do it right now).
I think we have agreed on this, I see the value in doing this.

> 2. Make rte_lpm_rcu_qsbr_add() to return pointer to the defer_queue.
> Again if user doesn't want to call reclaim() himself, he can just ignore return
> value.
Given the goal of reducing the burden on the user, this is not in that direction. But if you see a use case for it, I don't have any issues. Vladimir asked for it as well in the other thread.

> 
> These 2 changes will provide us with necessary flexibility that would help to
> cover more use-cases:
> - user can decide how big should be the defer queue
> - user can decide when/how he wants to do reclaim()
> 
> Konstantin
> 
> >Similar operations are provided in rte_hash library. IMO, we should
> >follow  consistent approach.
> >
> > > Another thing you do allocate defer queue, but it is internal, so
> > > user can't call
> > > reclaim() manually, which looks strange.
> > > Why not to return defer_queue pointer to the user, so he can call
> > > reclaim() himself at appropriate time?
> > The intention of the design is to take the complexity away from the
> > user of LPM library. IMO, the current design will address most uses cases of
> LPM library. If we expose the 2 parameters (when to trigger reclamation and
> how much to reclaim) in the 'rte_lpm_rcu_qsbr_add'
> > API, it should provide enough flexibility to the application.
> >
> > > Third thing - you always allocate defer queue with size equal to
> > > number of tbl8.
> > > Though I understand it could be up to 16M tbl8 groups inside the LPM.
> > > Do we really need defer queue that long?
> > No, we do not need it to be this long. It is this long today to avoid returning
> no-space on the defer queue error.
> >
> > > Especially  considering that current rcu_defer_queue will start
> > > reclamation when 1/8 of defer_quueue becomes full and wouldn't
> > > reclaim more then
> > > 1/16 of it.
> > > Probably better to let user to decide himself how long defer_queue
> > > he needs for that LPM?
> > It makes sense to expose it to the user if the writer-writer
> > concurrency is lock-free (no memory allocation allowed to expand the
> > defer queue size when the queue is full). However, LPM is not lock-free on
> the writer side. If we think the writer could be lock-free in the future, it has to
> be exposed to the user.
> >
> > >
> > > Konstantin
> > Pulling questions/comments from other threads:
> > Can we leave reclamation to some other house-keeping thread to do (sort of
> garbage collector). Or such mode is not supported/planned?
> >
> > [Honnappa] If the reclamation cost is small, the current method
> > provides advantages over having a separate thread to do reclamation. I
> > did not plan to provide such an option. But may be it makes sense to keep the
> options open (especially from ABI perspective). May be we should add a flags
> field which will allow us to implement different methods in the future?
> >
> > >
> > >
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  /*
> > > >   * Adds a rule to the rule table.
> > > >   *
> > > > @@ -679,14 +735,15 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20
> > > > *tbl8)  }
> > > >
> > > >  static int32_t
> > > > -tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > > number_tbl8s)
> > > > +__tbl8_alloc_v1604(struct rte_lpm *lpm)
> > > >  {
> > > >  	uint32_t group_idx; /* tbl8 group index. */
> > > >  	struct rte_lpm_tbl_entry *tbl8_entry;
> > > >
> > > >  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > > > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > > > -		tbl8_entry = &tbl8[group_idx *
> > > RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > > > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > > > +		tbl8_entry = &lpm->tbl8[group_idx *
> > > > +
> > > 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > > >  		/* If a free tbl8 group is found clean it and set as VALID. */
> > > >  		if (!tbl8_entry->valid_group) {
> > > >  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> > > 712,6 +769,21 @@
> > > > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> > > >  	return -ENOSPC;
> > > >  }
> > > >
> > > > +static int32_t
> > > > +tbl8_alloc_v1604(struct rte_lpm *lpm) {
> > > > +	int32_t group_idx; /* tbl8 group index. */
> > > > +
> > > > +	group_idx = __tbl8_alloc_v1604(lpm);
> > > > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> > > > +		/* If there are no tbl8 groups try to reclaim some. */
> > > > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq) == 0)
> > > > +			group_idx = __tbl8_alloc_v1604(lpm);
> > > > +	}
> > > > +
> > > > +	return group_idx;
> > > > +}
> > > > +
> > > >  static void
> > > >  tbl8_free_v20(struct rte_lpm_tbl_entry_v20 *tbl8, uint32_t
> > > > tbl8_group_start)  { @@ -728,13 +800,21 @@ tbl8_free_v20(struct
> > > > rte_lpm_tbl_entry_v20 *tbl8, uint32_t tbl8_group_start)  }
> > > >
> > > >  static void
> > > > -tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > > tbl8_group_start)
> > > > +tbl8_free_v1604(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> > > >  {
> > > > -	/* Set tbl8 group invalid*/
> > > >  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > > > +	struct __rte_lpm_rcu_dq_entry e;
> > > >
> > > > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > > > -			__ATOMIC_RELAXED);
> > > > +	if (lpm->dq != NULL) {
> > > > +		e.tbl8_group_index = tbl8_group_start;
> > > > +		e.pad = 0;
> > > > +		/* Push into QSBR defer queue. */
> > > > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&e);
> > > > +	} else {
> > > > +		/* Set tbl8 group invalid*/
> > > > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> > > &zero_tbl8_entry,
> > > > +				__ATOMIC_RELAXED);
> > > > +	}
> > > >  }
> > > >
> > > >  static __rte_noinline int32_t
> > > > @@ -1037,7 +1117,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > > > uint32_t ip_masked, uint8_t depth,
> > > >
> > > >  	if (!lpm->tbl24[tbl24_index].valid) {
> > > >  		/* Search for a free tbl8 group. */
> > > > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> > > >number_tbl8s);
> > > > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> > > >
> > > >  		/* Check tbl8 allocation was successful. */
> > > >  		if (tbl8_group_index < 0) {
> > > > @@ -1083,7 +1163,7 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > > uint32_t ip_masked, uint8_t depth,
> > > >  	} /* If valid entry but not extended calculate the index into Table8. */
> > > >  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> > > >  		/* Search for free tbl8 group. */
> > > > -		tbl8_group_index = tbl8_alloc_v1604(lpm->tbl8, lpm-
> > > >number_tbl8s);
> > > > +		tbl8_group_index = tbl8_alloc_v1604(lpm);
> > > >
> > > >  		if (tbl8_group_index < 0) {
> > > >  			return tbl8_group_index;
> > > > @@ -1818,7 +1898,7 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> > > uint32_t ip_masked,
> > > >  		 */
> > > >  		lpm->tbl24[tbl24_index].valid = 0;
> > > >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > > > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > > > +		tbl8_free_v1604(lpm, tbl8_group_start);
> > > >  	} else if (tbl8_recycle_index > -1) {
> > > >  		/* Update tbl24 entry. */
> > > >  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -1834,7
> > > +1914,7 @@
> > > > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> > > >  		__atomic_store(&lpm->tbl24[tbl24_index],
> > > &new_tbl24_entry,
> > > >  				__ATOMIC_RELAXED);
> > > >  		__atomic_thread_fence(__ATOMIC_RELEASE);
> > > > -		tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> > > > +		tbl8_free_v1604(lpm, tbl8_group_start);
> > > >  	}
> > > >  #undef group_idx
> > > >  	return 0;
> > > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> > > > index 906ec4483..49c12a68d 100644
> > > > --- a/lib/librte_lpm/rte_lpm.h
> > > > +++ b/lib/librte_lpm/rte_lpm.h
> > > > @@ -1,5 +1,6 @@
> > > >  /* SPDX-License-Identifier: BSD-3-Clause
> > > >   * Copyright(c) 2010-2014 Intel Corporation
> > > > + * Copyright(c) 2019 Arm Limited
> > > >   */
> > > >
> > > >  #ifndef _RTE_LPM_H_
> > > > @@ -21,6 +22,7 @@
> > > >  #include <rte_common.h>
> > > >  #include <rte_vect.h>
> > > >  #include <rte_compat.h>
> > > > +#include <rte_rcu_qsbr.h>
> > > >
> > > >  #ifdef __cplusplus
> > > >  extern "C" {
> > > > @@ -186,6 +188,7 @@ struct rte_lpm {
> > > >  			__rte_cache_aligned; /**< LPM tbl24 table. */
> > > >  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> > > >  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > > > +	struct rte_rcu_qsbr_dq *dq;	/**< RCU QSBR defer queue.*/
> > > >  };
> > > >
> > > >  /**
> > > > @@ -248,6 +251,24 @@ rte_lpm_free_v20(struct rte_lpm_v20 *lpm);
> > > void
> > > > rte_lpm_free_v1604(struct rte_lpm *lpm);
> > > >
> > > > +/**
> > > > + * Associate RCU QSBR variable with an LPM object.
> > > > + *
> > > > + * @param lpm
> > > > + *   the lpm object to add RCU QSBR
> > > > + * @param v
> > > > + *   RCU QSBR variable
> > > > + * @return
> > > > + *   On success - 0
> > > > + *   On error - 1 with error code set in rte_errno.
> > > > + *   Possible rte_errno codes are:
> > > > + *   - EINVAL - invalid pointer
> > > > + *   - EEXIST - already added QSBR
> > > > + *   - ENOMEM - memory allocation failure
> > > > + */
> > > > +__rte_experimental
> > > > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_rcu_qsbr
> > > > +*v);
> > > > +
> > > >  /**
> > > >   * Add a rule to the LPM table.
> > > >   *
> > > > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > > > b/lib/librte_lpm/rte_lpm_version.map
> > > > index 90beac853..b353aabd2 100644
> > > > --- a/lib/librte_lpm/rte_lpm_version.map
> > > > +++ b/lib/librte_lpm/rte_lpm_version.map
> > > > @@ -44,3 +44,9 @@ DPDK_17.05 {
> > > >  	rte_lpm6_lookup_bulk_func;
> > > >
> > > >  } DPDK_16.04;
> > > > +
> > > > +EXPERIMENTAL {
> > > > +	global:
> > > > +
> > > > +	rte_lpm_rcu_qsbr_add;
> > > > +};
> > > > --
> > > > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
  2019-10-15 16:48           ` Medvedkin, Vladimir
@ 2019-10-18  3:47             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2019-10-18  3:47 UTC (permalink / raw)
  To: Medvedkin, Vladimir, konstantin.ananyev, stephen, paulmck
  Cc: yipeng1.wang, Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, dev, Honnappa Nagarahalli, nd, nd

<snip>

> 
> Hi Honnappa,
> 
> On 13/10/2019 04:02, Honnappa Nagarahalli wrote:
> > Hi Vladimir,
> > 	Apologies for the delayed response, I had to run few experiments.
> >
> > <snip>
> >
> >> Hi Honnappa,
> >>
> >> On 01/10/2019 07:29, Honnappa Nagarahalli wrote:
> >>> Add resource reclamation APIs to make it simple for applications and
> >>> libraries to integrate rte_rcu library.
> >>>
> >>> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >>> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>> ---
> >>>    app/test/test_rcu_qsbr.c           | 291 ++++++++++++++++++++++++++++-
> >>>    lib/librte_rcu/meson.build         |   2 +
> >>>    lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> >>>    lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> >>>    lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> >>>    lib/librte_rcu/rte_rcu_version.map |   4 +
> >>>    lib/meson.build                    |   6 +-
> >>>    7 files changed, 700 insertions(+), 3 deletions(-)
> >>>    create mode 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >>>
> >>> diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
> >>> index
> >>> d1b9e46a2..3a6815243 100644
> >>> --- a/app/test/test_rcu_qsbr.c
> >>> +++ b/app/test/test_rcu_qsbr.c
> >>> @@ -1,8 +1,9 @@
> >>>    /* SPDX-License-Identifier: BSD-3-Clause
> >>> - * Copyright (c) 2018 Arm Limited
> >>> + * Copyright (c) 2019 Arm Limited
> >>>     */
> >>>
> >>>    #include <stdio.h>
> >>> +#include <string.h>
> >>>    #include <rte_pause.h>
> >>>    #include <rte_rcu_qsbr.h>
> >>>    #include <rte_hash.h>
> >>> @@ -33,6 +34,7 @@ static uint32_t *keys;
> >>>    #define COUNTER_VALUE 4096
> >>>    static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
> >>>    static uint8_t writer_done;
> >>> +static uint8_t cb_failed;
> >>>
> >>>    static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
> >>>    struct rte_hash *h[RTE_MAX_LCORE]; @@ -582,6 +584,269 @@
> >>> test_rcu_qsbr_thread_offline(void)
> >>>    	return 0;
> >>>    }
> >>>
> >>> +static void
> >>> +rte_rcu_qsbr_test_free_resource(void *p, void *e) {
> >>> +	if (p != NULL && e != NULL) {
> >>> +		printf("%s: Test failed\n", __func__);
> >>> +		cb_failed = 1;
> >>> +	}
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_create: create a queue used to store the data
> >>> +structure
> >>> + * elements that can be freed later. This queue is referred to as
> >>> +'defer
> >> queue'.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_create(void)
> >>> +{
> >>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> >>> +	struct rte_rcu_qsbr_dq_parameters params;
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_create()\n");
> >>> +
> >>> +	/* Pass invalid parameters */
> >>> +	dq = rte_rcu_qsbr_dq_create(NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> >>> +	params.name = rcu_dq_name;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	params.f = rte_rcu_qsbr_test_free_resource;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> >>> +	params.v = t[0];
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	params.size = 1;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	params.esize = 3;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid
> >>> +params");
> >>> +
> >>> +	/* Pass all valid parameters */
> >>> +	params.esize = 16;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> >> params");
> >>> +	rte_rcu_qsbr_dq_delete(dq);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer
> >>> +queue,
> >>> + * to be freed later after atleast one grace period is over.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_enqueue(void)
> >>> +{
> >>> +	int ret;
> >>> +	uint64_t r;
> >>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> >>> +	struct rte_rcu_qsbr_dq_parameters params;
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
> >>> +
> >>> +	/* Create a queue with simple parameters */
> >>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> >>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> >>> +	params.name = rcu_dq_name;
> >>> +	params.f = rte_rcu_qsbr_test_free_resource;
> >>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> >>> +	params.v = t[0];
> >>> +	params.size = 1;
> >>> +	params.esize = 16;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> >>> +params");
> >>> +
> >>> +	/* Pass invalid parameters */
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> >>> +params");
> >>> +
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> >>> +params");
> >>> +
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid
> >>> +params");
> >>> +
> >>> +	ret = rte_rcu_qsbr_dq_delete(dq);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid
> >> params");
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_reclaim(void)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
> >>> +
> >>> +	/* Pass invalid parameters */
> >>> +	ret = rte_rcu_qsbr_dq_reclaim(NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid
> >>> +params");
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_delete: Delete a defer queue.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_delete(void)
> >>> +{
> >>> +	int ret;
> >>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> >>> +	struct rte_rcu_qsbr_dq_parameters params;
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
> >>> +
> >>> +	/* Pass invalid parameters */
> >>> +	ret = rte_rcu_qsbr_dq_delete(NULL);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid
> >>> +params");
> >>> +
> >>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> >>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> >>> +	params.name = rcu_dq_name;
> >>> +	params.f = rte_rcu_qsbr_test_free_resource;
> >>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> >>> +	params.v = t[0];
> >>> +	params.size = 1;
> >>> +	params.esize = 16;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> >> params");
> >>> +	ret = rte_rcu_qsbr_dq_delete(dq);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
> >> params");
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/*
> >>> + * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer
> >>> +queue,
> >>> + * to be freed later after atleast one grace period is over.
> >>> + */
> >>> +static int
> >>> +test_rcu_qsbr_dq_functional(int32_t size, int32_t esize) {
> >>> +	int i, j, ret;
> >>> +	char rcu_dq_name[RTE_RING_NAMESIZE];
> >>> +	struct rte_rcu_qsbr_dq_parameters params;
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +	uint64_t *e;
> >>> +	uint64_t sc = 200;
> >>> +	int max_entries;
> >>> +
> >>> +	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
> >>> +	printf("Size = %d, esize = %d\n", size, esize);
> >>> +
> >>> +	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
> >>> +	if (e == NULL)
> >>> +		return 0;
> >>> +	cb_failed = 0;
> >>> +
> >>> +	/* Initialize the RCU variable. No threads are registered */
> >>> +	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
> >>> +
> >>> +	/* Create a queue with simple parameters */
> >>> +	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
> >>> +	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
> >>> +	params.name = rcu_dq_name;
> >>> +	params.f = rte_rcu_qsbr_test_free_resource;
> >>> +	params.v = t[0];
> >>> +	params.size = size;
> >>> +	params.esize = esize;
> >>> +	dq = rte_rcu_qsbr_dq_create(&params);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid
> >>> +params");
> >>> +
> >>> +	/* Given the size and esize, calculate the maximum number of entries
> >>> +	 * that can be stored on the defer queue (look at the logic used
> >>> +	 * in capacity calculation of rte_ring).
> >>> +	 */
> >>> +	max_entries = rte_align32pow2(((esize/8 + 1) * size) + 1);
> >>> +	max_entries = (max_entries - 1)/(esize/8 + 1);
> >>> +
> >>> +	/* Enqueue few counters starting with the value 'sc' */
> >>> +	/* The queue size will be rounded up to 2. The enqueue API also
> >>> +	 * reclaims if the queue size is above certain limit. Since, there
> >>> +	 * are no threads registered, reclamation succedes. Hence, it should
> >>> +	 * be possible to enqueue more than the provided queue size.
> >>> +	 */
> >>> +	for (i = 0; i < 10; i++) {
> >>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> >>> +			"dq enqueue functional");
> >>> +		for (j = 0; j < esize/8; j++)
> >>> +			e[j] = sc++;
> >>> +	}
> >>> +
> >>> +	/* Register a thread on the RCU QSBR variable. Reclamation will not
> >>> +	 * succeed. It should not be possible to enqueue more than the size
> >>> +	 * number of resources.
> >>> +	 */
> >>> +	rte_rcu_qsbr_thread_register(t[0], 1);
> >>> +	rte_rcu_qsbr_thread_online(t[0], 1);
> >>> +
> >>> +	for (i = 0; i < max_entries; i++) {
> >>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> >>> +			"dq enqueue functional");
> >>> +		for (j = 0; j < esize/8; j++)
> >>> +			e[j] = sc++;
> >>> +	}
> >>> +
> >>> +	/* Enqueue fails as queue is full */
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
> >> functional");
> >>> +
> >>> +	/* Delete should fail as there are elements in defer queue which
> >>> +	 * cannot be reclaimed.
> >>> +	 */
> >>> +	ret = rte_rcu_qsbr_dq_delete(dq);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid
> >> params");
> >>> +
> >>> +	/* Report quiescent state, enqueue should succeed */
> >>> +	rte_rcu_qsbr_quiescent(t[0], 1);
> >>> +	for (i = 0; i < max_entries; i++) {
> >>> +		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +		TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0),
> >>> +			"dq enqueue functional");
> >>> +		for (j = 0; j < esize/8; j++)
> >>> +			e[j] = sc++;
> >>> +	}
> >>> +
> >>> +	/* Queue is full */
> >>> +	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue
> >> functional");
> >>> +
> >>> +	/* Report quiescent state, delete should succeed */
> >>> +	rte_rcu_qsbr_quiescent(t[0], 1);
> >>> +	ret = rte_rcu_qsbr_dq_delete(dq);
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid
> >> params");
> >>> +
> >>> +	/* Validate that call back function did not return any error */
> >>> +	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
> >>> +
> >>> +	rte_free(e);
> >>> +	return 0;
> >>> +}
> >>> +
> >>>    /*
> >>>     * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
> >>>     */
> >>> @@ -1025,6 +1290,18 @@ test_rcu_qsbr_main(void)
> >>>    	if (test_rcu_qsbr_thread_offline() < 0)
> >>>    		goto test_fail;
> >>>
> >>> +	if (test_rcu_qsbr_dq_create() < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_reclaim() < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_delete() < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_enqueue() < 0)
> >>> +		goto test_fail;
> >>> +
> >>>    	printf("\nFunctional tests\n");
> >>>
> >>>    	if (test_rcu_qsbr_sw_sv_3qs() < 0) @@ -1033,6 +1310,18 @@
> >>> test_rcu_qsbr_main(void)
> >>>    	if (test_rcu_qsbr_mw_mv_mqs() < 0)
> >>>    		goto test_fail;
> >>>
> >>> +	if (test_rcu_qsbr_dq_functional(1, 8) < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_functional(2, 8) < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_functional(303, 16) < 0)
> >>> +		goto test_fail;
> >>> +
> >>> +	if (test_rcu_qsbr_dq_functional(7, 128) < 0)
> >>> +		goto test_fail;
> >>> +
> >>>    	free_rcu();
> >>>
> >>>    	printf("\n");
> >>> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> >>> index 62920ba02..e280b29c1 100644
> >>> --- a/lib/librte_rcu/meson.build
> >>> +++ b/lib/librte_rcu/meson.build
> >>> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
> >>>    if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> >>>    	ext_deps += cc.find_library('atomic')
> >>>    endif
> >>> +
> >>> +deps += ['ring']
> >>> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> >>> b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> >>> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> >>> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> >>> @@ -21,6 +21,7 @@
> >>>    #include <rte_errno.h>
> >>>
> >>>    #include "rte_rcu_qsbr.h"
> >>> +#include "rte_rcu_qsbr_pvt.h"
> >>>
> >>>    /* Get the memory size of QSBR variable */
> >>>    size_t
> >>> @@ -267,6 +268,190 @@ rte_rcu_qsbr_dump(FILE *f, struct
> rte_rcu_qsbr
> >> *v)
> >>>    	return 0;
> >>>    }
> >>>
> >>> +/* Create a queue used to store the data structure elements that
> >>> +can
> >>> + * be freed later. This queue is referred to as 'defer queue'.
> >>> + */
> >>> +struct rte_rcu_qsbr_dq *
> >>> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> >>> +*params) {
> >>> +	struct rte_rcu_qsbr_dq *dq;
> >>> +	uint32_t qs_fifo_size;
> >>> +
> >>> +	if (params == NULL || params->f == NULL ||
> >>> +		params->v == NULL || params->name == NULL ||
> >>> +		params->size == 0 || params->esize == 0 ||
> >>> +		(params->esize % 8 != 0)) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Invalid input parameter\n", __func__);
> >>> +		rte_errno = EINVAL;
> >>> +
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	dq = rte_zmalloc(NULL,
> >>> +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> >>> +		RTE_CACHE_LINE_SIZE);
> >>> +	if (dq == NULL) {
> >>> +		rte_errno = ENOMEM;
> >>> +
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	/* round up qs_fifo_size to next power of two that is not less than
> >>> +	 * max_size.
> >>> +	 */
> >>> +	qs_fifo_size = rte_align32pow2((((params->esize/8) + 1)
> >>> +					* params->size) + 1);
> >>> +	dq->r = rte_ring_create(params->name, qs_fifo_size,
> >>> +					SOCKET_ID_ANY, 0);
> >>> +	if (dq->r == NULL) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): defer queue create failed\n", __func__);
> >>> +		rte_free(dq);
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	dq->v = params->v;
> >>> +	dq->size = params->size;
> >>> +	dq->esize = params->esize;
> >>> +	dq->f = params->f;
> >>> +	dq->p = params->p;
> >>> +
> >>> +	return dq;
> >>> +}
> >>> +
> >>> +/* Enqueue one resource to the defer queue to free after the grace
> >>> + * period is over.
> >>> + */
> >>> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> >>> +	uint64_t token;
> >>> +	uint64_t *tmp;
> >>> +	uint32_t i;
> >>> +	uint32_t cur_size, free_size;
> >>> +
> >>> +	if (dq == NULL || e == NULL) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Invalid input parameter\n", __func__);
> >>> +		rte_errno = EINVAL;
> >>> +
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	/* Start the grace period */
> >>> +	token = rte_rcu_qsbr_start(dq->v);
> >>> +
> >>> +	/* Reclaim resources if the queue is 1/8th full. This helps
> >>> +	 * the queue from growing too large and allows time for reader
> >>> +	 * threads to report their quiescent state.
> >>> +	 */
> >>> +	cur_size = rte_ring_count(dq->r) / (dq->esize/8 + 1);
> >>> +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> >>> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> >>> +			"%s(): Triggering reclamation\n", __func__);
> >>> +		rte_rcu_qsbr_dq_reclaim(dq);
> >>> +	}
> >> There are two problems I see:
> >>
> >> 1. rte_rcu_qsbr_dq_reclaim() reclaims only 1/16 of the defer queue
> >> while it triggers on 1/8. This means that there will always be 1/16
> >> of non reclaimed entries in the queue.
> > There will be 'at least' 1/16 non-reclaimed entries.
> Correct, that's what I meant :)
> >   It could be more depending on the length of the grace period and the rate
> of deletion.
> 
> Right, the number of entries to reclaim depends on:
> 
> - grace period which is application specific
> 
> - cost of delete operation which is library (algorithm) specific
> 
> - rate of deletion which depends on runtime.
> 
> So it is very hard to predict how big should be threshold to trigger
> reclamation and how many entries should it reclaim.
> 
> > The trigger of 1/8 is used to give sufficient time for the readers to report
> their quiescent state. 1/16 is used to spread the load of reclamation across
> multiple calls and provide a upper bound on the cycles consumed.
> 
> 1/16 of max entries to reclaim within single call can cost a lot.
> Moreover, it could have an impact on the readers through massive cache
> evictions.
> 
> Consider a set of routes from test_lpm_perf.c. To install all routes you need
> to have at least 65k tbl8 entries (now it has 2k). So when reclaiming, besides
> the costs of rte_rcu_qsbr_check(), you'll need to rewrite 4k cache lines.
> 
> So 1/16 of max entries is relatively big and it's better to spread this load
> across multiple calls.
> 
> >
> >> 2. Number of entries to reclaim depend on dq->size. So,
> >> rte_rcu_qsbr_dq_reclaim() could take a lot of cycles. For LPM library
> >> this
> > That is true. It depends on dq->size (number of tbl8 groups). However, note
> that there is patch [1] which provides batch reclamation kind of behavior
> which reduces the cycles consumed by reclamation significantly.
> >
> > [1] https://patches.dpdk.org/patch/58960/
> >
> >> means that rte_lpm_delete() sometimes takes a long time.
> > Agree, sometimes takes additional time. It is good to spread it over multiple
> calls.
> Right, with batch reclamation we have here classic throughput vs latency
> problem. Either reclaiming big number of entries relatively infrequently
> spreading the cost of readers quiescent state check or reclaiming small
> amount of entries more often spending more cycles in average. I'd prefer
> latency here because as I mentioned earlier huge batches could have an
> impact on readers and lead to big difference in cost of delete().
> >
> >> So, my suggestions here would be
> >>
> >> - trigger rte_rcu_qsbr_dq_reclaim() with every enqueue
> > Given that the LPM APIs are mainly for control plane, I would think that,
> the next time LPM API is called, the readers have completed the grace period.
> But if there are frequent updates, we might end up with empty reclaims
> which will waste cycles. IMO, this trigger should happen after at least few
> entries are in the queue.
> >
> >> - reclaim small amount of entries (could be configurable of creation
> >> time)
> > Agree. I would keep it a smaller than the trigger amount knowing that the
> elements added right before the trigger might not have completed the grace
> period.
> >
> >> - provide API to trigger reclaim from the application manually.
> > IMO, this will add additional complexity to the application. I agree that
> there will be special needs for some applications. I think those applications
> might have to implement their own methods using the base RCU APIs.
> > Instead, as agreed in other threads, I suggest we expose the parameters
> (when to trigger and how much to reclaim) to the application as optional
> configurable parameters. i.e. if the application does not provide we can use
> default values. I think this should provide enough flexibility to the application.
> 
> Agree.
> 
> Regarding default values, one strategy could be:
> 
> - if reported threshold isn't set (i.e. is equal 0) then call reclaim with every
> enqueue (i.e. threshold == 1)
> 
> - if max_entries_to_reclaim isn't set then reclaim as much as we can
> 
Ok, sounds good.

> 
> >>> +
> >>> +	/* Check if there is space for atleast for 1 resource */
> >>> +	free_size = rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> >>> +	if (!free_size) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Defer queue is full\n", __func__);
> >>> +		rte_errno = ENOSPC;
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	/* Enqueue the resource */
> >>> +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> >>> +
> >>> +	/* The resource to enqueue needs to be a multiple of 64b
> >>> +	 * due to the limitation of the rte_ring implementation.
> >>> +	 */
> >>> +	for (i = 0, tmp = (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> >>> +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/* Reclaim resources from the defer queue. */ int
> >>> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> >>> +	uint32_t max_cnt;
> >>> +	uint32_t cnt;
> >>> +	void *token;
> >>> +	uint64_t *tmp;
> >>> +	uint32_t i;
> >>> +
> >>> +	if (dq == NULL) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Invalid input parameter\n", __func__);
> >>> +		rte_errno = EINVAL;
> >>> +
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	/* Anything to reclaim? */
> >>> +	if (rte_ring_count(dq->r) == 0)
> >>> +		return 0;
> >>> +
> >>> +	/* Reclaim at the max 1/16th the total number of entries. */
> >>> +	max_cnt = dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> >>> +	max_cnt = (max_cnt == 0) ? dq->size : max_cnt;
> >>> +	cnt = 0;
> >>> +
> >>> +	/* Check reader threads quiescent state and reclaim resources */
> >>> +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) == 0) &&
> >>> +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> >>> +			== 1)) {
> >>> +		(void)rte_ring_sc_dequeue(dq->r, &token);
> >>> +		/* The resource to dequeue needs to be a multiple of 64b
> >>> +		 * due to the limitation of the rte_ring implementation.
> >>> +		 */
> >>> +		for (i = 0, tmp = (uint64_t *)dq->e; i < dq->esize/8;
> >>> +			i++, tmp++)
> >>> +			(void)rte_ring_sc_dequeue(dq->r,
> >>> +					(void *)(uintptr_t)tmp);
> >>> +		dq->f(dq->p, dq->e);
> >>> +
> >>> +		cnt++;
> >>> +	}
> >>> +
> >>> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> >>> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> >>> +
> >>> +	if (cnt == 0) {
> >>> +		/* No resources were reclaimed */
> >>> +		rte_errno = EAGAIN;
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/* Delete a defer queue. */
> >>> +int
> >>> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> >>> +	if (dq == NULL) {
> >>> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> >>> +			"%s(): Invalid input parameter\n", __func__);
> >>> +		rte_errno = EINVAL;
> >>> +
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	/* Reclaim all the resources */
> >>> +	if (rte_rcu_qsbr_dq_reclaim(dq) != 0)
> >>> +		/* Error number is already set by the reclaim API */
> >>> +		return 1;
> >>> +
> >>> +	rte_ring_free(dq->r);
> >>> +	rte_free(dq);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>    int rte_rcu_log_type;
> >>>
> >>>    RTE_INIT(rte_rcu_register)
> >>> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> >>> b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> >>> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> >>> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> >>> @@ -34,6 +34,7 @@ extern "C" {
> >>>    #include <rte_lcore.h>
> >>>    #include <rte_debug.h>
> >>>    #include <rte_atomic.h>
> >>> +#include <rte_ring.h>
> >>>
> >>>    extern int rte_rcu_log_type;
> >>>
> >>> @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> >>>    	 */
> >>>    } __rte_cache_aligned;
> >>>
> >>> +/**
> >>> + * Call back function called to free the resources.
> >>> + *
> >>> + * @param p
> >>> + *   Pointer provided while creating the defer queue
> >>> + * @param e
> >>> + *   Pointer to the resource data stored on the defer queue
> >>> + *
> >>> + * @return
> >>> + *   None
> >>> + */
> >>> +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> >>> +
> >>> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> >>> +
> >>> +/**
> >>> + *  Trigger automatic reclamation after 1/8th the defer queue is full.
> >>> + */
> >>> +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> >>> +
> >>> +/**
> >>> + *  Reclaim at the max 1/16th the total number of resources.
> >>> + */
> >>> +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> >>> +
> >>> +/**
> >>> + * Parameters used when creating the defer queue.
> >>> + */
> >>> +struct rte_rcu_qsbr_dq_parameters {
> >>> +	const char *name;
> >>> +	/**< Name of the queue. */
> >>> +	uint32_t size;
> >>> +	/**< Number of entries in queue. Typically, this will be
> >>> +	 *   the same as the maximum number of entries supported in the
> >>> +	 *   lock free data structure.
> >>> +	 *   Data structures with unbounded number of entries is not
> >>> +	 *   supported currently.
> >>> +	 */
> >>> +	uint32_t esize;
> >>> +	/**< Size (in bytes) of each element in the defer queue.
> >>> +	 *   This has to be multiple of 8B as the rte_ring APIs
> >>> +	 *   support 8B element sizes only.
> >>> +	 */
> >>> +	rte_rcu_qsbr_free_resource f;
> >>> +	/**< Function to call to free the resource. */
> >>> +	void *p;
> >>> +	/**< Pointer passed to the free function. Typically, this is the
> >>> +	 *   pointer to the data structure to which the resource to free
> >>> +	 *   belongs. This can be NULL.
> >>> +	 */
> >>> +	struct rte_rcu_qsbr *v;
> >>> +	/**< RCU QSBR variable to use for this defer queue */ };
> >>> +
> >>> +/* RTE defer queue structure.
> >>> + * This structure holds the defer queue. The defer queue is used to
> >>> + * hold the deleted entries from the data structure that are not
> >>> + * yet freed.
> >>> + */
> >>> +struct rte_rcu_qsbr_dq;
> >>> +
> >>>    /**
> >>>     * @warning
> >>>     * @b EXPERIMENTAL: this API may change without prior notice @@
> >>> -648,6 +710,113 @@ __rte_experimental
> >>>    int
> >>>    rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> >>>
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Create a queue used to store the data structure elements that
> >>> +can
> >>> + * be freed later. This queue is referred to as 'defer queue'.
> >>> + *
> >>> + * @param params
> >>> + *   Parameters to create a defer queue.
> >>> + * @return
> >>> + *   On success - Valid pointer to defer queue
> >>> + *   On error - NULL
> >>> + *   Possible rte_errno codes are:
> >>> + *   - EINVAL - NULL parameters are passed
> >>> + *   - ENOMEM - Not enough memory
> >>> + */
> >>> +__rte_experimental
> >>> +struct rte_rcu_qsbr_dq *
> >>> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> >>> +*params);
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Enqueue one resource to the defer queue and start the grace period.
> >>> + * The resource will be freed later after at least one grace period
> >>> + * is over.
> >>> + *
> >>> + * If the defer queue is full, it will attempt to reclaim resources.
> >>> + * It will also reclaim resources at regular intervals to avoid
> >>> + * the defer queue from growing too big.
> >>> + *
> >>> + * This API is not multi-thread safe. It is expected that the
> >>> +caller
> >>> + * provides multi-thread safety by locking a mutex or some other means.
> >>> + *
> >>> + * A lock free multi-thread writer algorithm could achieve
> >>> +multi-thread
> >>> + * safety by creating and using one defer queue per thread.
> >>> + *
> >>> + * @param dq
> >>> + *   Defer queue to allocate an entry from.
> >>> + * @param e
> >>> + *   Pointer to resource data to copy to the defer queue. The size of
> >>> + *   the data to copy is equal to the element size provided when the
> >>> + *   defer queue was created.
> >>> + * @return
> >>> + *   On success - 0
> >>> + *   On error - 1 with rte_errno set to
> >>> + *   - EINVAL - NULL parameters are passed
> >>> + *   - ENOSPC - Defer queue is full. This condition can not happen
> >>> + *		if the defer queue size is equal (or larger) than the
> >>> + *		number of elements in the data structure.
> >>> + */
> >>> +__rte_experimental
> >>> +int
> >>> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Reclaim resources from the defer queue.
> >>> + *
> >>> + * This API is not multi-thread safe. It is expected that the
> >>> +caller
> >>> + * provides multi-thread safety by locking a mutex or some other means.
> >>> + *
> >>> + * A lock free multi-thread writer algorithm could achieve
> >>> +multi-thread
> >>> + * safety by creating and using one defer queue per thread.
> >>> + *
> >>> + * @param dq
> >>> + *   Defer queue to reclaim an entry from.
> >>> + * @return
> >>> + *   On successful reclamation of at least 1 resource - 0
> >>> + *   On error - 1 with rte_errno set to
> >>> + *   - EINVAL - NULL parameters are passed
> >>> + *   - EAGAIN - None of the resources have completed at least 1 grace
> >> period,
> >>> + *		try again.
> >>> + */
> >>> +__rte_experimental
> >>> +int
> >>> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Delete a defer queue.
> >>> + *
> >>> + * It tries to reclaim all the resources on the defer queue.
> >>> + * If any of the resources have not completed the grace period
> >>> + * the reclamation stops and returns immediately. The rest of
> >>> + * the resources are not reclaimed and the defer queue is not
> >>> + * freed.
> >>> + *
> >>> + * @param dq
> >>> + *   Defer queue to delete.
> >>> + * @return
> >>> + *   On success - 0
> >>> + *   On error - 1
> >>> + *   Possible rte_errno codes are:
> >>> + *   - EINVAL - NULL parameters are passed
> >>> + *   - EAGAIN - Some of the resources have not completed at least 1
> grace
> >>> + *		period, try again.
> >>> + */
> >>> +__rte_experimental
> >>> +int
> >>> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> >>> +
> >>>    #ifdef __cplusplus
> >>>    }
> >>>    #endif
> >>> diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >>> b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >>> new file mode 100644
> >>> index 000000000..2122bc36a
> >>> --- /dev/null
> >>> +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >>> @@ -0,0 +1,46 @@
> >>> +/* SPDX-License-Identifier: BSD-3-Clause
> >>> + * Copyright (c) 2019 Arm Limited
> >>> + */
> >>> +
> >>> +#ifndef _RTE_RCU_QSBR_PVT_H_
> >>> +#define _RTE_RCU_QSBR_PVT_H_
> >>> +
> >>> +/**
> >>> + * This file is private to the RCU library. It should not be
> >>> +included
> >>> + * by the user of this library.
> >>> + */
> >>> +
> >>> +#ifdef __cplusplus
> >>> +extern "C" {
> >>> +#endif
> >>> +
> >>> +#include "rte_rcu_qsbr.h"
> >>> +
> >>> +/* RTE defer queue structure.
> >>> + * This structure holds the defer queue. The defer queue is used to
> >>> + * hold the deleted entries from the data structure that are not
> >>> + * yet freed.
> >>> + */
> >>> +struct rte_rcu_qsbr_dq {
> >>> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> >>> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> >>> +	uint32_t size;
> >>> +	/**< Number of elements in the defer queue */
> >>> +	uint32_t esize;
> >>> +	/**< Size (in bytes) of data stored on the defer queue */
> >>> +	rte_rcu_qsbr_free_resource f;
> >>> +	/**< Function to call to free the resource. */
> >>> +	void *p;
> >>> +	/**< Pointer passed to the free function. Typically, this is the
> >>> +	 *   pointer to the data structure to which the resource to free
> >>> +	 *   belongs.
> >>> +	 */
> >>> +	char e[0];
> >>> +	/**< Temporary storage to copy the defer queue element. */ };
> >>> +
> >>> +#ifdef __cplusplus
> >>> +}
> >>> +#endif
> >>> +
> >>> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> >>> diff --git a/lib/librte_rcu/rte_rcu_version.map
> >>> b/lib/librte_rcu/rte_rcu_version.map
> >>> index f8b9ef2ab..dfac88a37 100644
> >>> --- a/lib/librte_rcu/rte_rcu_version.map
> >>> +++ b/lib/librte_rcu/rte_rcu_version.map
> >>> @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >>>    	rte_rcu_qsbr_synchronize;
> >>>    	rte_rcu_qsbr_thread_register;
> >>>    	rte_rcu_qsbr_thread_unregister;
> >>> +	rte_rcu_qsbr_dq_create;
> >>> +	rte_rcu_qsbr_dq_enqueue;
> >>> +	rte_rcu_qsbr_dq_reclaim;
> >>> +	rte_rcu_qsbr_dq_delete;
> >>>
> >>>    	local: *;
> >>>    };
> >>> diff --git a/lib/meson.build b/lib/meson.build index
> >>> e5ff83893..0e1be8407 100644
> >>> --- a/lib/meson.build
> >>> +++ b/lib/meson.build
> >>> @@ -11,7 +11,9 @@
> >>>    libraries = [
> >>>    	'kvargs', # eal depends on kvargs
> >>>    	'eal', # everything depends on eal
> >>> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >>> +	'ring',
> >>> +	'rcu', # rcu depends on ring
> >>> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >>>    	'cmdline',
> >>>    	'metrics', # bitrate/latency stats depends on this
> >>>    	'hash',    # efd depends on this
> >>> @@ -22,7 +24,7 @@ libraries = [
> >>>    	'gro', 'gso', 'ip_frag', 'jobstats',
> >>>    	'kni', 'latencystats', 'lpm', 'member',
> >>>    	'power', 'pdump', 'rawdev',
> >>> -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> >>> +	'reorder', 'sched', 'security', 'stack', 'vhost',
> >>>    	# ipsec lib depends on net, crypto and security
> >>>    	'ipsec',
> >>>    	# add pkt framework libs which use other libs from above
> >> --
> >> Regards,
> >> Vladimir
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
                       ` (2 preceding siblings ...)
  2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 3/3] doc/rcu: add RCU integration design details Honnappa Nagarahalli
@ 2020-03-29 20:57     ` Thomas Monjalon
  2020-03-30 17:37       ` Honnappa Nagarahalli
  2020-04-03 18:41     ` [dpdk-dev] [PATCH v4 0/4] " Honnappa Nagarahalli
  2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
  5 siblings, 1 reply; 137+ messages in thread
From: Thomas Monjalon @ 2020-03-29 20:57 UTC (permalink / raw)
  To: honnappa.nagarahalli
  Cc: konstantin.ananyev, stephen, paulmck, dev, yipeng1.wang,
	vladimir.medvedkin, ruifeng.wang, dharmik.thakkar, nd

01/10/2019 08:29, Honnappa Nagarahalli:
> This is not a new patch. This patch set is separated from the LPM
> changes as the size of the changes in RCU library has grown due
> to comments from community. These APIs will help reduce the changes
> in LPM and hash libraries that are getting integrated with RCU
> library.
> 
> This adds 4 new APIs to RCU library to create a defer queue, enqueue
> deleted resources, reclaim resources and delete the defer queue.

It is in the roadmap for 20.05.
What is the status of this patchset?

> The patches to LPM and HASH libraries to integrate RCU will depend on
> this patch.

I guess lpm and hash integrations are planned for 20.08?



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs
  2020-03-29 20:57     ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Thomas Monjalon
@ 2020-03-30 17:37       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-03-30 17:37 UTC (permalink / raw)
  To: thomas
  Cc: konstantin.ananyev, stephen, paulmck, dev, yipeng1.wang,
	vladimir.medvedkin, Ruifeng Wang, Dharmik Thakkar, nd,
	Honnappa Nagarahalli, nd

<snip>

> 
> 01/10/2019 08:29, Honnappa Nagarahalli:
> > This is not a new patch. This patch set is separated from the LPM
> > changes as the size of the changes in RCU library has grown due to
> > comments from community. These APIs will help reduce the changes in
> > LPM and hash libraries that are getting integrated with RCU library.
> >
> > This adds 4 new APIs to RCU library to create a defer queue, enqueue
> > deleted resources, reclaim resources and delete the defer queue.
> 
> It is in the roadmap for 20.05.
> What is the status of this patchset?
It has a dependency on changes to rte_ring APIs, which have a clash from Konstantin's [1] and my patch [2]. Konstantin is working through his patch to address comments.

I am currently incorporating the review comments I received on the RCU defer APIs. We should see the next version soon.

[1] http://mails.dpdk.org/archives/dev/2020-March/160828.html
[2] http://mails.dpdk.org/archives/dev/2020-March/160787.html

> 
> > The patches to LPM and HASH libraries to integrate RCU will depend on
> > this patch.
> 
> I guess lpm and hash integrations are planned for 20.08?
Yes, this is pushed to 20.08.

> 


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 0/4] Add RCU reclamation APIs
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
                       ` (3 preceding siblings ...)
  2020-03-29 20:57     ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Thomas Monjalon
@ 2020-04-03 18:41     ` Honnappa Nagarahalli
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource " Honnappa Nagarahalli
                         ` (3 more replies)
  2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
  5 siblings, 4 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-03 18:41 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin
  Cc: dev, honnappa.nagarahalli, ruifeng.wang, dharmik.thakkar, nd

v4
1) RCU reclamation APIs changed to provide lot more flexibility
   a) The rte_rcu_qsbr_dq_enqueue and rte_rcu_qsbr_dq_reclaim APIs
      can be configured to be MT safe
   b) The auto reclamation limit and how much to reclaim
      can be configured
   c) rte_rcu_qsbr_dq_reclaim API returns the number of resources
      reclaimed and the number of pending resources on the defer
      queue
   d) rte_rcu_qsbr_dq_reclaim API takes maximum number of resources
      to reclaim as a parameter
2) Multiple minor fixes
   a) Private header file and test function names changed to remove 'rte_'
   b) Compilation for shared library
   c) Split the test cases into a separate commit
   d) Uses rte_ring_xxx_elem APIs to support flexible ring element size

v3
1) Separated from the original series
   (https://patches.dpdk.org/cover/58811/)
2) Added reclamation APIs and test cases (Stephen, Yipeng)

Depends on: https://patchwork.dpdk.org/cover/67696/. However it needed
changes (will send the comments). So, cannot be tested right now.
But, please look at the changes in V4.

This is not a new patch. This patch set is separated from the LPM
changes as the size of the changes in RCU library has grown due
to comments from community. These APIs will help reduce the changes
in LPM and hash libraries that are getting integrated with RCU
library.

This adds 4 new APIs to RCU library to create a defer queue, enqueue
deleted resources, reclaim resources and delete the defer queue.

The rationale for the APIs is documented in 3/4.

The patches to LPM and HASH libraries to integrate RCU will depend on
this patch.

Honnappa Nagarahalli (3):
  lib/rcu: add resource reclamation APIs
  test/rcu: test cases for RCU defer queue APIs
  lib/rcu: add additional debug logs

Ruifeng Wang (1):
  doc/rcu: add RCU integration design details

 app/test/test_rcu_qsbr.c           | 365 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/rcu_lib.rst  |  59 +++++
 lib/librte_rcu/Makefile            |   2 +-
 lib/librte_rcu/meson.build         |   2 +
 lib/librte_rcu/rcu_qsbr_pvt.h      |  57 +++++
 lib/librte_rcu/rte_rcu_qsbr.c      | 243 ++++++++++++++++++-
 lib/librte_rcu/rte_rcu_qsbr.h      | 197 +++++++++++++++-
 lib/librte_rcu/rte_rcu_version.map |   4 +
 lib/meson.build                    |   6 +-
 9 files changed, 928 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource reclamation APIs
  2020-04-03 18:41     ` [dpdk-dev] [PATCH v4 0/4] " Honnappa Nagarahalli
@ 2020-04-03 18:41       ` Honnappa Nagarahalli
  2020-04-07 17:39         ` Ananyev, Konstantin
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 2/4] test/rcu: test cases for RCU defer queue APIs Honnappa Nagarahalli
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-03 18:41 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin
  Cc: dev, honnappa.nagarahalli, ruifeng.wang, dharmik.thakkar, nd

Add resource reclamation APIs to make it simple for applications
and libraries to integrate rte_rcu library.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_rcu/Makefile            |   2 +-
 lib/librte_rcu/meson.build         |   2 +
 lib/librte_rcu/rcu_qsbr_pvt.h      |  57 +++++++
 lib/librte_rcu/rte_rcu_qsbr.c      | 243 ++++++++++++++++++++++++++++-
 lib/librte_rcu/rte_rcu_qsbr.h      | 188 ++++++++++++++++++++++
 lib/librte_rcu/rte_rcu_version.map |   4 +
 lib/meson.build                    |   6 +-
 7 files changed, 498 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h

diff --git a/lib/librte_rcu/Makefile b/lib/librte_rcu/Makefile
index c4bb28d77..95f8a57e2 100644
--- a/lib/librte_rcu/Makefile
+++ b/lib/librte_rcu/Makefile
@@ -8,7 +8,7 @@ LIB = librte_rcu.a
 
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
-LDLIBS += -lrte_eal
+LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_rcu_version.map
 
diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
index 62920ba02..e280b29c1 100644
--- a/lib/librte_rcu/meson.build
+++ b/lib/librte_rcu/meson.build
@@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
 if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
 	ext_deps += cc.find_library('atomic')
 endif
+
+deps += ['ring']
diff --git a/lib/librte_rcu/rcu_qsbr_pvt.h b/lib/librte_rcu/rcu_qsbr_pvt.h
new file mode 100644
index 000000000..413f28587
--- /dev/null
+++ b/lib/librte_rcu/rcu_qsbr_pvt.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2019 Arm Limited
+ */
+
+#ifndef _RTE_RCU_QSBR_PVT_H_
+#define _RTE_RCU_QSBR_PVT_H_
+
+/**
+ * This file is private to the RCU library. It should not be included
+ * by the user of this library.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+
+#include "rte_rcu_qsbr.h"
+
+/* RTE defer queue structure.
+ * This structure holds the defer queue. The defer queue is used to
+ * hold the deleted entries from the data structure that are not
+ * yet freed.
+ */
+struct rte_rcu_qsbr_dq {
+	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
+	struct rte_ring *r;     /**< RCU QSBR defer queue. */
+	uint32_t size;
+	/**< Number of elements in the defer queue */
+	uint32_t esize;
+	/**< Size (in bytes) of data, including the token, stored on the
+	 *   defer queue.
+	 */
+	uint32_t trigger_reclaim_limit;
+	/**< Trigger automatic reclamation after the defer queue
+	 *   has atleast these many resources waiting.
+	 */
+	uint32_t max_reclaim_size;
+	/**< Reclaim at the max these many resources during auto
+	 *   reclamation.
+	 */
+	rte_rcu_qsbr_free_resource_t free_fn;
+	/**< Function to call to free the resource. */
+	void *p;
+	/**< Pointer passed to the free function. Typically, this is the
+	 *   pointer to the data structure to which the resource to free
+	 *   belongs.
+	 */
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RCU_QSBR_PVT_H_ */
diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
index 2f3fad776..e8c1e386f 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.c
+++ b/lib/librte_rcu/rte_rcu_qsbr.c
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  *
- * Copyright (c) 2018 Arm Limited
+ * Copyright (c) 2018-2019 Arm Limited
  */
 
 #include <stdio.h>
@@ -18,8 +18,10 @@
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_errno.h>
+#include <rte_ring_elem.h>
 
 #include "rte_rcu_qsbr.h"
+#include "rcu_qsbr_pvt.h"
 
 /* Get the memory size of QSBR variable */
 size_t
@@ -270,6 +272,245 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
 	return 0;
 }
 
+/* Create a queue used to store the data structure elements that can
+ * be freed later. This queue is referred to as 'defer queue'.
+ */
+struct rte_rcu_qsbr_dq *
+rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
+{
+	struct rte_rcu_qsbr_dq *dq;
+	uint32_t qs_fifo_size;
+	unsigned int flags;
+
+	if (params == NULL || params->free_fn == NULL ||
+		params->v == NULL || params->name == NULL ||
+		params->size == 0 || params->esize == 0 ||
+		(params->esize % 4 != 0)) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return NULL;
+	}
+	/* If auto reclamation is configured, reclaim limit
+	 * should be a valid value.
+	 */
+	if ((params->trigger_reclaim_limit <= params->size) &&
+	    (params->max_reclaim_size == 0)) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter, size = %u, trigger_reclaim_limit = %u, max_reclaim_size = %u\n",
+			__func__, params->size, params->trigger_reclaim_limit,
+			params->max_reclaim_size);
+		rte_errno = EINVAL;
+
+		return NULL;
+	}
+
+	dq = rte_zmalloc(NULL, sizeof(struct rte_rcu_qsbr_dq),
+			 RTE_CACHE_LINE_SIZE);
+	if (dq == NULL) {
+		rte_errno = ENOMEM;
+
+		return NULL;
+	}
+
+	/* Decide the flags for the ring.
+	 * If MT safety is requested, use RTS for ring enqueue as most
+	 * use cases involve dq-enqueue happening on the control plane.
+	 * Ring dequeue is always HTS due to the possibility of revert.
+	 */
+	flags = RING_F_MP_RTS_ENQ;
+	if (params->flags & RTE_RCU_QSBR_DQ_MT_UNSAFE)
+		flags = RING_F_SP_ENQ;
+	flags |= RING_F_MC_HTS_DEQ;
+	/* round up qs_fifo_size to next power of two that is not less than
+	 * max_size.
+	 */
+	qs_fifo_size = rte_align32pow2(params->size + 1);
+	/* Add token size to ring element size */
+	dq->r = rte_ring_create_elem(params->name,
+			__RTE_QSBR_TOKEN_SIZE + params->esize,
+			qs_fifo_size, SOCKET_ID_ANY, flags);
+	if (dq->r == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): defer queue create failed\n", __func__);
+		rte_free(dq);
+		return NULL;
+	}
+
+	dq->v = params->v;
+	dq->size = params->size;
+	dq->esize = __RTE_QSBR_TOKEN_SIZE + params->esize;
+	dq->trigger_reclaim_limit = params->trigger_reclaim_limit;
+	dq->max_reclaim_size = params->max_reclaim_size;
+	dq->free_fn = params->free_fn;
+	dq->p = params->p;
+
+	return dq;
+}
+
+/* Enqueue one resource to the defer queue to free after the grace
+ * period is over.
+ */
+int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
+{
+	uint64_t token;
+	uint32_t cur_size, free_size;
+
+	if (dq == NULL || e == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	/* Start the grace period */
+	token = rte_rcu_qsbr_start(dq->v);
+
+	/* Reclaim resources if the queue is 1/8th full. This helps
+	 * the queue from growing too large and allows time for reader
+	 * threads to report their quiescent state.
+	 */
+	cur_size = rte_ring_count(dq->r);
+	if (cur_size > dq->trigger_reclaim_limit) {
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Triggering reclamation\n", __func__);
+		rte_rcu_qsbr_dq_reclaim(dq, dq->max_reclaim_size, NULL, NULL);
+	}
+
+	/* Check if there is space for atleast 1 resource */
+	free_size = rte_ring_free_count(dq->r);
+	if (!free_size) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Defer queue is full\n", __func__);
+		/* Note that the token generated above is not used.
+		 * Other than wasting tokens, it should not cause any
+		 * other issues.
+		 */
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Skipped enqueuing token = %"PRIu64"\n",
+			__func__, token);
+
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	/* Enqueue the token and resource. Generating the token
+	 * and enqueuing (token + resource) on the queue is not an
+	 * atomic operation. This might result in tokens enqueued
+	 * out of order on the queue. So, some tokens might wait
+	 * longer than they are required to be reclaimed.
+	 */
+	char data[dq->esize];
+	memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
+	memcpy(data + __RTE_QSBR_TOKEN_SIZE, e,
+		dq->esize - __RTE_QSBR_TOKEN_SIZE);
+	/* Check the status as enqueue might fail since the other thread
+	 * might have used up the freed space.
+	 * Enqueue uses the configured flags when the DQ was created.
+	 */
+	if (rte_ring_enqueue_elem(dq->r, data, dq->esize) != 0) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Enqueue failed\n", __func__);
+		/* Note that the token generated above is not used.
+		 * Other than wasting tokens, it should not cause any
+		 * other issues.
+		 */
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Skipped enqueuing token = %"PRIu64"\n",
+			__func__, token);
+
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+		"%s(): Enqueued token = %"PRIu64"\n", __func__, token);
+
+	return 0;
+}
+
+/* Reclaim resources from the defer queue. */
+int
+rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
+				unsigned int *freed, unsigned int *pending)
+{
+	uint32_t cnt;
+	uint64_t token;
+
+	if (dq == NULL || n == 0) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	cnt = 0;
+
+	char e[dq->esize];
+	/* Check reader threads quiescent state and reclaim resources */
+	while ((cnt < n) &&
+		(rte_ring_dequeue_bulk_elem_start(dq->r, e,
+					dq->esize, 1, NULL) != 0)) {
+		memcpy(&token, e, sizeof(uint64_t));
+
+		/* Reclaim the resource */
+		if (rte_rcu_qsbr_check(dq->v, token, false) != 1) {
+			rte_ring_dequeue_finish(dq->r, 0);
+			break;
+		}
+		rte_ring_dequeue_finish(dq->r, 1);
+
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Reclaimed token = %"PRIu64"\n",
+			__func__, *(uint64_t *)e);
+
+		dq->free_fn(dq->p, e + __RTE_QSBR_TOKEN_SIZE);
+
+		cnt++;
+	}
+
+	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+		"%s(): Reclaimed %u resources\n", __func__, cnt);
+
+	if (freed != NULL)
+		*freed = cnt;
+	if (pending != NULL)
+		*pending = rte_ring_count(dq->r);
+
+	return 0;
+}
+
+/* Delete a defer queue. */
+int
+rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
+{
+	unsigned int pending;
+
+	if (dq == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	/* Reclaim all the resources */
+	rte_rcu_qsbr_dq_reclaim(dq, ~0, NULL, &pending);
+	if (pending != 0) {
+		rte_errno = EAGAIN;
+
+		return 1;
+	}
+
+	rte_ring_free(dq->r);
+	rte_free(dq);
+
+	return 0;
+}
+
 int rte_rcu_log_type;
 
 RTE_INIT(rte_rcu_register)
diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index 0b5585925..213f9b029 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -34,6 +34,7 @@ extern "C" {
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_atomic.h>
+#include <rte_ring.h>
 
 extern int rte_rcu_log_type;
 
@@ -84,6 +85,7 @@ struct rte_rcu_qsbr_cnt {
 #define __RTE_QSBR_CNT_THR_OFFLINE 0
 #define __RTE_QSBR_CNT_INIT 1
 #define __RTE_QSBR_CNT_MAX ((uint64_t)~0)
+#define __RTE_QSBR_TOKEN_SIZE sizeof(uint64_t)
 
 /* RTE Quiescent State variable structure.
  * This structure has two elements that vary in size based on the
@@ -114,6 +116,84 @@ struct rte_rcu_qsbr {
 	 */
 } __rte_cache_aligned;
 
+/**
+ * Call back function called to free the resources.
+ *
+ * @param p
+ *   Pointer provided while creating the defer queue
+ * @param e
+ *   Pointer to the resource data stored on the defer queue
+ *
+ * @return
+ *   None
+ */
+typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
+
+#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
+
+/**
+ * Various flags supported.
+ */
+/**< Enqueue and reclaim operations are multi-thread safe by default.
+ *   The call back functions registered to free the resources are
+ *   assumed to be multi-thread safe.
+ *   Set this flag is multi-thread safety is not required.
+ */
+#define RTE_RCU_QSBR_DQ_MT_UNSAFE 1
+
+/**
+ * Parameters used when creating the defer queue.
+ */
+struct rte_rcu_qsbr_dq_parameters {
+	const char *name;
+	/**< Name of the queue. */
+	uint32_t flags;
+	/**< Flags to control API behaviors */
+	uint32_t size;
+	/**< Number of entries in queue. Typically, this will be
+	 *   the same as the maximum number of entries supported in the
+	 *   lock free data structure.
+	 *   Data structures with unbounded number of entries is not
+	 *   supported currently.
+	 */
+	uint32_t esize;
+	/**< Size (in bytes) of each element in the defer queue.
+	 *   This has to be multiple of 4B.
+	 */
+	uint32_t trigger_reclaim_limit;
+	/**< Trigger automatic reclamation after the defer queue
+	 *   has atleast these many resources waiting. This auto
+	 *   reclamation is triggered in rte_rcu_qsbr_dq_enqueue API
+	 *   call.
+	 *   If this is greater than 'size', auto reclamation is
+	 *   not triggered.
+	 *   If this is set to 0, auto reclamation is triggered
+	 *   in every call to rte_rcu_qsbr_dq_enqueue API.
+	 */
+	uint32_t max_reclaim_size;
+	/**< When automatic reclamation is enabled, reclaim at the max
+	 *   these many resources. This should contain a valid value, if
+	 *   auto reclamation is on. Setting this to 'size' or greater will
+	 *   reclaim all possible resources currently on the defer queue.
+	 */
+	rte_rcu_qsbr_free_resource_t free_fn;
+	/**< Function to call to free the resource. */
+	void *p;
+	/**< Pointer passed to the free function. Typically, this is the
+	 *   pointer to the data structure to which the resource to free
+	 *   belongs. This can be NULL.
+	 */
+	struct rte_rcu_qsbr *v;
+	/**< RCU QSBR variable to use for this defer queue */
+};
+
+/* RTE defer queue structure.
+ * This structure holds the defer queue. The defer queue is used to
+ * hold the deleted entries from the data structure that are not
+ * yet freed.
+ */
+struct rte_rcu_qsbr_dq;
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
@@ -692,6 +772,114 @@ __rte_experimental
 int
 rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a queue used to store the data structure elements that can
+ * be freed later. This queue is referred to as 'defer queue'.
+ *
+ * @param params
+ *   Parameters to create a defer queue.
+ * @return
+ *   On success - Valid pointer to defer queue
+ *   On error - NULL
+ *   Possible rte_errno codes are:
+ *   - EINVAL - NULL parameters are passed
+ *   - ENOMEM - Not enough memory
+ */
+__rte_experimental
+struct rte_rcu_qsbr_dq *
+rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enqueue one resource to the defer queue and start the grace period.
+ * The resource will be freed later after at least one grace period
+ * is over.
+ *
+ * If the defer queue is full, it will attempt to reclaim resources.
+ * It will also reclaim resources at regular intervals to avoid
+ * the defer queue from growing too big.
+ *
+ * Multi-thread safety is provided as the defer queue configuration.
+ * When multi-thread safety is requested, it is possible that the
+ * resources are not stored in their order of deletion. This results
+ * in resources being held in the defer queue longer than they should.
+ *
+ * @param dq
+ *   Defer queue to allocate an entry from.
+ * @param e
+ *   Pointer to resource data to copy to the defer queue. The size of
+ *   the data to copy is equal to the element size provided when the
+ *   defer queue was created.
+ * @return
+ *   On success - 0
+ *   On error - 1 with rte_errno set to
+ *   - EINVAL - NULL parameters are passed
+ *   - ENOSPC - Defer queue is full. This condition can not happen
+ *		if the defer queue size is equal (or larger) than the
+ *		number of elements in the data structure.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free quesed resources from the defer queue.
+ *
+ * This API is multi-thread safe.
+ *
+ * @param dq
+ *   Defer queue to free an entry from.
+ * @param n
+ *   Maximum number of resources to free.
+ * @param freed
+ *   Number of resources that were freed.
+ * @param pending
+ *   Number of resources pending on the defer queue. This number might not
+ *   be acurate if multi-thread safety is configured.
+ * @return
+ *   On successful reclamation of at least 1 resource - 0
+ *   On error - 1 with rte_errno set to
+ *   - EINVAL - NULL parameters are passed
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
+				unsigned int *freed, unsigned int *pending);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Delete a defer queue.
+ *
+ * It tries to reclaim all the resources on the defer queue.
+ * If any of the resources have not completed the grace period
+ * the reclamation stops and returns immediately. The rest of
+ * the resources are not reclaimed and the defer queue is not
+ * freed.
+ *
+ * @param dq
+ *   Defer queue to delete.
+ * @return
+ *   On success - 0
+ *   On error - 1
+ *   Possible rte_errno codes are:
+ *   - EINVAL - NULL parameters are passed
+ *   - EAGAIN - Some of the resources have not completed at least 1 grace
+ *		period, try again.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
index f8b9ef2ab..dfac88a37 100644
--- a/lib/librte_rcu/rte_rcu_version.map
+++ b/lib/librte_rcu/rte_rcu_version.map
@@ -8,6 +8,10 @@ EXPERIMENTAL {
 	rte_rcu_qsbr_synchronize;
 	rte_rcu_qsbr_thread_register;
 	rte_rcu_qsbr_thread_unregister;
+	rte_rcu_qsbr_dq_create;
+	rte_rcu_qsbr_dq_enqueue;
+	rte_rcu_qsbr_dq_reclaim;
+	rte_rcu_qsbr_dq_delete;
 
 	local: *;
 };
diff --git a/lib/meson.build b/lib/meson.build
index 9c3cc55d5..15e91a303 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,7 +11,9 @@
 libraries = [
 	'kvargs', # eal depends on kvargs
 	'eal', # everything depends on eal
-	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
+	'ring',
+	'rcu', # rcu depends on ring
+	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
 	'cmdline',
 	'metrics', # bitrate/latency stats depends on this
 	'hash',    # efd depends on this
@@ -22,7 +24,7 @@ libraries = [
 	'gro', 'gso', 'ip_frag', 'jobstats',
 	'kni', 'latencystats', 'lpm', 'member',
 	'power', 'pdump', 'rawdev',
-	'rcu', 'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
+	'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
 	# ipsec lib depends on net, crypto and security
 	'ipsec',
 	#fib lib depends on rib
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 2/4] test/rcu: test cases for RCU defer queue APIs
  2020-04-03 18:41     ` [dpdk-dev] [PATCH v4 0/4] " Honnappa Nagarahalli
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource " Honnappa Nagarahalli
@ 2020-04-03 18:41       ` Honnappa Nagarahalli
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 3/4] doc/rcu: add RCU integration design details Honnappa Nagarahalli
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 4/4] lib/rcu: add additional debug logs Honnappa Nagarahalli
  3 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-03 18:41 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin
  Cc: dev, honnappa.nagarahalli, ruifeng.wang, dharmik.thakkar, nd

Add test cases for RCU defer queue APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_rcu_qsbr.c | 365 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 363 insertions(+), 2 deletions(-)

diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
index b60dc5099..94869b7b9 100644
--- a/app/test/test_rcu_qsbr.c
+++ b/app/test/test_rcu_qsbr.c
@@ -1,8 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright (c) 2018 Arm Limited
+ * Copyright (c) 2019 Arm Limited
  */
 
 #include <stdio.h>
+#include <string.h>
 #include <rte_pause.h>
 #include <rte_rcu_qsbr.h>
 #include <rte_hash.h>
@@ -15,7 +16,8 @@
 #include "test.h"
 
 /* Check condition and return an error if true. */
-#define TEST_RCU_QSBR_RETURN_IF_ERROR(cond, str, ...) do { \
+#define TEST_RCU_QSBR_RETURN_IF_ERROR(cond, str, ...) \
+do { \
 	if (cond) { \
 		printf("ERROR file %s, line %d: " str "\n", __FILE__, \
 			__LINE__, ##__VA_ARGS__); \
@@ -23,6 +25,16 @@
 	} \
 } while (0)
 
+/* Check condition and go to label if true. */
+#define TEST_RCU_QSBR_GOTO_IF_ERROR(label, cond, str, ...) \
+do { \
+	if (cond) { \
+		printf("ERROR file %s, line %d: " str "\n", __FILE__, \
+			__LINE__, ##__VA_ARGS__); \
+		goto label; \
+	} \
+} while (0)
+
 /* Make sure that this has the same value as __RTE_QSBR_CNT_INIT */
 #define TEST_RCU_QSBR_CNT_INIT 1
 
@@ -34,6 +46,7 @@ static uint32_t *keys;
 #define COUNTER_VALUE 4096
 static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
 static uint8_t writer_done;
+static uint8_t cb_failed;
 
 static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
 static struct rte_hash *h[RTE_MAX_LCORE];
@@ -585,6 +598,330 @@ test_rcu_qsbr_thread_offline(void)
 	return 0;
 }
 
+static void
+test_rcu_qsbr_free_resource1(void *p, void *e)
+{
+	if (p != NULL || e != NULL) {
+		printf("%s: Test failed\n", __func__);
+		cb_failed = 1;
+	}
+}
+
+static void
+test_rcu_qsbr_free_resource2(void *p, void *e)
+{
+	if (p != NULL || e == NULL) {
+		printf("%s: Test failed\n", __func__);
+		cb_failed = 1;
+	}
+}
+
+/*
+ * rte_rcu_qsbr_dq_create: create a queue used to store the data structure
+ * elements that can be freed later. This queue is referred to as 'defer queue'.
+ */
+static int
+test_rcu_qsbr_dq_create(void)
+{
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_create()\n");
+
+	/* Pass invalid parameters */
+	dq = rte_rcu_qsbr_dq_create(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.free_fn = test_rcu_qsbr_free_resource1;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.size = 1;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.esize = 3;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = 0;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	/* Pass all valid parameters */
+	params.esize = 16;
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = params.size;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	rte_rcu_qsbr_dq_delete(dq);
+
+	params.esize = 16;
+	params.flags = RTE_RCU_QSBR_DQ_MT_UNSAFE;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	rte_rcu_qsbr_dq_delete(dq);
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
+ * to be freed later after atleast one grace period is over.
+ */
+static int
+test_rcu_qsbr_dq_enqueue(void)
+{
+	int ret;
+	uint64_t r;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
+
+	/* Create a queue with simple parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.free_fn = test_rcu_qsbr_free_resource1;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 16;
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = params.size;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
+ */
+static int
+test_rcu_qsbr_dq_reclaim(void)
+{
+	int ret;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_reclaim(NULL, 10, NULL, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
+
+	/* Pass invalid parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.free_fn = test_rcu_qsbr_free_resource1;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 3;
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = params.size;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	ret = rte_rcu_qsbr_dq_reclaim(dq, 0, NULL, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_delete: Delete a defer queue.
+ */
+static int
+test_rcu_qsbr_dq_delete(void)
+{
+	int ret;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_delete(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq delete invalid params");
+
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.free_fn = test_rcu_qsbr_free_resource1;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 16;
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = params.size;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
+ * to be freed later after atleast one grace period is over.
+ */
+static int
+test_rcu_qsbr_dq_functional(int32_t size, int32_t esize, uint32_t flags)
+{
+	int i, j, ret;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+	uint64_t *e;
+	uint64_t sc = 200;
+	int max_entries;
+
+	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
+	printf("Size = %d, esize = %d, flags = 0x%x\n", size, esize, flags);
+
+	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
+	if (e == NULL)
+		return 0;
+	cb_failed = 0;
+
+	/* Initialize the RCU variable. No threads are registered */
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+
+	/* Create a queue with simple parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.flags = flags;
+	params.free_fn = test_rcu_qsbr_free_resource2;
+	params.v = t[0];
+	params.size = size;
+	params.esize = esize;
+	params.trigger_reclaim_limit = size >> 3;
+	params.max_reclaim_size = (size >> 4)?(size >> 4):1;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+
+	/* Given the size calculate the maximum number of entries
+	 * that can be stored on the defer queue (look at the logic used
+	 * in capacity calculation of rte_ring).
+	 */
+	max_entries = rte_align32pow2(size + 1) - 1;
+	printf("max_entries = %d\n", max_entries);
+
+	/* Enqueue few counters starting with the value 'sc' */
+	/* The queue size will be rounded up to 2. The enqueue API also
+	 * reclaims if the queue size is above certain limit. Since, there
+	 * are no threads registered, reclamation succedes. Hence, it should
+	 * be possible to enqueue more than the provided queue size.
+	 */
+	for (i = 0; i < 10; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret != 0),
+			"dq enqueue functional, i = %d", i);
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Validate that call back function did not return any error */
+	TEST_RCU_QSBR_GOTO_IF_ERROR(end, (cb_failed == 1), "CB failed");
+
+	/* Register a thread on the RCU QSBR variable. Reclamation will not
+	 * succeed. It should not be possible to enqueue more than the size
+	 * number of resources.
+	 */
+	rte_rcu_qsbr_thread_register(t[0], 1);
+	rte_rcu_qsbr_thread_online(t[0], 1);
+
+	for (i = 0; i < max_entries; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret != 0),
+			"dq enqueue functional, max_entries = %d, i = %d",
+			max_entries, i);
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Enqueue fails as queue is full */
+	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+	TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret == 0), "defer queue is not full");
+
+	/* Delete should fail as there are elements in defer queue which
+	 * cannot be reclaimed.
+	 */
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid params");
+
+	/* Report quiescent state, enqueue should succeed */
+	rte_rcu_qsbr_quiescent(t[0], 1);
+	for (i = 0; i < max_entries; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret != 0),
+			"dq enqueue functional");
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Validate that call back function did not return any error */
+	TEST_RCU_QSBR_GOTO_IF_ERROR(end, (cb_failed == 1), "CB failed");
+
+	/* Queue is full */
+	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+	TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret == 0), "defer queue is not full");
+
+	/* Report quiescent state, delete should succeed */
+	rte_rcu_qsbr_quiescent(t[0], 1);
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+
+	rte_free(e);
+
+	/* Validate that call back function did not return any error */
+	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
+
+	return 0;
+
+end:
+	rte_free(e);
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+	return -1;
+}
+
 /*
  * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
  */
@@ -1028,6 +1365,18 @@ test_rcu_qsbr_main(void)
 	if (test_rcu_qsbr_thread_offline() < 0)
 		goto test_fail;
 
+	if (test_rcu_qsbr_dq_create() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_reclaim() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_delete() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_enqueue() < 0)
+		goto test_fail;
+
 	printf("\nFunctional tests\n");
 
 	if (test_rcu_qsbr_sw_sv_3qs() < 0)
@@ -1036,6 +1385,18 @@ test_rcu_qsbr_main(void)
 	if (test_rcu_qsbr_mw_mv_mqs() < 0)
 		goto test_fail;
 
+	if (test_rcu_qsbr_dq_functional(1, 8, 0) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(2, 8, RTE_RCU_QSBR_DQ_MT_UNSAFE) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(303, 16, 0) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(7, 128, RTE_RCU_QSBR_DQ_MT_UNSAFE) < 0)
+		goto test_fail;
+
 	free_rcu();
 
 	printf("\n");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 3/4] doc/rcu: add RCU integration design details
  2020-04-03 18:41     ` [dpdk-dev] [PATCH v4 0/4] " Honnappa Nagarahalli
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource " Honnappa Nagarahalli
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 2/4] test/rcu: test cases for RCU defer queue APIs Honnappa Nagarahalli
@ 2020-04-03 18:41       ` Honnappa Nagarahalli
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 4/4] lib/rcu: add additional debug logs Honnappa Nagarahalli
  3 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-03 18:41 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin
  Cc: dev, honnappa.nagarahalli, ruifeng.wang, dharmik.thakkar, nd

From: Ruifeng Wang <ruifeng.wang@arm.com>

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 59 +++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 9b0bf138f..484c7b882 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -190,3 +190,62 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Resource reclamation framework for DPDK
+---------------------------------------
+
+Lock-free algorithms place additional burden of resource reclamation on
+the application. When a writer deletes an entry from a data structure, the writer:
+
+#. Has to start the grace period
+#. Has to store a reference to the deleted resources in a FIFO
+#. Should check if the readers have completed a grace period and free the resources. This can also be done when the writer runs out of free resources.
+
+There are several APIs provided to help with this process. The writer
+can create a FIFO to store the references to deleted resources using ``rte_rcu_qsbr_dq_create()``.
+The resources can be enqueued to this FIFO using ``rte_rcu_qsbr_dq_enqueue()``.
+If the FIFO is full, ``rte_rcu_qsbr_dq_enqueue`` will reclaim the resources before enqueuing. It will also reclaim resources on regular basis to keep the FIFO from growing too large. If the writer runs out of resources, the writer can call ``rte_rcu_qsbr_dq_reclaim`` API to reclaim resources. ``rte_rcu_qsbr_dq_delete`` is provided to reclaim any remaining resources and free the FIFO while shutting down.
+
+However, if this resource reclamation process were to be integrated in lock-free data structure libraries, it
+hides this complexity from the application and makes it easier for the application to adopt lock-free algorithms. The following paragraphs discuss how the reclamation process can be integrated in DPDK libraries.
+
+In any DPDK application, the resource reclamation process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to lock-free data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
+
+The application has to handle 'Initialization' and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader threads to report their quiescent state.
+* the application has to register the same RCU variable with the client library.
+* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
+
+The client library will handle 'Reclaiming Resources' part of the process. The
+client libraries will make use of the writer thread context to execute the memory
+reclamation algorithm. So,
+
+* client library should provide an API to register a RCU variable that it will use. It should call ``rte_rcu_qsbr_dq_create()`` to create the FIFO to store the references to deleted entries.
+* client library should use ``rte_rcu_qsbr_dq_enqueue`` to enqueue the deleted resources on the FIFO and start the grace period.
+* if the library runs out of resources while adding entries, it should call ``rte_rcu_qsbr_dq_reclaim`` to reclaim the resources and try the resource allocation again.
+
+The 'Shutdown' process needs to be shared between the application and the
+client library.
+
+* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function.
+
+* client library should call ``rte_rcu_qsbr_dq_delete`` to reclaim any remaining resources and free the FIFO.
+
+Integrating the resource reclamation with client libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclamation happens as part of the writer thread with little impact on
+   performance.
+#. The client library has better control over the resources. For ex: the client
+   library can attempt to reclaim when it has run out of resources.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 4/4] lib/rcu: add additional debug logs
  2020-04-03 18:41     ` [dpdk-dev] [PATCH v4 0/4] " Honnappa Nagarahalli
                         ` (2 preceding siblings ...)
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 3/4] doc/rcu: add RCU integration design details Honnappa Nagarahalli
@ 2020-04-03 18:41       ` Honnappa Nagarahalli
  3 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-03 18:41 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin
  Cc: dev, honnappa.nagarahalli, ruifeng.wang, dharmik.thakkar, nd

Added additional debug logs. These helped in debugging RCU
defer APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_rcu/rte_rcu_qsbr.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index 213f9b029..dd51e7e35 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -716,8 +716,15 @@ rte_rcu_qsbr_check(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 	RTE_ASSERT(v != NULL);
 
 	/* Check if all the readers have already acknowledged this token */
-	if (likely(t <= v->acked_token))
+	if (likely(t <= v->acked_token)) {
+		__RTE_RCU_DP_LOG(DEBUG,
+			"%s: check: token = %"PRIu64", wait = %d",
+			__func__, t, wait);
+		__RTE_RCU_DP_LOG(DEBUG,
+			"%s: status: least acked token = %"PRIu64"",
+			__func__, v->acked_token);
 		return 1;
+	}
 
 	if (likely(v->num_threads == v->max_threads))
 		return __rte_rcu_qsbr_check_all(v, t, wait);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource reclamation APIs
  2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource " Honnappa Nagarahalli
@ 2020-04-07 17:39         ` Ananyev, Konstantin
  2020-04-19 23:22           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2020-04-07 17:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, Medvedkin, Vladimir
  Cc: dev, ruifeng.wang, dharmik.thakkar, nd

> Add resource reclamation APIs to make it simple for applications
> and libraries to integrate rte_rcu library.

Few nits, thoughts, please see below.
Apart from that - LGTM.
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_rcu/Makefile            |   2 +-
>  lib/librte_rcu/meson.build         |   2 +
>  lib/librte_rcu/rcu_qsbr_pvt.h      |  57 +++++++
>  lib/librte_rcu/rte_rcu_qsbr.c      | 243 ++++++++++++++++++++++++++++-
>  lib/librte_rcu/rte_rcu_qsbr.h      | 188 ++++++++++++++++++++++
>  lib/librte_rcu/rte_rcu_version.map |   4 +
>  lib/meson.build                    |   6 +-
>  7 files changed, 498 insertions(+), 4 deletions(-)
>  create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h
> 
> diff --git a/lib/librte_rcu/Makefile b/lib/librte_rcu/Makefile
> index c4bb28d77..95f8a57e2 100644
> --- a/lib/librte_rcu/Makefile
> +++ b/lib/librte_rcu/Makefile
> @@ -8,7 +8,7 @@ LIB = librte_rcu.a
> 
>  CFLAGS += -DALLOW_EXPERIMENTAL_API
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> -LDLIBS += -lrte_eal
> +LDLIBS += -lrte_eal -lrte_ring
> 
>  EXPORT_MAP := rte_rcu_version.map
> 
> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> index 62920ba02..e280b29c1 100644
> --- a/lib/librte_rcu/meson.build
> +++ b/lib/librte_rcu/meson.build
> @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')
>  if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
>  	ext_deps += cc.find_library('atomic')
>  endif
> +
> +deps += ['ring']
> diff --git a/lib/librte_rcu/rcu_qsbr_pvt.h b/lib/librte_rcu/rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..413f28587
> --- /dev/null
> +++ b/lib/librte_rcu/rcu_qsbr_pvt.h
> @@ -0,0 +1,57 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2019 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_ring.h>
> +#include <rte_ring_elem.h>
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data, including the token, stored on the
> +	 *   defer queue.
> +	 */
> +	uint32_t trigger_reclaim_limit;
> +	/**< Trigger automatic reclamation after the defer queue
> +	 *   has atleast these many resources waiting.
> +	 */
> +	uint32_t max_reclaim_size;
> +	/**< Reclaim at the max these many resources during auto
> +	 *   reclamation.
> +	 */
> +	rte_rcu_qsbr_free_resource_t free_fn;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
> index 2f3fad776..e8c1e386f 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> @@ -1,6 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   *
> - * Copyright (c) 2018 Arm Limited
> + * Copyright (c) 2018-2019 Arm Limited
>   */
> 
>  #include <stdio.h>
> @@ -18,8 +18,10 @@
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
>  #include <rte_errno.h>
> +#include <rte_ring_elem.h>
> 
>  #include "rte_rcu_qsbr.h"
> +#include "rcu_qsbr_pvt.h"
> 
>  /* Get the memory size of QSBR variable */
>  size_t
> @@ -270,6 +272,245 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
>  	return 0;
>  }
> 
> +/* Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + */
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
> +{
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint32_t qs_fifo_size;
> +	unsigned int flags;
> +
> +	if (params == NULL || params->free_fn == NULL ||
> +		params->v == NULL || params->name == NULL ||
> +		params->size == 0 || params->esize == 0 ||
> +		(params->esize % 4 != 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +	/* If auto reclamation is configured, reclaim limit
> +	 * should be a valid value.
> +	 */
> +	if ((params->trigger_reclaim_limit <= params->size) &&
> +	    (params->max_reclaim_size == 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter, size = %u, trigger_reclaim_limit = %u, max_reclaim_size = %u\n",
> +			__func__, params->size, params->trigger_reclaim_limit,
> +			params->max_reclaim_size);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +
> +	dq = rte_zmalloc(NULL, sizeof(struct rte_rcu_qsbr_dq),
> +			 RTE_CACHE_LINE_SIZE);
> +	if (dq == NULL) {
> +		rte_errno = ENOMEM;
> +
> +		return NULL;
> +	}
> +
> +	/* Decide the flags for the ring.
> +	 * If MT safety is requested, use RTS for ring enqueue as most
> +	 * use cases involve dq-enqueue happening on the control plane.
> +	 * Ring dequeue is always HTS due to the possibility of revert.
> +	 */
> +	flags = RING_F_MP_RTS_ENQ;
> +	if (params->flags & RTE_RCU_QSBR_DQ_MT_UNSAFE)
> +		flags = RING_F_SP_ENQ;
> +	flags |= RING_F_MC_HTS_DEQ;
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * max_size.
> +	 */
> +	qs_fifo_size = rte_align32pow2(params->size + 1);
> +	/* Add token size to ring element size */
> +	dq->r = rte_ring_create_elem(params->name,
> +			__RTE_QSBR_TOKEN_SIZE + params->esize,
> +			qs_fifo_size, SOCKET_ID_ANY, flags);
> +	if (dq->r == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): defer queue create failed\n", __func__);
> +		rte_free(dq);
> +		return NULL;
> +	}
> +
> +	dq->v = params->v;
> +	dq->size = params->size;
> +	dq->esize = __RTE_QSBR_TOKEN_SIZE + params->esize;
> +	dq->trigger_reclaim_limit = params->trigger_reclaim_limit;
> +	dq->max_reclaim_size = params->max_reclaim_size;
> +	dq->free_fn = params->free_fn;
> +	dq->p = params->p;
> +
> +	return dq;
> +}
> +
> +/* Enqueue one resource to the defer queue to free after the grace
> + * period is over.
> + */
> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> +{
> +	uint64_t token;
> +	uint32_t cur_size, free_size;
> +
> +	if (dq == NULL || e == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Start the grace period */
> +	token = rte_rcu_qsbr_start(dq->v);
> +
> +	/* Reclaim resources if the queue is 1/8th full. This helps

Comment about 1/8 is probably left from older version?
As I understand now it is configurable parameter.

> +	 * the queue from growing too large and allows time for reader
> +	 * threads to report their quiescent state.
> +	 */
> +	cur_size = rte_ring_count(dq->r);
> +	if (cur_size > dq->trigger_reclaim_limit) {
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Triggering reclamation\n", __func__);
> +		rte_rcu_qsbr_dq_reclaim(dq, dq->max_reclaim_size, NULL, NULL);
> +	}
> +
> +	/* Check if there is space for atleast 1 resource */
> +	free_size = rte_ring_free_count(dq->r);
> +	if (!free_size) {

Is there any point to do this check at all?
You are doing enqueue below and handle situation with
not enough space in the ring anyway.

> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Defer queue is full\n", __func__);
> +		/* Note that the token generated above is not used.
> +		 * Other than wasting tokens, it should not cause any
> +		 * other issues.
> +		 */
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Skipped enqueuing token = %"PRIu64"\n",
> +			__func__, token);
> +
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	/* Enqueue the token and resource. Generating the token
> +	 * and enqueuing (token + resource) on the queue is not an
> +	 * atomic operation. This might result in tokens enqueued
> +	 * out of order on the queue. So, some tokens might wait
> +	 * longer than they are required to be reclaimed.
> +	 */
> +	char data[dq->esize];
> +	memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
> +	memcpy(data + __RTE_QSBR_TOKEN_SIZE, e,
> +		dq->esize - __RTE_QSBR_TOKEN_SIZE);
> +	/* Check the status as enqueue might fail since the other thread
> +	 * might have used up the freed space.
> +	 * Enqueue uses the configured flags when the DQ was created.
> +	 */
> +	if (rte_ring_enqueue_elem(dq->r, data, dq->esize) != 0) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Enqueue failed\n", __func__);
> +		/* Note that the token generated above is not used.
> +		 * Other than wasting tokens, it should not cause any
> +		 * other issues.
> +		 */
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Skipped enqueuing token = %"PRIu64"\n",
> +			__func__, token);
> +
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}


Just as a thought: in theory if we'll use MP_HTS(/SP) ring we can avoid
wasting RCU tokens:

if (rte_ring_enqueue_elem_bulk_start(dq->r, 1, NULL) != 0) {
	token = rte_rcu_qsbr_start(dq->v);
	memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
	rte_ring_enqueue_elem_finish(dq->r, data, dq->esize, 1);
}

Though it might slowdown things if we'll have a lot of
parallel dq_enqueue. 
So not sure is it worth it or not.

> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Enqueued token = %"PRIu64"\n", __func__, token);
> +
> +	return 0;
> +}
> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
> +				unsigned int *freed, unsigned int *pending)
> +{
> +	uint32_t cnt;
> +	uint64_t token;
> +
> +	if (dq == NULL || n == 0) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	cnt = 0;
> +
> +	char e[dq->esize];
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while ((cnt < n) &&
> +		(rte_ring_dequeue_bulk_elem_start(dq->r, e,
> +					dq->esize, 1, NULL) != 0)) {

Another thought - any point to use burst_elem_start() here to retrieve more
then 1 elem in one go? Something like:
char e[32][dq->size]; 
while ((cnt < n) {
	k = RTE_MAX(32, cnt - n);
	k = rte_ring_dequeue_burst_elem_start(dq->r, e, dq->esize, k, NULL);
	if (k = 0)
		break;
	for (i = 0; i != k; i++) {
		memcpy(&token, e[i], sizeof(uint64_t));
		if (rte_rcu_qsbr_check(dq->v, token, false) != 1)
			break;
	}
	k = i;
	rte_ring_dequeue_elem_finish(dq->r, k);
	for (i = 0; i != k; i++)
		dq->free_fn(dq->p, e[i] + __RTE_QSBR_TOKEN_SIZE);
	n += k;
	if (k == 0)
		break;

?
Also if at enqueue we guarantee strict ordrer (via enqueue_start/enqueue_finish),
then here we probably can do _check_ from the last retrieved token here?
In theory that might help to minimize number of checks.
I.E. do:
for (i = k; i-- !=0; )  {
	memcpy(&token, e[i], sizeof(uint64_t));
	if (rte_rcu_qsbr_check(dq->v, token, false) != 1)   
		break;
}
k = i + 1;
...

> +		memcpy(&token, e, sizeof(uint64_t));
> +
> +		/* Reclaim the resource */
> +		if (rte_rcu_qsbr_check(dq->v, token, false) != 1) {
> +			rte_ring_dequeue_finish(dq->r, 0);
> +			break;
> +		}
> +		rte_ring_dequeue_finish(dq->r, 1);
> +
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Reclaimed token = %"PRIu64"\n",
> +			__func__, *(uint64_t *)e);
> +
> +		dq->free_fn(dq->p, e + __RTE_QSBR_TOKEN_SIZE);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (freed != NULL)
> +		*freed = cnt;
> +	if (pending != NULL)
> +		*pending = rte_ring_count(dq->r);
> +
> +	return 0;
> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	unsigned int pending;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	/* Reclaim all the resources */
> +	rte_rcu_qsbr_dq_reclaim(dq, ~0, NULL, &pending);
> +	if (pending != 0) {
> +		rte_errno = EAGAIN;
> +
> +		return 1;
> +	}
> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>  int rte_rcu_log_type;
> 
>  RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index 0b5585925..213f9b029 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -34,6 +34,7 @@ extern "C" {
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> +#include <rte_ring.h>
> 
>  extern int rte_rcu_log_type;
> 
> @@ -84,6 +85,7 @@ struct rte_rcu_qsbr_cnt {
>  #define __RTE_QSBR_CNT_THR_OFFLINE 0
>  #define __RTE_QSBR_CNT_INIT 1
>  #define __RTE_QSBR_CNT_MAX ((uint64_t)~0)
> +#define __RTE_QSBR_TOKEN_SIZE sizeof(uint64_t)
> 
>  /* RTE Quiescent State variable structure.
>   * This structure has two elements that vary in size based on the
> @@ -114,6 +116,84 @@ struct rte_rcu_qsbr {
>  	 */
>  } __rte_cache_aligned;
> 
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> +
> +/**
> + * Various flags supported.
> + */
> +/**< Enqueue and reclaim operations are multi-thread safe by default.
> + *   The call back functions registered to free the resources are
> + *   assumed to be multi-thread safe.
> + *   Set this flag is multi-thread safety is not required.
> + */
> +#define RTE_RCU_QSBR_DQ_MT_UNSAFE 1
> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t flags;
> +	/**< Flags to control API behaviors */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 4B.
> +	 */
> +	uint32_t trigger_reclaim_limit;
> +	/**< Trigger automatic reclamation after the defer queue
> +	 *   has atleast these many resources waiting. This auto
> +	 *   reclamation is triggered in rte_rcu_qsbr_dq_enqueue API
> +	 *   call.
> +	 *   If this is greater than 'size', auto reclamation is
> +	 *   not triggered.
> +	 *   If this is set to 0, auto reclamation is triggered
> +	 *   in every call to rte_rcu_qsbr_dq_enqueue API.
> +	 */
> +	uint32_t max_reclaim_size;
> +	/**< When automatic reclamation is enabled, reclaim at the max
> +	 *   these many resources. This should contain a valid value, if
> +	 *   auto reclamation is on. Setting this to 'size' or greater will
> +	 *   reclaim all possible resources currently on the defer queue.
> +	 */
> +	rte_rcu_qsbr_free_resource_t free_fn;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;
> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this API may change without prior notice
> @@ -692,6 +772,114 @@ __rte_experimental
>  int
>  rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * Multi-thread safety is provided as the defer queue configuration.
> + * When multi-thread safety is requested, it is possible that the
> + * resources are not stored in their order of deletion. This results
> + * in resources being held in the defer queue longer than they should.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Free quesed resources from the defer queue.
> + *
> + * This API is multi-thread safe.
> + *
> + * @param dq
> + *   Defer queue to free an entry from.
> + * @param n
> + *   Maximum number of resources to free.
> + * @param freed
> + *   Number of resources that were freed.
> + * @param pending
> + *   Number of resources pending on the defer queue. This number might not
> + *   be acurate if multi-thread safety is configured.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
> +				unsigned int *freed, unsigned int *pending);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>  	rte_rcu_qsbr_synchronize;
>  	rte_rcu_qsbr_thread_register;
>  	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
> 
>  	local: *;
>  };
> diff --git a/lib/meson.build b/lib/meson.build
> index 9c3cc55d5..15e91a303 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>  libraries = [
>  	'kvargs', # eal depends on kvargs
>  	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>  	'cmdline',
>  	'metrics', # bitrate/latency stats depends on this
>  	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>  	'gro', 'gso', 'ip_frag', 'jobstats',
>  	'kni', 'latencystats', 'lpm', 'member',
>  	'power', 'pdump', 'rawdev',
> -	'rcu', 'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
>  	# ipsec lib depends on net, crypto and security
>  	'ipsec',
>  	#fib lib depends on rib
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource reclamation APIs
  2020-04-07 17:39         ` Ananyev, Konstantin
@ 2020-04-19 23:22           ` Honnappa Nagarahalli
  2020-04-20  8:19             ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19 23:22 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Medvedkin, Vladimir
  Cc: dev, Ruifeng Wang, Dharmik Thakkar, nd, Honnappa Nagarahalli, nd

<snip>

> 
> > Add resource reclamation APIs to make it simple for applications and
> > libraries to integrate rte_rcu library.
> 
> Few nits, thoughts, please see below.
> Apart from that - LGTM.
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_rcu/Makefile            |   2 +-
> >  lib/librte_rcu/meson.build         |   2 +
> >  lib/librte_rcu/rcu_qsbr_pvt.h      |  57 +++++++
> >  lib/librte_rcu/rte_rcu_qsbr.c      | 243 ++++++++++++++++++++++++++++-
> >  lib/librte_rcu/rte_rcu_qsbr.h      | 188 ++++++++++++++++++++++
> >  lib/librte_rcu/rte_rcu_version.map |   4 +
> >  lib/meson.build                    |   6 +-
> >  7 files changed, 498 insertions(+), 4 deletions(-)  create mode
> > 100644 lib/librte_rcu/rcu_qsbr_pvt.h
> >
> > diff --git a/lib/librte_rcu/Makefile b/lib/librte_rcu/Makefile index
> > c4bb28d77..95f8a57e2 100644
> > --- a/lib/librte_rcu/Makefile
> > +++ b/lib/librte_rcu/Makefile
> > @@ -8,7 +8,7 @@ LIB = librte_rcu.a
> >
> >  CFLAGS += -DALLOW_EXPERIMENTAL_API
> >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -LDLIBS += -lrte_eal
> > +LDLIBS += -lrte_eal -lrte_ring
> >
> >  EXPORT_MAP := rte_rcu_version.map
> >
> > diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> > index 62920ba02..e280b29c1 100644
> > --- a/lib/librte_rcu/meson.build
> > +++ b/lib/librte_rcu/meson.build
> > @@ -10,3 +10,5 @@ headers = files('rte_rcu_qsbr.h')  if cc.get_id() ==
> > 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> >  	ext_deps += cc.find_library('atomic')  endif
> > +
> > +deps += ['ring']
> > diff --git a/lib/librte_rcu/rcu_qsbr_pvt.h
> > b/lib/librte_rcu/rcu_qsbr_pvt.h new file mode 100644 index
> > 000000000..413f28587
> > --- /dev/null
> > +++ b/lib/librte_rcu/rcu_qsbr_pvt.h
> > @@ -0,0 +1,57 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2019 Arm Limited
> > + */
> > +
> > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > +#define _RTE_RCU_QSBR_PVT_H_
> > +
> > +/**
> > + * This file is private to the RCU library. It should not be included
> > + * by the user of this library.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_ring.h>
> > +#include <rte_ring_elem.h>
> > +
> > +#include "rte_rcu_qsbr.h"
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq {
> > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > +	uint32_t size;
> > +	/**< Number of elements in the defer queue */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of data, including the token, stored on the
> > +	 *   defer queue.
> > +	 */
> > +	uint32_t trigger_reclaim_limit;
> > +	/**< Trigger automatic reclamation after the defer queue
> > +	 *   has atleast these many resources waiting.
> > +	 */
> > +	uint32_t max_reclaim_size;
> > +	/**< Reclaim at the max these many resources during auto
> > +	 *   reclamation.
> > +	 */
> > +	rte_rcu_qsbr_free_resource_t free_fn;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs.
> > +	 */
> > +};
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > b/lib/librte_rcu/rte_rcu_qsbr.c index 2f3fad776..e8c1e386f 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > @@ -1,6 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   *
> > - * Copyright (c) 2018 Arm Limited
> > + * Copyright (c) 2018-2019 Arm Limited
> >   */
> >
> >  #include <stdio.h>
> > @@ -18,8 +18,10 @@
> >  #include <rte_per_lcore.h>
> >  #include <rte_lcore.h>
> >  #include <rte_errno.h>
> > +#include <rte_ring_elem.h>
> >
> >  #include "rte_rcu_qsbr.h"
> > +#include "rcu_qsbr_pvt.h"
> >
> >  /* Get the memory size of QSBR variable */  size_t @@ -270,6 +272,245
> > @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> >  	return 0;
> >  }
> >
> > +/* Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + */
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params) {
> > +	struct rte_rcu_qsbr_dq *dq;
> > +	uint32_t qs_fifo_size;
> > +	unsigned int flags;
> > +
> > +	if (params == NULL || params->free_fn == NULL ||
> > +		params->v == NULL || params->name == NULL ||
> > +		params->size == 0 || params->esize == 0 ||
> > +		(params->esize % 4 != 0)) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return NULL;
> > +	}
> > +	/* If auto reclamation is configured, reclaim limit
> > +	 * should be a valid value.
> > +	 */
> > +	if ((params->trigger_reclaim_limit <= params->size) &&
> > +	    (params->max_reclaim_size == 0)) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter, size = %u,
> trigger_reclaim_limit = %u, max_reclaim_size = %u\n",
> > +			__func__, params->size, params-
> >trigger_reclaim_limit,
> > +			params->max_reclaim_size);
> > +		rte_errno = EINVAL;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	dq = rte_zmalloc(NULL, sizeof(struct rte_rcu_qsbr_dq),
> > +			 RTE_CACHE_LINE_SIZE);
> > +	if (dq == NULL) {
> > +		rte_errno = ENOMEM;
> > +
> > +		return NULL;
> > +	}
> > +
> > +	/* Decide the flags for the ring.
> > +	 * If MT safety is requested, use RTS for ring enqueue as most
> > +	 * use cases involve dq-enqueue happening on the control plane.
> > +	 * Ring dequeue is always HTS due to the possibility of revert.
> > +	 */
> > +	flags = RING_F_MP_RTS_ENQ;
> > +	if (params->flags & RTE_RCU_QSBR_DQ_MT_UNSAFE)
> > +		flags = RING_F_SP_ENQ;
> > +	flags |= RING_F_MC_HTS_DEQ;
> > +	/* round up qs_fifo_size to next power of two that is not less than
> > +	 * max_size.
> > +	 */
> > +	qs_fifo_size = rte_align32pow2(params->size + 1);
> > +	/* Add token size to ring element size */
> > +	dq->r = rte_ring_create_elem(params->name,
> > +			__RTE_QSBR_TOKEN_SIZE + params->esize,
> > +			qs_fifo_size, SOCKET_ID_ANY, flags);
> > +	if (dq->r == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): defer queue create failed\n", __func__);
> > +		rte_free(dq);
> > +		return NULL;
> > +	}
> > +
> > +	dq->v = params->v;
> > +	dq->size = params->size;
> > +	dq->esize = __RTE_QSBR_TOKEN_SIZE + params->esize;
> > +	dq->trigger_reclaim_limit = params->trigger_reclaim_limit;
> > +	dq->max_reclaim_size = params->max_reclaim_size;
> > +	dq->free_fn = params->free_fn;
> > +	dq->p = params->p;
> > +
> > +	return dq;
> > +}
> > +
> > +/* Enqueue one resource to the defer queue to free after the grace
> > + * period is over.
> > + */
> > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > +	uint64_t token;
> > +	uint32_t cur_size, free_size;
> > +
> > +	if (dq == NULL || e == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Start the grace period */
> > +	token = rte_rcu_qsbr_start(dq->v);
> > +
> > +	/* Reclaim resources if the queue is 1/8th full. This helps
> 
> Comment about 1/8 is probably left from older version?
> As I understand now it is configurable parameter.
Ack, will correct this.

> 
> > +	 * the queue from growing too large and allows time for reader
> > +	 * threads to report their quiescent state.
> > +	 */
> > +	cur_size = rte_ring_count(dq->r);
> > +	if (cur_size > dq->trigger_reclaim_limit) {
> > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +			"%s(): Triggering reclamation\n", __func__);
> > +		rte_rcu_qsbr_dq_reclaim(dq, dq->max_reclaim_size, NULL,
> NULL);
> > +	}
> > +
> > +	/* Check if there is space for atleast 1 resource */
> > +	free_size = rte_ring_free_count(dq->r);
> > +	if (!free_size) {
> 
> Is there any point to do this check at all?
> You are doing enqueue below and handle situation with not enough space in
> the ring anyway.
Ack

> 
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Defer queue is full\n", __func__);
> > +		/* Note that the token generated above is not used.
> > +		 * Other than wasting tokens, it should not cause any
> > +		 * other issues.
> > +		 */
> > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +			"%s(): Skipped enqueuing token = %"PRIu64"\n",
> > +			__func__, token);
> > +
> > +		rte_errno = ENOSPC;
> > +		return 1;
> > +	}
> > +
> > +	/* Enqueue the token and resource. Generating the token
> > +	 * and enqueuing (token + resource) on the queue is not an
> > +	 * atomic operation. This might result in tokens enqueued
> > +	 * out of order on the queue. So, some tokens might wait
> > +	 * longer than they are required to be reclaimed.
> > +	 */
> > +	char data[dq->esize];
> > +	memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
> > +	memcpy(data + __RTE_QSBR_TOKEN_SIZE, e,
> > +		dq->esize - __RTE_QSBR_TOKEN_SIZE);
> > +	/* Check the status as enqueue might fail since the other thread
> > +	 * might have used up the freed space.
> > +	 * Enqueue uses the configured flags when the DQ was created.
> > +	 */
> > +	if (rte_ring_enqueue_elem(dq->r, data, dq->esize) != 0) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Enqueue failed\n", __func__);
> > +		/* Note that the token generated above is not used.
> > +		 * Other than wasting tokens, it should not cause any
> > +		 * other issues.
> > +		 */
> > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +			"%s(): Skipped enqueuing token = %"PRIu64"\n",
> > +			__func__, token);
> > +
> > +		rte_errno = ENOSPC;
> > +		return 1;
> > +	}
> 
> 
> Just as a thought: in theory if we'll use MP_HTS(/SP) ring we can avoid
> wasting RCU tokens:
> 
> if (rte_ring_enqueue_elem_bulk_start(dq->r, 1, NULL) != 0) {
> 	token = rte_rcu_qsbr_start(dq->v);
> 	memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
> 	rte_ring_enqueue_elem_finish(dq->r, data, dq->esize, 1); }
> 
> Though it might slowdown things if we'll have a lot of parallel dq_enqueue.
> So not sure is it worth it or not.
Adding peek APIs for RTS would be better. That should take care of the parallel dw_enqueue. Not sure if I gave you the comment. My ring patch supported these APIs.

> 
> > +
> > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +		"%s(): Enqueued token = %"PRIu64"\n", __func__, token);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Reclaim resources from the defer queue. */ int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
> > +				unsigned int *freed, unsigned int *pending) {
> > +	uint32_t cnt;
> > +	uint64_t token;
> > +
> > +	if (dq == NULL || n == 0) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	cnt = 0;
> > +
> > +	char e[dq->esize];
> > +	/* Check reader threads quiescent state and reclaim resources */
> > +	while ((cnt < n) &&
> > +		(rte_ring_dequeue_bulk_elem_start(dq->r, e,
> > +					dq->esize, 1, NULL) != 0)) {
> 
> Another thought - any point to use burst_elem_start() here to retrieve more
> then 1 elem in one go? Something like:
I think it makes sense.

> char e[32][dq->size];
> while ((cnt < n) {
> 	k = RTE_MAX(32, cnt - n);
> 	k = rte_ring_dequeue_burst_elem_start(dq->r, e, dq->esize, k, NULL);
> 	if (k = 0)
> 		break;
> 	for (i = 0; i != k; i++) {
> 		memcpy(&token, e[i], sizeof(uint64_t));
> 		if (rte_rcu_qsbr_check(dq->v, token, false) != 1)
> 			break;
> 	}
> 	k = i;
> 	rte_ring_dequeue_elem_finish(dq->r, k);
> 	for (i = 0; i != k; i++)
> 		dq->free_fn(dq->p, e[i] + __RTE_QSBR_TOKEN_SIZE);
I think it also makes sense to change the free_fn to take 'n' number of tokens.

> 	n += k;
> 	if (k == 0)
> 		break;
> 
> ?
> Also if at enqueue we guarantee strict ordrer (via
> enqueue_start/enqueue_finish), then here we probably can do _check_ from
> the last retrieved token here?
> In theory that might help to minimize number of checks.
> I.E. do:
> for (i = k; i-- !=0; )  {
> 	memcpy(&token, e[i], sizeof(uint64_t));
> 	if (rte_rcu_qsbr_check(dq->v, token, false) != 1)
There is a higher chance that later tokens are not acked. This introduces more polling of the counters.
The rte_rcu_qsbr_check has an optimization. While acking the current token, it will also caches the greatest token acked. It uses the cached token for the subsequent calls. I think this provides a better optimization.

> 		break;
> }
> k = i + 1;
> ...
> 
> > +		memcpy(&token, e, sizeof(uint64_t));
> > +
> > +		/* Reclaim the resource */
> > +		if (rte_rcu_qsbr_check(dq->v, token, false) != 1) {
> > +			rte_ring_dequeue_finish(dq->r, 0);
> > +			break;
> > +		}
> > +		rte_ring_dequeue_finish(dq->r, 1);
> > +
> > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +			"%s(): Reclaimed token = %"PRIu64"\n",
> > +			__func__, *(uint64_t *)e);
> > +
> > +		dq->free_fn(dq->p, e + __RTE_QSBR_TOKEN_SIZE);
> > +
> > +		cnt++;
> > +	}
> > +
> > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > +
> > +	if (freed != NULL)
> > +		*freed = cnt;
> > +	if (pending != NULL)
> > +		*pending = rte_ring_count(dq->r);
> > +
> > +	return 0;
> > +}
> > +
> > +/* Delete a defer queue. */
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > +	unsigned int pending;
> > +
> > +	if (dq == NULL) {
> > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > +			"%s(): Invalid input parameter\n", __func__);
> > +		rte_errno = EINVAL;
> > +
> > +		return 1;
> > +	}
> > +
> > +	/* Reclaim all the resources */
> > +	rte_rcu_qsbr_dq_reclaim(dq, ~0, NULL, &pending);
> > +	if (pending != 0) {
> > +		rte_errno = EAGAIN;
> > +
> > +		return 1;
> > +	}
> > +
> > +	rte_ring_free(dq->r);
> > +	rte_free(dq);
> > +
> > +	return 0;
> > +}
> > +
> >  int rte_rcu_log_type;
> >
> >  RTE_INIT(rte_rcu_register)
> > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > b/lib/librte_rcu/rte_rcu_qsbr.h index 0b5585925..213f9b029 100644
> > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > @@ -34,6 +34,7 @@ extern "C" {
> >  #include <rte_lcore.h>
> >  #include <rte_debug.h>
> >  #include <rte_atomic.h>
> > +#include <rte_ring.h>
> >
> >  extern int rte_rcu_log_type;
> >
> > @@ -84,6 +85,7 @@ struct rte_rcu_qsbr_cnt {  #define
> > __RTE_QSBR_CNT_THR_OFFLINE 0  #define __RTE_QSBR_CNT_INIT 1
> #define
> > __RTE_QSBR_CNT_MAX ((uint64_t)~0)
> > +#define __RTE_QSBR_TOKEN_SIZE sizeof(uint64_t)
> >
> >  /* RTE Quiescent State variable structure.
> >   * This structure has two elements that vary in size based on the @@
> > -114,6 +116,84 @@ struct rte_rcu_qsbr {
> >  	 */
> >  } __rte_cache_aligned;
> >
> > +/**
> > + * Call back function called to free the resources.
> > + *
> > + * @param p
> > + *   Pointer provided while creating the defer queue
> > + * @param e
> > + *   Pointer to the resource data stored on the defer queue
> > + *
> > + * @return
> > + *   None
> > + */
> > +typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);
> > +
> > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > +
> > +/**
> > + * Various flags supported.
> > + */
> > +/**< Enqueue and reclaim operations are multi-thread safe by default.
> > + *   The call back functions registered to free the resources are
> > + *   assumed to be multi-thread safe.
> > + *   Set this flag is multi-thread safety is not required.
> > + */
> > +#define RTE_RCU_QSBR_DQ_MT_UNSAFE 1
> > +
> > +/**
> > + * Parameters used when creating the defer queue.
> > + */
> > +struct rte_rcu_qsbr_dq_parameters {
> > +	const char *name;
> > +	/**< Name of the queue. */
> > +	uint32_t flags;
> > +	/**< Flags to control API behaviors */
> > +	uint32_t size;
> > +	/**< Number of entries in queue. Typically, this will be
> > +	 *   the same as the maximum number of entries supported in the
> > +	 *   lock free data structure.
> > +	 *   Data structures with unbounded number of entries is not
> > +	 *   supported currently.
> > +	 */
> > +	uint32_t esize;
> > +	/**< Size (in bytes) of each element in the defer queue.
> > +	 *   This has to be multiple of 4B.
> > +	 */
> > +	uint32_t trigger_reclaim_limit;
> > +	/**< Trigger automatic reclamation after the defer queue
> > +	 *   has atleast these many resources waiting. This auto
> > +	 *   reclamation is triggered in rte_rcu_qsbr_dq_enqueue API
> > +	 *   call.
> > +	 *   If this is greater than 'size', auto reclamation is
> > +	 *   not triggered.
> > +	 *   If this is set to 0, auto reclamation is triggered
> > +	 *   in every call to rte_rcu_qsbr_dq_enqueue API.
> > +	 */
> > +	uint32_t max_reclaim_size;
> > +	/**< When automatic reclamation is enabled, reclaim at the max
> > +	 *   these many resources. This should contain a valid value, if
> > +	 *   auto reclamation is on. Setting this to 'size' or greater will
> > +	 *   reclaim all possible resources currently on the defer queue.
> > +	 */
> > +	rte_rcu_qsbr_free_resource_t free_fn;
> > +	/**< Function to call to free the resource. */
> > +	void *p;
> > +	/**< Pointer passed to the free function. Typically, this is the
> > +	 *   pointer to the data structure to which the resource to free
> > +	 *   belongs. This can be NULL.
> > +	 */
> > +	struct rte_rcu_qsbr *v;
> > +	/**< RCU QSBR variable to use for this defer queue */ };
> > +
> > +/* RTE defer queue structure.
> > + * This structure holds the defer queue. The defer queue is used to
> > + * hold the deleted entries from the data structure that are not
> > + * yet freed.
> > + */
> > +struct rte_rcu_qsbr_dq;
> > +
> >  /**
> >   * @warning
> >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > -692,6 +772,114 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f,
> > struct rte_rcu_qsbr *v);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a queue used to store the data structure elements that can
> > + * be freed later. This queue is referred to as 'defer queue'.
> > + *
> > + * @param params
> > + *   Parameters to create a defer queue.
> > + * @return
> > + *   On success - Valid pointer to defer queue
> > + *   On error - NULL
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOMEM - Not enough memory
> > + */
> > +__rte_experimental
> > +struct rte_rcu_qsbr_dq *
> > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > +*params);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Enqueue one resource to the defer queue and start the grace period.
> > + * The resource will be freed later after at least one grace period
> > + * is over.
> > + *
> > + * If the defer queue is full, it will attempt to reclaim resources.
> > + * It will also reclaim resources at regular intervals to avoid
> > + * the defer queue from growing too big.
> > + *
> > + * Multi-thread safety is provided as the defer queue configuration.
> > + * When multi-thread safety is requested, it is possible that the
> > + * resources are not stored in their order of deletion. This results
> > + * in resources being held in the defer queue longer than they should.
> > + *
> > + * @param dq
> > + *   Defer queue to allocate an entry from.
> > + * @param e
> > + *   Pointer to resource data to copy to the defer queue. The size of
> > + *   the data to copy is equal to the element size provided when the
> > + *   defer queue was created.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > + *		if the defer queue size is equal (or larger) than the
> > + *		number of elements in the data structure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Free quesed resources from the defer queue.
> > + *
> > + * This API is multi-thread safe.
> > + *
> > + * @param dq
> > + *   Defer queue to free an entry from.
> > + * @param n
> > + *   Maximum number of resources to free.
> > + * @param freed
> > + *   Number of resources that were freed.
> > + * @param pending
> > + *   Number of resources pending on the defer queue. This number might
> not
> > + *   be acurate if multi-thread safety is configured.
> > + * @return
> > + *   On successful reclamation of at least 1 resource - 0
> > + *   On error - 1 with rte_errno set to
> > + *   - EINVAL - NULL parameters are passed
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
> > +				unsigned int *freed, unsigned int *pending);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Delete a defer queue.
> > + *
> > + * It tries to reclaim all the resources on the defer queue.
> > + * If any of the resources have not completed the grace period
> > + * the reclamation stops and returns immediately. The rest of
> > + * the resources are not reclaimed and the defer queue is not
> > + * freed.
> > + *
> > + * @param dq
> > + *   Defer queue to delete.
> > + * @return
> > + *   On success - 0
> > + *   On error - 1
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - NULL parameters are passed
> > + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> > + *		period, try again.
> > + */
> > +__rte_experimental
> > +int
> > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > b/lib/librte_rcu/rte_rcu_version.map
> > index f8b9ef2ab..dfac88a37 100644
> > --- a/lib/librte_rcu/rte_rcu_version.map
> > +++ b/lib/librte_rcu/rte_rcu_version.map
> > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> >  	rte_rcu_qsbr_synchronize;
> >  	rte_rcu_qsbr_thread_register;
> >  	rte_rcu_qsbr_thread_unregister;
> > +	rte_rcu_qsbr_dq_create;
> > +	rte_rcu_qsbr_dq_enqueue;
> > +	rte_rcu_qsbr_dq_reclaim;
> > +	rte_rcu_qsbr_dq_delete;
> >
> >  	local: *;
> >  };
> > diff --git a/lib/meson.build b/lib/meson.build index
> > 9c3cc55d5..15e91a303 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -11,7 +11,9 @@
> >  libraries = [
> >  	'kvargs', # eal depends on kvargs
> >  	'eal', # everything depends on eal
> > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > +	'ring',
> > +	'rcu', # rcu depends on ring
> > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> >  	'cmdline',
> >  	'metrics', # bitrate/latency stats depends on this
> >  	'hash',    # efd depends on this
> > @@ -22,7 +24,7 @@ libraries = [
> >  	'gro', 'gso', 'ip_frag', 'jobstats',
> >  	'kni', 'latencystats', 'lpm', 'member',
> >  	'power', 'pdump', 'rawdev',
> > -	'rcu', 'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > +	'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
> >  	# ipsec lib depends on net, crypto and security
> >  	'ipsec',
> >  	#fib lib depends on rib
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource reclamation APIs
  2020-04-19 23:22           ` Honnappa Nagarahalli
@ 2020-04-20  8:19             ` Ananyev, Konstantin
  0 siblings, 0 replies; 137+ messages in thread
From: Ananyev, Konstantin @ 2020-04-20  8:19 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, Medvedkin, Vladimir
  Cc: dev, Ruifeng Wang, Dharmik Thakkar, nd, nd

> > > +
> > > +	/* Enqueue the token and resource. Generating the token
> > > +	 * and enqueuing (token + resource) on the queue is not an
> > > +	 * atomic operation. This might result in tokens enqueued
> > > +	 * out of order on the queue. So, some tokens might wait
> > > +	 * longer than they are required to be reclaimed.
> > > +	 */
> > > +	char data[dq->esize];
> > > +	memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
> > > +	memcpy(data + __RTE_QSBR_TOKEN_SIZE, e,
> > > +		dq->esize - __RTE_QSBR_TOKEN_SIZE);
> > > +	/* Check the status as enqueue might fail since the other thread
> > > +	 * might have used up the freed space.
> > > +	 * Enqueue uses the configured flags when the DQ was created.
> > > +	 */
> > > +	if (rte_ring_enqueue_elem(dq->r, data, dq->esize) != 0) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Enqueue failed\n", __func__);
> > > +		/* Note that the token generated above is not used.
> > > +		 * Other than wasting tokens, it should not cause any
> > > +		 * other issues.
> > > +		 */
> > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +			"%s(): Skipped enqueuing token = %"PRIu64"\n",
> > > +			__func__, token);
> > > +
> > > +		rte_errno = ENOSPC;
> > > +		return 1;
> > > +	}
> >
> >
> > Just as a thought: in theory if we'll use MP_HTS(/SP) ring we can avoid
> > wasting RCU tokens:
> >
> > if (rte_ring_enqueue_elem_bulk_start(dq->r, 1, NULL) != 0) {
> > 	token = rte_rcu_qsbr_start(dq->v);
> > 	memcpy(data, &token, __RTE_QSBR_TOKEN_SIZE);
> > 	rte_ring_enqueue_elem_finish(dq->r, data, dq->esize, 1); }
> >
> > Though it might slowdown things if we'll have a lot of parallel dq_enqueue.
> > So not sure is it worth it or not.
> Adding peek APIs for RTS would be better. That should take care of the parallel dw_enqueue. Not sure if I gave you the comment. My ring
> patch supported these APIs.

AFAIK, peek API is not possible for RTS mode.
Probably you are talking about Scatter-Gather API introduced in your RFC
(_reserve_; update ring entries manually; _commit_)?
Anyway, if there is no much value in my idea above, then feel free to drop it.

> 
> >
> > > +
> > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +		"%s(): Enqueued token = %"PRIu64"\n", __func__, token);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Reclaim resources from the defer queue. */ int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
> > > +				unsigned int *freed, unsigned int *pending) {
> > > +	uint32_t cnt;
> > > +	uint64_t token;
> > > +
> > > +	if (dq == NULL || n == 0) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno = EINVAL;
> > > +
> > > +		return 1;
> > > +	}
> > > +
> > > +	cnt = 0;
> > > +
> > > +	char e[dq->esize];
> > > +	/* Check reader threads quiescent state and reclaim resources */
> > > +	while ((cnt < n) &&
> > > +		(rte_ring_dequeue_bulk_elem_start(dq->r, e,
> > > +					dq->esize, 1, NULL) != 0)) {
> >
> > Another thought - any point to use burst_elem_start() here to retrieve more
> > then 1 elem in one go? Something like:
> I think it makes sense.
> 
> > char e[32][dq->size];
> > while ((cnt < n) {
> > 	k = RTE_MAX(32, cnt - n);
> > 	k = rte_ring_dequeue_burst_elem_start(dq->r, e, dq->esize, k, NULL);
> > 	if (k = 0)
> > 		break;
> > 	for (i = 0; i != k; i++) {
> > 		memcpy(&token, e[i], sizeof(uint64_t));
> > 		if (rte_rcu_qsbr_check(dq->v, token, false) != 1)
> > 			break;
> > 	}
> > 	k = i;
> > 	rte_ring_dequeue_elem_finish(dq->r, k);
> > 	for (i = 0; i != k; i++)
> > 		dq->free_fn(dq->p, e[i] + __RTE_QSBR_TOKEN_SIZE);
> I think it also makes sense to change the free_fn to take 'n' number of tokens.
> 
> > 	n += k;
> > 	if (k == 0)
> > 		break;
> >
> > ?
> > Also if at enqueue we guarantee strict ordrer (via
> > enqueue_start/enqueue_finish), then here we probably can do _check_ from
> > the last retrieved token here?
> > In theory that might help to minimize number of checks.
> > I.E. do:
> > for (i = k; i-- !=0; )  {
> > 	memcpy(&token, e[i], sizeof(uint64_t));
> > 	if (rte_rcu_qsbr_check(dq->v, token, false) != 1)
> There is a higher chance that later tokens are not acked. This introduces more polling of the counters.
> The rte_rcu_qsbr_check has an optimization. While acking the current token, it will also caches the greatest token acked. It uses the cached
> token for the subsequent calls. I think this provides a better optimization.

Ok. 


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs
  2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
                       ` (4 preceding siblings ...)
  2020-04-03 18:41     ` [dpdk-dev] [PATCH v4 0/4] " Honnappa Nagarahalli
@ 2020-04-22  3:30     ` Honnappa Nagarahalli
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource " Honnappa Nagarahalli
                         ` (4 more replies)
  5 siblings, 5 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-22  3:30 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin, dev
  Cc: honnappa.nagarahalli, david.marchand, ruifeng.wang, dharmik.thakkar, nd

v5
1) rte_rcu_dq_delete does not return an error on NULL parameter
2) Removed good amount of memcopy's where possible
3) Changed the call back function to be friendly for
   batching.

v4
1) RCU reclamation APIs changed to provide lot more flexibility
   a) The rte_rcu_qsbr_dq_enqueue and rte_rcu_qsbr_dq_reclaim APIs
      can be configured to be MT safe
   b) The auto reclamation limit and how much to reclaim
      can be configured
   c) rte_rcu_qsbr_dq_reclaim API returns the number of resources
      reclaimed and the number of pending resources on the defer
      queue
   d) rte_rcu_qsbr_dq_reclaim API takes maximum number of resources
      to reclaim as a parameter
2) Multiple minor fixes
   a) Private header file and test function names changed to remove 'rte_'
   b) Compilation for shared library
   c) Split the test cases into a separate commit
   d) Uses rte_ring_xxx_elem APIs to support flexible ring element size

v3
1) Separated from the original series
   (https://patches.dpdk.org/cover/58811/)
2) Added reclamation APIs and test cases (Stephen, Yipeng)

This is not a new patch. This patch set is separated from the LPM
changes as the size of the changes in RCU library has grown due
to comments from community. These APIs will help reduce the changes
in LPM and hash libraries that are getting integrated with RCU
library.

This adds 4 new APIs to RCU library to create a defer queue, enqueue
deleted resources, reclaim resources and delete the defer queue.

The rationale for the APIs is documented in 3/4.

Honnappa Nagarahalli (3):
  lib/rcu: add resource reclamation APIs
  test/rcu: test cases for RCU defer queue APIs
  lib/rcu: add additional debug logs

Ruifeng Wang (1):
  doc/rcu: add RCU integration design details

 app/test/test_rcu_qsbr.c           | 365 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/rcu_lib.rst  |  59 +++++
 lib/librte_rcu/Makefile            |   2 +-
 lib/librte_rcu/meson.build         |   7 +
 lib/librte_rcu/rcu_qsbr_pvt.h      |  66 ++++++
 lib/librte_rcu/rte_rcu_qsbr.c      | 227 +++++++++++++++++-
 lib/librte_rcu/rte_rcu_qsbr.h      | 203 +++++++++++++++-
 lib/librte_rcu/rte_rcu_version.map |   4 +
 lib/meson.build                    |   6 +-
 9 files changed, 931 insertions(+), 8 deletions(-)
 create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource reclamation APIs
  2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
@ 2020-04-22  3:30       ` Honnappa Nagarahalli
  2020-04-22  8:36         ` Ananyev, Konstantin
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 2/4] test/rcu: test cases for RCU defer queue APIs Honnappa Nagarahalli
                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-22  3:30 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin, dev
  Cc: honnappa.nagarahalli, david.marchand, ruifeng.wang, dharmik.thakkar, nd

Add resource reclamation APIs to make it simple for applications
and libraries to integrate rte_rcu library.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_rcu/Makefile            |   2 +-
 lib/librte_rcu/meson.build         |   7 +
 lib/librte_rcu/rcu_qsbr_pvt.h      |  66 +++++++++
 lib/librte_rcu/rte_rcu_qsbr.c      | 227 ++++++++++++++++++++++++++++-
 lib/librte_rcu/rte_rcu_qsbr.h      | 194 +++++++++++++++++++++++-
 lib/librte_rcu/rte_rcu_version.map |   4 +
 lib/meson.build                    |   6 +-
 7 files changed, 501 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h

diff --git a/lib/librte_rcu/Makefile b/lib/librte_rcu/Makefile
index 728669975..553bca2ef 100644
--- a/lib/librte_rcu/Makefile
+++ b/lib/librte_rcu/Makefile
@@ -7,7 +7,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_rcu.a
 
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
-LDLIBS += -lrte_eal
+LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_rcu_version.map
 
diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
index c009ae4b7..3eb2ace17 100644
--- a/lib/librte_rcu/meson.build
+++ b/lib/librte_rcu/meson.build
@@ -3,3 +3,10 @@
 
 sources = files('rte_rcu_qsbr.c')
 headers = files('rte_rcu_qsbr.h')
+
+# for clang 32-bit compiles we need libatomic for 64-bit atomic ops
+if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
+	ext_deps += cc.find_library('atomic')
+endif
+
+deps += ['ring']
diff --git a/lib/librte_rcu/rcu_qsbr_pvt.h b/lib/librte_rcu/rcu_qsbr_pvt.h
new file mode 100644
index 000000000..63f7a5fff
--- /dev/null
+++ b/lib/librte_rcu/rcu_qsbr_pvt.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2020 Arm Limited
+ */
+
+#ifndef _RTE_RCU_QSBR_PVT_H_
+#define _RTE_RCU_QSBR_PVT_H_
+
+/**
+ * This file is private to the RCU library. It should not be included
+ * by the user of this library.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring.h>
+#include <rte_ring_elem.h>
+
+#include "rte_rcu_qsbr.h"
+
+/* Defer queue structure.
+ * This structure holds the defer queue. The defer queue is used to
+ * hold the deleted entries from the data structure that are not
+ * yet freed.
+ */
+struct rte_rcu_qsbr_dq {
+	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
+	struct rte_ring *r;     /**< RCU QSBR defer queue. */
+	uint32_t size;
+	/**< Number of elements in the defer queue */
+	uint32_t esize;
+	/**< Size (in bytes) of data, including the token, stored on the
+	 *   defer queue.
+	 */
+	uint32_t trigger_reclaim_limit;
+	/**< Trigger automatic reclamation after the defer queue
+	 *   has atleast these many resources waiting.
+	 */
+	uint32_t max_reclaim_size;
+	/**< Reclaim at the max these many resources during auto
+	 *   reclamation.
+	 */
+	rte_rcu_qsbr_free_resource_t free_fn;
+	/**< Function to call to free the resource. */
+	void *p;
+	/**< Pointer passed to the free function. Typically, this is the
+	 *   pointer to the data structure to which the resource to free
+	 *   belongs.
+	 */
+};
+
+/* Internal structure to represent the element on the defer queue.
+ * Use alias as a character array is type casted to a variable
+ * of this structure type.
+ */
+typedef struct {
+	uint64_t token;  /**< Token */
+	uint8_t elem[0]; /**< Pointer to user element */
+} __attribute__((__may_alias__)) __rte_rcu_qsbr_dq_elem_t;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RCU_QSBR_PVT_H_ */
diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
index 2f3fad776..6a429d8b3 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.c
+++ b/lib/librte_rcu/rte_rcu_qsbr.c
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  *
- * Copyright (c) 2018 Arm Limited
+ * Copyright (c) 2018-2020 Arm Limited
  */
 
 #include <stdio.h>
@@ -18,8 +18,10 @@
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
 #include <rte_errno.h>
+#include <rte_ring_elem.h>
 
 #include "rte_rcu_qsbr.h"
+#include "rcu_qsbr_pvt.h"
 
 /* Get the memory size of QSBR variable */
 size_t
@@ -270,6 +272,229 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
 	return 0;
 }
 
+/* Create a queue used to store the data structure elements that can
+ * be freed later. This queue is referred to as 'defer queue'.
+ */
+struct rte_rcu_qsbr_dq *
+rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
+{
+	struct rte_rcu_qsbr_dq *dq;
+	uint32_t qs_fifo_size;
+	unsigned int flags;
+
+	if (params == NULL || params->free_fn == NULL ||
+		params->v == NULL || params->name == NULL ||
+		params->size == 0 || params->esize == 0 ||
+		(params->esize % 4 != 0)) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return NULL;
+	}
+	/* If auto reclamation is configured, reclaim limit
+	 * should be a valid value.
+	 */
+	if ((params->trigger_reclaim_limit <= params->size) &&
+	    (params->max_reclaim_size == 0)) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter, size = %u, trigger_reclaim_limit = %u, max_reclaim_size = %u\n",
+			__func__, params->size, params->trigger_reclaim_limit,
+			params->max_reclaim_size);
+		rte_errno = EINVAL;
+
+		return NULL;
+	}
+
+	dq = rte_zmalloc(NULL, sizeof(struct rte_rcu_qsbr_dq),
+			 RTE_CACHE_LINE_SIZE);
+	if (dq == NULL) {
+		rte_errno = ENOMEM;
+
+		return NULL;
+	}
+
+	/* Decide the flags for the ring.
+	 * If MT safety is requested, use RTS for ring enqueue as most
+	 * use cases involve dq-enqueue happening on the control plane.
+	 * Ring dequeue is always HTS due to the possibility of revert.
+	 */
+	flags = RING_F_MP_RTS_ENQ;
+	if (params->flags & RTE_RCU_QSBR_DQ_MT_UNSAFE)
+		flags = RING_F_SP_ENQ;
+	flags |= RING_F_MC_HTS_DEQ;
+	/* round up qs_fifo_size to next power of two that is not less than
+	 * max_size.
+	 */
+	qs_fifo_size = rte_align32pow2(params->size + 1);
+	/* Add token size to ring element size */
+	dq->r = rte_ring_create_elem(params->name,
+			__RTE_QSBR_TOKEN_SIZE + params->esize,
+			qs_fifo_size, SOCKET_ID_ANY, flags);
+	if (dq->r == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): defer queue create failed\n", __func__);
+		rte_free(dq);
+		return NULL;
+	}
+
+	dq->v = params->v;
+	dq->size = params->size;
+	dq->esize = __RTE_QSBR_TOKEN_SIZE + params->esize;
+	dq->trigger_reclaim_limit = params->trigger_reclaim_limit;
+	dq->max_reclaim_size = params->max_reclaim_size;
+	dq->free_fn = params->free_fn;
+	dq->p = params->p;
+
+	return dq;
+}
+
+/* Enqueue one resource to the defer queue to free after the grace
+ * period is over.
+ */
+int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
+{
+	__rte_rcu_qsbr_dq_elem_t *dq_elem;
+	uint32_t cur_size;
+
+	if (dq == NULL || e == NULL) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	char data[dq->esize];
+	dq_elem = (__rte_rcu_qsbr_dq_elem_t *)data;
+	/* Start the grace period */
+	dq_elem->token = rte_rcu_qsbr_start(dq->v);
+
+	/* Reclaim resources if the queue size has hit the reclaim
+	 * limit. This helps the queue from growing too large and
+	 * allows time for reader threads to report their quiescent state.
+	 */
+	cur_size = rte_ring_count(dq->r);
+	if (cur_size > dq->trigger_reclaim_limit) {
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Triggering reclamation\n", __func__);
+		rte_rcu_qsbr_dq_reclaim(dq, dq->max_reclaim_size,
+						NULL, NULL, NULL);
+	}
+
+	/* Enqueue the token and resource. Generating the token and
+	 * enqueuing (token + resource) on the queue is not an
+	 * atomic operation. When the defer queue is shared by multiple
+	 * writers, this might result in tokens enqueued out of order
+	 * on the queue. So, some tokens might wait longer than they
+	 * are required to be reclaimed.
+	 */
+	memcpy(dq_elem->elem, e, dq->esize - __RTE_QSBR_TOKEN_SIZE);
+	/* Check the status as enqueue might fail since the other threads
+	 * might have used up the freed space.
+	 * Enqueue uses the configured flags when the DQ was created.
+	 */
+	if (rte_ring_enqueue_elem(dq->r, data, dq->esize) != 0) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Enqueue failed\n", __func__);
+		/* Note that the token generated above is not used.
+		 * Other than wasting tokens, it should not cause any
+		 * other issues.
+		 */
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Skipped enqueuing token = %"PRIu64"\n",
+			__func__, dq_elem->token);
+
+		rte_errno = ENOSPC;
+		return 1;
+	}
+
+	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+		"%s(): Enqueued token = %"PRIu64"\n", __func__, dq_elem->token);
+
+	return 0;
+}
+
+/* Reclaim resources from the defer queue. */
+int
+rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
+			unsigned int *freed, unsigned int *pending,
+			unsigned int *available)
+{
+	uint32_t cnt;
+	__rte_rcu_qsbr_dq_elem_t *dq_elem;
+
+	if (dq == NULL || n == 0) {
+		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+		rte_errno = EINVAL;
+
+		return 1;
+	}
+
+	cnt = 0;
+
+	char data[dq->esize];
+	/* Check reader threads quiescent state and reclaim resources */
+	while (cnt < n &&
+		rte_ring_dequeue_bulk_elem_start(dq->r, &data,
+					dq->esize, 1, available) != 0) {
+		dq_elem = (__rte_rcu_qsbr_dq_elem_t *)data;
+
+		/* Reclaim the resource */
+		if (rte_rcu_qsbr_check(dq->v, dq_elem->token, false) != 1) {
+			rte_ring_dequeue_elem_finish(dq->r, 0);
+			break;
+		}
+		rte_ring_dequeue_elem_finish(dq->r, 1);
+
+		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+			"%s(): Reclaimed token = %"PRIu64"\n",
+			__func__, dq_elem->token);
+
+		dq->free_fn(dq->p, dq_elem->elem, 1);
+
+		cnt++;
+	}
+
+	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
+		"%s(): Reclaimed %u resources\n", __func__, cnt);
+
+	if (freed != NULL)
+		*freed = cnt;
+	if (pending != NULL)
+		*pending = rte_ring_count(dq->r);
+
+	return 0;
+}
+
+/* Delete a defer queue. */
+int
+rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
+{
+	unsigned int pending;
+
+	if (dq == NULL) {
+		rte_log(RTE_LOG_DEBUG, rte_rcu_log_type,
+			"%s(): Invalid input parameter\n", __func__);
+
+		return 0;
+	}
+
+	/* Reclaim all the resources */
+	rte_rcu_qsbr_dq_reclaim(dq, ~0, NULL, &pending, NULL);
+	if (pending != 0) {
+		rte_errno = EAGAIN;
+
+		return 1;
+	}
+
+	rte_ring_free(dq->r);
+	rte_free(dq);
+
+	return 0;
+}
+
 int rte_rcu_log_type;
 
 RTE_INIT(rte_rcu_register)
diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index 0b5585925..e2fc7f83e 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright (c) 2018 Arm Limited
+ * Copyright (c) 2018-2020 Arm Limited
  */
 
 #ifndef _RTE_RCU_QSBR_H_
@@ -34,6 +34,7 @@ extern "C" {
 #include <rte_lcore.h>
 #include <rte_debug.h>
 #include <rte_atomic.h>
+#include <rte_ring.h>
 
 extern int rte_rcu_log_type;
 
@@ -84,6 +85,7 @@ struct rte_rcu_qsbr_cnt {
 #define __RTE_QSBR_CNT_THR_OFFLINE 0
 #define __RTE_QSBR_CNT_INIT 1
 #define __RTE_QSBR_CNT_MAX ((uint64_t)~0)
+#define __RTE_QSBR_TOKEN_SIZE sizeof(uint64_t)
 
 /* RTE Quiescent State variable structure.
  * This structure has two elements that vary in size based on the
@@ -114,6 +116,86 @@ struct rte_rcu_qsbr {
 	 */
 } __rte_cache_aligned;
 
+/**
+ * Call back function called to free the resources.
+ *
+ * @param p
+ *   Pointer provided while creating the defer queue
+ * @param e
+ *   Pointer to the resource data stored on the defer queue
+ * @param n
+ *   Number of resources to free. Currently, this is set to 1.
+ *
+ * @return
+ *   None
+ */
+typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e, unsigned int n);
+
+#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
+
+/**
+ * Various flags supported.
+ */
+/**< Enqueue and reclaim operations are multi-thread safe by default.
+ *   The call back functions registered to free the resources are
+ *   assumed to be multi-thread safe.
+ *   Set this flag is multi-thread safety is not required.
+ */
+#define RTE_RCU_QSBR_DQ_MT_UNSAFE 1
+
+/**
+ * Parameters used when creating the defer queue.
+ */
+struct rte_rcu_qsbr_dq_parameters {
+	const char *name;
+	/**< Name of the queue. */
+	uint32_t flags;
+	/**< Flags to control API behaviors */
+	uint32_t size;
+	/**< Number of entries in queue. Typically, this will be
+	 *   the same as the maximum number of entries supported in the
+	 *   lock free data structure.
+	 *   Data structures with unbounded number of entries is not
+	 *   supported currently.
+	 */
+	uint32_t esize;
+	/**< Size (in bytes) of each element in the defer queue.
+	 *   This has to be multiple of 4B.
+	 */
+	uint32_t trigger_reclaim_limit;
+	/**< Trigger automatic reclamation after the defer queue
+	 *   has atleast these many resources waiting. This auto
+	 *   reclamation is triggered in rte_rcu_qsbr_dq_enqueue API
+	 *   call.
+	 *   If this is greater than 'size', auto reclamation is
+	 *   not triggered.
+	 *   If this is set to 0, auto reclamation is triggered
+	 *   in every call to rte_rcu_qsbr_dq_enqueue API.
+	 */
+	uint32_t max_reclaim_size;
+	/**< When automatic reclamation is enabled, reclaim at the max
+	 *   these many resources. This should contain a valid value, if
+	 *   auto reclamation is on. Setting this to 'size' or greater will
+	 *   reclaim all possible resources currently on the defer queue.
+	 */
+	rte_rcu_qsbr_free_resource_t free_fn;
+	/**< Function to call to free the resource. */
+	void *p;
+	/**< Pointer passed to the free function. Typically, this is the
+	 *   pointer to the data structure to which the resource to free
+	 *   belongs. This can be NULL.
+	 */
+	struct rte_rcu_qsbr *v;
+	/**< RCU QSBR variable to use for this defer queue */
+};
+
+/* RTE defer queue structure.
+ * This structure holds the defer queue. The defer queue is used to
+ * hold the deleted entries from the data structure that are not
+ * yet freed.
+ */
+struct rte_rcu_qsbr_dq;
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
@@ -692,6 +774,116 @@ __rte_experimental
 int
 rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Create a queue used to store the data structure elements that can
+ * be freed later. This queue is referred to as 'defer queue'.
+ *
+ * @param params
+ *   Parameters to create a defer queue.
+ * @return
+ *   On success - Valid pointer to defer queue
+ *   On error - NULL
+ *   Possible rte_errno codes are:
+ *   - EINVAL - NULL parameters are passed
+ *   - ENOMEM - Not enough memory
+ */
+__rte_experimental
+struct rte_rcu_qsbr_dq *
+rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enqueue one resource to the defer queue and start the grace period.
+ * The resource will be freed later after at least one grace period
+ * is over.
+ *
+ * If the defer queue is full, it will attempt to reclaim resources.
+ * It will also reclaim resources at regular intervals to avoid
+ * the defer queue from growing too big.
+ *
+ * Multi-thread safety is provided as the defer queue configuration.
+ * When multi-thread safety is requested, it is possible that the
+ * resources are not stored in their order of deletion. This results
+ * in resources being held in the defer queue longer than they should.
+ *
+ * @param dq
+ *   Defer queue to allocate an entry from.
+ * @param e
+ *   Pointer to resource data to copy to the defer queue. The size of
+ *   the data to copy is equal to the element size provided when the
+ *   defer queue was created.
+ * @return
+ *   On success - 0
+ *   On error - 1 with rte_errno set to
+ *   - EINVAL - NULL parameters are passed
+ *   - ENOSPC - Defer queue is full. This condition can not happen
+ *		if the defer queue size is equal (or larger) than the
+ *		number of elements in the data structure.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free quesed resources from the defer queue.
+ *
+ * This API is multi-thread safe.
+ *
+ * @param dq
+ *   Defer queue to free an entry from.
+ * @param n
+ *   Maximum number of resources to free.
+ * @param freed
+ *   Number of resources that were freed.
+ * @param pending
+ *   Number of resources pending on the defer queue. This number might not
+ *   be acurate if multi-thread safety is configured.
+ * @param available
+ *   Number of resources that can be added to the defer queue.
+ *   This number might not be acurate if multi-thread safety is configured.
+ * @return
+ *   On successful reclamation of at least 1 resource - 0
+ *   On error - 1 with rte_errno set to
+ *   - EINVAL - NULL parameters are passed
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
+	unsigned int *freed, unsigned int *pending, unsigned int *available);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Delete a defer queue.
+ *
+ * It tries to reclaim all the resources on the defer queue.
+ * If any of the resources have not completed the grace period
+ * the reclamation stops and returns immediately. The rest of
+ * the resources are not reclaimed and the defer queue is not
+ * freed.
+ *
+ * @param dq
+ *   Defer queue to delete.
+ * @return
+ *   On success - 0
+ *   On error - 1
+ *   Possible rte_errno codes are:
+ *   - EAGAIN - Some of the resources have not completed at least 1 grace
+ *		period, try again.
+ */
+__rte_experimental
+int
+rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
index f8b9ef2ab..dfac88a37 100644
--- a/lib/librte_rcu/rte_rcu_version.map
+++ b/lib/librte_rcu/rte_rcu_version.map
@@ -8,6 +8,10 @@ EXPERIMENTAL {
 	rte_rcu_qsbr_synchronize;
 	rte_rcu_qsbr_thread_register;
 	rte_rcu_qsbr_thread_unregister;
+	rte_rcu_qsbr_dq_create;
+	rte_rcu_qsbr_dq_enqueue;
+	rte_rcu_qsbr_dq_reclaim;
+	rte_rcu_qsbr_dq_delete;
 
 	local: *;
 };
diff --git a/lib/meson.build b/lib/meson.build
index 63c17ee75..c28b8df83 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -11,7 +11,9 @@
 libraries = [
 	'kvargs', # eal depends on kvargs
 	'eal', # everything depends on eal
-	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
+	'ring',
+	'rcu', # rcu depends on ring
+	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
 	'cmdline',
 	'metrics', # bitrate/latency stats depends on this
 	'hash',    # efd depends on this
@@ -22,7 +24,7 @@ libraries = [
 	'gro', 'gso', 'ip_frag', 'jobstats',
 	'kni', 'latencystats', 'lpm', 'member',
 	'power', 'pdump', 'rawdev',
-	'rcu', 'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
+	'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
 	# ipsec lib depends on net, crypto and security
 	'ipsec',
 	#fib lib depends on rib
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 2/4] test/rcu: test cases for RCU defer queue APIs
  2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource " Honnappa Nagarahalli
@ 2020-04-22  3:30       ` Honnappa Nagarahalli
  2020-04-22  8:27         ` Ananyev, Konstantin
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 3/4] doc/rcu: add RCU integration design details Honnappa Nagarahalli
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-22  3:30 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin, dev
  Cc: honnappa.nagarahalli, david.marchand, ruifeng.wang, dharmik.thakkar, nd

Add test cases for RCU defer queue APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_rcu_qsbr.c | 365 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 363 insertions(+), 2 deletions(-)

diff --git a/app/test/test_rcu_qsbr.c b/app/test/test_rcu_qsbr.c
index b60dc5099..ae73bdab8 100644
--- a/app/test/test_rcu_qsbr.c
+++ b/app/test/test_rcu_qsbr.c
@@ -1,8 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright (c) 2018 Arm Limited
+ * Copyright (c) 2019-2020 Arm Limited
  */
 
 #include <stdio.h>
+#include <string.h>
 #include <rte_pause.h>
 #include <rte_rcu_qsbr.h>
 #include <rte_hash.h>
@@ -15,7 +16,8 @@
 #include "test.h"
 
 /* Check condition and return an error if true. */
-#define TEST_RCU_QSBR_RETURN_IF_ERROR(cond, str, ...) do { \
+#define TEST_RCU_QSBR_RETURN_IF_ERROR(cond, str, ...) \
+do { \
 	if (cond) { \
 		printf("ERROR file %s, line %d: " str "\n", __FILE__, \
 			__LINE__, ##__VA_ARGS__); \
@@ -23,6 +25,16 @@
 	} \
 } while (0)
 
+/* Check condition and go to label if true. */
+#define TEST_RCU_QSBR_GOTO_IF_ERROR(label, cond, str, ...) \
+do { \
+	if (cond) { \
+		printf("ERROR file %s, line %d: " str "\n", __FILE__, \
+			__LINE__, ##__VA_ARGS__); \
+		goto label; \
+	} \
+} while (0)
+
 /* Make sure that this has the same value as __RTE_QSBR_CNT_INIT */
 #define TEST_RCU_QSBR_CNT_INIT 1
 
@@ -34,6 +46,7 @@ static uint32_t *keys;
 #define COUNTER_VALUE 4096
 static uint32_t *hash_data[RTE_MAX_LCORE][TOTAL_ENTRY];
 static uint8_t writer_done;
+static uint8_t cb_failed;
 
 static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
 static struct rte_hash *h[RTE_MAX_LCORE];
@@ -585,6 +598,330 @@ test_rcu_qsbr_thread_offline(void)
 	return 0;
 }
 
+static void
+test_rcu_qsbr_free_resource1(void *p, void *e, unsigned int n)
+{
+	if (p != NULL || e != NULL || n != 1) {
+		printf("%s: Test failed\n", __func__);
+		cb_failed = 1;
+	}
+}
+
+static void
+test_rcu_qsbr_free_resource2(void *p, void *e, unsigned int n)
+{
+	if (p != NULL || e == NULL || n != 1) {
+		printf("%s: Test failed\n", __func__);
+		cb_failed = 1;
+	}
+}
+
+/*
+ * rte_rcu_qsbr_dq_create: create a queue used to store the data structure
+ * elements that can be freed later. This queue is referred to as 'defer queue'.
+ */
+static int
+test_rcu_qsbr_dq_create(void)
+{
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_create()\n");
+
+	/* Pass invalid parameters */
+	dq = rte_rcu_qsbr_dq_create(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.free_fn = test_rcu_qsbr_free_resource1;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.size = 1;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.esize = 3;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = 0;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq != NULL), "dq create invalid params");
+
+	/* Pass all valid parameters */
+	params.esize = 16;
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = params.size;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	rte_rcu_qsbr_dq_delete(dq);
+
+	params.esize = 16;
+	params.flags = RTE_RCU_QSBR_DQ_MT_UNSAFE;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	rte_rcu_qsbr_dq_delete(dq);
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
+ * to be freed later after atleast one grace period is over.
+ */
+static int
+test_rcu_qsbr_dq_enqueue(void)
+{
+	int ret;
+	uint64_t r;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_enqueue()\n");
+
+	/* Create a queue with simple parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.free_fn = test_rcu_qsbr_free_resource1;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 16;
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = params.size;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_enqueue(NULL, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_enqueue(dq, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_enqueue(NULL, &r);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq enqueue invalid params");
+
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 1), "dq delete valid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_reclaim: Reclaim resources from the defer queue.
+ */
+static int
+test_rcu_qsbr_dq_reclaim(void)
+{
+	int ret;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_reclaim()\n");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_reclaim(NULL, 10, NULL, NULL, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
+
+	/* Pass invalid parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.free_fn = test_rcu_qsbr_free_resource1;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 3;
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = params.size;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	ret = rte_rcu_qsbr_dq_reclaim(dq, 0, NULL, NULL, NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 1), "dq reclaim invalid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_delete: Delete a defer queue.
+ */
+static int
+test_rcu_qsbr_dq_delete(void)
+{
+	int ret;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+
+	printf("\nTest rte_rcu_qsbr_dq_delete()\n");
+
+	/* Pass invalid parameters */
+	ret = rte_rcu_qsbr_dq_delete(NULL);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete invalid params");
+
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.free_fn = test_rcu_qsbr_free_resource1;
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+	params.v = t[0];
+	params.size = 1;
+	params.esize = 16;
+	params.trigger_reclaim_limit = 0;
+	params.max_reclaim_size = params.size;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+
+	return 0;
+}
+
+/*
+ * rte_rcu_qsbr_dq_enqueue: enqueue one resource to the defer queue,
+ * to be freed later after atleast one grace period is over.
+ */
+static int
+test_rcu_qsbr_dq_functional(int32_t size, int32_t esize, uint32_t flags)
+{
+	int i, j, ret;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params;
+	struct rte_rcu_qsbr_dq *dq;
+	uint64_t *e;
+	uint64_t sc = 200;
+	int max_entries;
+
+	printf("\nTest rte_rcu_qsbr_dq_xxx functional tests()\n");
+	printf("Size = %d, esize = %d, flags = 0x%x\n", size, esize, flags);
+
+	e = (uint64_t *)rte_zmalloc(NULL, esize, RTE_CACHE_LINE_SIZE);
+	if (e == NULL)
+		return 0;
+	cb_failed = 0;
+
+	/* Initialize the RCU variable. No threads are registered */
+	rte_rcu_qsbr_init(t[0], RTE_MAX_LCORE);
+
+	/* Create a queue with simple parameters */
+	memset(&params, 0, sizeof(struct rte_rcu_qsbr_dq_parameters));
+	snprintf(rcu_dq_name, sizeof(rcu_dq_name), "TEST_RCU");
+	params.name = rcu_dq_name;
+	params.flags = flags;
+	params.free_fn = test_rcu_qsbr_free_resource2;
+	params.v = t[0];
+	params.size = size;
+	params.esize = esize;
+	params.trigger_reclaim_limit = size >> 3;
+	params.max_reclaim_size = (size >> 4)?(size >> 4):1;
+	dq = rte_rcu_qsbr_dq_create(&params);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((dq == NULL), "dq create valid params");
+
+	/* Given the size calculate the maximum number of entries
+	 * that can be stored on the defer queue (look at the logic used
+	 * in capacity calculation of rte_ring).
+	 */
+	max_entries = rte_align32pow2(size + 1) - 1;
+	printf("max_entries = %d\n", max_entries);
+
+	/* Enqueue few counters starting with the value 'sc' */
+	/* The queue size will be rounded up to 2. The enqueue API also
+	 * reclaims if the queue size is above certain limit. Since, there
+	 * are no threads registered, reclamation succedes. Hence, it should
+	 * be possible to enqueue more than the provided queue size.
+	 */
+	for (i = 0; i < 10; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret != 0),
+			"dq enqueue functional, i = %d", i);
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Validate that call back function did not return any error */
+	TEST_RCU_QSBR_GOTO_IF_ERROR(end, (cb_failed == 1), "CB failed");
+
+	/* Register a thread on the RCU QSBR variable. Reclamation will not
+	 * succeed. It should not be possible to enqueue more than the size
+	 * number of resources.
+	 */
+	rte_rcu_qsbr_thread_register(t[0], 1);
+	rte_rcu_qsbr_thread_online(t[0], 1);
+
+	for (i = 0; i < max_entries; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret != 0),
+			"dq enqueue functional, max_entries = %d, i = %d",
+			max_entries, i);
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Enqueue fails as queue is full */
+	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+	TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret == 0), "defer queue is not full");
+
+	/* Delete should fail as there are elements in defer queue which
+	 * cannot be reclaimed.
+	 */
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret == 0), "dq delete valid params");
+
+	/* Report quiescent state, enqueue should succeed */
+	rte_rcu_qsbr_quiescent(t[0], 1);
+	for (i = 0; i < max_entries; i++) {
+		ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+		TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret != 0),
+			"dq enqueue functional");
+		for (j = 0; j < esize/8; j++)
+			e[j] = sc++;
+	}
+
+	/* Validate that call back function did not return any error */
+	TEST_RCU_QSBR_GOTO_IF_ERROR(end, (cb_failed == 1), "CB failed");
+
+	/* Queue is full */
+	ret = rte_rcu_qsbr_dq_enqueue(dq, e);
+	TEST_RCU_QSBR_GOTO_IF_ERROR(end, (ret == 0), "defer queue is not full");
+
+	/* Report quiescent state, delete should succeed */
+	rte_rcu_qsbr_quiescent(t[0], 1);
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+
+	rte_free(e);
+
+	/* Validate that call back function did not return any error */
+	TEST_RCU_QSBR_RETURN_IF_ERROR((cb_failed == 1), "CB failed");
+
+	return 0;
+
+end:
+	rte_free(e);
+	ret = rte_rcu_qsbr_dq_delete(dq);
+	TEST_RCU_QSBR_RETURN_IF_ERROR((ret != 0), "dq delete valid params");
+	return -1;
+}
+
 /*
  * rte_rcu_qsbr_dump: Dump status of a single QS variable to a file
  */
@@ -1028,6 +1365,18 @@ test_rcu_qsbr_main(void)
 	if (test_rcu_qsbr_thread_offline() < 0)
 		goto test_fail;
 
+	if (test_rcu_qsbr_dq_create() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_reclaim() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_delete() < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_enqueue() < 0)
+		goto test_fail;
+
 	printf("\nFunctional tests\n");
 
 	if (test_rcu_qsbr_sw_sv_3qs() < 0)
@@ -1036,6 +1385,18 @@ test_rcu_qsbr_main(void)
 	if (test_rcu_qsbr_mw_mv_mqs() < 0)
 		goto test_fail;
 
+	if (test_rcu_qsbr_dq_functional(1, 8, 0) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(2, 8, RTE_RCU_QSBR_DQ_MT_UNSAFE) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(303, 16, 0) < 0)
+		goto test_fail;
+
+	if (test_rcu_qsbr_dq_functional(7, 128, RTE_RCU_QSBR_DQ_MT_UNSAFE) < 0)
+		goto test_fail;
+
 	free_rcu();
 
 	printf("\n");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 3/4] doc/rcu: add RCU integration design details
  2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource " Honnappa Nagarahalli
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 2/4] test/rcu: test cases for RCU defer queue APIs Honnappa Nagarahalli
@ 2020-04-22  3:30       ` Honnappa Nagarahalli
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 4/4] lib/rcu: add additional debug logs Honnappa Nagarahalli
  2020-04-22 18:46       ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs David Marchand
  4 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-22  3:30 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin, dev
  Cc: honnappa.nagarahalli, david.marchand, ruifeng.wang, dharmik.thakkar, nd

From: Ruifeng Wang <ruifeng.wang@arm.com>

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 doc/guides/prog_guide/rcu_lib.rst | 59 +++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst
index 9b0bf138f..60468f143 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -190,3 +190,62 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Resource reclamation framework for DPDK
+---------------------------------------
+
+Lock-free algorithms place additional burden of resource reclamation on
+the application. When a writer deletes an entry from a data structure, the writer:
+
+#. Has to start the grace period
+#. Has to store a reference to the deleted resources in a FIFO
+#. Should check if the readers have completed a grace period and free the resources.
+
+There are several APIs provided to help with this process. The writer
+can create a FIFO to store the references to deleted resources using ``rte_rcu_qsbr_dq_create()``.
+The resources can be enqueued to this FIFO using ``rte_rcu_qsbr_dq_enqueue()``.
+If the FIFO is full, ``rte_rcu_qsbr_dq_enqueue`` will reclaim the resources before enqueuing. It will also reclaim resources on regular basis to keep the FIFO from growing too large. If the writer runs out of resources, the writer can call ``rte_rcu_qsbr_dq_reclaim`` API to reclaim resources. ``rte_rcu_qsbr_dq_delete`` is provided to reclaim any remaining resources and free the FIFO while shutting down.
+
+However, if this resource reclamation process were to be integrated in lock-free data structure libraries, it
+hides this complexity from the application and makes it easier for the application to adopt lock-free algorithms. The following paragraphs discuss how the reclamation process can be integrated in DPDK libraries.
+
+In any DPDK application, the resource reclamation process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to lock-free data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
+
+The application has to handle 'Initialization' and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader threads to report their quiescent state.
+* the application has to register the same RCU variable with the client library.
+* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
+
+The client library will handle 'Reclaiming Resources' part of the process. The
+client libraries will make use of the writer thread context to execute the memory
+reclamation algorithm. So,
+
+* client library should provide an API to register a RCU variable that it will use. It should call ``rte_rcu_qsbr_dq_create()`` to create the FIFO to store the references to deleted entries.
+* client library should use ``rte_rcu_qsbr_dq_enqueue`` to enqueue the deleted resources on the FIFO and start the grace period.
+* if the library runs out of resources while adding entries, it should call ``rte_rcu_qsbr_dq_reclaim`` to reclaim the resources and try the resource allocation again.
+
+The 'Shutdown' process needs to be shared between the application and the
+client library.
+
+* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function.
+
+* client library should call ``rte_rcu_qsbr_dq_delete`` to reclaim any remaining resources and free the FIFO.
+
+Integrating the resource reclamation with client libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclamation happens as part of the writer thread with little impact on
+   performance.
+#. The client library has better control over the resources. For ex: the client
+   library can attempt to reclaim when it has run out of resources.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 4/4] lib/rcu: add additional debug logs
  2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
                         ` (2 preceding siblings ...)
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 3/4] doc/rcu: add RCU integration design details Honnappa Nagarahalli
@ 2020-04-22  3:30       ` Honnappa Nagarahalli
  2020-04-22  8:25         ` Ananyev, Konstantin
  2020-04-22 18:46       ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs David Marchand
  4 siblings, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-22  3:30 UTC (permalink / raw)
  To: konstantin.ananyev, stephen, vladimir.medvedkin, dev
  Cc: honnappa.nagarahalli, david.marchand, ruifeng.wang, dharmik.thakkar, nd

Added additional debug logs. These helped in debugging RCU
defer APIs.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_rcu/rte_rcu_qsbr.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
index e2fc7f83e..9524915fa 100644
--- a/lib/librte_rcu/rte_rcu_qsbr.h
+++ b/lib/librte_rcu/rte_rcu_qsbr.h
@@ -718,8 +718,15 @@ rte_rcu_qsbr_check(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
 	RTE_ASSERT(v != NULL);
 
 	/* Check if all the readers have already acknowledged this token */
-	if (likely(t <= v->acked_token))
+	if (likely(t <= v->acked_token)) {
+		__RTE_RCU_DP_LOG(DEBUG,
+			"%s: check: token = %"PRIu64", wait = %d",
+			__func__, t, wait);
+		__RTE_RCU_DP_LOG(DEBUG,
+			"%s: status: least acked token = %"PRIu64"",
+			__func__, v->acked_token);
 		return 1;
+	}
 
 	if (likely(v->num_threads == v->max_threads))
 		return __rte_rcu_qsbr_check_all(v, t, wait);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 4/4] lib/rcu: add additional debug logs
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 4/4] lib/rcu: add additional debug logs Honnappa Nagarahalli
@ 2020-04-22  8:25         ` Ananyev, Konstantin
  0 siblings, 0 replies; 137+ messages in thread
From: Ananyev, Konstantin @ 2020-04-22  8:25 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, Medvedkin, Vladimir, dev
  Cc: david.marchand, ruifeng.wang, dharmik.thakkar, nd


> Added additional debug logs. These helped in debugging RCU
> defer APIs.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_rcu/rte_rcu_qsbr.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index e2fc7f83e..9524915fa 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -718,8 +718,15 @@ rte_rcu_qsbr_check(struct rte_rcu_qsbr *v, uint64_t t, bool wait)
>  	RTE_ASSERT(v != NULL);
> 
>  	/* Check if all the readers have already acknowledged this token */
> -	if (likely(t <= v->acked_token))
> +	if (likely(t <= v->acked_token)) {
> +		__RTE_RCU_DP_LOG(DEBUG,
> +			"%s: check: token = %"PRIu64", wait = %d",
> +			__func__, t, wait);
> +		__RTE_RCU_DP_LOG(DEBUG,
> +			"%s: status: least acked token = %"PRIu64"",
> +			__func__, v->acked_token);
>  		return 1;
> +	}
> 
>  	if (likely(v->num_threads == v->max_threads))
>  		return __rte_rcu_qsbr_check_all(v, t, wait);
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 2/4] test/rcu: test cases for RCU defer queue APIs
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 2/4] test/rcu: test cases for RCU defer queue APIs Honnappa Nagarahalli
@ 2020-04-22  8:27         ` Ananyev, Konstantin
  0 siblings, 0 replies; 137+ messages in thread
From: Ananyev, Konstantin @ 2020-04-22  8:27 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, Medvedkin, Vladimir, dev
  Cc: david.marchand, ruifeng.wang, dharmik.thakkar, nd


> Add test cases for RCU defer queue APIs.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  app/test/test_rcu_qsbr.c | 365 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 363 insertions(+), 2 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource reclamation APIs
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource " Honnappa Nagarahalli
@ 2020-04-22  8:36         ` Ananyev, Konstantin
  2020-04-22  8:42           ` David Marchand
  0 siblings, 1 reply; 137+ messages in thread
From: Ananyev, Konstantin @ 2020-04-22  8:36 UTC (permalink / raw)
  To: Honnappa Nagarahalli, stephen, Medvedkin, Vladimir, dev
  Cc: david.marchand, ruifeng.wang, dharmik.thakkar, nd

> Add resource reclamation APIs to make it simple for applications
> and libraries to integrate rte_rcu library.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_rcu/Makefile            |   2 +-
>  lib/librte_rcu/meson.build         |   7 +
>  lib/librte_rcu/rcu_qsbr_pvt.h      |  66 +++++++++
>  lib/librte_rcu/rte_rcu_qsbr.c      | 227 ++++++++++++++++++++++++++++-
>  lib/librte_rcu/rte_rcu_qsbr.h      | 194 +++++++++++++++++++++++-
>  lib/librte_rcu/rte_rcu_version.map |   4 +
>  lib/meson.build                    |   6 +-
>  7 files changed, 501 insertions(+), 5 deletions(-)
>  create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h
> 
> diff --git a/lib/librte_rcu/Makefile b/lib/librte_rcu/Makefile
> index 728669975..553bca2ef 100644
> --- a/lib/librte_rcu/Makefile
> +++ b/lib/librte_rcu/Makefile
> @@ -7,7 +7,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  LIB = librte_rcu.a
> 
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> -LDLIBS += -lrte_eal
> +LDLIBS += -lrte_eal -lrte_ring
> 
>  EXPORT_MAP := rte_rcu_version.map
> 
> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> index c009ae4b7..3eb2ace17 100644
> --- a/lib/librte_rcu/meson.build
> +++ b/lib/librte_rcu/meson.build
> @@ -3,3 +3,10 @@
> 
>  sources = files('rte_rcu_qsbr.c')
>  headers = files('rte_rcu_qsbr.h')
> +
> +# for clang 32-bit compiles we need libatomic for 64-bit atomic ops
> +if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> +	ext_deps += cc.find_library('atomic')
> +endif
> +

As a nit - as Pavan patch is already integrated into mainline,
this is not necessary any more, I think.

Also noticed that most of make builds failed due to dependency problem:
http://mails.dpdk.org/archives/test-report/2020-April/127765.html
I can't reproduce it locally, but my guess that we need to move rcu above
ring in this mk file: mk/rte.app.mk
Probably something like that:
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index da12b9eec..8e5d023de 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -91,13 +91,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_MEMPOOL)        += -lrte_mempool
 _LDLIBS-$(CONFIG_RTE_LIBRTE_STACK)          += -lrte_stack
 _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_RING)   += -lrte_mempool_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_OCTEONTX2_MEMPOOL) += -lrte_mempool_octeontx2
+_LDLIBS-$(CONFIG_RTE_LIBRTE_RCU)            += -lrte_rcu
 _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PCI)            += -lrte_pci
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
 _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)          += -lrte_sched
-_LDLIBS-$(CONFIG_RTE_LIBRTE_RCU)            += -lrte_rcu

Apart from that - LGTM
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> +deps += ['ring']
> diff --git a/lib/librte_rcu/rcu_qsbr_pvt.h b/lib/librte_rcu/rcu_qsbr_pvt.h
> new file mode 100644
> index 000000000..63f7a5fff
> --- /dev/null
> +++ b/lib/librte_rcu/rcu_qsbr_pvt.h
> @@ -0,0 +1,66 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2020 Arm Limited
> + */
> +
> +#ifndef _RTE_RCU_QSBR_PVT_H_
> +#define _RTE_RCU_QSBR_PVT_H_
> +
> +/**
> + * This file is private to the RCU library. It should not be included
> + * by the user of this library.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_ring.h>
> +#include <rte_ring_elem.h>
> +
> +#include "rte_rcu_qsbr.h"
> +
> +/* Defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq {
> +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*/
> +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> +	uint32_t size;
> +	/**< Number of elements in the defer queue */
> +	uint32_t esize;
> +	/**< Size (in bytes) of data, including the token, stored on the
> +	 *   defer queue.
> +	 */
> +	uint32_t trigger_reclaim_limit;
> +	/**< Trigger automatic reclamation after the defer queue
> +	 *   has atleast these many resources waiting.
> +	 */
> +	uint32_t max_reclaim_size;
> +	/**< Reclaim at the max these many resources during auto
> +	 *   reclamation.
> +	 */
> +	rte_rcu_qsbr_free_resource_t free_fn;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs.
> +	 */
> +};
> +
> +/* Internal structure to represent the element on the defer queue.
> + * Use alias as a character array is type casted to a variable
> + * of this structure type.
> + */
> +typedef struct {
> +	uint64_t token;  /**< Token */
> +	uint8_t elem[0]; /**< Pointer to user element */
> +} __attribute__((__may_alias__)) __rte_rcu_qsbr_dq_elem_t;
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.c b/lib/librte_rcu/rte_rcu_qsbr.c
> index 2f3fad776..6a429d8b3 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.c
> +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> @@ -1,6 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   *
> - * Copyright (c) 2018 Arm Limited
> + * Copyright (c) 2018-2020 Arm Limited
>   */
> 
>  #include <stdio.h>
> @@ -18,8 +18,10 @@
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
>  #include <rte_errno.h>
> +#include <rte_ring_elem.h>
> 
>  #include "rte_rcu_qsbr.h"
> +#include "rcu_qsbr_pvt.h"
> 
>  /* Get the memory size of QSBR variable */
>  size_t
> @@ -270,6 +272,229 @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
>  	return 0;
>  }
> 
> +/* Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + */
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params)
> +{
> +	struct rte_rcu_qsbr_dq *dq;
> +	uint32_t qs_fifo_size;
> +	unsigned int flags;
> +
> +	if (params == NULL || params->free_fn == NULL ||
> +		params->v == NULL || params->name == NULL ||
> +		params->size == 0 || params->esize == 0 ||
> +		(params->esize % 4 != 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +	/* If auto reclamation is configured, reclaim limit
> +	 * should be a valid value.
> +	 */
> +	if ((params->trigger_reclaim_limit <= params->size) &&
> +	    (params->max_reclaim_size == 0)) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter, size = %u, trigger_reclaim_limit = %u, max_reclaim_size = %u\n",
> +			__func__, params->size, params->trigger_reclaim_limit,
> +			params->max_reclaim_size);
> +		rte_errno = EINVAL;
> +
> +		return NULL;
> +	}
> +
> +	dq = rte_zmalloc(NULL, sizeof(struct rte_rcu_qsbr_dq),
> +			 RTE_CACHE_LINE_SIZE);
> +	if (dq == NULL) {
> +		rte_errno = ENOMEM;
> +
> +		return NULL;
> +	}
> +
> +	/* Decide the flags for the ring.
> +	 * If MT safety is requested, use RTS for ring enqueue as most
> +	 * use cases involve dq-enqueue happening on the control plane.
> +	 * Ring dequeue is always HTS due to the possibility of revert.
> +	 */
> +	flags = RING_F_MP_RTS_ENQ;
> +	if (params->flags & RTE_RCU_QSBR_DQ_MT_UNSAFE)
> +		flags = RING_F_SP_ENQ;
> +	flags |= RING_F_MC_HTS_DEQ;
> +	/* round up qs_fifo_size to next power of two that is not less than
> +	 * max_size.
> +	 */
> +	qs_fifo_size = rte_align32pow2(params->size + 1);
> +	/* Add token size to ring element size */
> +	dq->r = rte_ring_create_elem(params->name,
> +			__RTE_QSBR_TOKEN_SIZE + params->esize,
> +			qs_fifo_size, SOCKET_ID_ANY, flags);
> +	if (dq->r == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): defer queue create failed\n", __func__);
> +		rte_free(dq);
> +		return NULL;
> +	}
> +
> +	dq->v = params->v;
> +	dq->size = params->size;
> +	dq->esize = __RTE_QSBR_TOKEN_SIZE + params->esize;
> +	dq->trigger_reclaim_limit = params->trigger_reclaim_limit;
> +	dq->max_reclaim_size = params->max_reclaim_size;
> +	dq->free_fn = params->free_fn;
> +	dq->p = params->p;
> +
> +	return dq;
> +}
> +
> +/* Enqueue one resource to the defer queue to free after the grace
> + * period is over.
> + */
> +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e)
> +{
> +	__rte_rcu_qsbr_dq_elem_t *dq_elem;
> +	uint32_t cur_size;
> +
> +	if (dq == NULL || e == NULL) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	char data[dq->esize];
> +	dq_elem = (__rte_rcu_qsbr_dq_elem_t *)data;
> +	/* Start the grace period */
> +	dq_elem->token = rte_rcu_qsbr_start(dq->v);
> +
> +	/* Reclaim resources if the queue size has hit the reclaim
> +	 * limit. This helps the queue from growing too large and
> +	 * allows time for reader threads to report their quiescent state.
> +	 */
> +	cur_size = rte_ring_count(dq->r);
> +	if (cur_size > dq->trigger_reclaim_limit) {
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Triggering reclamation\n", __func__);
> +		rte_rcu_qsbr_dq_reclaim(dq, dq->max_reclaim_size,
> +						NULL, NULL, NULL);
> +	}
> +
> +	/* Enqueue the token and resource. Generating the token and
> +	 * enqueuing (token + resource) on the queue is not an
> +	 * atomic operation. When the defer queue is shared by multiple
> +	 * writers, this might result in tokens enqueued out of order
> +	 * on the queue. So, some tokens might wait longer than they
> +	 * are required to be reclaimed.
> +	 */
> +	memcpy(dq_elem->elem, e, dq->esize - __RTE_QSBR_TOKEN_SIZE);
> +	/* Check the status as enqueue might fail since the other threads
> +	 * might have used up the freed space.
> +	 * Enqueue uses the configured flags when the DQ was created.
> +	 */
> +	if (rte_ring_enqueue_elem(dq->r, data, dq->esize) != 0) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Enqueue failed\n", __func__);
> +		/* Note that the token generated above is not used.
> +		 * Other than wasting tokens, it should not cause any
> +		 * other issues.
> +		 */
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Skipped enqueuing token = %"PRIu64"\n",
> +			__func__, dq_elem->token);
> +
> +		rte_errno = ENOSPC;
> +		return 1;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Enqueued token = %"PRIu64"\n", __func__, dq_elem->token);
> +
> +	return 0;
> +}
> +
> +/* Reclaim resources from the defer queue. */
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
> +			unsigned int *freed, unsigned int *pending,
> +			unsigned int *available)
> +{
> +	uint32_t cnt;
> +	__rte_rcu_qsbr_dq_elem_t *dq_elem;
> +
> +	if (dq == NULL || n == 0) {
> +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +		rte_errno = EINVAL;
> +
> +		return 1;
> +	}
> +
> +	cnt = 0;
> +
> +	char data[dq->esize];
> +	/* Check reader threads quiescent state and reclaim resources */
> +	while (cnt < n &&
> +		rte_ring_dequeue_bulk_elem_start(dq->r, &data,
> +					dq->esize, 1, available) != 0) {
> +		dq_elem = (__rte_rcu_qsbr_dq_elem_t *)data;
> +
> +		/* Reclaim the resource */
> +		if (rte_rcu_qsbr_check(dq->v, dq_elem->token, false) != 1) {
> +			rte_ring_dequeue_elem_finish(dq->r, 0);
> +			break;
> +		}
> +		rte_ring_dequeue_elem_finish(dq->r, 1);
> +
> +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +			"%s(): Reclaimed token = %"PRIu64"\n",
> +			__func__, dq_elem->token);
> +
> +		dq->free_fn(dq->p, dq_elem->elem, 1);
> +
> +		cnt++;
> +	}
> +
> +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> +
> +	if (freed != NULL)
> +		*freed = cnt;
> +	if (pending != NULL)
> +		*pending = rte_ring_count(dq->r);
> +
> +	return 0;
> +}
> +
> +/* Delete a defer queue. */
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq)
> +{
> +	unsigned int pending;
> +
> +	if (dq == NULL) {
> +		rte_log(RTE_LOG_DEBUG, rte_rcu_log_type,
> +			"%s(): Invalid input parameter\n", __func__);
> +
> +		return 0;
> +	}
> +
> +	/* Reclaim all the resources */
> +	rte_rcu_qsbr_dq_reclaim(dq, ~0, NULL, &pending, NULL);
> +	if (pending != 0) {
> +		rte_errno = EAGAIN;
> +
> +		return 1;
> +	}
> +
> +	rte_ring_free(dq->r);
> +	rte_free(dq);
> +
> +	return 0;
> +}
> +
>  int rte_rcu_log_type;
> 
>  RTE_INIT(rte_rcu_register)
> diff --git a/lib/librte_rcu/rte_rcu_qsbr.h b/lib/librte_rcu/rte_rcu_qsbr.h
> index 0b5585925..e2fc7f83e 100644
> --- a/lib/librte_rcu/rte_rcu_qsbr.h
> +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright (c) 2018 Arm Limited
> + * Copyright (c) 2018-2020 Arm Limited
>   */
> 
>  #ifndef _RTE_RCU_QSBR_H_
> @@ -34,6 +34,7 @@ extern "C" {
>  #include <rte_lcore.h>
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> +#include <rte_ring.h>
> 
>  extern int rte_rcu_log_type;
> 
> @@ -84,6 +85,7 @@ struct rte_rcu_qsbr_cnt {
>  #define __RTE_QSBR_CNT_THR_OFFLINE 0
>  #define __RTE_QSBR_CNT_INIT 1
>  #define __RTE_QSBR_CNT_MAX ((uint64_t)~0)
> +#define __RTE_QSBR_TOKEN_SIZE sizeof(uint64_t)
> 
>  /* RTE Quiescent State variable structure.
>   * This structure has two elements that vary in size based on the
> @@ -114,6 +116,86 @@ struct rte_rcu_qsbr {
>  	 */
>  } __rte_cache_aligned;
> 
> +/**
> + * Call back function called to free the resources.
> + *
> + * @param p
> + *   Pointer provided while creating the defer queue
> + * @param e
> + *   Pointer to the resource data stored on the defer queue
> + * @param n
> + *   Number of resources to free. Currently, this is set to 1.
> + *
> + * @return
> + *   None
> + */
> +typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e, unsigned int n);
> +
> +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> +
> +/**
> + * Various flags supported.
> + */
> +/**< Enqueue and reclaim operations are multi-thread safe by default.
> + *   The call back functions registered to free the resources are
> + *   assumed to be multi-thread safe.
> + *   Set this flag is multi-thread safety is not required.
> + */
> +#define RTE_RCU_QSBR_DQ_MT_UNSAFE 1
> +
> +/**
> + * Parameters used when creating the defer queue.
> + */
> +struct rte_rcu_qsbr_dq_parameters {
> +	const char *name;
> +	/**< Name of the queue. */
> +	uint32_t flags;
> +	/**< Flags to control API behaviors */
> +	uint32_t size;
> +	/**< Number of entries in queue. Typically, this will be
> +	 *   the same as the maximum number of entries supported in the
> +	 *   lock free data structure.
> +	 *   Data structures with unbounded number of entries is not
> +	 *   supported currently.
> +	 */
> +	uint32_t esize;
> +	/**< Size (in bytes) of each element in the defer queue.
> +	 *   This has to be multiple of 4B.
> +	 */
> +	uint32_t trigger_reclaim_limit;
> +	/**< Trigger automatic reclamation after the defer queue
> +	 *   has atleast these many resources waiting. This auto
> +	 *   reclamation is triggered in rte_rcu_qsbr_dq_enqueue API
> +	 *   call.
> +	 *   If this is greater than 'size', auto reclamation is
> +	 *   not triggered.
> +	 *   If this is set to 0, auto reclamation is triggered
> +	 *   in every call to rte_rcu_qsbr_dq_enqueue API.
> +	 */
> +	uint32_t max_reclaim_size;
> +	/**< When automatic reclamation is enabled, reclaim at the max
> +	 *   these many resources. This should contain a valid value, if
> +	 *   auto reclamation is on. Setting this to 'size' or greater will
> +	 *   reclaim all possible resources currently on the defer queue.
> +	 */
> +	rte_rcu_qsbr_free_resource_t free_fn;
> +	/**< Function to call to free the resource. */
> +	void *p;
> +	/**< Pointer passed to the free function. Typically, this is the
> +	 *   pointer to the data structure to which the resource to free
> +	 *   belongs. This can be NULL.
> +	 */
> +	struct rte_rcu_qsbr *v;
> +	/**< RCU QSBR variable to use for this defer queue */
> +};
> +
> +/* RTE defer queue structure.
> + * This structure holds the defer queue. The defer queue is used to
> + * hold the deleted entries from the data structure that are not
> + * yet freed.
> + */
> +struct rte_rcu_qsbr_dq;
> +
>  /**
>   * @warning
>   * @b EXPERIMENTAL: this API may change without prior notice
> @@ -692,6 +774,116 @@ __rte_experimental
>  int
>  rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a queue used to store the data structure elements that can
> + * be freed later. This queue is referred to as 'defer queue'.
> + *
> + * @param params
> + *   Parameters to create a defer queue.
> + * @return
> + *   On success - Valid pointer to defer queue
> + *   On error - NULL
> + *   Possible rte_errno codes are:
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOMEM - Not enough memory
> + */
> +__rte_experimental
> +struct rte_rcu_qsbr_dq *
> +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters *params);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enqueue one resource to the defer queue and start the grace period.
> + * The resource will be freed later after at least one grace period
> + * is over.
> + *
> + * If the defer queue is full, it will attempt to reclaim resources.
> + * It will also reclaim resources at regular intervals to avoid
> + * the defer queue from growing too big.
> + *
> + * Multi-thread safety is provided as the defer queue configuration.
> + * When multi-thread safety is requested, it is possible that the
> + * resources are not stored in their order of deletion. This results
> + * in resources being held in the defer queue longer than they should.
> + *
> + * @param dq
> + *   Defer queue to allocate an entry from.
> + * @param e
> + *   Pointer to resource data to copy to the defer queue. The size of
> + *   the data to copy is equal to the element size provided when the
> + *   defer queue was created.
> + * @return
> + *   On success - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + *   - ENOSPC - Defer queue is full. This condition can not happen
> + *		if the defer queue size is equal (or larger) than the
> + *		number of elements in the data structure.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Free quesed resources from the defer queue.
> + *
> + * This API is multi-thread safe.
> + *
> + * @param dq
> + *   Defer queue to free an entry from.
> + * @param n
> + *   Maximum number of resources to free.
> + * @param freed
> + *   Number of resources that were freed.
> + * @param pending
> + *   Number of resources pending on the defer queue. This number might not
> + *   be acurate if multi-thread safety is configured.
> + * @param available
> + *   Number of resources that can be added to the defer queue.
> + *   This number might not be acurate if multi-thread safety is configured.
> + * @return
> + *   On successful reclamation of at least 1 resource - 0
> + *   On error - 1 with rte_errno set to
> + *   - EINVAL - NULL parameters are passed
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq, unsigned int n,
> +	unsigned int *freed, unsigned int *pending, unsigned int *available);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Delete a defer queue.
> + *
> + * It tries to reclaim all the resources on the defer queue.
> + * If any of the resources have not completed the grace period
> + * the reclamation stops and returns immediately. The rest of
> + * the resources are not reclaimed and the defer queue is not
> + * freed.
> + *
> + * @param dq
> + *   Defer queue to delete.
> + * @return
> + *   On success - 0
> + *   On error - 1
> + *   Possible rte_errno codes are:
> + *   - EAGAIN - Some of the resources have not completed at least 1 grace
> + *		period, try again.
> + */
> +__rte_experimental
> +int
> +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_rcu/rte_rcu_version.map b/lib/librte_rcu/rte_rcu_version.map
> index f8b9ef2ab..dfac88a37 100644
> --- a/lib/librte_rcu/rte_rcu_version.map
> +++ b/lib/librte_rcu/rte_rcu_version.map
> @@ -8,6 +8,10 @@ EXPERIMENTAL {
>  	rte_rcu_qsbr_synchronize;
>  	rte_rcu_qsbr_thread_register;
>  	rte_rcu_qsbr_thread_unregister;
> +	rte_rcu_qsbr_dq_create;
> +	rte_rcu_qsbr_dq_enqueue;
> +	rte_rcu_qsbr_dq_reclaim;
> +	rte_rcu_qsbr_dq_delete;
> 
>  	local: *;
>  };
> diff --git a/lib/meson.build b/lib/meson.build
> index 63c17ee75..c28b8df83 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -11,7 +11,9 @@
>  libraries = [
>  	'kvargs', # eal depends on kvargs
>  	'eal', # everything depends on eal
> -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> +	'ring',
> +	'rcu', # rcu depends on ring
> +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
>  	'cmdline',
>  	'metrics', # bitrate/latency stats depends on this
>  	'hash',    # efd depends on this
> @@ -22,7 +24,7 @@ libraries = [
>  	'gro', 'gso', 'ip_frag', 'jobstats',
>  	'kni', 'latencystats', 'lpm', 'member',
>  	'power', 'pdump', 'rawdev',
> -	'rcu', 'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
> +	'rib', 'reorder', 'sched', 'security', 'stack', 'vhost',
>  	# ipsec lib depends on net, crypto and security
>  	'ipsec',
>  	#fib lib depends on rib
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource reclamation APIs
  2020-04-22  8:36         ` Ananyev, Konstantin
@ 2020-04-22  8:42           ` David Marchand
  2020-04-22  8:51             ` David Marchand
  0 siblings, 1 reply; 137+ messages in thread
From: David Marchand @ 2020-04-22  8:42 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Honnappa Nagarahalli, stephen, Medvedkin, Vladimir, dev,
	ruifeng.wang, dharmik.thakkar, nd

On Wed, Apr 22, 2020 at 10:37 AM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
> > Add resource reclamation APIs to make it simple for applications
> > and libraries to integrate rte_rcu library.
> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/librte_rcu/Makefile            |   2 +-
> >  lib/librte_rcu/meson.build         |   7 +
> >  lib/librte_rcu/rcu_qsbr_pvt.h      |  66 +++++++++
> >  lib/librte_rcu/rte_rcu_qsbr.c      | 227 ++++++++++++++++++++++++++++-
> >  lib/librte_rcu/rte_rcu_qsbr.h      | 194 +++++++++++++++++++++++-
> >  lib/librte_rcu/rte_rcu_version.map |   4 +
> >  lib/meson.build                    |   6 +-
> >  7 files changed, 501 insertions(+), 5 deletions(-)
> >  create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h
> >
> > diff --git a/lib/librte_rcu/Makefile b/lib/librte_rcu/Makefile
> > index 728669975..553bca2ef 100644
> > --- a/lib/librte_rcu/Makefile
> > +++ b/lib/librte_rcu/Makefile
> > @@ -7,7 +7,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
> >  LIB = librte_rcu.a
> >
> >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> > -LDLIBS += -lrte_eal
> > +LDLIBS += -lrte_eal -lrte_ring
> >
> >  EXPORT_MAP := rte_rcu_version.map
> >
> > diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> > index c009ae4b7..3eb2ace17 100644
> > --- a/lib/librte_rcu/meson.build
> > +++ b/lib/librte_rcu/meson.build
> > @@ -3,3 +3,10 @@
> >
> >  sources = files('rte_rcu_qsbr.c')
> >  headers = files('rte_rcu_qsbr.h')
> > +
> > +# for clang 32-bit compiles we need libatomic for 64-bit atomic ops
> > +if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> > +     ext_deps += cc.find_library('atomic')
> > +endif
> > +
>
> As a nit - as Pavan patch is already integrated into mainline,
> this is not necessary any more, I think.

I can handle this.


> Also noticed that most of make builds failed due to dependency problem:
> http://mails.dpdk.org/archives/test-report/2020-April/127765.html
> I can't reproduce it locally, but my guess that we need to move rcu above
> ring in this mk file: mk/rte.app.mk
> Probably something like that:
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index da12b9eec..8e5d023de 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -91,13 +91,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_MEMPOOL)        += -lrte_mempool
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_STACK)          += -lrte_stack
>  _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_RING)   += -lrte_mempool_ring
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_OCTEONTX2_MEMPOOL) += -lrte_mempool_octeontx2
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_RCU)            += -lrte_rcu
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PCI)            += -lrte_pci
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)          += -lrte_sched
> -_LDLIBS-$(CONFIG_RTE_LIBRTE_RCU)            += -lrte_rcu

No, just moving will not express a dependency.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource reclamation APIs
  2020-04-22  8:42           ` David Marchand
@ 2020-04-22  8:51             ` David Marchand
  2020-04-22  9:26               ` Ananyev, Konstantin
  0 siblings, 1 reply; 137+ messages in thread
From: David Marchand @ 2020-04-22  8:51 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Honnappa Nagarahalli, stephen, Medvedkin, Vladimir, dev,
	ruifeng.wang, dharmik.thakkar, nd

On Wed, Apr 22, 2020 at 10:42 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> On Wed, Apr 22, 2020 at 10:37 AM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> >
> > > Add resource reclamation APIs to make it simple for applications
> > > and libraries to integrate rte_rcu library.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > ---
> > >  lib/librte_rcu/Makefile            |   2 +-
> > >  lib/librte_rcu/meson.build         |   7 +
> > >  lib/librte_rcu/rcu_qsbr_pvt.h      |  66 +++++++++
> > >  lib/librte_rcu/rte_rcu_qsbr.c      | 227 ++++++++++++++++++++++++++++-
> > >  lib/librte_rcu/rte_rcu_qsbr.h      | 194 +++++++++++++++++++++++-
> > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > >  lib/meson.build                    |   6 +-
> > >  7 files changed, 501 insertions(+), 5 deletions(-)
> > >  create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h
> > >
> > > diff --git a/lib/librte_rcu/Makefile b/lib/librte_rcu/Makefile
> > > index 728669975..553bca2ef 100644
> > > --- a/lib/librte_rcu/Makefile
> > > +++ b/lib/librte_rcu/Makefile
> > > @@ -7,7 +7,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > >  LIB = librte_rcu.a
> > >
> > >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> > > -LDLIBS += -lrte_eal
> > > +LDLIBS += -lrte_eal -lrte_ring
> > >
> > >  EXPORT_MAP := rte_rcu_version.map
> > >
> > > diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> > > index c009ae4b7..3eb2ace17 100644
> > > --- a/lib/librte_rcu/meson.build
> > > +++ b/lib/librte_rcu/meson.build
> > > @@ -3,3 +3,10 @@
> > >
> > >  sources = files('rte_rcu_qsbr.c')
> > >  headers = files('rte_rcu_qsbr.h')
> > > +
> > > +# for clang 32-bit compiles we need libatomic for 64-bit atomic ops
> > > +if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> > > +     ext_deps += cc.find_library('atomic')
> > > +endif
> > > +
> >
> > As a nit - as Pavan patch is already integrated into mainline,
> > this is not necessary any more, I think.
>
> I can handle this.
>
>
> > Also noticed that most of make builds failed due to dependency problem:
> > http://mails.dpdk.org/archives/test-report/2020-April/127765.html
> > I can't reproduce it locally, but my guess that we need to move rcu above
> > ring in this mk file: mk/rte.app.mk
> > Probably something like that:
> > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > index da12b9eec..8e5d023de 100644
> > --- a/mk/rte.app.mk
> > +++ b/mk/rte.app.mk
> > @@ -91,13 +91,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_MEMPOOL)        += -lrte_mempool
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_STACK)          += -lrte_stack
> >  _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_RING)   += -lrte_mempool_ring
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_OCTEONTX2_MEMPOOL) += -lrte_mempool_octeontx2
> > +_LDLIBS-$(CONFIG_RTE_LIBRTE_RCU)            += -lrte_rcu
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PCI)            += -lrte_pci
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)          += -lrte_sched
> > -_LDLIBS-$(CONFIG_RTE_LIBRTE_RCU)            += -lrte_rcu
>
> No, just moving will not express a dependency.

Fixed with:

diff --git a/lib/Makefile b/lib/Makefile
index 2cbb096f1..8bc0c2e4a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -118,6 +118,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TELEMETRY) += librte_telemetry
 DEPDIRS-librte_telemetry := librte_eal librte_metrics librte_ethdev
 DIRS-$(CONFIG_RTE_LIBRTE_RCU) += librte_rcu
-DEPDIRS-librte_rcu := librte_eal
+DEPDIRS-librte_rcu := librte_eal librte_ring

 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUX),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
index 3eb2ace17..09abc5204 100644
--- a/lib/librte_rcu/meson.build
+++ b/lib/librte_rcu/meson.build
@@ -4,9 +4,4 @@
 sources = files('rte_rcu_qsbr.c')
 headers = files('rte_rcu_qsbr.h')

-# for clang 32-bit compiles we need libatomic for 64-bit atomic ops
-if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
-       ext_deps += cc.find_library('atomic')
-endif
-
 deps += ['ring']


-- 
David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource reclamation APIs
  2020-04-22  8:51             ` David Marchand
@ 2020-04-22  9:26               ` Ananyev, Konstantin
  0 siblings, 0 replies; 137+ messages in thread
From: Ananyev, Konstantin @ 2020-04-22  9:26 UTC (permalink / raw)
  To: David Marchand
  Cc: Honnappa Nagarahalli, stephen, Medvedkin, Vladimir, dev,
	ruifeng.wang, dharmik.thakkar, nd

> 
> On Wed, Apr 22, 2020 at 10:42 AM David Marchand
> <david.marchand@redhat.com> wrote:
> >
> > On Wed, Apr 22, 2020 at 10:37 AM Ananyev, Konstantin
> > <konstantin.ananyev@intel.com> wrote:
> > >
> > > > Add resource reclamation APIs to make it simple for applications
> > > > and libraries to integrate rte_rcu library.
> > > >
> > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > ---
> > > >  lib/librte_rcu/Makefile            |   2 +-
> > > >  lib/librte_rcu/meson.build         |   7 +
> > > >  lib/librte_rcu/rcu_qsbr_pvt.h      |  66 +++++++++
> > > >  lib/librte_rcu/rte_rcu_qsbr.c      | 227 ++++++++++++++++++++++++++++-
> > > >  lib/librte_rcu/rte_rcu_qsbr.h      | 194 +++++++++++++++++++++++-
> > > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > > >  lib/meson.build                    |   6 +-
> > > >  7 files changed, 501 insertions(+), 5 deletions(-)
> > > >  create mode 100644 lib/librte_rcu/rcu_qsbr_pvt.h
> > > >
> > > > diff --git a/lib/librte_rcu/Makefile b/lib/librte_rcu/Makefile
> > > > index 728669975..553bca2ef 100644
> > > > --- a/lib/librte_rcu/Makefile
> > > > +++ b/lib/librte_rcu/Makefile
> > > > @@ -7,7 +7,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
> > > >  LIB = librte_rcu.a
> > > >
> > > >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> > > > -LDLIBS += -lrte_eal
> > > > +LDLIBS += -lrte_eal -lrte_ring
> > > >
> > > >  EXPORT_MAP := rte_rcu_version.map
> > > >
> > > > diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> > > > index c009ae4b7..3eb2ace17 100644
> > > > --- a/lib/librte_rcu/meson.build
> > > > +++ b/lib/librte_rcu/meson.build
> > > > @@ -3,3 +3,10 @@
> > > >
> > > >  sources = files('rte_rcu_qsbr.c')
> > > >  headers = files('rte_rcu_qsbr.h')
> > > > +
> > > > +# for clang 32-bit compiles we need libatomic for 64-bit atomic ops
> > > > +if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> > > > +     ext_deps += cc.find_library('atomic')
> > > > +endif
> > > > +
> > >
> > > As a nit - as Pavan patch is already integrated into mainline,
> > > this is not necessary any more, I think.
> >
> > I can handle this.
> >
> >
> > > Also noticed that most of make builds failed due to dependency problem:
> > > http://mails.dpdk.org/archives/test-report/2020-April/127765.html
> > > I can't reproduce it locally, but my guess that we need to move rcu above
> > > ring in this mk file: mk/rte.app.mk
> > > Probably something like that:
> > > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > > index da12b9eec..8e5d023de 100644
> > > --- a/mk/rte.app.mk
> > > +++ b/mk/rte.app.mk
> > > @@ -91,13 +91,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_MEMPOOL)        += -lrte_mempool
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_STACK)          += -lrte_stack
> > >  _LDLIBS-$(CONFIG_RTE_DRIVER_MEMPOOL_RING)   += -lrte_mempool_ring
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_OCTEONTX2_MEMPOOL) += -lrte_mempool_octeontx2
> > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_RCU)            += -lrte_rcu
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_PCI)            += -lrte_pci
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)          += -lrte_sched
> > > -_LDLIBS-$(CONFIG_RTE_LIBRTE_RCU)            += -lrte_rcu
> >
> > No, just moving will not express a dependency.
> 
> Fixed with:
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index 2cbb096f1..8bc0c2e4a 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -118,6 +118,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TELEMETRY) += librte_telemetry
>  DEPDIRS-librte_telemetry := librte_eal librte_metrics librte_ethdev
>  DIRS-$(CONFIG_RTE_LIBRTE_RCU) += librte_rcu
> -DEPDIRS-librte_rcu := librte_eal
> +DEPDIRS-librte_rcu := librte_eal librte_ring

Right, totally forgot about that part.

> 
>  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUX),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> diff --git a/lib/librte_rcu/meson.build b/lib/librte_rcu/meson.build
> index 3eb2ace17..09abc5204 100644
> --- a/lib/librte_rcu/meson.build
> +++ b/lib/librte_rcu/meson.build
> @@ -4,9 +4,4 @@
>  sources = files('rte_rcu_qsbr.c')
>  headers = files('rte_rcu_qsbr.h')
> 
> -# for clang 32-bit compiles we need libatomic for 64-bit atomic ops
> -if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
> -       ext_deps += cc.find_library('atomic')
> -endif
> -
>  deps += ['ring']
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs
  2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
                         ` (3 preceding siblings ...)
  2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 4/4] lib/rcu: add additional debug logs Honnappa Nagarahalli
@ 2020-04-22 18:46       ` David Marchand
  4 siblings, 0 replies; 137+ messages in thread
From: David Marchand @ 2020-04-22 18:46 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Ananyev, Konstantin, Stephen Hemminger, Vladimir Medvedkin, dev,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, nd

On Wed, Apr 22, 2020 at 5:30 AM Honnappa Nagarahalli
<honnappa.nagarahalli@arm.com> wrote:
> This is not a new patch. This patch set is separated from the LPM
> changes as the size of the changes in RCU library has grown due
> to comments from community. These APIs will help reduce the changes
> in LPM and hash libraries that are getting integrated with RCU
> library.
>
> This adds 4 new APIs to RCU library to create a defer queue, enqueue
> deleted resources, reclaim resources and delete the defer queue.

Fixed a few typos.
Added missing dependency of librte_rcu on librte_ring for make.

Series applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (7 preceding siblings ...)
  2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
@ 2020-06-08  5:16   ` Ruifeng Wang
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                       ` (2 more replies)
  2020-06-29  8:02   ` [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library Ruifeng Wang
                     ` (5 subsequent siblings)
  14 siblings, 3 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-08  5:16 UTC (permalink / raw)
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 293 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 123 +++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 992 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
  2020-06-08  5:16   ` [dpdk-dev] [PATCH v4 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-06-08  5:16     ` Ruifeng Wang
  2020-06-08 18:46       ` Honnappa Nagarahalli
  2020-06-18 17:21       ` Medvedkin, Vladimir
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 2 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-08  5:16 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 123 ++++++++++++++++++++++++++---
 lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 211 insertions(+), 12 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..7cc99044a 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has smaller depth.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
+while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..30f541179 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -246,12 +247,85 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	if (lpm->dq)
+		rte_rcu_qsbr_dq_delete(lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if ((lpm == NULL) || (cfg == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->v) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		if (params.trigger_reclaim_limit == 0)
+			params.trigger_reclaim_limit =
+					RTE_LPM_RCU_DQ_RECLAIM_THD;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm->tbl8;
+		params.v = cfg->v;
+		lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM,
+					"LPM QS defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	lpm->rcu_mode = cfg->mode;
+	lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +468,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +502,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if ((group_idx < 0) && (lpm->dq != NULL)) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (!lpm->v) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +624,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +670,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1078,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1094,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..8c054509a 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default threshold to trigger RCU defer queue reclaimation. */
+#define RTE_LPM_RCU_DQ_RECLAIM_THD	32
+
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/* Create defer queue for reclaim. */
+#define RTE_LPM_QSBR_MODE_DQ		0
+/* Use blocking mode reclaim. No defer queue created. */
+#define RTE_LPM_QSBR_MODE_SYNC		0x01
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -130,6 +143,28 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	uint32_t rcu_mode;		/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	uint32_t mode;
+	/* RCU defer queue size. default: lpm->number_tbl8s. */
+	uint32_t dq_size;
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_TRHD.
+				 */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
 };
 
 /**
@@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 2/3] test/lpm: add LPM RCU integration functional tests
  2020-06-08  5:16   ` [dpdk-dev] [PATCH v4 0/3] RCU integration with LPM library Ruifeng Wang
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-06-08  5:16     ` Ruifeng Wang
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-08  5:16 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_lpm.c | 293 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 292 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index 3a3fd097f..ea38a6792 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,9 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
+static int32_t test21(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +65,10 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20,
+	test21
 };
 
 #define MAX_DEPTH 32
@@ -1265,6 +1272,290 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check LPM attached RCU QSBR variable and FIFO queue
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	/* Invalid QSBR mode */
+	rcu_cfg.mode = 2;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	rcu_cfg.v = qsv2;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	TEST_LPM_ASSERT(lpm->dq != NULL);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add DQ mode functional test.
+ * Reader and writer are in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 18, 100, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
+static struct rte_lpm *g_lpm;
+static struct rte_rcu_qsbr *g_v;
+static uint32_t g_ip = RTE_IPV4(192, 18, 100, 100);
+static volatile uint8_t writer_done;
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+#define WRITER_ITERATIONS	512
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(g_v, 0);
+	rte_rcu_qsbr_thread_online(g_v, 0);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(g_v, 0);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(g_v, 0);
+	rte_rcu_qsbr_thread_unregister(g_v, 0);
+
+	return 0;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add sync mode functional test.
+ * 1 Reader and 1 writer. They cannot be in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable with sync mode to LPM
+ *  - Register a reader thread. Reader keeps looking up a specific rule.
+ *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)
+ */
+int32_t
+test21(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	int32_t status;
+	uint32_t i, next_hop;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for %s, expecting at least 2\n",
+			__func__);
+		return TEST_SKIPPED;
+	}
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(g_lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(g_v != NULL);
+
+	status = rte_rcu_qsbr_init(g_v, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = g_v;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	writer_done = 0;
+	/* Launch reader thread */
+	rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+				rte_get_next_lcore(-1, 1, 0));
+
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+	if (status != 0) {
+		printf("%s: Failed to add rule\n", __func__);
+		goto error;
+	}
+
+	/* Writer update */
+	for (i = 0; i < WRITER_ITERATIONS; i++) {
+		status = rte_lpm_delete(g_lpm, g_ip, depth);
+		if (status != 0) {
+			printf("%s: Failed to delete rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+
+		status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+		if (status != 0) {
+			printf("%s: Failed to add rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+	}
+
+error:
+	writer_done = 1;
+	/* Wait until reader exited. */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(g_lpm);
+	rte_free(g_v);
+
+	return (status == 0) ? PASS : -1;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v4 3/3] test/lpm: add RCU integration performance tests
  2020-06-08  5:16   ` [dpdk-dev] [PATCH v4 0/3] RCU integration with LPM library Ruifeng Wang
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
@ 2020-06-08  5:16     ` Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-08  5:16 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 489 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 489719c40..dfe186426 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,27 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static uint64_t gwrite_cycles;
+static uint64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +40,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +52,13 @@ struct route_rule {
 };
 
 static struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	RTE_SET_USED(arg);
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	__atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,
+			__ATOMIC_RELAXED);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +825,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +960,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-06-08 18:46       ` Honnappa Nagarahalli
  2020-06-18 17:36         ` Medvedkin, Vladimir
  2020-06-18 17:21       ` Medvedkin, Vladimir
  1 sibling, 1 reply; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-06-08 18:46 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, Vladimir Medvedkin,
	John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, nd, Ruifeng Wang, Honnappa Nagarahalli, nd

<snip>

> Subject: [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
> 
> Currently, the tbl8 group is freed even though the readers might be using the
> tbl8 group entries. The freed tbl8 group can be reallocated quickly. This
> results in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of integrating
> RCU library into other libraries.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
>  lib/librte_lpm/Makefile            |   2 +-
>  lib/librte_lpm/meson.build         |   1 +
>  lib/librte_lpm/rte_lpm.c           | 123 ++++++++++++++++++++++++++---
>  lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>  6 files changed, 211 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/lpm_lib.rst
> b/doc/guides/prog_guide/lpm_lib.rst
> index 1609a57d0..7cc99044a 100644
> --- a/doc/guides/prog_guide/lpm_lib.rst
> +++ b/doc/guides/prog_guide/lpm_lib.rst
> @@ -145,6 +145,38 @@ depending on whether we need to move to the next
> table or not.
>  Prefix expansion is one of the keys of this algorithm,  since it improves the
> speed dramatically by adding redundancy.
> 
> +Deletion
> +~~~~~~~~
> +
> +When deleting a rule, a replacement rule is searched for. Replacement
> +rule is an existing rule that has the longest prefix match with the rule to be
> deleted, but has smaller depth.
> +
> +If a replacement rule is found, target tbl24 and tbl8 entries are
> +updated to have the same depth and next hop value with the replacement
> rule.
> +
> +If no replacement rule can be found, target tbl24 and tbl8 entries will be
> cleared.
> +
> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32
> bits.
> +
> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry
> are freed in following cases:
> +
> +*   All tbl8s in the group are empty .
> +
> +*   All tbl8s in the group have the same values and with depth no greater
> than 24.
> +
> +Free of tbl8s have different behaviors:
> +
> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> +
> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> +
> +When the LPM is not using RCU, tbl8 group can be freed immediately even
> +though the readers might be using the tbl8 group entries. This might result
> in incorrect lookup results.
> +
> +RCU QSBR process is integrated for safe tbl8 group reclaimation.
> +Application has certain responsibilities while using this feature.
> +Please refer to resource reclaimation framework of :ref:`RCU library
> <RCU_Library>` for more details.
> +
>  Lookup
>  ~~~~~~
> 
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> d682785b6..6f06c5c03 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -8,7 +8,7 @@ LIB = librte_lpm.a
> 
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> 
>  EXPORT_MAP := rte_lpm_version.map
> 
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build index
> 021ac6d8d..6cfc083c5 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')  # without
> worrying about which architecture we actually need  headers +=
> files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')  deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> 38ab512a4..30f541179 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
> 
>  #include <string.h>
> @@ -246,12 +247,85 @@ rte_lpm_free(struct rte_lpm *lpm)
> 
>  	rte_mcfg_tailq_write_unlock();
> 
> +	if (lpm->dq)
> +		rte_rcu_qsbr_dq_delete(lpm->dq);
>  	rte_free(lpm->tbl8);
>  	rte_free(lpm->rules_tbl);
>  	rte_free(lpm);
>  	rte_free(te);
>  }
> 
> +static void
> +__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n) {
> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	uint32_t tbl8_group_index = *(uint32_t *)data;
> +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> +
> +	RTE_SET_USED(n);
> +	/* Set tbl8 group invalid */
> +	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
> +		__ATOMIC_RELAXED);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
> +	struct rte_rcu_qsbr_dq **dq)
I prefer not to return the defer queue to the user here. I see 3 different ways how RCU can be integrated in the libraries:

1) The sync mode in which the defer queue is not created. The rte_rcu_qsbr_synchronize API is called after delete. The resource is freed after rte_rcu_qsbr_synchronize returns and the control is given back to the user.

2) The mode where the defer queue is created. There is a lot of flexibility provided now as the defer queue size, reclaim threshold and how many resources to reclaim are all configurable. IMO, this solves most of the use cases and helps the application integrate lock-less algorithms with minimal effort.

3) This is where the application has its own method of reclamation that does not fall under 1) or 2). To address this use case, I think we should make changes to the LPM library. Today, in LPM, the delete and free are combined into a single API. We can split this single API into 2 separate APIs - delete and free (similar thing was done to rte_hash library) without affecting the ABI. This should provide all the flexibility required for the application to implement any kind of reclamation algorithm it wants. Returning the defer queue to the user in the above API does not solve this use case.

> +{
> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params = {0};
> +
> +	if ((lpm == NULL) || (cfg == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->v) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
> +		/* No other things to do. */
> +	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
> +		/* Init QSBR defer queue. */
> +		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
> +				"LPM_RCU_%s", lpm->name);
> +		params.name = rcu_dq_name;
> +		params.size = cfg->dq_size;
> +		if (params.size == 0)
> +			params.size = lpm->number_tbl8s;
> +		params.trigger_reclaim_limit = cfg->reclaim_thd;
> +		if (params.trigger_reclaim_limit == 0)
> +			params.trigger_reclaim_limit =
> +					RTE_LPM_RCU_DQ_RECLAIM_THD;
> +		params.max_reclaim_size = cfg->reclaim_max;
> +		if (params.max_reclaim_size == 0)
> +			params.max_reclaim_size =
> RTE_LPM_RCU_DQ_RECLAIM_MAX;
> +		params.esize = sizeof(uint32_t);	/* tbl8 group index */
> +		params.free_fn = __lpm_rcu_qsbr_free_resource;
> +		params.p = lpm->tbl8;
> +		params.v = cfg->v;
> +		lpm->dq = rte_rcu_qsbr_dq_create(&params);
> +		if (lpm->dq == NULL) {
> +			RTE_LOG(ERR, LPM,
> +					"LPM QS defer queue creation
> failed\n");
> +			return 1;
> +		}
> +		if (dq)
> +			*dq = lpm->dq;
> +	} else {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +	lpm->rcu_mode = cfg->mode;
> +	lpm->v = cfg->v;
> +
> +	return 0;
> +}
> +
>  /*
>   * Adds a rule to the rule table.
>   *
> @@ -394,14 +468,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked,
> uint8_t depth)
>   * Find, clean and allocate a tbl8.
>   */
>  static int32_t
> -tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +_tbl8_alloc(struct rte_lpm *lpm)
>  {
>  	uint32_t group_idx; /* tbl8 group index. */
>  	struct rte_lpm_tbl_entry *tbl8_entry;
> 
>  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>  		/* If a free tbl8 group is found clean it and set as VALID. */
>  		if (!tbl8_entry->valid_group) {
>  			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
> 427,14 +502,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t
> number_tbl8s)
>  	return -ENOSPC;
>  }
> 
> +static int32_t
> +tbl8_alloc(struct rte_lpm *lpm)
> +{
> +	int32_t group_idx; /* tbl8 group index. */
> +
> +	group_idx = _tbl8_alloc(lpm);
> +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> +		/* If there are no tbl8 groups try to reclaim one. */
> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL)
> == 0)
> +			group_idx = _tbl8_alloc(lpm);
> +	}
> +
> +	return group_idx;
> +}
> +
>  static void
> -tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>  {
> -	/* Set tbl8 group invalid*/
>  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> 
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (!lpm->v) {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> +		/* Wait for quiescent state change. */
> +		rte_rcu_qsbr_synchronize(lpm->v,
> RTE_QSBR_THRID_INVALID);
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> +		/* Push into QSBR defer queue. */
> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void
> *)&tbl8_group_start);
> +	}
>  }
> 
>  static __rte_noinline int32_t
> @@ -523,7 +624,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth,
> 
>  	if (!lpm->tbl24[tbl24_index].valid) {
>  		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc(lpm);
> 
>  		/* Check tbl8 allocation was successful. */
>  		if (tbl8_group_index < 0) {
> @@ -569,7 +670,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth,
>  	} /* If valid entry but not extended calculate the index into Table8. */
>  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>  		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc(lpm);
> 
>  		if (tbl8_group_index < 0) {
>  			return tbl8_group_index;
> @@ -977,7 +1078,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t
> ip_masked,
>  		 */
>  		lpm->tbl24[tbl24_index].valid = 0;
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free(lpm->tbl8, tbl8_group_start);
> +		tbl8_free(lpm, tbl8_group_start);
>  	} else if (tbl8_recycle_index > -1) {
>  		/* Update tbl24 entry. */
>  		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -993,7
> +1094,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
>  		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
>  				__ATOMIC_RELAXED);
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free(lpm->tbl8, tbl8_group_start);
> +		tbl8_free(lpm, tbl8_group_start);
>  	}
>  #undef group_idx
>  	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> b9d49ac87..8c054509a 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
> 
>  #ifndef _RTE_LPM_H_
> @@ -20,6 +21,7 @@
>  #include <rte_memory.h>
>  #include <rte_common.h>
>  #include <rte_vect.h>
> +#include <rte_rcu_qsbr.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -62,6 +64,17 @@ extern "C" {
>  /** Bitmask used to indicate successful lookup */
>  #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
> 
> +/** @internal Default threshold to trigger RCU defer queue reclaimation. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_THD	32
> +
> +/** @internal Default RCU defer queue entries to reclaim in one go. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
> +
> +/* Create defer queue for reclaim. */
> +#define RTE_LPM_QSBR_MODE_DQ		0
> +/* Use blocking mode reclaim. No defer queue created. */
> +#define RTE_LPM_QSBR_MODE_SYNC		0x01
> +
>  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>  /** @internal Tbl24 entry structure. */  __extension__ @@ -130,6 +143,28
> @@ struct rte_lpm {
>  			__rte_cache_aligned; /**< LPM tbl24 table. */
>  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +
> +	/* RCU config. */
> +	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
> +	uint32_t rcu_mode;		/* Blocking, defer queue. */
> +	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
> +};
> +
> +/** LPM RCU QSBR configuration structure. */ struct rte_lpm_rcu_config
> +{
> +	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
> +	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
> +	 * '0' for default: create defer queue for reclaim.
> +	 */
> +	uint32_t mode;
> +	/* RCU defer queue size. default: lpm->number_tbl8s. */
> +	uint32_t dq_size;
> +	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim.
> +				 * default:
> RTE_LPM_RCU_DQ_RECLAIM_TRHD.
> +				 */
> +	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
> +				 * default:
> RTE_LPM_RCU_DQ_RECLAIM_MAX.
> +				 */
>  };
> 
>  /**
> @@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);  void
> rte_lpm_free(struct rte_lpm *lpm);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param cfg
> + *   RCU QSBR configuration
> + * @param dq
> + *   handler of created RCU QSBR defer queue
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config
> *cfg,
> +	struct rte_rcu_qsbr_dq **dq);
> +
>  /**
>   * Add a rule to the LPM table.
>   *
> diff --git a/lib/librte_lpm/rte_lpm_version.map
> b/lib/librte_lpm/rte_lpm_version.map
> index 500f58b80..bfccd7eac 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -21,3 +21,9 @@ DPDK_20.0 {
> 
>  	local: *;
>  };
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
  2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-06-08 18:46       ` Honnappa Nagarahalli
@ 2020-06-18 17:21       ` Medvedkin, Vladimir
  2020-06-22  5:46         ` Ruifeng Wang
  1 sibling, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2020-06-18 17:21 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, John McNamara, Marko Kovacevic,
	Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd

Hi Ruifeng,

Thanks for patches, see comments below


On 08/06/2020 06:16, Ruifeng Wang wrote:
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
>
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>   doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
>   lib/librte_lpm/Makefile            |   2 +-
>   lib/librte_lpm/meson.build         |   1 +
>   lib/librte_lpm/rte_lpm.c           | 123 ++++++++++++++++++++++++++---
>   lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
>   lib/librte_lpm/rte_lpm_version.map |   6 ++
>   6 files changed, 211 insertions(+), 12 deletions(-)
>
> diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
> index 1609a57d0..7cc99044a 100644
> --- a/doc/guides/prog_guide/lpm_lib.rst
> +++ b/doc/guides/prog_guide/lpm_lib.rst
> @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
>   Prefix expansion is one of the keys of this algorithm,
>   since it improves the speed dramatically by adding redundancy.
>   
> +Deletion
> +~~~~~~~~
> +
> +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
> +the longest prefix match with the rule to be deleted, but has smaller depth.
> +
> +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
> +value with the replacement rule.
> +
> +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
> +
> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
> +
> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
> +
> +*   All tbl8s in the group are empty .
> +
> +*   All tbl8s in the group have the same values and with depth no greater than 24.
> +
> +Free of tbl8s have different behaviors:
> +
> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> +
> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> +
> +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
> +the tbl8 group entries. This might result in incorrect lookup results.
> +
> +RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
> +while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
> +for more details.
> +
>   Lookup
>   ~~~~~~
>   
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index d682785b6..6f06c5c03 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -8,7 +8,7 @@ LIB = librte_lpm.a
>   
>   CFLAGS += -O3
>   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
>   
>   EXPORT_MAP := rte_lpm_version.map
>   
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index 021ac6d8d..6cfc083c5 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
>   # without worrying about which architecture we actually need
>   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>   deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 38ab512a4..30f541179 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>    */
>   
>   #include <string.h>
> @@ -246,12 +247,85 @@ rte_lpm_free(struct rte_lpm *lpm)
>   
>   	rte_mcfg_tailq_write_unlock();
>   
> +	if (lpm->dq)
> +		rte_rcu_qsbr_dq_delete(lpm->dq);
>   	rte_free(lpm->tbl8);
>   	rte_free(lpm->rules_tbl);
>   	rte_free(lpm);
>   	rte_free(te);
>   }
>   
> +static void
> +__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
> +{
> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	uint32_t tbl8_group_index = *(uint32_t *)data;
> +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> +
> +	RTE_SET_USED(n);
> +	/* Set tbl8 group invalid */
> +	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
> +		__ATOMIC_RELAXED);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
> +	struct rte_rcu_qsbr_dq **dq)
> +{
> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params = {0};
> +
> +	if ((lpm == NULL) || (cfg == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->v) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
> +		/* No other things to do. */
> +	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
> +		/* Init QSBR defer queue. */
> +		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
> +				"LPM_RCU_%s", lpm->name);
> +		params.name = rcu_dq_name;
> +		params.size = cfg->dq_size;
> +		if (params.size == 0)
> +			params.size = lpm->number_tbl8s;
> +		params.trigger_reclaim_limit = cfg->reclaim_thd;
> +		if (params.trigger_reclaim_limit == 0)


This makes it impossible for a user to configure reclamation triggering 
with every call. Should we allow it?


> +			params.trigger_reclaim_limit =
> +					RTE_LPM_RCU_DQ_RECLAIM_THD;
> +		params.max_reclaim_size = cfg->reclaim_max;
> +		if (params.max_reclaim_size == 0)
> +			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
> +		params.esize = sizeof(uint32_t);	/* tbl8 group index */
> +		params.free_fn = __lpm_rcu_qsbr_free_resource;
> +		params.p = lpm->tbl8;


I think it's better to pass the LPM pointer here rather than tbl8, for 
example, in case we decide to add some counters in the future


> +		params.v = cfg->v;
> +		lpm->dq = rte_rcu_qsbr_dq_create(&params);
> +		if (lpm->dq == NULL) {
> +			RTE_LOG(ERR, LPM,
> +					"LPM QS defer queue creation failed\n");
> +			return 1;
> +		}
> +		if (dq)
> +			*dq = lpm->dq;
> +	} else {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +	lpm->rcu_mode = cfg->mode;
> +	lpm->v = cfg->v;
> +
> +	return 0;
> +}
> +
>   /*
>    * Adds a rule to the rule table.
>    *
> @@ -394,14 +468,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
>    * Find, clean and allocate a tbl8.
>    */
>   static int32_t
> -tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +_tbl8_alloc(struct rte_lpm *lpm)
>   {
>   	uint32_t group_idx; /* tbl8 group index. */
>   	struct rte_lpm_tbl_entry *tbl8_entry;
>   
>   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>   		/* If a free tbl8 group is found clean it and set as VALID. */
>   		if (!tbl8_entry->valid_group) {
>   			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -427,14 +502,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>   	return -ENOSPC;
>   }
>   
> +static int32_t
> +tbl8_alloc(struct rte_lpm *lpm)
> +{
> +	int32_t group_idx; /* tbl8 group index. */
> +
> +	group_idx = _tbl8_alloc(lpm);
> +	if ((group_idx < 0) && (lpm->dq != NULL)) {


I think (group_idx == -ENOSPC) will be safer


> +		/* If there are no tbl8 groups try to reclaim one. */
> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
> +			group_idx = _tbl8_alloc(lpm);


I'm not really happy with this approach. _tbl8_alloc() produces linear 
scan through the memory to find a free group_idx and it is the slowest 
part of rte_lpm_add().
Here after reclamation of some group index we need to rescan a memory 
again to find it. It would be great if there will be some way to return 
reclaimed elements. Or just to dequeue elements from dq and reclaim them 
manually.


> +	}
> +
> +	return group_idx;
> +}
> +
>   static void
> -tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>   {
> -	/* Set tbl8 group invalid*/
>   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
>   
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (!lpm->v) {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> +		/* Wait for quiescent state change. */
> +		rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> +		/* Push into QSBR defer queue. */
> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
> +	}
>   }
>   
>   static __rte_noinline int32_t
> @@ -523,7 +624,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   
>   	if (!lpm->tbl24[tbl24_index].valid) {
>   		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc(lpm);
>   
>   		/* Check tbl8 allocation was successful. */
>   		if (tbl8_group_index < 0) {
> @@ -569,7 +670,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   	} /* If valid entry but not extended calculate the index into Table8. */
>   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>   		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc(lpm);
>   
>   		if (tbl8_group_index < 0) {
>   			return tbl8_group_index;
> @@ -977,7 +1078,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
>   		 */
>   		lpm->tbl24[tbl24_index].valid = 0;
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free(lpm->tbl8, tbl8_group_start);
> +		tbl8_free(lpm, tbl8_group_start);
>   	} else if (tbl8_recycle_index > -1) {
>   		/* Update tbl24 entry. */
>   		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -993,7 +1094,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
>   		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>   				__ATOMIC_RELAXED);
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free(lpm->tbl8, tbl8_group_start);
> +		tbl8_free(lpm, tbl8_group_start);
>   	}
>   #undef group_idx
>   	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b9d49ac87..8c054509a 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>    */
>   
>   #ifndef _RTE_LPM_H_
> @@ -20,6 +21,7 @@
>   #include <rte_memory.h>
>   #include <rte_common.h>
>   #include <rte_vect.h>
> +#include <rte_rcu_qsbr.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -62,6 +64,17 @@ extern "C" {
>   /** Bitmask used to indicate successful lookup */
>   #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
>   
> +/** @internal Default threshold to trigger RCU defer queue reclaimation. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_THD	32
> +
> +/** @internal Default RCU defer queue entries to reclaim in one go. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
> +
> +/* Create defer queue for reclaim. */
> +#define RTE_LPM_QSBR_MODE_DQ		0
> +/* Use blocking mode reclaim. No defer queue created. */
> +#define RTE_LPM_QSBR_MODE_SYNC		0x01


using enums instead of defines?


> +
>   #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>   /** @internal Tbl24 entry structure. */
>   __extension__
> @@ -130,6 +143,28 @@ struct rte_lpm {
>   			__rte_cache_aligned; /**< LPM tbl24 table. */
>   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +
> +	/* RCU config. */
> +	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
> +	uint32_t rcu_mode;		/* Blocking, defer queue. */
> +	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
> +};
> +
> +/** LPM RCU QSBR configuration structure. */
> +struct rte_lpm_rcu_config {
> +	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
> +	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
> +	 * '0' for default: create defer queue for reclaim.
> +	 */
> +	uint32_t mode;
> +	/* RCU defer queue size. default: lpm->number_tbl8s. */
> +	uint32_t dq_size;
> +	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim.
> +				 * default: RTE_LPM_RCU_DQ_RECLAIM_TRHD.
> +				 */
> +	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
> +				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
> +				 */
>   };
>   
>   /**
> @@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
>   void
>   rte_lpm_free(struct rte_lpm *lpm);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param cfg
> + *   RCU QSBR configuration
> + * @param dq
> + *   handler of created RCU QSBR defer queue
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
> +	struct rte_rcu_qsbr_dq **dq);
> +
>   /**
>    * Add a rule to the LPM table.
>    *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 500f58b80..bfccd7eac 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -21,3 +21,9 @@ DPDK_20.0 {
>   
>   	local: *;
>   };
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
  2020-06-08 18:46       ` Honnappa Nagarahalli
@ 2020-06-18 17:36         ` Medvedkin, Vladimir
  0 siblings, 0 replies; 137+ messages in thread
From: Medvedkin, Vladimir @ 2020-06-18 17:36 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ruifeng Wang, Bruce Richardson,
	John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, nd

Hi Honnappa,

On 08/06/2020 19:46, Honnappa Nagarahalli wrote:
> <snip>
>
>> Subject: [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
>>
>> Currently, the tbl8 group is freed even though the readers might be using the
>> tbl8 group entries. The freed tbl8 group can be reallocated quickly. This
>> results in incorrect lookup results.
>>
>> RCU QSBR process is integrated for safe tbl8 group reclaim.
>> Refer to RCU documentation to understand various aspects of integrating
>> RCU library into other libraries.
>>
>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>> ---
>>   doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
>>   lib/librte_lpm/Makefile            |   2 +-
>>   lib/librte_lpm/meson.build         |   1 +
>>   lib/librte_lpm/rte_lpm.c           | 123 ++++++++++++++++++++++++++---
>>   lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
>>   lib/librte_lpm/rte_lpm_version.map |   6 ++
>>   6 files changed, 211 insertions(+), 12 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/lpm_lib.rst
>> b/doc/guides/prog_guide/lpm_lib.rst
>> index 1609a57d0..7cc99044a 100644
>> --- a/doc/guides/prog_guide/lpm_lib.rst
>> +++ b/doc/guides/prog_guide/lpm_lib.rst
>> @@ -145,6 +145,38 @@ depending on whether we need to move to the next
>> table or not.
>>   Prefix expansion is one of the keys of this algorithm,  since it improves the
>> speed dramatically by adding redundancy.
>>
>> +Deletion
>> +~~~~~~~~
>> +
>> +When deleting a rule, a replacement rule is searched for. Replacement
>> +rule is an existing rule that has the longest prefix match with the rule to be
>> deleted, but has smaller depth.
>> +
>> +If a replacement rule is found, target tbl24 and tbl8 entries are
>> +updated to have the same depth and next hop value with the replacement
>> rule.
>> +
>> +If no replacement rule can be found, target tbl24 and tbl8 entries will be
>> cleared.
>> +
>> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32
>> bits.
>> +
>> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry
>> are freed in following cases:
>> +
>> +*   All tbl8s in the group are empty .
>> +
>> +*   All tbl8s in the group have the same values and with depth no greater
>> than 24.
>> +
>> +Free of tbl8s have different behaviors:
>> +
>> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
>> +
>> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
>> +
>> +When the LPM is not using RCU, tbl8 group can be freed immediately even
>> +though the readers might be using the tbl8 group entries. This might result
>> in incorrect lookup results.
>> +
>> +RCU QSBR process is integrated for safe tbl8 group reclaimation.
>> +Application has certain responsibilities while using this feature.
>> +Please refer to resource reclaimation framework of :ref:`RCU library
>> <RCU_Library>` for more details.
>> +
>>   Lookup
>>   ~~~~~~
>>
>> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
>> d682785b6..6f06c5c03 100644
>> --- a/lib/librte_lpm/Makefile
>> +++ b/lib/librte_lpm/Makefile
>> @@ -8,7 +8,7 @@ LIB = librte_lpm.a
>>
>>   CFLAGS += -O3
>>   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
>> -LDLIBS += -lrte_eal -lrte_hash
>> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
>>
>>   EXPORT_MAP := rte_lpm_version.map
>>
>> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build index
>> 021ac6d8d..6cfc083c5 100644
>> --- a/lib/librte_lpm/meson.build
>> +++ b/lib/librte_lpm/meson.build
>> @@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')  # without
>> worrying about which architecture we actually need  headers +=
>> files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')  deps += ['hash']
>> +deps += ['rcu']
>> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
>> 38ab512a4..30f541179 100644
>> --- a/lib/librte_lpm/rte_lpm.c
>> +++ b/lib/librte_lpm/rte_lpm.c
>> @@ -1,5 +1,6 @@
>>   /* SPDX-License-Identifier: BSD-3-Clause
>>    * Copyright(c) 2010-2014 Intel Corporation
>> + * Copyright(c) 2020 Arm Limited
>>    */
>>
>>   #include <string.h>
>> @@ -246,12 +247,85 @@ rte_lpm_free(struct rte_lpm *lpm)
>>
>>   	rte_mcfg_tailq_write_unlock();
>>
>> +	if (lpm->dq)
>> +		rte_rcu_qsbr_dq_delete(lpm->dq);
>>   	rte_free(lpm->tbl8);
>>   	rte_free(lpm->rules_tbl);
>>   	rte_free(lpm);
>>   	rte_free(te);
>>   }
>>
>> +static void
>> +__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n) {
>> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
>> +	uint32_t tbl8_group_index = *(uint32_t *)data;
>> +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
>> +
>> +	RTE_SET_USED(n);
>> +	/* Set tbl8 group invalid */
>> +	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
>> +		__ATOMIC_RELAXED);
>> +}
>> +
>> +/* Associate QSBR variable with an LPM object.
>> + */
>> +int
>> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
>> +	struct rte_rcu_qsbr_dq **dq)
> I prefer not to return the defer queue to the user here. I see 3 different ways how RCU can be integrated in the libraries:
>
> 1) The sync mode in which the defer queue is not created. The rte_rcu_qsbr_synchronize API is called after delete. The resource is freed after rte_rcu_qsbr_synchronize returns and the control is given back to the user.
>
> 2) The mode where the defer queue is created. There is a lot of flexibility provided now as the defer queue size, reclaim threshold and how many resources to reclaim are all configurable. IMO, this solves most of the use cases and helps the application integrate lock-less algorithms with minimal effort.
>
> 3) This is where the application has its own method of reclamation that does not fall under 1) or 2). To address this use case, I think we should make changes to the LPM library. Today, in LPM, the delete and free are combined into a single API. We can split this single API into 2 separate APIs - delete and free (similar thing was done to rte_hash library) without affecting the ABI. This should provide all the flexibility required for the application to implement any kind of reclamation algorithm it wants. Returning the defer queue to the user in the above API does not solve this use case.


Agree, I don't see any case when user will need the defer queue. From my 
perspective reclamation of tbl8 is totally internal and user should not 
worry about it. So, in case of LPM I don't see any real use case when we 
need to enable third way of how RCU can be integrated.
P.S. In rte_fib case we even don't have an opportunity to reclaim it not 
from the library due to rte_fib struct layout is hidden from user. 
Moreover, there may not be tbl8 at all, since it supports different 
algorithms


>> +{
>> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
>> +	struct rte_rcu_qsbr_dq_parameters params = {0};
>> +
>> +	if ((lpm == NULL) || (cfg == NULL)) {
>> +		rte_errno = EINVAL;
>> +		return 1;
>> +	}
>> +
>> +	if (lpm->v) {
>> +		rte_errno = EEXIST;
>> +		return 1;
>> +	}
>> +
>> +	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
>> +		/* No other things to do. */
>> +	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
>> +		/* Init QSBR defer queue. */
>> +		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
>> +				"LPM_RCU_%s", lpm->name);
>> +		params.name = rcu_dq_name;
>> +		params.size = cfg->dq_size;
>> +		if (params.size == 0)
>> +			params.size = lpm->number_tbl8s;
>> +		params.trigger_reclaim_limit = cfg->reclaim_thd;
>> +		if (params.trigger_reclaim_limit == 0)
>> +			params.trigger_reclaim_limit =
>> +					RTE_LPM_RCU_DQ_RECLAIM_THD;
>> +		params.max_reclaim_size = cfg->reclaim_max;
>> +		if (params.max_reclaim_size == 0)
>> +			params.max_reclaim_size =
>> RTE_LPM_RCU_DQ_RECLAIM_MAX;
>> +		params.esize = sizeof(uint32_t);	/* tbl8 group index */
>> +		params.free_fn = __lpm_rcu_qsbr_free_resource;
>> +		params.p = lpm->tbl8;
>> +		params.v = cfg->v;
>> +		lpm->dq = rte_rcu_qsbr_dq_create(&params);
>> +		if (lpm->dq == NULL) {
>> +			RTE_LOG(ERR, LPM,
>> +					"LPM QS defer queue creation
>> failed\n");
>> +			return 1;
>> +		}
>> +		if (dq)
>> +			*dq = lpm->dq;
>> +	} else {
>> +		rte_errno = EINVAL;
>> +		return 1;
>> +	}
>> +	lpm->rcu_mode = cfg->mode;
>> +	lpm->v = cfg->v;
>> +
>> +	return 0;
>> +}
>> +
>>   /*
>>    * Adds a rule to the rule table.
>>    *
>> @@ -394,14 +468,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked,
>> uint8_t depth)
>>    * Find, clean and allocate a tbl8.
>>    */
>>   static int32_t
>> -tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>> +_tbl8_alloc(struct rte_lpm *lpm)
>>   {
>>   	uint32_t group_idx; /* tbl8 group index. */
>>   	struct rte_lpm_tbl_entry *tbl8_entry;
>>
>>   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
>> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
>> -		tbl8_entry = &tbl8[group_idx *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
>> +		tbl8_entry = &lpm->tbl8[group_idx *
>> +
>> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>>   		/* If a free tbl8 group is found clean it and set as VALID. */
>>   		if (!tbl8_entry->valid_group) {
>>   			struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -
>> 427,14 +502,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t
>> number_tbl8s)
>>   	return -ENOSPC;
>>   }
>>
>> +static int32_t
>> +tbl8_alloc(struct rte_lpm *lpm)
>> +{
>> +	int32_t group_idx; /* tbl8 group index. */
>> +
>> +	group_idx = _tbl8_alloc(lpm);
>> +	if ((group_idx < 0) && (lpm->dq != NULL)) {
>> +		/* If there are no tbl8 groups try to reclaim one. */
>> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL)
>> == 0)
>> +			group_idx = _tbl8_alloc(lpm);
>> +	}
>> +
>> +	return group_idx;
>> +}
>> +
>>   static void
>> -tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
>> +tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>>   {
>> -	/* Set tbl8 group invalid*/
>>   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
>>
>> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
>> -			__ATOMIC_RELAXED);
>> +	if (!lpm->v) {
>> +		/* Set tbl8 group invalid*/
>> +		__atomic_store(&lpm->tbl8[tbl8_group_start],
>> &zero_tbl8_entry,
>> +				__ATOMIC_RELAXED);
>> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
>> +		/* Wait for quiescent state change. */
>> +		rte_rcu_qsbr_synchronize(lpm->v,
>> RTE_QSBR_THRID_INVALID);
>> +		/* Set tbl8 group invalid*/
>> +		__atomic_store(&lpm->tbl8[tbl8_group_start],
>> &zero_tbl8_entry,
>> +				__ATOMIC_RELAXED);
>> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
>> +		/* Push into QSBR defer queue. */
>> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void
>> *)&tbl8_group_start);
>> +	}
>>   }
>>
>>   static __rte_noinline int32_t
>> @@ -523,7 +624,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t
>> ip_masked, uint8_t depth,
>>
>>   	if (!lpm->tbl24[tbl24_index].valid) {
>>   		/* Search for a free tbl8 group. */
>> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
>> +		tbl8_group_index = tbl8_alloc(lpm);
>>
>>   		/* Check tbl8 allocation was successful. */
>>   		if (tbl8_group_index < 0) {
>> @@ -569,7 +670,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t
>> ip_masked, uint8_t depth,
>>   	} /* If valid entry but not extended calculate the index into Table8. */
>>   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>>   		/* Search for free tbl8 group. */
>> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
>> +		tbl8_group_index = tbl8_alloc(lpm);
>>
>>   		if (tbl8_group_index < 0) {
>>   			return tbl8_group_index;
>> @@ -977,7 +1078,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t
>> ip_masked,
>>   		 */
>>   		lpm->tbl24[tbl24_index].valid = 0;
>>   		__atomic_thread_fence(__ATOMIC_RELEASE);
>> -		tbl8_free(lpm->tbl8, tbl8_group_start);
>> +		tbl8_free(lpm, tbl8_group_start);
>>   	} else if (tbl8_recycle_index > -1) {
>>   		/* Update tbl24 entry. */
>>   		struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -993,7
>> +1094,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
>>   		__atomic_store(&lpm->tbl24[tbl24_index],
>> &new_tbl24_entry,
>>   				__ATOMIC_RELAXED);
>>   		__atomic_thread_fence(__ATOMIC_RELEASE);
>> -		tbl8_free(lpm->tbl8, tbl8_group_start);
>> +		tbl8_free(lpm, tbl8_group_start);
>>   	}
>>   #undef group_idx
>>   	return 0;
>> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
>> b9d49ac87..8c054509a 100644
>> --- a/lib/librte_lpm/rte_lpm.h
>> +++ b/lib/librte_lpm/rte_lpm.h
>> @@ -1,5 +1,6 @@
>>   /* SPDX-License-Identifier: BSD-3-Clause
>>    * Copyright(c) 2010-2014 Intel Corporation
>> + * Copyright(c) 2020 Arm Limited
>>    */
>>
>>   #ifndef _RTE_LPM_H_
>> @@ -20,6 +21,7 @@
>>   #include <rte_memory.h>
>>   #include <rte_common.h>
>>   #include <rte_vect.h>
>> +#include <rte_rcu_qsbr.h>
>>
>>   #ifdef __cplusplus
>>   extern "C" {
>> @@ -62,6 +64,17 @@ extern "C" {
>>   /** Bitmask used to indicate successful lookup */
>>   #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
>>
>> +/** @internal Default threshold to trigger RCU defer queue reclaimation. */
>> +#define RTE_LPM_RCU_DQ_RECLAIM_THD	32
>> +
>> +/** @internal Default RCU defer queue entries to reclaim in one go. */
>> +#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
>> +
>> +/* Create defer queue for reclaim. */
>> +#define RTE_LPM_QSBR_MODE_DQ		0
>> +/* Use blocking mode reclaim. No defer queue created. */
>> +#define RTE_LPM_QSBR_MODE_SYNC		0x01
>> +
>>   #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>>   /** @internal Tbl24 entry structure. */  __extension__ @@ -130,6 +143,28
>> @@ struct rte_lpm {
>>   			__rte_cache_aligned; /**< LPM tbl24 table. */
>>   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>>   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
>> +
>> +	/* RCU config. */
>> +	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
>> +	uint32_t rcu_mode;		/* Blocking, defer queue. */
>> +	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
>> +};
>> +
>> +/** LPM RCU QSBR configuration structure. */ struct rte_lpm_rcu_config
>> +{
>> +	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
>> +	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
>> +	 * '0' for default: create defer queue for reclaim.
>> +	 */
>> +	uint32_t mode;
>> +	/* RCU defer queue size. default: lpm->number_tbl8s. */
>> +	uint32_t dq_size;
>> +	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim.
>> +				 * default:
>> RTE_LPM_RCU_DQ_RECLAIM_TRHD.
>> +				 */
>> +	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
>> +				 * default:
>> RTE_LPM_RCU_DQ_RECLAIM_MAX.
>> +				 */
>>   };
>>
>>   /**
>> @@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);  void
>> rte_lpm_free(struct rte_lpm *lpm);
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Associate RCU QSBR variable with an LPM object.
>> + *
>> + * @param lpm
>> + *   the lpm object to add RCU QSBR
>> + * @param cfg
>> + *   RCU QSBR configuration
>> + * @param dq
>> + *   handler of created RCU QSBR defer queue
>> + * @return
>> + *   On success - 0
>> + *   On error - 1 with error code set in rte_errno.
>> + *   Possible rte_errno codes are:
>> + *   - EINVAL - invalid pointer
>> + *   - EEXIST - already added QSBR
>> + *   - ENOMEM - memory allocation failure
>> + */
>> +__rte_experimental
>> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config
>> *cfg,
>> +	struct rte_rcu_qsbr_dq **dq);
>> +
>>   /**
>>    * Add a rule to the LPM table.
>>    *
>> diff --git a/lib/librte_lpm/rte_lpm_version.map
>> b/lib/librte_lpm/rte_lpm_version.map
>> index 500f58b80..bfccd7eac 100644
>> --- a/lib/librte_lpm/rte_lpm_version.map
>> +++ b/lib/librte_lpm/rte_lpm_version.map
>> @@ -21,3 +21,9 @@ DPDK_20.0 {
>>
>>   	local: *;
>>   };
>> +
>> +EXPERIMENTAL {
>> +	global:
>> +
>> +	rte_lpm_rcu_qsbr_add;
>> +};
>> --
>> 2.17.1

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
  2020-06-18 17:21       ` Medvedkin, Vladimir
@ 2020-06-22  5:46         ` Ruifeng Wang
  2020-06-23  4:34           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-22  5:46 UTC (permalink / raw)
  To: Medvedkin, Vladimir, Bruce Richardson, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, Honnappa Nagarahalli
  Cc: dev, konstantin.ananyev, nd, nd

Hi Vladimir,

> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Friday, June 19, 2020 1:22 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Bruce Richardson
> <bruce.richardson@intel.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>
> Cc: dev@dpdk.org; konstantin.ananyev@intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
> 
> Hi Ruifeng,
> 
> Thanks for patches, see comments below
Thanks for your review.
> 
> 
> On 08/06/2020 06:16, Ruifeng Wang wrote:
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >   doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
> >   lib/librte_lpm/Makefile            |   2 +-
> >   lib/librte_lpm/meson.build         |   1 +
> >   lib/librte_lpm/rte_lpm.c           | 123 ++++++++++++++++++++++++++---
> >   lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
> >   lib/librte_lpm/rte_lpm_version.map |   6 ++
> >   6 files changed, 211 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/lpm_lib.rst
> > b/doc/guides/prog_guide/lpm_lib.rst
> > index 1609a57d0..7cc99044a 100644
> > --- a/doc/guides/prog_guide/lpm_lib.rst
> > +++ b/doc/guides/prog_guide/lpm_lib.rst
> > @@ -145,6 +145,38 @@ depending on whether we need to move to the
> next table or not.
> >   Prefix expansion is one of the keys of this algorithm,
> >   since it improves the speed dramatically by adding redundancy.
> >
> > +Deletion
> > +~~~~~~~~
> > +
> > +When deleting a rule, a replacement rule is searched for. Replacement
> > +rule is an existing rule that has the longest prefix match with the rule to be
> deleted, but has smaller depth.
> > +
> > +If a replacement rule is found, target tbl24 and tbl8 entries are
> > +updated to have the same depth and next hop value with the
> replacement rule.
> > +
> > +If no replacement rule can be found, target tbl24 and tbl8 entries will be
> cleared.
> > +
> > +Prefix expansion is performed if the rule's depth is not exactly 24 bits or
> 32 bits.
> > +
> > +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry
> are freed in following cases:
> > +
> > +*   All tbl8s in the group are empty .
> > +
> > +*   All tbl8s in the group have the same values and with depth no greater
> than 24.
> > +
> > +Free of tbl8s have different behaviors:
> > +
> > +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> > +
> > +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> > +
> > +When the LPM is not using RCU, tbl8 group can be freed immediately
> > +even though the readers might be using the tbl8 group entries. This might
> result in incorrect lookup results.
> > +
> > +RCU QSBR process is integrated for safe tbl8 group reclaimation.
> > +Application has certain responsibilities while using this feature.
> > +Please refer to resource reclaimation framework of :ref:`RCU library
> <RCU_Library>` for more details.
> > +
> >   Lookup
> >   ~~~~~~
> >
> > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > d682785b6..6f06c5c03 100644
> > --- a/lib/librte_lpm/Makefile
> > +++ b/lib/librte_lpm/Makefile
> > @@ -8,7 +8,7 @@ LIB = librte_lpm.a
> >
> >   CFLAGS += -O3
> >   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal
> > -lrte_hash
> > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> >
> >   EXPORT_MAP := rte_lpm_version.map
> >
> > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > index 021ac6d8d..6cfc083c5 100644
> > --- a/lib/librte_lpm/meson.build
> > +++ b/lib/librte_lpm/meson.build
> > @@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
> >   # without worrying about which architecture we actually need
> >   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> >   deps += ['hash']
> > +deps += ['rcu']
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 38ab512a4..30f541179 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >    */
> >
> >   #include <string.h>
> > @@ -246,12 +247,85 @@ rte_lpm_free(struct rte_lpm *lpm)
> >
> >   	rte_mcfg_tailq_write_unlock();
> >
> > +	if (lpm->dq)
> > +		rte_rcu_qsbr_dq_delete(lpm->dq);
> >   	rte_free(lpm->tbl8);
> >   	rte_free(lpm->rules_tbl);
> >   	rte_free(lpm);
> >   	rte_free(te);
> >   }
> >
> > +static void
> > +__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n) {
> > +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > +	uint32_t tbl8_group_index = *(uint32_t *)data;
> > +	struct rte_lpm_tbl_entry *tbl8 = (struct rte_lpm_tbl_entry *)p;
> > +
> > +	RTE_SET_USED(n);
> > +	/* Set tbl8 group invalid */
> > +	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
> > +		__ATOMIC_RELAXED);
> > +}
> > +
> > +/* Associate QSBR variable with an LPM object.
> > + */
> > +int
> > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config
> *cfg,
> > +	struct rte_rcu_qsbr_dq **dq)
> > +{
> > +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > +	struct rte_rcu_qsbr_dq_parameters params = {0};
> > +
> > +	if ((lpm == NULL) || (cfg == NULL)) {
> > +		rte_errno = EINVAL;
> > +		return 1;
> > +	}
> > +
> > +	if (lpm->v) {
> > +		rte_errno = EEXIST;
> > +		return 1;
> > +	}
> > +
> > +	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
> > +		/* No other things to do. */
> > +	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
> > +		/* Init QSBR defer queue. */
> > +		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
> > +				"LPM_RCU_%s", lpm->name);
> > +		params.name = rcu_dq_name;
> > +		params.size = cfg->dq_size;
> > +		if (params.size == 0)
> > +			params.size = lpm->number_tbl8s;
> > +		params.trigger_reclaim_limit = cfg->reclaim_thd;
> > +		if (params.trigger_reclaim_limit == 0)
> 
> 
> This makes it impossible for a user to configure reclamation triggering with
> every call. Should we allow it?
Yes, use (reclaim_thd = 0) to trigger reclamation at each dq enqueue should be a valid case.
Will remove value overriding and take it as is.
> 
> 
> > +			params.trigger_reclaim_limit =
> > +					RTE_LPM_RCU_DQ_RECLAIM_THD;
> > +		params.max_reclaim_size = cfg->reclaim_max;
> > +		if (params.max_reclaim_size == 0)
> > +			params.max_reclaim_size =
> RTE_LPM_RCU_DQ_RECLAIM_MAX;
> > +		params.esize = sizeof(uint32_t);	/* tbl8 group index */
> > +		params.free_fn = __lpm_rcu_qsbr_free_resource;
> > +		params.p = lpm->tbl8;
> 
> 
> I think it's better to pass the LPM pointer here rather than tbl8, for
> example, in case we decide to add some counters in the future
Use LPM pointer is more extendable.
Will change in next version.
> 
> 
> > +		params.v = cfg->v;
> > +		lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > +		if (lpm->dq == NULL) {
> > +			RTE_LOG(ERR, LPM,
> > +					"LPM QS defer queue creation
> failed\n");
> > +			return 1;
> > +		}
> > +		if (dq)
> > +			*dq = lpm->dq;
> > +	} else {
> > +		rte_errno = EINVAL;
> > +		return 1;
> > +	}
> > +	lpm->rcu_mode = cfg->mode;
> > +	lpm->v = cfg->v;
> > +
> > +	return 0;
> > +}
> > +
> >   /*
> >    * Adds a rule to the rule table.
> >    *
> > @@ -394,14 +468,15 @@ rule_find(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth)
> >    * Find, clean and allocate a tbl8.
> >    */
> >   static int32_t
> > -tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> > +_tbl8_alloc(struct rte_lpm *lpm)
> >   {
> >   	uint32_t group_idx; /* tbl8 group index. */
> >   	struct rte_lpm_tbl_entry *tbl8_entry;
> >
> >   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > -		tbl8_entry = &tbl8[group_idx *
> RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > +		tbl8_entry = &lpm->tbl8[group_idx *
> > +
> 	RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> >   		/* If a free tbl8 group is found clean it and set as VALID. */
> >   		if (!tbl8_entry->valid_group) {
> >   			struct rte_lpm_tbl_entry new_tbl8_entry = {
> > @@ -427,14 +502,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8,
> uint32_t number_tbl8s)
> >   	return -ENOSPC;
> >   }
> >
> > +static int32_t
> > +tbl8_alloc(struct rte_lpm *lpm)
> > +{
> > +	int32_t group_idx; /* tbl8 group index. */
> > +
> > +	group_idx = _tbl8_alloc(lpm);
> > +	if ((group_idx < 0) && (lpm->dq != NULL)) {
> 
> 
> I think (group_idx == -ENOSPC) will be safer
Will change it in next version.
> 
> 
> > +		/* If there are no tbl8 groups try to reclaim one. */
> > +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL)
> == 0)
> > +			group_idx = _tbl8_alloc(lpm);
> 
> 
> I'm not really happy with this approach. _tbl8_alloc() produces linear
> scan through the memory to find a free group_idx and it is the slowest
> part of rte_lpm_add().
> Here after reclamation of some group index we need to rescan a memory
> again to find it. It would be great if there will be some way to return
> reclaimed elements. Or just to dequeue elements from dq and reclaim them
> manually.
I think there is little chance a rescan will be needed. If RCU QSBR defer queue trigger_reclaim_limit
is configured with reasonable value, tbl8 groups will be reclaimed regularly. So defer queue won't get too long.

Return reclaimed elements makes API complex. Not sure if it is useful for other use cases.
@Honnappa Nagarahalli, any idea?
> 
> 
> > +	}
> > +
> > +	return group_idx;
> > +}
> > +
> >   static void
> > -tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> > +tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> >   {
> > -	/* Set tbl8 group invalid*/
> >   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> >
> > -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > -			__ATOMIC_RELAXED);
> > +	if (!lpm->v) {
> > +		/* Set tbl8 group invalid*/
> > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> > +				__ATOMIC_RELAXED);
> > +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> > +		/* Wait for quiescent state change. */
> > +		rte_rcu_qsbr_synchronize(lpm->v,
> RTE_QSBR_THRID_INVALID);
> > +		/* Set tbl8 group invalid*/
> > +		__atomic_store(&lpm->tbl8[tbl8_group_start],
> &zero_tbl8_entry,
> > +				__ATOMIC_RELAXED);
> > +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> > +		/* Push into QSBR defer queue. */
> > +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void
> *)&tbl8_group_start);
> > +	}
> >   }
> >
> >   static __rte_noinline int32_t
> > @@ -523,7 +624,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth,
> >
> >   	if (!lpm->tbl24[tbl24_index].valid) {
> >   		/* Search for a free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc(lpm);
> >
> >   		/* Check tbl8 allocation was successful. */
> >   		if (tbl8_group_index < 0) {
> > @@ -569,7 +670,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth,
> >   	} /* If valid entry but not extended calculate the index into Table8. */
> >   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> >   		/* Search for free tbl8 group. */
> > -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm-
> >number_tbl8s);
> > +		tbl8_group_index = tbl8_alloc(lpm);
> >
> >   		if (tbl8_group_index < 0) {
> >   			return tbl8_group_index;
> > @@ -977,7 +1078,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t
> ip_masked,
> >   		 */
> >   		lpm->tbl24[tbl24_index].valid = 0;
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free(lpm, tbl8_group_start);
> >   	} else if (tbl8_recycle_index > -1) {
> >   		/* Update tbl24 entry. */
> >   		struct rte_lpm_tbl_entry new_tbl24_entry = {
> > @@ -993,7 +1094,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t
> ip_masked,
> >   		__atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> >   				__ATOMIC_RELAXED);
> >   		__atomic_thread_fence(__ATOMIC_RELEASE);
> > -		tbl8_free(lpm->tbl8, tbl8_group_start);
> > +		tbl8_free(lpm, tbl8_group_start);
> >   	}
> >   #undef group_idx
> >   	return 0;
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> > index b9d49ac87..8c054509a 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >    */
> >
> >   #ifndef _RTE_LPM_H_
> > @@ -20,6 +21,7 @@
> >   #include <rte_memory.h>
> >   #include <rte_common.h>
> >   #include <rte_vect.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >   #ifdef __cplusplus
> >   extern "C" {
> > @@ -62,6 +64,17 @@ extern "C" {
> >   /** Bitmask used to indicate successful lookup */
> >   #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
> >
> > +/** @internal Default threshold to trigger RCU defer queue reclaimation.
> */
> > +#define RTE_LPM_RCU_DQ_RECLAIM_THD	32
> > +
> > +/** @internal Default RCU defer queue entries to reclaim in one go. */
> > +#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
> > +
> > +/* Create defer queue for reclaim. */
> > +#define RTE_LPM_QSBR_MODE_DQ		0
> > +/* Use blocking mode reclaim. No defer queue created. */
> > +#define RTE_LPM_QSBR_MODE_SYNC		0x01
> 
> 
> using enums instead of defines?
Will convert to enums in next version.
> 
> 
> > +
> >   #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> >   /** @internal Tbl24 entry structure. */
> >   __extension__
> > @@ -130,6 +143,28 @@ struct rte_lpm {
> >   			__rte_cache_aligned; /**< LPM tbl24 table. */
> >   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +
> > +	/* RCU config. */
> > +	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
> > +	uint32_t rcu_mode;		/* Blocking, defer queue. */
> > +	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
> > +};
> > +
> > +/** LPM RCU QSBR configuration structure. */
> > +struct rte_lpm_rcu_config {
> > +	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
> > +	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
> > +	 * '0' for default: create defer queue for reclaim.
> > +	 */
> > +	uint32_t mode;
> > +	/* RCU defer queue size. default: lpm->number_tbl8s. */
> > +	uint32_t dq_size;
> > +	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim.
> > +				 * default:
> RTE_LPM_RCU_DQ_RECLAIM_TRHD.
> > +				 */
> > +	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
> > +				 * default:
> RTE_LPM_RCU_DQ_RECLAIM_MAX.
> > +				 */
> >   };
> >
> >   /**
> > @@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
> >   void
> >   rte_lpm_free(struct rte_lpm *lpm);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Associate RCU QSBR variable with an LPM object.
> > + *
> > + * @param lpm
> > + *   the lpm object to add RCU QSBR
> > + * @param cfg
> > + *   RCU QSBR configuration
> > + * @param dq
> > + *   handler of created RCU QSBR defer queue
> > + * @return
> > + *   On success - 0
> > + *   On error - 1 with error code set in rte_errno.
> > + *   Possible rte_errno codes are:
> > + *   - EINVAL - invalid pointer
> > + *   - EEXIST - already added QSBR
> > + *   - ENOMEM - memory allocation failure
> > + */
> > +__rte_experimental
> > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
> rte_lpm_rcu_config *cfg,
> > +	struct rte_rcu_qsbr_dq **dq);
> > +
> >   /**
> >    * Add a rule to the LPM table.
> >    *
> > diff --git a/lib/librte_lpm/rte_lpm_version.map
> b/lib/librte_lpm/rte_lpm_version.map
> > index 500f58b80..bfccd7eac 100644
> > --- a/lib/librte_lpm/rte_lpm_version.map
> > +++ b/lib/librte_lpm/rte_lpm_version.map
> > @@ -21,3 +21,9 @@ DPDK_20.0 {
> >
> >   	local: *;
> >   };
> > +
> > +EXPERIMENTAL {
> > +	global:
> > +
> > +	rte_lpm_rcu_qsbr_add;
> > +};
> 
> --
> Regards,
> Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR
  2020-06-22  5:46         ` Ruifeng Wang
@ 2020-06-23  4:34           ` Honnappa Nagarahalli
  0 siblings, 0 replies; 137+ messages in thread
From: Honnappa Nagarahalli @ 2020-06-23  4:34 UTC (permalink / raw)
  To: Ruifeng Wang, Medvedkin, Vladimir, Bruce Richardson,
	John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, nd, Honnappa Nagarahalli, nd

<snip>

> >
> >
> > On 08/06/2020 06:16, Ruifeng Wang wrote:
> > > Currently, the tbl8 group is freed even though the readers might be
> > > using the tbl8 group entries. The freed tbl8 group can be
> > > reallocated quickly. This results in incorrect lookup results.
> > >
> > > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > > Refer to RCU documentation to understand various aspects of
> > > integrating RCU library into other libraries.
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > ---
> > >   doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
> > >   lib/librte_lpm/Makefile            |   2 +-
> > >   lib/librte_lpm/meson.build         |   1 +
> > >   lib/librte_lpm/rte_lpm.c           | 123 ++++++++++++++++++++++++++---
> > >   lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
> > >   lib/librte_lpm/rte_lpm_version.map |   6 ++
> > >   6 files changed, 211 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/doc/guides/prog_guide/lpm_lib.rst
> > > b/doc/guides/prog_guide/lpm_lib.rst
> > > index 1609a57d0..7cc99044a 100644
> > > --- a/doc/guides/prog_guide/lpm_lib.rst
> > > +++ b/doc/guides/prog_guide/lpm_lib.rst
> > > @@ -145,6 +145,38 @@ depending on whether we need to move to the
> > next table or not.
> > >   Prefix expansion is one of the keys of this algorithm,
> > >   since it improves the speed dramatically by adding redundancy.
> > >
> > > +Deletion
> > > +~~~~~~~~
> > > +
> > > +When deleting a rule, a replacement rule is searched for.
> > > +Replacement rule is an existing rule that has the longest prefix
> > > +match with the rule to be
> > deleted, but has smaller depth.
> > > +
> > > +If a replacement rule is found, target tbl24 and tbl8 entries are
> > > +updated to have the same depth and next hop value with the
> > replacement rule.
> > > +
> > > +If no replacement rule can be found, target tbl24 and tbl8 entries
> > > +will be
> > cleared.
> > > +
> > > +Prefix expansion is performed if the rule's depth is not exactly 24
> > > +bits or
> > 32 bits.
> > > +
> > > +After deleting a rule, a group of tbl8s that belongs to the same
> > > +tbl24 entry
> > are freed in following cases:
> > > +
> > > +*   All tbl8s in the group are empty .
> > > +
> > > +*   All tbl8s in the group have the same values and with depth no
> greater
> > than 24.
> > > +
> > > +Free of tbl8s have different behaviors:
> > > +
> > > +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> > > +
> > > +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent
> state.
> > > +
> > > +When the LPM is not using RCU, tbl8 group can be freed immediately
> > > +even though the readers might be using the tbl8 group entries. This
> > > +might
> > result in incorrect lookup results.
> > > +
> > > +RCU QSBR process is integrated for safe tbl8 group reclaimation.
> > > +Application has certain responsibilities while using this feature.
> > > +Please refer to resource reclaimation framework of :ref:`RCU
> > > +library
> > <RCU_Library>` for more details.
> > > +
> > >   Lookup
> > >   ~~~~~~
> > >
> > > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index
> > > d682785b6..6f06c5c03 100644
> > > --- a/lib/librte_lpm/Makefile
> > > +++ b/lib/librte_lpm/Makefile
> > > @@ -8,7 +8,7 @@ LIB = librte_lpm.a
> > >
> > >   CFLAGS += -O3
> > >   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal
> > > -lrte_hash
> > > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
> > >
> > >   EXPORT_MAP := rte_lpm_version.map
> > >
> > > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> > > index 021ac6d8d..6cfc083c5 100644
> > > --- a/lib/librte_lpm/meson.build
> > > +++ b/lib/librte_lpm/meson.build
> > > @@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
> > >   # without worrying about which architecture we actually need
> > >   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
> > >   deps += ['hash']
> > > +deps += ['rcu']
> > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> > > index
> > > 38ab512a4..30f541179 100644
> > > --- a/lib/librte_lpm/rte_lpm.c
> > > +++ b/lib/librte_lpm/rte_lpm.c
> > > @@ -1,5 +1,6 @@
> > >   /* SPDX-License-Identifier: BSD-3-Clause
> > >    * Copyright(c) 2010-2014 Intel Corporation
> > > + * Copyright(c) 2020 Arm Limited
> > >    */
> > >
> > >   #include <string.h>
> > > @@ -246,12 +247,85 @@ rte_lpm_free(struct rte_lpm *lpm)
> > >
> > >   rte_mcfg_tailq_write_unlock();
> > >
> > > +if (lpm->dq)
> > > +rte_rcu_qsbr_dq_delete(lpm->dq);
> > >   rte_free(lpm->tbl8);
> > >   rte_free(lpm->rules_tbl);
> > >   rte_free(lpm);
> > >   rte_free(te);
> > >   }
> > >
> > > +static void
> > > +__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n) {
> > > +struct rte_lpm_tbl_entry zero_tbl8_entry = {0}; uint32_t
> > > +tbl8_group_index = *(uint32_t *)data; struct rte_lpm_tbl_entry
> > > +*tbl8 = (struct rte_lpm_tbl_entry *)p;
> > > +
> > > +RTE_SET_USED(n);
> > > +/* Set tbl8 group invalid */
> > > +__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
> > > +__ATOMIC_RELAXED); }
> > > +
> > > +/* Associate QSBR variable with an LPM object.
> > > + */
> > > +int
> > > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config
> > *cfg,
> > > +struct rte_rcu_qsbr_dq **dq)
> > > +{
> > > +char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> > > +struct rte_rcu_qsbr_dq_parameters params = {0};
> > > +
> > > +if ((lpm == NULL) || (cfg == NULL)) { rte_errno = EINVAL; return 1;
> > > +}
> > > +
> > > +if (lpm->v) {
> > > +rte_errno = EEXIST;
> > > +return 1;
> > > +}
> > > +
> > > +if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
> > > +/* No other things to do. */
> > > +} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
> > > +/* Init QSBR defer queue. */
> > > +snprintf(rcu_dq_name, sizeof(rcu_dq_name), "LPM_RCU_%s",
> > > +lpm->name); params.name = rcu_dq_name; params.size = cfg->dq_size;
> > > +if (params.size == 0) params.size = lpm->number_tbl8s;
> > > +params.trigger_reclaim_limit = cfg->reclaim_thd; if
> > > +(params.trigger_reclaim_limit == 0)
> >
> >
> > This makes it impossible for a user to configure reclamation
> > triggering with every call. Should we allow it?
> Yes, use (reclaim_thd = 0) to trigger reclamation at each dq enqueue should
> be a valid case.
> Will remove value overriding and take it as is.
> >
> >
> > > +params.trigger_reclaim_limit =
> > > +RTE_LPM_RCU_DQ_RECLAIM_THD;
> > > +params.max_reclaim_size = cfg->reclaim_max; if
> > > +(params.max_reclaim_size == 0) params.max_reclaim_size =
> > RTE_LPM_RCU_DQ_RECLAIM_MAX;
> > > +params.esize = sizeof(uint32_t);/* tbl8 group index */
> > > +params.free_fn = __lpm_rcu_qsbr_free_resource; params.p =
> > > +lpm->tbl8;
> >
> >
> > I think it's better to pass the LPM pointer here rather than tbl8, for
> > example, in case we decide to add some counters in the future
> Use LPM pointer is more extendable.
> Will change in next version.
> >
> >
> > > +params.v = cfg->v;
> > > +lpm->dq = rte_rcu_qsbr_dq_create(&params);
> > > +if (lpm->dq == NULL) {
> > > +RTE_LOG(ERR, LPM,
> > > +"LPM QS defer queue creation
> > failed\n");
> > > +return 1;
> > > +}
> > > +if (dq)
> > > +*dq = lpm->dq;
> > > +} else {
> > > +rte_errno = EINVAL;
> > > +return 1;
> > > +}
> > > +lpm->rcu_mode = cfg->mode;
> > > +lpm->v = cfg->v;
> > > +
> > > +return 0;
> > > +}
> > > +
> > >   /*
> > >    * Adds a rule to the rule table.
> > >    *
> > > @@ -394,14 +468,15 @@ rule_find(struct rte_lpm *lpm, uint32_t
> > ip_masked, uint8_t depth)
> > >    * Find, clean and allocate a tbl8.
> > >    */
> > >   static int32_t
> > > -tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> > > +_tbl8_alloc(struct rte_lpm *lpm)
> > >   {
> > >   uint32_t group_idx; /* tbl8 group index. */
> > >   struct rte_lpm_tbl_entry *tbl8_entry;
> > >
> > >   /* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> > > -for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> > > -tbl8_entry = &tbl8[group_idx *
> > RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > > +for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> > > +tbl8_entry = &lpm->tbl8[group_idx *
> > > +
> > RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> > >   /* If a free tbl8 group is found clean it and set as VALID. */
> > >   if (!tbl8_entry->valid_group) {
> > >   struct rte_lpm_tbl_entry new_tbl8_entry = { @@ -427,14 +502,40 @@
> > > tbl8_alloc(struct rte_lpm_tbl_entry *tbl8,
> > uint32_t number_tbl8s)
> > >   return -ENOSPC;
> > >   }
> > >
> > > +static int32_t
> > > +tbl8_alloc(struct rte_lpm *lpm)
> > > +{
> > > +int32_t group_idx; /* tbl8 group index. */
> > > +
> > > +group_idx = _tbl8_alloc(lpm);
> > > +if ((group_idx < 0) && (lpm->dq != NULL)) {
> >
> >
> > I think (group_idx == -ENOSPC) will be safer
> Will change it in next version.
> >
> >
> > > +/* If there are no tbl8 groups try to reclaim one. */ if
> > > +(rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL)
> > == 0)
> > > +group_idx = _tbl8_alloc(lpm);
> >
> >
> > I'm not really happy with this approach. _tbl8_alloc() produces linear
> > scan through the memory to find a free group_idx and it is the slowest
> > part of rte_lpm_add().
> > Here after reclamation of some group index we need to rescan a memory
> > again to find it. It would be great if there will be some way to
> > return reclaimed elements. Or just to dequeue elements from dq and
> > reclaim them manually.
> I think there is little chance a rescan will be needed. If RCU QSBR defer
> queue trigger_reclaim_limit is configured with reasonable value, tbl8 groups
> will be reclaimed regularly. So defer queue won't get too long.
> 
> Return reclaimed elements makes API complex. Not sure if it is useful for
> other use cases.
> @Honnappa Nagarahalli, any idea?
I think it is a problem that should be solved in LPM. I had hacked up the allocation scheme to use a ring when I was doing the performance testing. I can send that out if required.

> >
> >
> > > +}
> > > +
> > > +return group_idx;
> > > +}
> > > +
> > >   static void
> > > -tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t
> > > tbl8_group_start)
> > > +tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
> > >   {
> > > -/* Set tbl8 group invalid*/
> > >   struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> > >
> > > -__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> > > -__ATOMIC_RELAXED);
> > > +if (!lpm->v) {
> > > +/* Set tbl8 group invalid*/
> > > +__atomic_store(&lpm->tbl8[tbl8_group_start],
> > &zero_tbl8_entry,
> > > +__ATOMIC_RELAXED);
> > > +} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> > > +/* Wait for quiescent state change. */
> > > +rte_rcu_qsbr_synchronize(lpm->v,
> > RTE_QSBR_THRID_INVALID);
> > > +/* Set tbl8 group invalid*/
> > > +__atomic_store(&lpm->tbl8[tbl8_group_start],
> > &zero_tbl8_entry,
> > > +__ATOMIC_RELAXED);
> > > +} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> > > +/* Push into QSBR defer queue. */
> > > +rte_rcu_qsbr_dq_enqueue(lpm->dq, (void
> > *)&tbl8_group_start);
> > > +}
> > >   }
> > >
> > >   static __rte_noinline int32_t
> > > @@ -523,7 +624,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t
> > ip_masked, uint8_t depth,
> > >
> > >   if (!lpm->tbl24[tbl24_index].valid) {
> > >   /* Search for a free tbl8 group. */  -tbl8_group_index =
> > >tbl8_alloc(lpm->tbl8, lpm- number_tbl8s);
> > > +tbl8_group_index = tbl8_alloc(lpm);
> > >
> > >   /* Check tbl8 allocation was successful. */
> > >   if (tbl8_group_index < 0) {
> > > @@ -569,7 +670,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t
> > ip_masked, uint8_t depth,
> > >   } /* If valid entry but not extended calculate the index into Table8. */
> > >   else if (lpm->tbl24[tbl24_index].valid_group == 0) {
> > >   /* Search for free tbl8 group. */
> > > -tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm- number_tbl8s);
> > > +tbl8_group_index = tbl8_alloc(lpm);
> > >
> > >   if (tbl8_group_index < 0) {
> > >   return tbl8_group_index;
> > > @@ -977,7 +1078,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t
> > ip_masked,
> > >    */
> > >   lpm->tbl24[tbl24_index].valid = 0;
> > >   __atomic_thread_fence(__ATOMIC_RELEASE);
> > > -tbl8_free(lpm->tbl8, tbl8_group_start);
> > > +tbl8_free(lpm, tbl8_group_start);
> > >   } else if (tbl8_recycle_index > -1) {
> > >   /* Update tbl24 entry. */
> > >   struct rte_lpm_tbl_entry new_tbl24_entry = { @@ -993,7 +1094,7 @@
> > > delete_depth_big(struct rte_lpm *lpm, uint32_t
> > ip_masked,
> > >   __atomic_store(&lpm->tbl24[tbl24_index],
> > &new_tbl24_entry,
> > >   __ATOMIC_RELAXED);
> > >   __atomic_thread_fence(__ATOMIC_RELEASE);
> > > -tbl8_free(lpm->tbl8, tbl8_group_start);
> > > +tbl8_free(lpm, tbl8_group_start);
> > >   }
> > >   #undef group_idx
> > >   return 0;
> > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> > > index b9d49ac87..8c054509a 100644
> > > --- a/lib/librte_lpm/rte_lpm.h
> > > +++ b/lib/librte_lpm/rte_lpm.h
> > > @@ -1,5 +1,6 @@
> > >   /* SPDX-License-Identifier: BSD-3-Clause
> > >    * Copyright(c) 2010-2014 Intel Corporation
> > > + * Copyright(c) 2020 Arm Limited
> > >    */
> > >
> > >   #ifndef _RTE_LPM_H_
> > > @@ -20,6 +21,7 @@
> > >   #include <rte_memory.h>
> > >   #include <rte_common.h>
> > >   #include <rte_vect.h>
> > > +#include <rte_rcu_qsbr.h>
> > >
> > >   #ifdef __cplusplus
> > >   extern "C" {
> > > @@ -62,6 +64,17 @@ extern "C" {
> > >   /** Bitmask used to indicate successful lookup */
> > >   #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
> > >
> > > +/** @internal Default threshold to trigger RCU defer queue
> reclaimation.
> > */
> > > +#define RTE_LPM_RCU_DQ_RECLAIM_THD32
> > > +
> > > +/** @internal Default RCU defer queue entries to reclaim in one go.
> > > +*/ #define RTE_LPM_RCU_DQ_RECLAIM_MAX16
> > > +
> > > +/* Create defer queue for reclaim. */ #define
> RTE_LPM_QSBR_MODE_DQ0
> > > +/* Use blocking mode reclaim. No defer queue created. */ #define
> > > +RTE_LPM_QSBR_MODE_SYNC0x01
> >
> >
> > using enums instead of defines?
> Will convert to enums in next version.
> >
> >
> > > +
> > >   #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > >   /** @internal Tbl24 entry structure. */
> > >   __extension__
> > > @@ -130,6 +143,28 @@ struct rte_lpm {
> > >   __rte_cache_aligned; /**< LPM tbl24 table. */
> > >   struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> > >   struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > > +
> > > +/* RCU config. */
> > > +struct rte_rcu_qsbr *v;/* RCU QSBR variable. */ uint32_t
> > > +rcu_mode;/* Blocking, defer queue. */ struct rte_rcu_qsbr_dq *dq;/*
> > > +RCU QSBR defer queue. */ };
> > > +
> > > +/** LPM RCU QSBR configuration structure. */ struct
> > > +rte_lpm_rcu_config { struct rte_rcu_qsbr *v;/* RCU QSBR variable.
> > > +*/
> > > +/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
> > > + * '0' for default: create defer queue for reclaim.
> > > + */
> > > +uint32_t mode;
> > > +/* RCU defer queue size. default: lpm->number_tbl8s. */ uint32_t
> > > +dq_size; uint32_t reclaim_thd;/* Threshold to trigger auto reclaim.
> > > + * default:
> > RTE_LPM_RCU_DQ_RECLAIM_TRHD.
> > > + */
> > > +uint32_t reclaim_max;/* Max entries to reclaim in one go.
> > > + * default:
> > RTE_LPM_RCU_DQ_RECLAIM_MAX.
> > > + */
> > >   };
> > >
> > >   /**
> > > @@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
> > >   void
> > >   rte_lpm_free(struct rte_lpm *lpm);
> > >
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Associate RCU QSBR variable with an LPM object.
> > > + *
> > > + * @param lpm
> > > + *   the lpm object to add RCU QSBR
> > > + * @param cfg
> > > + *   RCU QSBR configuration
> > > + * @param dq
> > > + *   handler of created RCU QSBR defer queue
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1 with error code set in rte_errno.
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - invalid pointer
> > > + *   - EEXIST - already added QSBR
> > > + *   - ENOMEM - memory allocation failure
> > > + */
> > > +__rte_experimental
> > > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
> > rte_lpm_rcu_config *cfg,
> > > +struct rte_rcu_qsbr_dq **dq);
> > > +
> > >   /**
> > >    * Add a rule to the LPM table.
> > >    *
> > > diff --git a/lib/librte_lpm/rte_lpm_version.map
> > b/lib/librte_lpm/rte_lpm_version.map
> > > index 500f58b80..bfccd7eac 100644
> > > --- a/lib/librte_lpm/rte_lpm_version.map
> > > +++ b/lib/librte_lpm/rte_lpm_version.map
> > > @@ -21,3 +21,9 @@ DPDK_20.0 {
> > >
> > >   local: *;
> > >   };
> > > +
> > > +EXPERIMENTAL {
> > > +global:
> > > +
> > > +rte_lpm_rcu_qsbr_add;
> > > +};
> >
> > --
> > Regards,
> > Vladimir
> 


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (8 preceding siblings ...)
  2020-06-08  5:16   ` [dpdk-dev] [PATCH v4 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-06-29  8:02   ` Ruifeng Wang
  2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                       ` (2 more replies)
  2020-07-07 14:40   ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
                     ` (4 subsequent siblings)
  14 siblings, 3 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-29  8:02 UTC (permalink / raw)
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 129 +++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 995 insertions(+), 17 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  2020-06-29  8:02   ` [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-06-29  8:02     ` Ruifeng Wang
  2020-06-29 11:56       ` David Marchand
  2020-06-30 10:33       ` Kinsella, Ray
  2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
  2020-06-29  8:03     ` [dpdk-dev] [PATCH v5 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 2 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-29  8:02 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
 lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 216 insertions(+), 13 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..7cc99044a 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has smaller depth.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
+while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..41e9c49b8 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 		TAILQ_REMOVE(lpm_list, te, next);
 
 	rte_mcfg_tailq_write_unlock();
-
+#ifdef ALLOW_EXPERIMENTAL_API
+	if (lpm->dq)
+		rte_rcu_qsbr_dq_delete(lpm->dq);
+#endif
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if ((lpm == NULL) || (cfg == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->v) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM,
+					"LPM QS defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	lpm->rcu_mode = cfg->mode;
+	lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +466,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +500,46 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+#ifdef ALLOW_EXPERIMENTAL_API
+	if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+#endif
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
-
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
+#ifdef ALLOW_EXPERIMENTAL_API
+	if (!lpm->v) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
+	}
+#else
+	/* Set tbl8 group invalid*/
+	__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
 			__ATOMIC_RELAXED);
+#endif
 }
 
 static __rte_noinline int32_t
@@ -523,7 +628,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +674,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1082,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1098,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..7889f21b3 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -130,6 +143,28 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+#ifdef ALLOW_EXPERIMENTAL_API
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+#endif
+};
+
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
 };
 
 /**
@@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 2/3] test/lpm: add LPM RCU integration functional tests
  2020-06-29  8:02   ` [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library Ruifeng Wang
  2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-06-29  8:02     ` Ruifeng Wang
  2020-06-29  8:03     ` [dpdk-dev] [PATCH v5 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-29  8:02 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_lpm.c | 291 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 290 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index 3a3fd097f..93742e3c7 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,9 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
+static int32_t test21(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +65,10 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20,
+	test21
 };
 
 #define MAX_DEPTH 32
@@ -1265,6 +1272,288 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check returns
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	/* Invalid QSBR mode */
+	rcu_cfg.mode = 2;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	rcu_cfg.v = qsv2;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add DQ mode functional test.
+ * Reader and writer are in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 18, 100, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
+static struct rte_lpm *g_lpm;
+static struct rte_rcu_qsbr *g_v;
+static uint32_t g_ip = RTE_IPV4(192, 18, 100, 100);
+static volatile uint8_t writer_done;
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+#define WRITER_ITERATIONS	512
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(g_v, 0);
+	rte_rcu_qsbr_thread_online(g_v, 0);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(g_v, 0);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(g_v, 0);
+	rte_rcu_qsbr_thread_unregister(g_v, 0);
+
+	return 0;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add sync mode functional test.
+ * 1 Reader and 1 writer. They cannot be in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable with sync mode to LPM
+ *  - Register a reader thread. Reader keeps looking up a specific rule.
+ *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)
+ */
+int32_t
+test21(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	int32_t status;
+	uint32_t i, next_hop;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for %s, expecting at least 2\n",
+			__func__);
+		return TEST_SKIPPED;
+	}
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(g_lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(g_v != NULL);
+
+	status = rte_rcu_qsbr_init(g_v, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = g_v;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	writer_done = 0;
+	/* Launch reader thread */
+	rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+				rte_get_next_lcore(-1, 1, 0));
+
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+	if (status != 0) {
+		printf("%s: Failed to add rule\n", __func__);
+		goto error;
+	}
+
+	/* Writer update */
+	for (i = 0; i < WRITER_ITERATIONS; i++) {
+		status = rte_lpm_delete(g_lpm, g_ip, depth);
+		if (status != 0) {
+			printf("%s: Failed to delete rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+
+		status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+		if (status != 0) {
+			printf("%s: Failed to add rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+	}
+
+error:
+	writer_done = 1;
+	/* Wait until reader exited. */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(g_lpm);
+	rte_free(g_v);
+
+	return (status == 0) ? PASS : -1;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v5 3/3] test/lpm: add RCU integration performance tests
  2020-06-29  8:02   ` [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library Ruifeng Wang
  2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
@ 2020-06-29  8:03     ` Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-06-29  8:03 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 489 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 489719c40..dfe186426 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,27 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static uint64_t gwrite_cycles;
+static uint64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +40,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +52,13 @@ struct route_rule {
 };
 
 static struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	RTE_SET_USED(arg);
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	__atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,
+			__ATOMIC_RELAXED);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +825,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +960,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-06-29 11:56       ` David Marchand
  2020-06-29 12:55         ` Bruce Richardson
                           ` (2 more replies)
  2020-06-30 10:33       ` Kinsella, Ray
  1 sibling, 3 replies; 137+ messages in thread
From: David Marchand @ 2020-06-29 11:56 UTC (permalink / raw)
  To: Ruifeng Wang, Vladimir Medvedkin, Bruce Richardson
  Cc: John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman, dev,
	Ananyev, Konstantin, Honnappa Nagarahalli, nd

On Mon, Jun 29, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
>
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
>  lib/librte_lpm/Makefile            |   2 +-
>  lib/librte_lpm/meson.build         |   1 +
>  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
>  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>  6 files changed, 216 insertions(+), 13 deletions(-)
>
> diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
> index 1609a57d0..7cc99044a 100644
> --- a/doc/guides/prog_guide/lpm_lib.rst
> +++ b/doc/guides/prog_guide/lpm_lib.rst
> @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
>  Prefix expansion is one of the keys of this algorithm,
>  since it improves the speed dramatically by adding redundancy.
>
> +Deletion
> +~~~~~~~~
> +
> +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
> +the longest prefix match with the rule to be deleted, but has smaller depth.
> +
> +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
> +value with the replacement rule.
> +
> +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
> +
> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
> +
> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
> +
> +*   All tbl8s in the group are empty .
> +
> +*   All tbl8s in the group have the same values and with depth no greater than 24.
> +
> +Free of tbl8s have different behaviors:
> +
> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> +
> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> +
> +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
> +the tbl8 group entries. This might result in incorrect lookup results.
> +
> +RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
> +while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
> +for more details.
> +

Would the lpm6 library benefit from the same?
Asking as I do not see much code shared between lpm and lpm6.

[...]

> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 38ab512a4..41e9c49b8 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
>
>  #include <string.h>
> @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
>                 TAILQ_REMOVE(lpm_list, te, next);
>
>         rte_mcfg_tailq_write_unlock();
> -
> +#ifdef ALLOW_EXPERIMENTAL_API
> +       if (lpm->dq)
> +               rte_rcu_qsbr_dq_delete(lpm->dq);
> +#endif

All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API flag set.
There is no need to protect against this flag in rte_lpm.c.

[...]

> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b9d49ac87..7889f21b3 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h

> @@ -130,6 +143,28 @@ struct rte_lpm {
>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +#ifdef ALLOW_EXPERIMENTAL_API
> +       /* RCU config. */
> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> +#endif
> +};

This is more a comment/question for the lpm maintainers.

Afaics, the rte_lpm structure is exported/public because of lookup
which is inlined.
But most of the structure can be hidden and stored in a private
structure that would embed the exposed rte_lpm.
The slowpath functions would only have to translate from publicly
exposed to internal representation (via container_of).

This patch could do this and be the first step to hide the unneeded
exposure of other fields (later/in 20.11 ?).

Thoughts?


-- 
David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  2020-06-29 11:56       ` David Marchand
@ 2020-06-29 12:55         ` Bruce Richardson
  2020-06-30 10:35           ` Kinsella, Ray
  2020-07-03  7:43         ` David Marchand
  2020-07-04 17:00         ` Ruifeng Wang
  2 siblings, 1 reply; 137+ messages in thread
From: Bruce Richardson @ 2020-06-29 12:55 UTC (permalink / raw)
  To: David Marchand
  Cc: Ruifeng Wang, Vladimir Medvedkin, John McNamara, Marko Kovacevic,
	Ray Kinsella, Neil Horman, dev, Ananyev, Konstantin,
	Honnappa Nagarahalli, nd

On Mon, Jun 29, 2020 at 01:56:07PM +0200, David Marchand wrote:
> On Mon, Jun 29, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> >
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
> >  lib/librte_lpm/Makefile            |   2 +-
> >  lib/librte_lpm/meson.build         |   1 +
> >  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
> >  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
> >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> >  6 files changed, 216 insertions(+), 13 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
> > index 1609a57d0..7cc99044a 100644
> > --- a/doc/guides/prog_guide/lpm_lib.rst
> > +++ b/doc/guides/prog_guide/lpm_lib.rst
> > @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
> >  Prefix expansion is one of the keys of this algorithm,
> >  since it improves the speed dramatically by adding redundancy.
> >
> > +Deletion
> > +~~~~~~~~
> > +
> > +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
> > +the longest prefix match with the rule to be deleted, but has smaller depth.
> > +
> > +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
> > +value with the replacement rule.
> > +
> > +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
> > +
> > +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
> > +
> > +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
> > +
> > +*   All tbl8s in the group are empty .
> > +
> > +*   All tbl8s in the group have the same values and with depth no greater than 24.
> > +
> > +Free of tbl8s have different behaviors:
> > +
> > +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> > +
> > +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> > +
> > +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
> > +the tbl8 group entries. This might result in incorrect lookup results.
> > +
> > +RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
> > +while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
> > +for more details.
> > +
> 
> Would the lpm6 library benefit from the same?
> Asking as I do not see much code shared between lpm and lpm6.
> 
> [...]
> 
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> > index 38ab512a4..41e9c49b8 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #include <string.h>
> > @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
> >                 TAILQ_REMOVE(lpm_list, te, next);
> >
> >         rte_mcfg_tailq_write_unlock();
> > -
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       if (lpm->dq)
> > +               rte_rcu_qsbr_dq_delete(lpm->dq);
> > +#endif
> 
> All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API flag set.
> There is no need to protect against this flag in rte_lpm.c.
> 
> [...]
> 
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> > index b9d49ac87..7889f21b3 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> 
> > @@ -130,6 +143,28 @@ struct rte_lpm {
> >                         __rte_cache_aligned; /**< LPM tbl24 table. */
> >         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       /* RCU config. */
> > +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> > +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> > +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> > +#endif
> > +};
> 
> This is more a comment/question for the lpm maintainers.
> 
> Afaics, the rte_lpm structure is exported/public because of lookup
> which is inlined.
> But most of the structure can be hidden and stored in a private
> structure that would embed the exposed rte_lpm.
> The slowpath functions would only have to translate from publicly
> exposed to internal representation (via container_of).
> 
> This patch could do this and be the first step to hide the unneeded
> exposure of other fields (later/in 20.11 ?).
> 
> Thoughts?
> 
Hiding as much of the structures as possible is always a good idea, so if
that is possible in this patchset I would support such a move.

/Bruce

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-06-29 11:56       ` David Marchand
@ 2020-06-30 10:33       ` Kinsella, Ray
  1 sibling, 0 replies; 137+ messages in thread
From: Kinsella, Ray @ 2020-06-30 10:33 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, Vladimir Medvedkin,
	John McNamara, Marko Kovacevic, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd



On 29/06/2020 09:02, Ruifeng Wang wrote:
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
>  lib/librte_lpm/Makefile            |   2 +-
>  lib/librte_lpm/meson.build         |   1 +
>  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
>  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>  6 files changed, 216 insertions(+), 13 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
> index 1609a57d0..7cc99044a 100644
> --- a/doc/guides/prog_guide/lpm_lib.rst
> +++ b/doc/guides/prog_guide/lpm_lib.rst
> @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
>  Prefix expansion is one of the keys of this algorithm,
>  since it improves the speed dramatically by adding redundancy.
>  
> +Deletion
> +~~~~~~~~
> +
> +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
> +the longest prefix match with the rule to be deleted, but has smaller depth.
> +
> +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
> +value with the replacement rule.
> +
> +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
> +
> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
> +
> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
> +
> +*   All tbl8s in the group are empty .
> +
> +*   All tbl8s in the group have the same values and with depth no greater than 24.
> +
> +Free of tbl8s have different behaviors:
> +
> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> +
> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> +
> +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
> +the tbl8 group entries. This might result in incorrect lookup results.
> +
> +RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
> +while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
> +for more details.
> +
>  Lookup
>  ~~~~~~
>  
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index d682785b6..6f06c5c03 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -8,7 +8,7 @@ LIB = librte_lpm.a
>  
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
>  
>  EXPORT_MAP := rte_lpm_version.map
>  
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index 021ac6d8d..6cfc083c5 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
>  # without worrying about which architecture we actually need
>  headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>  deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 38ab512a4..41e9c49b8 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
>  
>  #include <string.h>
> @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
>  		TAILQ_REMOVE(lpm_list, te, next);
>  
>  	rte_mcfg_tailq_write_unlock();
> -
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	if (lpm->dq)
> +		rte_rcu_qsbr_dq_delete(lpm->dq);
> +#endif
>  	rte_free(lpm->tbl8);
>  	rte_free(lpm->rules_tbl);
>  	rte_free(lpm);
>  	rte_free(te);
>  }
>  
> +static void
> +__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
> +{
> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	uint32_t tbl8_group_index = *(uint32_t *)data;
> +	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
> +
> +	RTE_SET_USED(n);
> +	/* Set tbl8 group invalid */
> +	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
> +		__ATOMIC_RELAXED);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
> +	struct rte_rcu_qsbr_dq **dq)
> +{
> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params = {0};
> +
> +	if ((lpm == NULL) || (cfg == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->v) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
> +		/* No other things to do. */
> +	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
> +		/* Init QSBR defer queue. */
> +		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
> +				"LPM_RCU_%s", lpm->name);
> +		params.name = rcu_dq_name;
> +		params.size = cfg->dq_size;
> +		if (params.size == 0)
> +			params.size = lpm->number_tbl8s;
> +		params.trigger_reclaim_limit = cfg->reclaim_thd;
> +		params.max_reclaim_size = cfg->reclaim_max;
> +		if (params.max_reclaim_size == 0)
> +			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
> +		params.esize = sizeof(uint32_t);	/* tbl8 group index */
> +		params.free_fn = __lpm_rcu_qsbr_free_resource;
> +		params.p = lpm;
> +		params.v = cfg->v;
> +		lpm->dq = rte_rcu_qsbr_dq_create(&params);
> +		if (lpm->dq == NULL) {
> +			RTE_LOG(ERR, LPM,
> +					"LPM QS defer queue creation failed\n");
> +			return 1;
> +		}
> +		if (dq)
> +			*dq = lpm->dq;
> +	} else {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +	lpm->rcu_mode = cfg->mode;
> +	lpm->v = cfg->v;
> +
> +	return 0;
> +}
> +
>  /*
>   * Adds a rule to the rule table.
>   *
> @@ -394,14 +466,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
>   * Find, clean and allocate a tbl8.
>   */
>  static int32_t
> -tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +_tbl8_alloc(struct rte_lpm *lpm)
>  {
>  	uint32_t group_idx; /* tbl8 group index. */
>  	struct rte_lpm_tbl_entry *tbl8_entry;
>  
>  	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>  		/* If a free tbl8 group is found clean it and set as VALID. */
>  		if (!tbl8_entry->valid_group) {
>  			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -427,14 +500,46 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>  	return -ENOSPC;
>  }
>  
> +static int32_t
> +tbl8_alloc(struct rte_lpm *lpm)
> +{
> +	int32_t group_idx; /* tbl8 group index. */
> +
> +	group_idx = _tbl8_alloc(lpm);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
> +		/* If there are no tbl8 groups try to reclaim one. */
> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
> +			group_idx = _tbl8_alloc(lpm);
> +	}
> +#endif
> +	return group_idx;
> +}
> +
>  static void
> -tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>  {
> -	/* Set tbl8 group invalid*/
>  	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> -
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	if (!lpm->v) {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> +		/* Wait for quiescent state change. */
> +		rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> +		/* Push into QSBR defer queue. */
> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
> +	}
> +#else
> +	/* Set tbl8 group invalid*/
> +	__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
>  			__ATOMIC_RELAXED);
> +#endif
>  }
>  
>  static __rte_noinline int32_t
> @@ -523,7 +628,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>  
>  	if (!lpm->tbl24[tbl24_index].valid) {
>  		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc(lpm);
>  
>  		/* Check tbl8 allocation was successful. */
>  		if (tbl8_group_index < 0) {
> @@ -569,7 +674,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>  	} /* If valid entry but not extended calculate the index into Table8. */
>  	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>  		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc(lpm);
>  
>  		if (tbl8_group_index < 0) {
>  			return tbl8_group_index;
> @@ -977,7 +1082,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
>  		 */
>  		lpm->tbl24[tbl24_index].valid = 0;
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free(lpm->tbl8, tbl8_group_start);
> +		tbl8_free(lpm, tbl8_group_start);
>  	} else if (tbl8_recycle_index > -1) {
>  		/* Update tbl24 entry. */
>  		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -993,7 +1098,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
>  		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>  				__ATOMIC_RELAXED);
>  		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free(lpm->tbl8, tbl8_group_start);
> +		tbl8_free(lpm, tbl8_group_start);
>  	}
>  #undef group_idx
>  	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b9d49ac87..7889f21b3 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
>  
>  #ifndef _RTE_LPM_H_
> @@ -20,6 +21,7 @@
>  #include <rte_memory.h>
>  #include <rte_common.h>
>  #include <rte_vect.h>
> +#include <rte_rcu_qsbr.h>
>  
>  #ifdef __cplusplus
>  extern "C" {
> @@ -62,6 +64,17 @@ extern "C" {
>  /** Bitmask used to indicate successful lookup */
>  #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
>  
> +/** @internal Default RCU defer queue entries to reclaim in one go. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
> +
> +/** RCU reclamation modes */
> +enum rte_lpm_qsbr_mode {
> +	/** Create defer queue for reclaim. */
> +	RTE_LPM_QSBR_MODE_DQ = 0,
> +	/** Use blocking mode reclaim. No defer queue created. */
> +	RTE_LPM_QSBR_MODE_SYNC
> +};
> +
>  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>  /** @internal Tbl24 entry structure. */
>  __extension__
> @@ -130,6 +143,28 @@ struct rte_lpm {
>  			__rte_cache_aligned; /**< LPM tbl24 table. */
>  	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>  	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	/* RCU config. */
> +	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
> +	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
> +#endif
> +};
> +
> +/** LPM RCU QSBR configuration structure. */
> +struct rte_lpm_rcu_config {
> +	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
> +	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
> +	 * '0' for default: create defer queue for reclaim.
> +	 */
> +	enum rte_lpm_qsbr_mode mode;
> +	uint32_t dq_size;	/* RCU defer queue size.
> +				 * default: lpm->number_tbl8s.
> +				 */
> +	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
> +	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
> +				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
> +				 */
>  };
>  
>  /**
> @@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
>  void
>  rte_lpm_free(struct rte_lpm *lpm);
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param cfg
> + *   RCU QSBR configuration
> + * @param dq
> + *   handler of created RCU QSBR defer queue
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
> +	struct rte_rcu_qsbr_dq **dq);
> +
>  /**
>   * Add a rule to the LPM table.
>   *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 500f58b80..bfccd7eac 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -21,3 +21,9 @@ DPDK_20.0 {
>  
>  	local: *;
>  };
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};
> 

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  2020-06-29 12:55         ` Bruce Richardson
@ 2020-06-30 10:35           ` Kinsella, Ray
  0 siblings, 0 replies; 137+ messages in thread
From: Kinsella, Ray @ 2020-06-30 10:35 UTC (permalink / raw)
  To: Bruce Richardson, David Marchand
  Cc: Ruifeng Wang, Vladimir Medvedkin, John McNamara, Marko Kovacevic,
	Neil Horman, dev, Ananyev, Konstantin, Honnappa Nagarahalli, nd



On 29/06/2020 13:55, Bruce Richardson wrote:
> On Mon, Jun 29, 2020 at 01:56:07PM +0200, David Marchand wrote:
>> On Mon, Jun 29, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>>>
>>> Currently, the tbl8 group is freed even though the readers might be
>>> using the tbl8 group entries. The freed tbl8 group can be reallocated
>>> quickly. This results in incorrect lookup results.
>>>
>>> RCU QSBR process is integrated for safe tbl8 group reclaim.
>>> Refer to RCU documentation to understand various aspects of
>>> integrating RCU library into other libraries.
>>>
>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>> ---
>>>  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
>>>  lib/librte_lpm/Makefile            |   2 +-
>>>  lib/librte_lpm/meson.build         |   1 +
>>>  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
>>>  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
>>>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>>>  6 files changed, 216 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
>>> index 1609a57d0..7cc99044a 100644
>>> --- a/doc/guides/prog_guide/lpm_lib.rst
>>> +++ b/doc/guides/prog_guide/lpm_lib.rst
>>> @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
>>>  Prefix expansion is one of the keys of this algorithm,
>>>  since it improves the speed dramatically by adding redundancy.
>>>
>>> +Deletion
>>> +~~~~~~~~
>>> +
>>> +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
>>> +the longest prefix match with the rule to be deleted, but has smaller depth.
>>> +
>>> +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
>>> +value with the replacement rule.
>>> +
>>> +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
>>> +
>>> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
>>> +
>>> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
>>> +
>>> +*   All tbl8s in the group are empty .
>>> +
>>> +*   All tbl8s in the group have the same values and with depth no greater than 24.
>>> +
>>> +Free of tbl8s have different behaviors:
>>> +
>>> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
>>> +
>>> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
>>> +
>>> +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
>>> +the tbl8 group entries. This might result in incorrect lookup results.
>>> +
>>> +RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
>>> +while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
>>> +for more details.
>>> +
>>
>> Would the lpm6 library benefit from the same?
>> Asking as I do not see much code shared between lpm and lpm6.
>>
>> [...]
>>
>>> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
>>> index 38ab512a4..41e9c49b8 100644
>>> --- a/lib/librte_lpm/rte_lpm.c
>>> +++ b/lib/librte_lpm/rte_lpm.c
>>> @@ -1,5 +1,6 @@
>>>  /* SPDX-License-Identifier: BSD-3-Clause
>>>   * Copyright(c) 2010-2014 Intel Corporation
>>> + * Copyright(c) 2020 Arm Limited
>>>   */
>>>
>>>  #include <string.h>
>>> @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
>>>                 TAILQ_REMOVE(lpm_list, te, next);
>>>
>>>         rte_mcfg_tailq_write_unlock();
>>> -
>>> +#ifdef ALLOW_EXPERIMENTAL_API
>>> +       if (lpm->dq)
>>> +               rte_rcu_qsbr_dq_delete(lpm->dq);
>>> +#endif
>>
>> All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API flag set.
>> There is no need to protect against this flag in rte_lpm.c.
>>
>> [...]
>>
>>> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>>> index b9d49ac87..7889f21b3 100644
>>> --- a/lib/librte_lpm/rte_lpm.h
>>> +++ b/lib/librte_lpm/rte_lpm.h
>>
>>> @@ -130,6 +143,28 @@ struct rte_lpm {
>>>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>>>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>>>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
>>> +#ifdef ALLOW_EXPERIMENTAL_API
>>> +       /* RCU config. */
>>> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
>>> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
>>> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
>>> +#endif
>>> +};
>>
>> This is more a comment/question for the lpm maintainers.
>>
>> Afaics, the rte_lpm structure is exported/public because of lookup
>> which is inlined.
>> But most of the structure can be hidden and stored in a private
>> structure that would embed the exposed rte_lpm.
>> The slowpath functions would only have to translate from publicly
>> exposed to internal representation (via container_of).
>>
>> This patch could do this and be the first step to hide the unneeded
>> exposure of other fields (later/in 20.11 ?).
>>
>> Thoughts?
>>
> Hiding as much of the structures as possible is always a good idea, so if
> that is possible in this patchset I would support such a move.
> 
> /Bruce
> 

Agreed - I acked the change as it doesn't break ABI compatibility.
Bruce and David's comments still hold for 20.11+. 

Ray K

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  2020-06-29 11:56       ` David Marchand
  2020-06-29 12:55         ` Bruce Richardson
@ 2020-07-03  7:43         ` David Marchand
  2020-07-04 17:00         ` Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: David Marchand @ 2020-07-03  7:43 UTC (permalink / raw)
  To: Ruifeng Wang, Honnappa Nagarahalli
  Cc: John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman, dev,
	Ananyev, Konstantin, nd, Vladimir Medvedkin, Bruce Richardson

Hello Ruifeng,

On Mon, Jun 29, 2020 at 1:56 PM David Marchand
<david.marchand@redhat.com> wrote:
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> > index 38ab512a4..41e9c49b8 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #include <string.h>
> > @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
> >                 TAILQ_REMOVE(lpm_list, te, next);
> >
> >         rte_mcfg_tailq_write_unlock();
> > -
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       if (lpm->dq)
> > +               rte_rcu_qsbr_dq_delete(lpm->dq);
> > +#endif
>
> All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API flag set.
> There is no need to protect against this flag in rte_lpm.c.

Please can you look at this?
Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  2020-06-29 11:56       ` David Marchand
  2020-06-29 12:55         ` Bruce Richardson
  2020-07-03  7:43         ` David Marchand
@ 2020-07-04 17:00         ` Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-04 17:00 UTC (permalink / raw)
  To: David Marchand, Vladimir Medvedkin, Bruce Richardson
  Cc: John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman, dev,
	Ananyev, Konstantin, Honnappa Nagarahalli, nd, nd

Hi David,

Sorry, I missed tracking of this thread.

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Monday, June 29, 2020 7:56 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Cc: John McNamara <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>; dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
> 
> On Mon, Jun 29, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> >
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
> >  lib/librte_lpm/Makefile            |   2 +-
> >  lib/librte_lpm/meson.build         |   1 +
> >  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
> >  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
> >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> >  6 files changed, 216 insertions(+), 13 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/lpm_lib.rst
> > b/doc/guides/prog_guide/lpm_lib.rst
> > index 1609a57d0..7cc99044a 100644
> > --- a/doc/guides/prog_guide/lpm_lib.rst
> > +++ b/doc/guides/prog_guide/lpm_lib.rst
> > @@ -145,6 +145,38 @@ depending on whether we need to move to the
> next table or not.
> >  Prefix expansion is one of the keys of this algorithm,  since it
> > improves the speed dramatically by adding redundancy.
> >
> > +Deletion
> > +~~~~~~~~
> > +
> > +When deleting a rule, a replacement rule is searched for. Replacement
> > +rule is an existing rule that has the longest prefix match with the rule to be
> deleted, but has smaller depth.
> > +
> > +If a replacement rule is found, target tbl24 and tbl8 entries are
> > +updated to have the same depth and next hop value with the
> replacement rule.
> > +
> > +If no replacement rule can be found, target tbl24 and tbl8 entries will be
> cleared.
> > +
> > +Prefix expansion is performed if the rule's depth is not exactly 24 bits or
> 32 bits.
> > +
> > +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry
> are freed in following cases:
> > +
> > +*   All tbl8s in the group are empty .
> > +
> > +*   All tbl8s in the group have the same values and with depth no greater
> than 24.
> > +
> > +Free of tbl8s have different behaviors:
> > +
> > +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> > +
> > +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> > +
> > +When the LPM is not using RCU, tbl8 group can be freed immediately
> > +even though the readers might be using the tbl8 group entries. This might
> result in incorrect lookup results.
> > +
> > +RCU QSBR process is integrated for safe tbl8 group reclaimation.
> > +Application has certain responsibilities while using this feature.
> > +Please refer to resource reclaimation framework of :ref:`RCU library
> <RCU_Library>` for more details.
> > +
> 
> Would the lpm6 library benefit from the same?
> Asking as I do not see much code shared between lpm and lpm6.
> 
Didn't look into lpm6. It may need separate integration with RCU since no shared code between lpm and lpm6 as you mentioned.

> [...]
> 
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 38ab512a4..41e9c49b8 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #include <string.h>
> > @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
> >                 TAILQ_REMOVE(lpm_list, te, next);
> >
> >         rte_mcfg_tailq_write_unlock();
> > -
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       if (lpm->dq)
> > +               rte_rcu_qsbr_dq_delete(lpm->dq); #endif
> 
> All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API
> flag set.
> There is no need to protect against this flag in rte_lpm.c.
> 
OK, I see. So DPDK libraries will always be compiled with the ALLOW_EXPERIMENTAL_API. It is application's 
choice to use experimental APIs. 
Will update in next version to remove the ALLOW_EXPERIMENTAL_API flag from rte_lpm.c and only keep the one in rte_lpm.h.

> [...]
> 
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > b9d49ac87..7889f21b3 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> 
> > @@ -130,6 +143,28 @@ struct rte_lpm {
> >                         __rte_cache_aligned; /**< LPM tbl24 table. */
> >         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       /* RCU config. */
> > +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> > +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> > +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> > +#endif
> > +};
> 
> This is more a comment/question for the lpm maintainers.
> 
> Afaics, the rte_lpm structure is exported/public because of lookup which is
> inlined.
> But most of the structure can be hidden and stored in a private structure that
> would embed the exposed rte_lpm.
> The slowpath functions would only have to translate from publicly exposed
> to internal representation (via container_of).
> 
> This patch could do this and be the first step to hide the unneeded exposure
> of other fields (later/in 20.11 ?).
> 
To hide most of the structure is reasonable. 
Since it will break ABI, I can do that in 20.11.

> Thoughts?
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (9 preceding siblings ...)
  2020-06-29  8:02   ` [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-07 14:40   ` Ruifeng Wang
  2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                       ` (2 more replies)
  2020-07-07 15:15   ` [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library Ruifeng Wang
                     ` (3 subsequent siblings)
  14 siblings, 3 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-07 14:40 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 120 ++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 987 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v6 1/3] lib/lpm: integrate RCU QSBR
  2020-07-07 14:40   ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-07 14:40     ` Ruifeng Wang
  2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
  2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-07 14:40 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 120 ++++++++++++++++++++++++++---
 lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 208 insertions(+), 12 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..6f5ccaadf 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
+while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..d498ba761 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -246,12 +247,82 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	if (lpm->dq)
+		rte_rcu_qsbr_dq_delete(lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if ((lpm == NULL) || (cfg == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->v) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM,
+					"LPM QS defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	lpm->rcu_mode = cfg->mode;
+	lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +465,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +499,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (!lpm->v) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +621,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +667,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1075,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1091,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..7889f21b3 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -130,6 +143,28 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+#ifdef ALLOW_EXPERIMENTAL_API
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+#endif
+};
+
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
 };
 
 /**
@@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v6 2/3] test/lpm: add LPM RCU integration functional tests
  2020-07-07 14:40   ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-07 14:40     ` Ruifeng Wang
  2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-07 14:40 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_lpm.c | 291 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 290 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index 3a3fd097f..93742e3c7 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,9 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
+static int32_t test21(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +65,10 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20,
+	test21
 };
 
 #define MAX_DEPTH 32
@@ -1265,6 +1272,288 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check returns
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	/* Invalid QSBR mode */
+	rcu_cfg.mode = 2;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	rcu_cfg.v = qsv2;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add DQ mode functional test.
+ * Reader and writer are in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 18, 100, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
+static struct rte_lpm *g_lpm;
+static struct rte_rcu_qsbr *g_v;
+static uint32_t g_ip = RTE_IPV4(192, 18, 100, 100);
+static volatile uint8_t writer_done;
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+#define WRITER_ITERATIONS	512
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(g_v, 0);
+	rte_rcu_qsbr_thread_online(g_v, 0);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(g_v, 0);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(g_v, 0);
+	rte_rcu_qsbr_thread_unregister(g_v, 0);
+
+	return 0;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add sync mode functional test.
+ * 1 Reader and 1 writer. They cannot be in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable with sync mode to LPM
+ *  - Register a reader thread. Reader keeps looking up a specific rule.
+ *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)
+ */
+int32_t
+test21(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	int32_t status;
+	uint32_t i, next_hop;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for %s, expecting at least 2\n",
+			__func__);
+		return TEST_SKIPPED;
+	}
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(g_lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(g_v != NULL);
+
+	status = rte_rcu_qsbr_init(g_v, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = g_v;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	writer_done = 0;
+	/* Launch reader thread */
+	rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+				rte_get_next_lcore(-1, 1, 0));
+
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+	if (status != 0) {
+		printf("%s: Failed to add rule\n", __func__);
+		goto error;
+	}
+
+	/* Writer update */
+	for (i = 0; i < WRITER_ITERATIONS; i++) {
+		status = rte_lpm_delete(g_lpm, g_ip, depth);
+		if (status != 0) {
+			printf("%s: Failed to delete rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+
+		status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+		if (status != 0) {
+			printf("%s: Failed to add rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+	}
+
+error:
+	writer_done = 1;
+	/* Wait until reader exited. */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(g_lpm);
+	rte_free(g_v);
+
+	return (status == 0) ? PASS : -1;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v6 3/3] test/lpm: add RCU integration performance tests
  2020-07-07 14:40   ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
@ 2020-07-07 14:40     ` Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-07 14:40 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 489 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 489719c40..dfe186426 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,27 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static uint64_t gwrite_cycles;
+static uint64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +40,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +52,13 @@ struct route_rule {
 };
 
 static struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	RTE_SET_USED(arg);
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	__atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,
+			__ATOMIC_RELAXED);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +825,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +960,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (10 preceding siblings ...)
  2020-07-07 14:40   ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-07 15:15   ` Ruifeng Wang
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                       ` (2 more replies)
  2020-07-09  8:02   ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
                     ` (2 subsequent siblings)
  14 siblings, 3 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-07 15:15 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 120 ++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 987 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
  2020-07-07 15:15   ` [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-07 15:15     ` Ruifeng Wang
  2020-07-08 12:36       ` Medvedkin, Vladimir
  2020-07-08 14:30       ` David Marchand
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 2 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-07 15:15 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 120 ++++++++++++++++++++++++++---
 lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 208 insertions(+), 12 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..d498ba761 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -246,12 +247,82 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	if (lpm->dq)
+		rte_rcu_qsbr_dq_delete(lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if ((lpm == NULL) || (cfg == NULL)) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	if (lpm->v) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM,
+					"LPM QS defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	lpm->rcu_mode = cfg->mode;
+	lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +465,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +499,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (!lpm->v) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +621,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +667,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1075,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1091,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..7889f21b3 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -130,6 +143,28 @@ struct rte_lpm {
 			__rte_cache_aligned; /**< LPM tbl24 table. */
 	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
+#ifdef ALLOW_EXPERIMENTAL_API
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+#endif
+};
+
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
 };
 
 /**
@@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v7 2/3] test/lpm: add LPM RCU integration functional tests
  2020-07-07 15:15   ` [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-07 15:15     ` Ruifeng Wang
  2020-07-08 12:37       ` Medvedkin, Vladimir
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-07 15:15 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_lpm.c | 291 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 290 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index 3a3fd097f..93742e3c7 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,9 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
+static int32_t test21(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +65,10 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20,
+	test21
 };
 
 #define MAX_DEPTH 32
@@ -1265,6 +1272,288 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check returns
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	/* Invalid QSBR mode */
+	rcu_cfg.mode = 2;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	rcu_cfg.v = qsv2;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add DQ mode functional test.
+ * Reader and writer are in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 18, 100, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
+static struct rte_lpm *g_lpm;
+static struct rte_rcu_qsbr *g_v;
+static uint32_t g_ip = RTE_IPV4(192, 18, 100, 100);
+static volatile uint8_t writer_done;
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+#define WRITER_ITERATIONS	512
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(g_v, 0);
+	rte_rcu_qsbr_thread_online(g_v, 0);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(g_v, 0);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(g_v, 0);
+	rte_rcu_qsbr_thread_unregister(g_v, 0);
+
+	return 0;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add sync mode functional test.
+ * 1 Reader and 1 writer. They cannot be in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable with sync mode to LPM
+ *  - Register a reader thread. Reader keeps looking up a specific rule.
+ *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)
+ */
+int32_t
+test21(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	int32_t status;
+	uint32_t i, next_hop;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for %s, expecting at least 2\n",
+			__func__);
+		return TEST_SKIPPED;
+	}
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(g_lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(g_v != NULL);
+
+	status = rte_rcu_qsbr_init(g_v, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = g_v;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	writer_done = 0;
+	/* Launch reader thread */
+	rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+				rte_get_next_lcore(-1, 1, 0));
+
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+	if (status != 0) {
+		printf("%s: Failed to add rule\n", __func__);
+		goto error;
+	}
+
+	/* Writer update */
+	for (i = 0; i < WRITER_ITERATIONS; i++) {
+		status = rte_lpm_delete(g_lpm, g_ip, depth);
+		if (status != 0) {
+			printf("%s: Failed to delete rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+
+		status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+		if (status != 0) {
+			printf("%s: Failed to add rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+	}
+
+error:
+	writer_done = 1;
+	/* Wait until reader exited. */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(g_lpm);
+	rte_free(g_v);
+
+	return (status == 0) ? PASS : -1;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v7 3/3] test/lpm: add RCU integration performance tests
  2020-07-07 15:15   ` [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
@ 2020-07-07 15:15     ` Ruifeng Wang
  2020-07-08 12:37       ` Medvedkin, Vladimir
  2 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-07 15:15 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 489 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 489719c40..dfe186426 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,27 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static uint64_t gwrite_cycles;
+static uint64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +40,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +52,13 @@ struct route_rule {
 };
 
 static struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	RTE_SET_USED(arg);
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	__atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,
+			__ATOMIC_RELAXED);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +825,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +960,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-08 12:36       ` Medvedkin, Vladimir
  2020-07-08 14:30       ` David Marchand
  1 sibling, 0 replies; 137+ messages in thread
From: Medvedkin, Vladimir @ 2020-07-08 12:36 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, John McNamara, Marko Kovacevic,
	Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd


On 07/07/2020 16:15, Ruifeng Wang wrote:
> Currently, the tbl8 group is freed even though the readers might be
> using the tbl8 group entries. The freed tbl8 group can be reallocated
> quickly. This results in incorrect lookup results.
>
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of
> integrating RCU library into other libraries.
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> ---
>   doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++++
>   lib/librte_lpm/Makefile            |   2 +-
>   lib/librte_lpm/meson.build         |   1 +
>   lib/librte_lpm/rte_lpm.c           | 120 ++++++++++++++++++++++++++---
>   lib/librte_lpm/rte_lpm.h           |  59 ++++++++++++++
>   lib/librte_lpm/rte_lpm_version.map |   6 ++
>   6 files changed, 208 insertions(+), 12 deletions(-)
>
> diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
> index 1609a57d0..03945904b 100644
> --- a/doc/guides/prog_guide/lpm_lib.rst
> +++ b/doc/guides/prog_guide/lpm_lib.rst
> @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
>   Prefix expansion is one of the keys of this algorithm,
>   since it improves the speed dramatically by adding redundancy.
>   
> +Deletion
> +~~~~~~~~
> +
> +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
> +the longest prefix match with the rule to be deleted, but has shorter prefix.
> +
> +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
> +value with the replacement rule.
> +
> +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
> +
> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
> +
> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
> +
> +*   All tbl8s in the group are empty .
> +
> +*   All tbl8s in the group have the same values and with depth no greater than 24.
> +
> +Free of tbl8s have different behaviors:
> +
> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> +
> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> +
> +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
> +the tbl8 group entries. This might result in incorrect lookup results.
> +
> +RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
> +while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
> +for more details.
> +
>   Lookup
>   ~~~~~~
>   
> diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
> index d682785b6..6f06c5c03 100644
> --- a/lib/librte_lpm/Makefile
> +++ b/lib/librte_lpm/Makefile
> @@ -8,7 +8,7 @@ LIB = librte_lpm.a
>   
>   CFLAGS += -O3
>   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> -LDLIBS += -lrte_eal -lrte_hash
> +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
>   
>   EXPORT_MAP := rte_lpm_version.map
>   
> diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
> index 021ac6d8d..6cfc083c5 100644
> --- a/lib/librte_lpm/meson.build
> +++ b/lib/librte_lpm/meson.build
> @@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
>   # without worrying about which architecture we actually need
>   headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
>   deps += ['hash']
> +deps += ['rcu']
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 38ab512a4..d498ba761 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>    */
>   
>   #include <string.h>
> @@ -246,12 +247,82 @@ rte_lpm_free(struct rte_lpm *lpm)
>   
>   	rte_mcfg_tailq_write_unlock();
>   
> +	if (lpm->dq)
> +		rte_rcu_qsbr_dq_delete(lpm->dq);
>   	rte_free(lpm->tbl8);
>   	rte_free(lpm->rules_tbl);
>   	rte_free(lpm);
>   	rte_free(te);
>   }
>   
> +static void
> +__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
> +{
> +	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +	uint32_t tbl8_group_index = *(uint32_t *)data;
> +	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
> +
> +	RTE_SET_USED(n);
> +	/* Set tbl8 group invalid */
> +	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
> +		__ATOMIC_RELAXED);
> +}
> +
> +/* Associate QSBR variable with an LPM object.
> + */
> +int
> +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
> +	struct rte_rcu_qsbr_dq **dq)
> +{
> +	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
> +	struct rte_rcu_qsbr_dq_parameters params = {0};
> +
> +	if ((lpm == NULL) || (cfg == NULL)) {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +
> +	if (lpm->v) {
> +		rte_errno = EEXIST;
> +		return 1;
> +	}
> +
> +	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
> +		/* No other things to do. */
> +	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
> +		/* Init QSBR defer queue. */
> +		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
> +				"LPM_RCU_%s", lpm->name);
> +		params.name = rcu_dq_name;
> +		params.size = cfg->dq_size;
> +		if (params.size == 0)
> +			params.size = lpm->number_tbl8s;
> +		params.trigger_reclaim_limit = cfg->reclaim_thd;
> +		params.max_reclaim_size = cfg->reclaim_max;
> +		if (params.max_reclaim_size == 0)
> +			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
> +		params.esize = sizeof(uint32_t);	/* tbl8 group index */
> +		params.free_fn = __lpm_rcu_qsbr_free_resource;
> +		params.p = lpm;
> +		params.v = cfg->v;
> +		lpm->dq = rte_rcu_qsbr_dq_create(&params);
> +		if (lpm->dq == NULL) {
> +			RTE_LOG(ERR, LPM,
> +					"LPM QS defer queue creation failed\n");
> +			return 1;
> +		}
> +		if (dq)
> +			*dq = lpm->dq;
> +	} else {
> +		rte_errno = EINVAL;
> +		return 1;
> +	}
> +	lpm->rcu_mode = cfg->mode;
> +	lpm->v = cfg->v;
> +
> +	return 0;
> +}
> +
>   /*
>    * Adds a rule to the rule table.
>    *
> @@ -394,14 +465,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
>    * Find, clean and allocate a tbl8.
>    */
>   static int32_t
> -tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
> +_tbl8_alloc(struct rte_lpm *lpm)
>   {
>   	uint32_t group_idx; /* tbl8 group index. */
>   	struct rte_lpm_tbl_entry *tbl8_entry;
>   
>   	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
> -	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
> -		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
> +	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
> +		tbl8_entry = &lpm->tbl8[group_idx *
> +					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
>   		/* If a free tbl8 group is found clean it and set as VALID. */
>   		if (!tbl8_entry->valid_group) {
>   			struct rte_lpm_tbl_entry new_tbl8_entry = {
> @@ -427,14 +499,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
>   	return -ENOSPC;
>   }
>   
> +static int32_t
> +tbl8_alloc(struct rte_lpm *lpm)
> +{
> +	int32_t group_idx; /* tbl8 group index. */
> +
> +	group_idx = _tbl8_alloc(lpm);
> +	if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
> +		/* If there are no tbl8 groups try to reclaim one. */
> +		if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
> +			group_idx = _tbl8_alloc(lpm);
> +	}
> +
> +	return group_idx;
> +}
> +
>   static void
> -tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
> +tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
>   {
> -	/* Set tbl8 group invalid*/
>   	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
>   
> -	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
> -			__ATOMIC_RELAXED);
> +	if (!lpm->v) {
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> +		/* Wait for quiescent state change. */
> +		rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
> +		/* Set tbl8 group invalid*/
> +		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
> +				__ATOMIC_RELAXED);
> +	} else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> +		/* Push into QSBR defer queue. */
> +		rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
> +	}
>   }
>   
>   static __rte_noinline int32_t
> @@ -523,7 +621,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   
>   	if (!lpm->tbl24[tbl24_index].valid) {
>   		/* Search for a free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc(lpm);
>   
>   		/* Check tbl8 allocation was successful. */
>   		if (tbl8_group_index < 0) {
> @@ -569,7 +667,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
>   	} /* If valid entry but not extended calculate the index into Table8. */
>   	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
>   		/* Search for free tbl8 group. */
> -		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
> +		tbl8_group_index = tbl8_alloc(lpm);
>   
>   		if (tbl8_group_index < 0) {
>   			return tbl8_group_index;
> @@ -977,7 +1075,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
>   		 */
>   		lpm->tbl24[tbl24_index].valid = 0;
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free(lpm->tbl8, tbl8_group_start);
> +		tbl8_free(lpm, tbl8_group_start);
>   	} else if (tbl8_recycle_index > -1) {
>   		/* Update tbl24 entry. */
>   		struct rte_lpm_tbl_entry new_tbl24_entry = {
> @@ -993,7 +1091,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
>   		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
>   				__ATOMIC_RELAXED);
>   		__atomic_thread_fence(__ATOMIC_RELEASE);
> -		tbl8_free(lpm->tbl8, tbl8_group_start);
> +		tbl8_free(lpm, tbl8_group_start);
>   	}
>   #undef group_idx
>   	return 0;
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b9d49ac87..7889f21b3 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>    */
>   
>   #ifndef _RTE_LPM_H_
> @@ -20,6 +21,7 @@
>   #include <rte_memory.h>
>   #include <rte_common.h>
>   #include <rte_vect.h>
> +#include <rte_rcu_qsbr.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -62,6 +64,17 @@ extern "C" {
>   /** Bitmask used to indicate successful lookup */
>   #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
>   
> +/** @internal Default RCU defer queue entries to reclaim in one go. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
> +
> +/** RCU reclamation modes */
> +enum rte_lpm_qsbr_mode {
> +	/** Create defer queue for reclaim. */
> +	RTE_LPM_QSBR_MODE_DQ = 0,
> +	/** Use blocking mode reclaim. No defer queue created. */
> +	RTE_LPM_QSBR_MODE_SYNC
> +};
> +
>   #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>   /** @internal Tbl24 entry structure. */
>   __extension__
> @@ -130,6 +143,28 @@ struct rte_lpm {
>   			__rte_cache_aligned; /**< LPM tbl24 table. */
>   	struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>   	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	/* RCU config. */
> +	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
> +	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
> +#endif
> +};
> +
> +/** LPM RCU QSBR configuration structure. */
> +struct rte_lpm_rcu_config {
> +	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
> +	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
> +	 * '0' for default: create defer queue for reclaim.
> +	 */
> +	enum rte_lpm_qsbr_mode mode;
> +	uint32_t dq_size;	/* RCU defer queue size.
> +				 * default: lpm->number_tbl8s.
> +				 */
> +	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
> +	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
> +				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
> +				 */
>   };
>   
>   /**
> @@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name);
>   void
>   rte_lpm_free(struct rte_lpm *lpm);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Associate RCU QSBR variable with an LPM object.
> + *
> + * @param lpm
> + *   the lpm object to add RCU QSBR
> + * @param cfg
> + *   RCU QSBR configuration
> + * @param dq
> + *   handler of created RCU QSBR defer queue
> + * @return
> + *   On success - 0
> + *   On error - 1 with error code set in rte_errno.
> + *   Possible rte_errno codes are:
> + *   - EINVAL - invalid pointer
> + *   - EEXIST - already added QSBR
> + *   - ENOMEM - memory allocation failure
> + */
> +__rte_experimental
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
> +	struct rte_rcu_qsbr_dq **dq);
> +
>   /**
>    * Add a rule to the LPM table.
>    *
> diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
> index 500f58b80..bfccd7eac 100644
> --- a/lib/librte_lpm/rte_lpm_version.map
> +++ b/lib/librte_lpm/rte_lpm_version.map
> @@ -21,3 +21,9 @@ DPDK_20.0 {
>   
>   	local: *;
>   };
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_lpm_rcu_qsbr_add;
> +};

Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>


-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v7 2/3] test/lpm: add LPM RCU integration functional tests
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
@ 2020-07-08 12:37       ` Medvedkin, Vladimir
  2020-07-08 14:00         ` Ruifeng Wang
  0 siblings, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2020-07-08 12:37 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd

Hi Ruifeng,

Just a few nits

On 07/07/2020 16:15, Ruifeng Wang wrote:
> Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
> Also test LPM library behavior when RCU QSBR is enabled.
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
>   app/test/test_lpm.c | 291 +++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 290 insertions(+), 1 deletion(-)
>
> diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
> index 3a3fd097f..93742e3c7 100644
> --- a/app/test/test_lpm.c
> +++ b/app/test/test_lpm.c
> @@ -8,6 +8,7 @@
>   
>   #include <rte_ip.h>
>   #include <rte_lpm.h>
> +#include <rte_malloc.h>
>   
>   #include "test.h"
>   #include "test_xmmt_ops.h"
> @@ -40,6 +41,9 @@ static int32_t test15(void);
>   static int32_t test16(void);
>   static int32_t test17(void);
>   static int32_t test18(void);
> +static int32_t test19(void);
> +static int32_t test20(void);
> +static int32_t test21(void);
>   
>   rte_lpm_test tests[] = {
>   /* Test Cases */
> @@ -61,7 +65,10 @@ rte_lpm_test tests[] = {
>   	test15,
>   	test16,
>   	test17,
> -	test18
> +	test18,
> +	test19,
> +	test20,
> +	test21
>   };
>   
>   #define MAX_DEPTH 32
> @@ -1265,6 +1272,288 @@ test18(void)
>   	return PASS;
>   }
>   
> +/*
> + * rte_lpm_rcu_qsbr_add positive and negative tests.
> + *  - Add RCU QSBR variable to LPM
> + *  - Add another RCU QSBR variable to LPM
> + *  - Check returns
> + */
> +int32_t
> +test19(void)
> +{
> +	struct rte_lpm *lpm = NULL;
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	struct rte_rcu_qsbr *qsv;
> +	struct rte_rcu_qsbr *qsv2;
> +	int32_t status;
> +	struct rte_lpm_rcu_config rcu_cfg = {0};
> +
> +	config.max_rules = MAX_RULES;
> +	config.number_tbl8s = NUMBER_TBL8S;
> +	config.flags = 0;
> +
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Create RCU QSBR variable */
> +	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
> +	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv != NULL);
> +
> +	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	rcu_cfg.v = qsv;
> +	/* Invalid QSBR mode */
> +	rcu_cfg.mode = 2;
> +	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
> +	/* Attach RCU QSBR to LPM table */
> +	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	/* Create and attach another RCU QSBR to LPM table */
> +	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv2 != NULL);
> +
> +	rcu_cfg.v = qsv2;
> +	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
> +	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	rte_lpm_free(lpm);
> +	rte_free(qsv);
> +	rte_free(qsv2);
> +
> +	return PASS;
> +}
> +
> +/*
> + * rte_lpm_rcu_qsbr_add DQ mode functional test.
> + * Reader and writer are in the same thread in this test.
> + *  - Create LPM which supports 1 tbl8 group at max
> + *  - Add RCU QSBR variable to LPM
> + *  - Add a rule with depth=28 (> 24)
> + *  - Register a reader thread (not a real thread)
> + *  - Reader lookup existing rule
> + *  - Writer delete the rule
> + *  - Reader lookup the rule
> + *  - Writer re-add the rule (no available tbl8 group)
> + *  - Reader report quiescent state and unregister
> + *  - Writer re-add the rule
> + *  - Reader lookup the rule
> + */
> +int32_t
> +test20(void)
> +{
> +	struct rte_lpm *lpm = NULL;
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	struct rte_rcu_qsbr *qsv;
> +	int32_t status;
> +	uint32_t ip, next_hop, next_hop_return;
> +	uint8_t depth;
> +	struct rte_lpm_rcu_config rcu_cfg = {0};
> +
> +	config.max_rules = MAX_RULES;
> +	config.number_tbl8s = 1;
> +	config.flags = 0;
> +
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Create RCU QSBR variable */
> +	sz = rte_rcu_qsbr_get_memsize(1);
> +	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(qsv != NULL);
> +
> +	status = rte_rcu_qsbr_init(qsv, 1);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	rcu_cfg.v = qsv;
> +	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
> +	/* Attach RCU QSBR to LPM table */
> +	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	ip = RTE_IPV4(192, 18, 100, 100);


This is a globally routed ip, it looks like you missed "6" in the second 
octet. Here it is better to use the rfc5737 address, rather than rfc1918.


> +	depth = 28;
> +	next_hop = 1;
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
> +
> +	/* Register pseudo reader */
> +	status = rte_rcu_qsbr_thread_register(qsv, 0);
> +	TEST_LPM_ASSERT(status == 0);
> +	rte_rcu_qsbr_thread_online(qsv, 0);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(next_hop_return == next_hop);
> +
> +	/* Writer update */
> +	status = rte_lpm_delete(lpm, ip, depth);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status != 0);
> +
> +	/* Reader quiescent */
> +	rte_rcu_qsbr_quiescent(qsv, 0);
> +
> +	status = rte_lpm_add(lpm, ip, depth, next_hop);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	rte_rcu_qsbr_thread_offline(qsv, 0);
> +	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
> +	TEST_LPM_ASSERT(status == 0);
> +	TEST_LPM_ASSERT(next_hop_return == next_hop);
> +
> +	rte_lpm_free(lpm);
> +	rte_free(qsv);
> +
> +	return PASS;
> +}
> +
> +static struct rte_lpm *g_lpm;
> +static struct rte_rcu_qsbr *g_v;
> +static uint32_t g_ip = RTE_IPV4(192, 18, 100, 100);


Same here as above


> +static volatile uint8_t writer_done;
> +/* Report quiescent state interval every 1024 lookups. Larger critical
> + * sections in reader will result in writer polling multiple times.
> + */
> +#define QSBR_REPORTING_INTERVAL 1024
> +#define WRITER_ITERATIONS	512
> +
> +/*
> + * Reader thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_reader(void *arg)
> +{
> +	int i;
> +	uint32_t next_hop_return = 0;
> +
> +	RTE_SET_USED(arg);
> +	/* Register this thread to report quiescent state */
> +	rte_rcu_qsbr_thread_register(g_v, 0);
> +	rte_rcu_qsbr_thread_online(g_v, 0);
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);
> +
> +		/* Update quiescent state */
> +		rte_rcu_qsbr_quiescent(g_v, 0);
> +	} while (!writer_done);
> +
> +	rte_rcu_qsbr_thread_offline(g_v, 0);
> +	rte_rcu_qsbr_thread_unregister(g_v, 0);
> +
> +	return 0;
> +}
> +
> +/*
> + * rte_lpm_rcu_qsbr_add sync mode functional test.
> + * 1 Reader and 1 writer. They cannot be in the same thread in this test.
> + *  - Create LPM which supports 1 tbl8 group at max
> + *  - Add RCU QSBR variable with sync mode to LPM
> + *  - Register a reader thread. Reader keeps looking up a specific rule.
> + *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)
> + */
> +int32_t
> +test21(void)
> +{
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	int32_t status;
> +	uint32_t i, next_hop;
> +	uint8_t depth;
> +	struct rte_lpm_rcu_config rcu_cfg = {0};
> +
> +	if (rte_lcore_count() < 2) {
> +		printf("Not enough cores for %s, expecting at least 2\n",
> +			__func__);
> +		return TEST_SKIPPED;
> +	}
> +
> +	config.max_rules = MAX_RULES;
> +	config.number_tbl8s = 1;
> +	config.flags = 0;
> +
> +	g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(g_lpm != NULL);
> +
> +	/* Create RCU QSBR variable */
> +	sz = rte_rcu_qsbr_get_memsize(1);
> +	g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
> +				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
> +	TEST_LPM_ASSERT(g_v != NULL);
> +
> +	status = rte_rcu_qsbr_init(g_v, 1);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	rcu_cfg.v = g_v;
> +	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
> +	/* Attach RCU QSBR to LPM table */
> +	status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);
> +	TEST_LPM_ASSERT(status == 0);
> +
> +	writer_done = 0;
> +	/* Launch reader thread */
> +	rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +				rte_get_next_lcore(-1, 1, 0));
> +
> +	depth = 28;
> +	next_hop = 1;
> +	status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
> +	if (status != 0) {
> +		printf("%s: Failed to add rule\n", __func__);
> +		goto error;
> +	}
> +
> +	/* Writer update */
> +	for (i = 0; i < WRITER_ITERATIONS; i++) {
> +		status = rte_lpm_delete(g_lpm, g_ip, depth);
> +		if (status != 0) {
> +			printf("%s: Failed to delete rule at iteration %d\n",
> +				__func__, i);
> +			goto error;
> +		}
> +
> +		status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
> +		if (status != 0) {
> +			printf("%s: Failed to add rule at iteration %d\n",
> +				__func__, i);
> +			goto error;
> +		}
> +	}
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until reader exited. */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(g_lpm);
> +	rte_free(g_v);
> +
> +	return (status == 0) ? PASS : -1;
> +}
> +
>   /*
>    * Do all unit tests.
>    */

Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>



-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v7 3/3] test/lpm: add RCU integration performance tests
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
@ 2020-07-08 12:37       ` Medvedkin, Vladimir
  2020-07-08 14:07         ` Ruifeng Wang
  0 siblings, 1 reply; 137+ messages in thread
From: Medvedkin, Vladimir @ 2020-07-08 12:37 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd


On 07/07/2020 16:15, Ruifeng Wang wrote:
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>
> Add performance tests for RCU integration. The performance
> difference with and without RCU integration is very small
> (~1% to ~2%) on both Arm and x86 platforms.
>
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 489 insertions(+), 3 deletions(-)
>
> diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
> index 489719c40..dfe186426 100644
> --- a/app/test/test_lpm_perf.c
> +++ b/app/test/test_lpm_perf.c
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>    */
>   
>   #include <stdio.h>
> @@ -10,12 +11,27 @@
>   #include <rte_cycles.h>
>   #include <rte_random.h>
>   #include <rte_branch_prediction.h>
> +#include <rte_malloc.h>
>   #include <rte_ip.h>
>   #include <rte_lpm.h>
>   
>   #include "test.h"
>   #include "test_xmmt_ops.h"
>   
> +struct rte_lpm *lpm;
> +static struct rte_rcu_qsbr *rv;
> +static volatile uint8_t writer_done;
> +static volatile uint32_t thr_id;
> +static uint64_t gwrite_cycles;
> +static uint64_t gwrites;
> +/* LPM APIs are not thread safe, use mutex to provide thread safety */
> +static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
> +
> +/* Report quiescent state interval every 1024 lookups. Larger critical
> + * sections in reader will result in writer polling multiple times.
> + */
> +#define QSBR_REPORTING_INTERVAL 1024
> +
>   #define TEST_LPM_ASSERT(cond) do {                                            \
>   	if (!(cond)) {                                                        \
>   		printf("Error at line %d: \n", __LINE__);                     \
> @@ -24,6 +40,7 @@
>   } while(0)
>   
>   #define ITERATIONS (1 << 10)
> +#define RCU_ITERATIONS 10
>   #define BATCH_SIZE (1 << 12)
>   #define BULK_SIZE 32
>   
> @@ -35,9 +52,13 @@ struct route_rule {
>   };
>   
>   static struct route_rule large_route_table[MAX_RULE_NUM];
> +/* Route table for routes with depth > 24 */
> +struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
>   
>   static uint32_t num_route_entries;
> +static uint32_t num_ldepth_route_entries;
>   #define NUM_ROUTE_ENTRIES num_route_entries
> +#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
>   
>   enum {
>   	IP_CLASS_A,
> @@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
>   	uint32_t ip_head_mask;
>   	uint32_t rule_num;
>   	uint32_t k;
> -	struct route_rule *ptr_rule;
> +	struct route_rule *ptr_rule, *ptr_ldepth_rule;
>   
>   	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
>   		fixed_bit_num = IP_HEAD_BIT_NUM_A;
> @@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
>   	 */
>   	start = lrand48() & mask;
>   	ptr_rule = &large_route_table[num_route_entries];
> +	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
>   	for (k = 0; k < rule_num; k++) {
>   		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
>   			| ip_head_mask;
>   		ptr_rule->depth = depth;
> +		/* If the depth of the route is more than 24, store it
> +		 * in another table as well.
> +		 */
> +		if (depth > 24) {
> +			ptr_ldepth_rule->ip = ptr_rule->ip;
> +			ptr_ldepth_rule->depth = ptr_rule->depth;
> +			ptr_ldepth_rule++;
> +			num_ldepth_route_entries++;
> +		}
>   		ptr_rule++;
>   		start = (start + step) & mask;
>   	}
> @@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)
>   	uint8_t  depth;
>   
>   	num_route_entries = 0;
> +	num_ldepth_route_entries = 0;
>   	memset(large_route_table, 0, sizeof(large_route_table));
>   
>   	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
> @@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
>   	printf("\n");
>   }
>   
> +/* Check condition and return an error if true. */
> +static uint16_t enabled_core_ids[RTE_MAX_LCORE];
> +static unsigned int num_cores;
> +
> +/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
> +static inline uint32_t
> +alloc_thread_id(void)
> +{
> +	uint32_t tmp_thr_id;
> +
> +	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
> +	if (tmp_thr_id >= RTE_MAX_LCORE)
> +		printf("Invalid thread id %u\n", tmp_thr_id);
> +
> +	return tmp_thr_id;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure without RCU.
> + */
> +static int
> +test_lpm_reader(void *arg)
> +{
> +	int i;
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	RTE_SET_USED(arg);
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +	} while (!writer_done);
> +
> +	return 0;
> +}
> +
> +/*
> + * Reader thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_reader(void *arg)
> +{
> +	int i;
> +	uint32_t thread_id = alloc_thread_id();
> +	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
> +	uint32_t next_hop_return = 0;
> +
> +	RTE_SET_USED(arg);
> +	/* Register this thread to report quiescent state */
> +	rte_rcu_qsbr_thread_register(rv, thread_id);
> +	rte_rcu_qsbr_thread_online(rv, thread_id);
> +
> +	do {
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			ip_batch[i] = rte_rand();
> +
> +		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
> +			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
> +
> +		/* Update quiescent state */
> +		rte_rcu_qsbr_quiescent(rv, thread_id);
> +	} while (!writer_done);
> +
> +	rte_rcu_qsbr_thread_offline(rv, thread_id);
> +	rte_rcu_qsbr_thread_unregister(rv, thread_id);
> +
> +	return 0;
> +}
> +
> +/*
> + * Writer thread using rte_lpm data structure with RCU.
> + */
> +static int
> +test_lpm_rcu_qsbr_writer(void *arg)
> +{
> +	unsigned int i, j, si, ei;
> +	uint64_t begin, total_cycles;
> +	uint8_t core_id = (uint8_t)((uintptr_t)arg);
> +	uint32_t next_hop_add = 0xAA;
> +
> +	RTE_SET_USED(arg);
> +	/* 2 writer threads are used */
> +	if (core_id % 2 == 0) {
> +		si = 0;
> +		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
> +	} else {
> +		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
> +		ei = NUM_LDEPTH_ROUTE_ENTRIES;
> +	}
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = si; j < ei; j++) {
> +			pthread_mutex_lock(&lpm_mutex);
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +			}
> +			pthread_mutex_unlock(&lpm_mutex);
> +		}
> +
> +		/* Delete all the entries */
> +		for (j = si; j < ei; j++) {
> +			pthread_mutex_lock(&lpm_mutex);
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +			}
> +			pthread_mutex_unlock(&lpm_mutex);
> +		}
> +	}
> +
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	__atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);
> +	__atomic_fetch_add(&gwrites,
> +			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,
> +			__ATOMIC_RELAXED);
> +
> +	return 0;
> +}
> +
> +/*
> + * Functional test:
> + * 2 writers, rest are readers
> + */
> +static int
> +test_lpm_rcu_perf_multi_writer(void)
> +{
> +	struct rte_lpm_config config;
> +	size_t sz;
> +	unsigned int i;
> +	uint16_t core_id;
> +	struct rte_lpm_rcu_config rcu_cfg = {0};
> +
> +	if (rte_lcore_count() < 3) {
> +		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_cores = 0;
> +	RTE_LCORE_FOREACH_SLAVE(core_id) {
> +		enabled_core_ids[num_cores] = core_id;
> +		num_cores++;
> +	}
> +
> +	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
> +		num_cores - 2);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Init RCU variable */
> +	sz = rte_rcu_qsbr_get_memsize(num_cores);
> +	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
> +						RTE_CACHE_LINE_SIZE);
> +	rte_rcu_qsbr_init(rv, num_cores);
> +
> +	rcu_cfg.v = rv;
> +	/* Assign the RCU variable to LPM */
> +	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
> +		printf("RCU variable assignment failed\n");
> +		goto error;
> +	}
> +
> +	writer_done = 0;
> +	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
> +
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Launch writer threads */
> +	for (i = 0; i < 2; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
> +					(void *)(uintptr_t)i,
> +					enabled_core_ids[i]);
> +
> +	/* Wait for writer threads */
> +	for (i = 0; i < 2; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	printf("Total LPM Adds: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
> +		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
> +			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
> +		);
> +
> +	/* Wait and check return value from reader threads */
> +	writer_done = 1;
> +	for (i = 2; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +	lpm = NULL;
> +	rv = NULL;
> +
> +	/* Test without RCU integration */
> +	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
> +		num_cores - 2);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	writer_done = 0;
> +	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Launch writer threads */
> +	for (i = 0; i < 2; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
> +					(void *)(uintptr_t)i,
> +					enabled_core_ids[i]);
> +
> +	/* Wait for writer threads */
> +	for (i = 0; i < 2; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	printf("Total LPM Adds: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
> +		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
> +			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
> +		);
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 2; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +
> +	return 0;
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until all readers have exited */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +
> +	return -1;
> +}
> +
> +/*
> + * Functional test:
> + * Single writer, rest are readers
> + */
> +static int
> +test_lpm_rcu_perf(void)
> +{
> +	struct rte_lpm_config config;
> +	uint64_t begin, total_cycles;
> +	size_t sz;
> +	unsigned int i, j;
> +	uint16_t core_id;
> +	uint32_t next_hop_add = 0xAA;
> +	struct rte_lpm_rcu_config rcu_cfg = {0};
> +
> +	if (rte_lcore_count() < 2) {
> +		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_cores = 0;
> +	RTE_LCORE_FOREACH_SLAVE(core_id) {
> +		enabled_core_ids[num_cores] = core_id;
> +		num_cores++;
> +	}
> +
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	/* Init RCU variable */
> +	sz = rte_rcu_qsbr_get_memsize(num_cores);
> +	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
> +						RTE_CACHE_LINE_SIZE);
> +	rte_rcu_qsbr_init(rv, num_cores);
> +
> +	rcu_cfg.v = rv;
> +	/* Assign the RCU variable to LPM */
> +	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
> +		printf("RCU variable assignment failed\n");
> +		goto error;
> +	}
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			goto error;
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +	lpm = NULL;
> +	rv = NULL;
> +
> +	/* Test without RCU integration */
> +	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
> +		num_cores);
> +
> +	/* Create LPM table */
> +	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
> +	config.flags = 0;
> +	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
> +	TEST_LPM_ASSERT(lpm != NULL);
> +
> +	writer_done = 0;
> +	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
> +
> +	/* Launch reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		rte_eal_remote_launch(test_lpm_reader, NULL,
> +					enabled_core_ids[i]);
> +
> +	/* Measure add/delete. */
> +	begin = rte_rdtsc_precise();
> +	for (i = 0; i < RCU_ITERATIONS; i++) {
> +		/* Add all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
> +					large_ldepth_route_table[j].depth,
> +					next_hop_add) != 0) {
> +				printf("Failed to add iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +
> +		/* Delete all the entries */
> +		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
> +			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
> +				large_ldepth_route_table[j].depth) != 0) {
> +				printf("Failed to delete iteration %d, route# %d\n",
> +					i, j);
> +				goto error;
> +			}
> +	}
> +	total_cycles = rte_rdtsc_precise() - begin;
> +
> +	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Total LPM Deletes: %d\n",
> +		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
> +	printf("Average LPM Add/Del: %g cycles\n",
> +		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
> +
> +	writer_done = 1;
> +	/* Wait and check return value from reader threads */
> +	for (i = 0; i < num_cores; i++)
> +		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
> +			printf("Warning: lcore %u not finished.\n",
> +				enabled_core_ids[i]);
> +
> +	rte_lpm_free(lpm);
> +
> +	return 0;
> +
> +error:
> +	writer_done = 1;
> +	/* Wait until all readers have exited */
> +	rte_eal_mp_wait_lcore();
> +
> +	rte_lpm_free(lpm);
> +	rte_free(rv);
> +
> +	return -1;
> +}
> +
>   static int
>   test_lpm_perf(void)
>   {
> -	struct rte_lpm *lpm = NULL;
>   	struct rte_lpm_config config;
>   
>   	config.max_rules = 2000000;
> @@ -343,7 +825,7 @@ test_lpm_perf(void)
>   	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
>   	TEST_LPM_ASSERT(lpm != NULL);
>   
> -	/* Measue add. */


unintentional typo?


> +	/* Measure add. */
>   	begin = rte_rdtsc();
>   
>   	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
> @@ -478,6 +960,10 @@ test_lpm_perf(void)
>   	rte_lpm_delete_all(lpm);
>   	rte_lpm_free(lpm);
>   
> +	test_lpm_rcu_perf();
> +
> +	test_lpm_rcu_perf_multi_writer();
> +
>   	return 0;
>   }
>   

Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>



-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v7 2/3] test/lpm: add LPM RCU integration functional tests
  2020-07-08 12:37       ` Medvedkin, Vladimir
@ 2020-07-08 14:00         ` Ruifeng Wang
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-08 14:00 UTC (permalink / raw)
  To: Medvedkin, Vladimir, Bruce Richardson
  Cc: dev, mdr, konstantin.ananyev, Honnappa Nagarahalli, nd, nd


From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
Sent: Wednesday, July 8, 2020 8:37 PM
To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org; mdr@ashroe.eu; konstantin.ananyev@intel.com; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
Subject: Re: [PATCH v7 2/3] test/lpm: add LPM RCU integration functional tests


Hi Ruifeng,

Just a few nits

[Ruifeng] Thank you for reviewing this patch.
On 07/07/2020 16:15, Ruifeng Wang wrote:

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.

Also test LPM library behavior when RCU QSBR is enabled.



Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com><mailto:ruifeng.wang@arm.com>

Reviewed-by: Gavin Hu <gavin.hu@arm.com><mailto:gavin.hu@arm.com>

Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com><mailto:honnappa.nagarahalli@arm.com>

---

 app/test/test_lpm.c | 291 +++++++++++++++++++++++++++++++++++++++++++-

 1 file changed, 290 insertions(+), 1 deletion(-)



diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c

index 3a3fd097f..93742e3c7 100644

--- a/app/test/test_lpm.c

+++ b/app/test/test_lpm.c

@@ -8,6 +8,7 @@



 #include <rte_ip.h>

 #include <rte_lpm.h>

+#include <rte_malloc.h>



 #include "test.h"

 #include "test_xmmt_ops.h"

@@ -40,6 +41,9 @@ static int32_t test15(void);

 static int32_t test16(void);

 static int32_t test17(void);

 static int32_t test18(void);

+static int32_t test19(void);

+static int32_t test20(void);

+static int32_t test21(void);



 rte_lpm_test tests[] = {

 /* Test Cases */

@@ -61,7 +65,10 @@ rte_lpm_test tests[] = {

        test15,

        test16,

        test17,

-       test18

+       test18,

+       test19,

+       test20,

+       test21

 };



 #define MAX_DEPTH 32

@@ -1265,6 +1272,288 @@ test18(void)

        return PASS;

 }



+/*

+ * rte_lpm_rcu_qsbr_add positive and negative tests.

+ *  - Add RCU QSBR variable to LPM

+ *  - Add another RCU QSBR variable to LPM

+ *  - Check returns

+ */

+int32_t

+test19(void)

+{

+       struct rte_lpm *lpm = NULL;

+       struct rte_lpm_config config;

+       size_t sz;

+       struct rte_rcu_qsbr *qsv;

+       struct rte_rcu_qsbr *qsv2;

+       int32_t status;

+       struct rte_lpm_rcu_config rcu_cfg = {0};

+

+       config.max_rules = MAX_RULES;

+       config.number_tbl8s = NUMBER_TBL8S;

+       config.flags = 0;

+

+       lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);

+       TEST_LPM_ASSERT(lpm != NULL);

+

+       /* Create RCU QSBR variable */

+       sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);

+       qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,

+                                      RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);

+       TEST_LPM_ASSERT(qsv != NULL);

+

+       status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);

+       TEST_LPM_ASSERT(status == 0);

+

+       rcu_cfg.v = qsv;

+       /* Invalid QSBR mode */

+       rcu_cfg.mode = 2;

+       status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);

+       TEST_LPM_ASSERT(status != 0);

+

+       rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;

+       /* Attach RCU QSBR to LPM table */

+       status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);

+       TEST_LPM_ASSERT(status == 0);

+

+       /* Create and attach another RCU QSBR to LPM table */

+       qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,

+                                      RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);

+       TEST_LPM_ASSERT(qsv2 != NULL);

+

+       rcu_cfg.v = qsv2;

+       rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;

+       status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);

+       TEST_LPM_ASSERT(status != 0);

+

+       rte_lpm_free(lpm);

+       rte_free(qsv);

+       rte_free(qsv2);

+

+       return PASS;

+}

+

+/*

+ * rte_lpm_rcu_qsbr_add DQ mode functional test.

+ * Reader and writer are in the same thread in this test.

+ *  - Create LPM which supports 1 tbl8 group at max

+ *  - Add RCU QSBR variable to LPM

+ *  - Add a rule with depth=28 (> 24)

+ *  - Register a reader thread (not a real thread)

+ *  - Reader lookup existing rule

+ *  - Writer delete the rule

+ *  - Reader lookup the rule

+ *  - Writer re-add the rule (no available tbl8 group)

+ *  - Reader report quiescent state and unregister

+ *  - Writer re-add the rule

+ *  - Reader lookup the rule

+ */

+int32_t

+test20(void)

+{

+       struct rte_lpm *lpm = NULL;

+       struct rte_lpm_config config;

+       size_t sz;

+       struct rte_rcu_qsbr *qsv;

+       int32_t status;

+       uint32_t ip, next_hop, next_hop_return;

+       uint8_t depth;

+       struct rte_lpm_rcu_config rcu_cfg = {0};

+

+       config.max_rules = MAX_RULES;

+       config.number_tbl8s = 1;

+       config.flags = 0;

+

+       lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);

+       TEST_LPM_ASSERT(lpm != NULL);

+

+       /* Create RCU QSBR variable */

+       sz = rte_rcu_qsbr_get_memsize(1);

+       qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,

+                              RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);

+       TEST_LPM_ASSERT(qsv != NULL);

+

+       status = rte_rcu_qsbr_init(qsv, 1);

+       TEST_LPM_ASSERT(status == 0);

+

+       rcu_cfg.v = qsv;

+       rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;

+       /* Attach RCU QSBR to LPM table */

+       status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);

+       TEST_LPM_ASSERT(status == 0);

+

+       ip = RTE_IPV4(192, 18, 100, 100);



This is a globally routed ip, it looks like you missed "6" in the second octet. Here it is better to use the rfc5737 address, rather than rfc1918.

[Ruifeng] Agreed. Use reserved address is better. Will change to rfc5737 address in next version.



+       depth = 28;

+       next_hop = 1;

+       status = rte_lpm_add(lpm, ip, depth, next_hop);

+       TEST_LPM_ASSERT(status == 0);

+       TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);

+

+       /* Register pseudo reader */

+       status = rte_rcu_qsbr_thread_register(qsv, 0);

+       TEST_LPM_ASSERT(status == 0);

+       rte_rcu_qsbr_thread_online(qsv, 0);

+

+       status = rte_lpm_lookup(lpm, ip, &next_hop_return);

+       TEST_LPM_ASSERT(status == 0);

+       TEST_LPM_ASSERT(next_hop_return == next_hop);

+

+       /* Writer update */

+       status = rte_lpm_delete(lpm, ip, depth);

+       TEST_LPM_ASSERT(status == 0);

+       TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);

+

+       status = rte_lpm_lookup(lpm, ip, &next_hop_return);

+       TEST_LPM_ASSERT(status != 0);

+

+       status = rte_lpm_add(lpm, ip, depth, next_hop);

+       TEST_LPM_ASSERT(status != 0);

+

+       /* Reader quiescent */

+       rte_rcu_qsbr_quiescent(qsv, 0);

+

+       status = rte_lpm_add(lpm, ip, depth, next_hop);

+       TEST_LPM_ASSERT(status == 0);

+

+       rte_rcu_qsbr_thread_offline(qsv, 0);

+       status = rte_rcu_qsbr_thread_unregister(qsv, 0);

+       TEST_LPM_ASSERT(status == 0);

+

+       status = rte_lpm_lookup(lpm, ip, &next_hop_return);

+       TEST_LPM_ASSERT(status == 0);

+       TEST_LPM_ASSERT(next_hop_return == next_hop);

+

+       rte_lpm_free(lpm);

+       rte_free(qsv);

+

+       return PASS;

+}

+

+static struct rte_lpm *g_lpm;

+static struct rte_rcu_qsbr *g_v;

+static uint32_t g_ip = RTE_IPV4(192, 18, 100, 100);



Same here as above

[Ruifeng] Will change. Thank you.



+static volatile uint8_t writer_done;

+/* Report quiescent state interval every 1024 lookups. Larger critical

+ * sections in reader will result in writer polling multiple times.

+ */

+#define QSBR_REPORTING_INTERVAL 1024

+#define WRITER_ITERATIONS     512

+

+/*

+ * Reader thread using rte_lpm data structure with RCU.

+ */

+static int

+test_lpm_rcu_qsbr_reader(void *arg)

+{

+       int i;

+       uint32_t next_hop_return = 0;

+

+       RTE_SET_USED(arg);

+       /* Register this thread to report quiescent state */

+       rte_rcu_qsbr_thread_register(g_v, 0);

+       rte_rcu_qsbr_thread_online(g_v, 0);

+

+       do {

+               for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)

+                       rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);

+

+               /* Update quiescent state */

+               rte_rcu_qsbr_quiescent(g_v, 0);

+       } while (!writer_done);

+

+       rte_rcu_qsbr_thread_offline(g_v, 0);

+       rte_rcu_qsbr_thread_unregister(g_v, 0);

+

+       return 0;

+}

+

+/*

+ * rte_lpm_rcu_qsbr_add sync mode functional test.

+ * 1 Reader and 1 writer. They cannot be in the same thread in this test.

+ *  - Create LPM which supports 1 tbl8 group at max

+ *  - Add RCU QSBR variable with sync mode to LPM

+ *  - Register a reader thread. Reader keeps looking up a specific rule.

+ *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)

+ */

+int32_t

+test21(void)

+{

+       struct rte_lpm_config config;

+       size_t sz;

+       int32_t status;

+       uint32_t i, next_hop;

+       uint8_t depth;

+       struct rte_lpm_rcu_config rcu_cfg = {0};

+

+       if (rte_lcore_count() < 2) {

+               printf("Not enough cores for %s, expecting at least 2\n",

+                       __func__);

+               return TEST_SKIPPED;

+       }

+

+       config.max_rules = MAX_RULES;

+       config.number_tbl8s = 1;

+       config.flags = 0;

+

+       g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);

+       TEST_LPM_ASSERT(g_lpm != NULL);

+

+       /* Create RCU QSBR variable */

+       sz = rte_rcu_qsbr_get_memsize(1);

+       g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,

+                              RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);

+       TEST_LPM_ASSERT(g_v != NULL);

+

+       status = rte_rcu_qsbr_init(g_v, 1);

+       TEST_LPM_ASSERT(status == 0);

+

+       rcu_cfg.v = g_v;

+       rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;

+       /* Attach RCU QSBR to LPM table */

+       status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);

+       TEST_LPM_ASSERT(status == 0);

+

+       writer_done = 0;

+       /* Launch reader thread */

+       rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,

+                              rte_get_next_lcore(-1, 1, 0));

+

+       depth = 28;

+       next_hop = 1;

+       status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);

+       if (status != 0) {

+               printf("%s: Failed to add rule\n", __func__);

+               goto error;

+       }

+

+       /* Writer update */

+       for (i = 0; i < WRITER_ITERATIONS; i++) {

+               status = rte_lpm_delete(g_lpm, g_ip, depth);

+               if (status != 0) {

+                       printf("%s: Failed to delete rule at iteration %d\n",

+                              __func__, i);

+                       goto error;

+               }

+

+               status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);

+               if (status != 0) {

+                       printf("%s: Failed to add rule at iteration %d\n",

+                              __func__, i);

+                       goto error;

+               }

+       }

+

+error:

+       writer_done = 1;

+       /* Wait until reader exited. */

+       rte_eal_mp_wait_lcore();

+

+       rte_lpm_free(g_lpm);

+       rte_free(g_v);

+

+       return (status == 0) ? PASS : -1;

+}

+

 /*

  * Do all unit tests.

  */



Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com><mailto:vladimir.medvedkin@intel.com>







--

Regards,

Vladimir

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v7 3/3] test/lpm: add RCU integration performance tests
  2020-07-08 12:37       ` Medvedkin, Vladimir
@ 2020-07-08 14:07         ` Ruifeng Wang
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-08 14:07 UTC (permalink / raw)
  To: Medvedkin, Vladimir, Bruce Richardson
  Cc: dev, mdr, konstantin.ananyev, Honnappa Nagarahalli, nd, nd


From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
Sent: Wednesday, July 8, 2020 8:37 PM
To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org; mdr@ashroe.eu; konstantin.ananyev@intel.com; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
Subject: Re: [PATCH v7 3/3] test/lpm: add RCU integration performance tests



On 07/07/2020 16:15, Ruifeng Wang wrote:

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com><mailto:honnappa.nagarahalli@arm.com>



Add performance tests for RCU integration. The performance

difference with and without RCU integration is very small

(~1% to ~2%) on both Arm and x86 platforms.



Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com><mailto:honnappa.nagarahalli@arm.com>

Reviewed-by: Gavin Hu <gavin.hu@arm.com><mailto:gavin.hu@arm.com>

Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com><mailto:ruifeng.wang@arm.com>

---

 app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-

 1 file changed, 489 insertions(+), 3 deletions(-)



diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c

index 489719c40..dfe186426 100644

--- a/app/test/test_lpm_perf.c

+++ b/app/test/test_lpm_perf.c

@@ -1,5 +1,6 @@

 /* SPDX-License-Identifier: BSD-3-Clause

  * Copyright(c) 2010-2014 Intel Corporation

+ * Copyright(c) 2020 Arm Limited

  */



 #include <stdio.h>

@@ -10,12 +11,27 @@

 #include <rte_cycles.h>

 #include <rte_random.h>

 #include <rte_branch_prediction.h>

+#include <rte_malloc.h>

 #include <rte_ip.h>

 #include <rte_lpm.h>



 #include "test.h"

 #include "test_xmmt_ops.h"



+struct rte_lpm *lpm;

+static struct rte_rcu_qsbr *rv;

+static volatile uint8_t writer_done;

+static volatile uint32_t thr_id;

+static uint64_t gwrite_cycles;

+static uint64_t gwrites;

+/* LPM APIs are not thread safe, use mutex to provide thread safety */

+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;

+

+/* Report quiescent state interval every 1024 lookups. Larger critical

+ * sections in reader will result in writer polling multiple times.

+ */

+#define QSBR_REPORTING_INTERVAL 1024

+

 #define TEST_LPM_ASSERT(cond) do {                                            \

        if (!(cond)) {                                                        \

                printf("Error at line %d: \n", __LINE__);                     \

@@ -24,6 +40,7 @@

 } while(0)



 #define ITERATIONS (1 << 10)

+#define RCU_ITERATIONS 10

 #define BATCH_SIZE (1 << 12)

 #define BULK_SIZE 32



@@ -35,9 +52,13 @@ struct route_rule {

 };



 static struct route_rule large_route_table[MAX_RULE_NUM];

+/* Route table for routes with depth > 24 */

+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];



 static uint32_t num_route_entries;

+static uint32_t num_ldepth_route_entries;

 #define NUM_ROUTE_ENTRIES num_route_entries

+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries



 enum {

        IP_CLASS_A,

@@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)

        uint32_t ip_head_mask;

        uint32_t rule_num;

        uint32_t k;

-       struct route_rule *ptr_rule;

+       struct route_rule *ptr_rule, *ptr_ldepth_rule;



        if (ip_class == IP_CLASS_A) {        /* IP Address class A */

                fixed_bit_num = IP_HEAD_BIT_NUM_A;

@@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)

         */

        start = lrand48() & mask;

        ptr_rule = &large_route_table[num_route_entries];

+       ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];

        for (k = 0; k < rule_num; k++) {

                ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))

                        | ip_head_mask;

                ptr_rule->depth = depth;

+               /* If the depth of the route is more than 24, store it

+                * in another table as well.

+                */

+               if (depth > 24) {

+                       ptr_ldepth_rule->ip = ptr_rule->ip;

+                       ptr_ldepth_rule->depth = ptr_rule->depth;

+                       ptr_ldepth_rule++;

+                       num_ldepth_route_entries++;

+               }

                ptr_rule++;

                start = (start + step) & mask;

        }

@@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)

        uint8_t  depth;



        num_route_entries = 0;

+       num_ldepth_route_entries = 0;

        memset(large_route_table, 0, sizeof(large_route_table));



        for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {

@@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)

        printf("\n");

 }



+/* Check condition and return an error if true. */

+static uint16_t enabled_core_ids[RTE_MAX_LCORE];

+static unsigned int num_cores;

+

+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */

+static inline uint32_t

+alloc_thread_id(void)

+{

+       uint32_t tmp_thr_id;

+

+       tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);

+       if (tmp_thr_id >= RTE_MAX_LCORE)

+               printf("Invalid thread id %u\n", tmp_thr_id);

+

+       return tmp_thr_id;

+}

+

+/*

+ * Reader thread using rte_lpm data structure without RCU.

+ */

+static int

+test_lpm_reader(void *arg)

+{

+       int i;

+       uint32_t ip_batch[QSBR_REPORTING_INTERVAL];

+       uint32_t next_hop_return = 0;

+

+       RTE_SET_USED(arg);

+       do {

+               for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)

+                       ip_batch[i] = rte_rand();

+

+               for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)

+                       rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);

+

+       } while (!writer_done);

+

+       return 0;

+}

+

+/*

+ * Reader thread using rte_lpm data structure with RCU.

+ */

+static int

+test_lpm_rcu_qsbr_reader(void *arg)

+{

+       int i;

+       uint32_t thread_id = alloc_thread_id();

+       uint32_t ip_batch[QSBR_REPORTING_INTERVAL];

+       uint32_t next_hop_return = 0;

+

+       RTE_SET_USED(arg);

+       /* Register this thread to report quiescent state */

+       rte_rcu_qsbr_thread_register(rv, thread_id);

+       rte_rcu_qsbr_thread_online(rv, thread_id);

+

+       do {

+               for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)

+                       ip_batch[i] = rte_rand();

+

+               for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)

+                       rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);

+

+               /* Update quiescent state */

+               rte_rcu_qsbr_quiescent(rv, thread_id);

+       } while (!writer_done);

+

+       rte_rcu_qsbr_thread_offline(rv, thread_id);

+       rte_rcu_qsbr_thread_unregister(rv, thread_id);

+

+       return 0;

+}

+

+/*

+ * Writer thread using rte_lpm data structure with RCU.

+ */

+static int

+test_lpm_rcu_qsbr_writer(void *arg)

+{

+       unsigned int i, j, si, ei;

+       uint64_t begin, total_cycles;

+       uint8_t core_id = (uint8_t)((uintptr_t)arg);

+       uint32_t next_hop_add = 0xAA;

+

+       RTE_SET_USED(arg);

+       /* 2 writer threads are used */

+       if (core_id % 2 == 0) {

+               si = 0;

+               ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;

+       } else {

+               si = NUM_LDEPTH_ROUTE_ENTRIES / 2;

+               ei = NUM_LDEPTH_ROUTE_ENTRIES;

+       }

+

+       /* Measure add/delete. */

+       begin = rte_rdtsc_precise();

+       for (i = 0; i < RCU_ITERATIONS; i++) {

+               /* Add all the entries */

+               for (j = si; j < ei; j++) {

+                       pthread_mutex_lock(&lpm_mutex);

+                       if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,

+                                      large_ldepth_route_table[j].depth,

+                                      next_hop_add) != 0) {

+                              printf("Failed to add iteration %d, route# %d\n",

+                                      i, j);

+                       }

+                       pthread_mutex_unlock(&lpm_mutex);

+               }

+

+               /* Delete all the entries */

+               for (j = si; j < ei; j++) {

+                       pthread_mutex_lock(&lpm_mutex);

+                       if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,

+                              large_ldepth_route_table[j].depth) != 0) {

+                              printf("Failed to delete iteration %d, route# %d\n",

+                                      i, j);

+                       }

+                       pthread_mutex_unlock(&lpm_mutex);

+               }

+       }

+

+       total_cycles = rte_rdtsc_precise() - begin;

+

+       __atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);

+       __atomic_fetch_add(&gwrites,

+                       2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,

+                       __ATOMIC_RELAXED);

+

+       return 0;

+}

+

+/*

+ * Functional test:

+ * 2 writers, rest are readers

+ */

+static int

+test_lpm_rcu_perf_multi_writer(void)

+{

+       struct rte_lpm_config config;

+       size_t sz;

+       unsigned int i;

+       uint16_t core_id;

+       struct rte_lpm_rcu_config rcu_cfg = {0};

+

+       if (rte_lcore_count() < 3) {

+               printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");

+               return TEST_SKIPPED;

+       }

+

+       num_cores = 0;

+       RTE_LCORE_FOREACH_SLAVE(core_id) {

+               enabled_core_ids[num_cores] = core_id;

+               num_cores++;

+       }

+

+       printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",

+               num_cores - 2);

+

+       /* Create LPM table */

+       config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;

+       config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;

+       config.flags = 0;

+       lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);

+       TEST_LPM_ASSERT(lpm != NULL);

+

+       /* Init RCU variable */

+       sz = rte_rcu_qsbr_get_memsize(num_cores);

+       rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,

+                                              RTE_CACHE_LINE_SIZE);

+       rte_rcu_qsbr_init(rv, num_cores);

+

+       rcu_cfg.v = rv;

+       /* Assign the RCU variable to LPM */

+       if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {

+               printf("RCU variable assignment failed\n");

+               goto error;

+       }

+

+       writer_done = 0;

+       __atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);

+       __atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);

+

+       __atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);

+

+       /* Launch reader threads */

+       for (i = 2; i < num_cores; i++)

+               rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,

+                                      enabled_core_ids[i]);

+

+       /* Launch writer threads */

+       for (i = 0; i < 2; i++)

+               rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,

+                                      (void *)(uintptr_t)i,

+                                      enabled_core_ids[i]);

+

+       /* Wait for writer threads */

+       for (i = 0; i < 2; i++)

+               if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)

+                       goto error;

+

+       printf("Total LPM Adds: %d\n",

+               2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);

+       printf("Total LPM Deletes: %d\n",

+               2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);

+       printf("Average LPM Add/Del: %"PRIu64" cycles\n",

+               __atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /

+                       __atomic_load_n(&gwrites, __ATOMIC_RELAXED)

+               );

+

+       /* Wait and check return value from reader threads */

+       writer_done = 1;

+       for (i = 2; i < num_cores; i++)

+               if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)

+                       goto error;

+

+       rte_lpm_free(lpm);

+       rte_free(rv);

+       lpm = NULL;

+       rv = NULL;

+

+       /* Test without RCU integration */

+       printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",

+               num_cores - 2);

+

+       /* Create LPM table */

+       config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;

+       config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;

+       config.flags = 0;

+       lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);

+       TEST_LPM_ASSERT(lpm != NULL);

+

+       writer_done = 0;

+       __atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);

+       __atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);

+       __atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);

+

+       /* Launch reader threads */

+       for (i = 2; i < num_cores; i++)

+               rte_eal_remote_launch(test_lpm_reader, NULL,

+                                      enabled_core_ids[i]);

+

+       /* Launch writer threads */

+       for (i = 0; i < 2; i++)

+               rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,

+                                      (void *)(uintptr_t)i,

+                                      enabled_core_ids[i]);

+

+       /* Wait for writer threads */

+       for (i = 0; i < 2; i++)

+               if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)

+                       goto error;

+

+       printf("Total LPM Adds: %d\n",

+               2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);

+       printf("Total LPM Deletes: %d\n",

+               2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);

+       printf("Average LPM Add/Del: %"PRIu64" cycles\n",

+               __atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /

+                       __atomic_load_n(&gwrites, __ATOMIC_RELAXED)

+               );

+

+       writer_done = 1;

+       /* Wait and check return value from reader threads */

+       for (i = 2; i < num_cores; i++)

+               if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)

+                       goto error;

+

+       rte_lpm_free(lpm);

+

+       return 0;

+

+error:

+       writer_done = 1;

+       /* Wait until all readers have exited */

+       rte_eal_mp_wait_lcore();

+

+       rte_lpm_free(lpm);

+       rte_free(rv);

+

+       return -1;

+}

+

+/*

+ * Functional test:

+ * Single writer, rest are readers

+ */

+static int

+test_lpm_rcu_perf(void)

+{

+       struct rte_lpm_config config;

+       uint64_t begin, total_cycles;

+       size_t sz;

+       unsigned int i, j;

+       uint16_t core_id;

+       uint32_t next_hop_add = 0xAA;

+       struct rte_lpm_rcu_config rcu_cfg = {0};

+

+       if (rte_lcore_count() < 2) {

+               printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");

+               return TEST_SKIPPED;

+       }

+

+       num_cores = 0;

+       RTE_LCORE_FOREACH_SLAVE(core_id) {

+               enabled_core_ids[num_cores] = core_id;

+               num_cores++;

+       }

+

+       printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",

+               num_cores);

+

+       /* Create LPM table */

+       config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;

+       config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;

+       config.flags = 0;

+       lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);

+       TEST_LPM_ASSERT(lpm != NULL);

+

+       /* Init RCU variable */

+       sz = rte_rcu_qsbr_get_memsize(num_cores);

+       rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,

+                                              RTE_CACHE_LINE_SIZE);

+       rte_rcu_qsbr_init(rv, num_cores);

+

+       rcu_cfg.v = rv;

+       /* Assign the RCU variable to LPM */

+       if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {

+               printf("RCU variable assignment failed\n");

+               goto error;

+       }

+

+       writer_done = 0;

+       __atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);

+

+       /* Launch reader threads */

+       for (i = 0; i < num_cores; i++)

+               rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,

+                                      enabled_core_ids[i]);

+

+       /* Measure add/delete. */

+       begin = rte_rdtsc_precise();

+       for (i = 0; i < RCU_ITERATIONS; i++) {

+               /* Add all the entries */

+               for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)

+                       if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,

+                                      large_ldepth_route_table[j].depth,

+                                      next_hop_add) != 0) {

+                              printf("Failed to add iteration %d, route# %d\n",

+                                      i, j);

+                              goto error;

+                       }

+

+               /* Delete all the entries */

+               for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)

+                       if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,

+                              large_ldepth_route_table[j].depth) != 0) {

+                              printf("Failed to delete iteration %d, route# %d\n",

+                                      i, j);

+                              goto error;

+                       }

+       }

+       total_cycles = rte_rdtsc_precise() - begin;

+

+       printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);

+       printf("Total LPM Deletes: %d\n",

+               ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);

+       printf("Average LPM Add/Del: %g cycles\n",

+               (double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));

+

+       writer_done = 1;

+       /* Wait and check return value from reader threads */

+       for (i = 0; i < num_cores; i++)

+               if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)

+                       goto error;

+

+       rte_lpm_free(lpm);

+       rte_free(rv);

+       lpm = NULL;

+       rv = NULL;

+

+       /* Test without RCU integration */

+       printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",

+               num_cores);

+

+       /* Create LPM table */

+       config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;

+       config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;

+       config.flags = 0;

+       lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);

+       TEST_LPM_ASSERT(lpm != NULL);

+

+       writer_done = 0;

+       __atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);

+

+       /* Launch reader threads */

+       for (i = 0; i < num_cores; i++)

+               rte_eal_remote_launch(test_lpm_reader, NULL,

+                                      enabled_core_ids[i]);

+

+       /* Measure add/delete. */

+       begin = rte_rdtsc_precise();

+       for (i = 0; i < RCU_ITERATIONS; i++) {

+               /* Add all the entries */

+               for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)

+                       if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,

+                                      large_ldepth_route_table[j].depth,

+                                      next_hop_add) != 0) {

+                              printf("Failed to add iteration %d, route# %d\n",

+                                      i, j);

+                              goto error;

+                       }

+

+               /* Delete all the entries */

+               for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)

+                       if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,

+                              large_ldepth_route_table[j].depth) != 0) {

+                              printf("Failed to delete iteration %d, route# %d\n",

+                                      i, j);

+                              goto error;

+                       }

+       }

+       total_cycles = rte_rdtsc_precise() - begin;

+

+       printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);

+       printf("Total LPM Deletes: %d\n",

+               ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);

+       printf("Average LPM Add/Del: %g cycles\n",

+               (double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));

+

+       writer_done = 1;

+       /* Wait and check return value from reader threads */

+       for (i = 0; i < num_cores; i++)

+               if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)

+                       printf("Warning: lcore %u not finished.\n",

+                              enabled_core_ids[i]);

+

+       rte_lpm_free(lpm);

+

+       return 0;

+

+error:

+       writer_done = 1;

+       /* Wait until all readers have exited */

+       rte_eal_mp_wait_lcore();

+

+       rte_lpm_free(lpm);

+       rte_free(rv);

+

+       return -1;

+}

+

 static int

 test_lpm_perf(void)

 {

-       struct rte_lpm *lpm = NULL;

        struct rte_lpm_config config;



        config.max_rules = 2000000;

@@ -343,7 +825,7 @@ test_lpm_perf(void)

        lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);

        TEST_LPM_ASSERT(lpm != NULL);



-       /* Measue add. */



unintentional typo?

[Ruifeng] Yes, this is a typo fix. I assume it is OK not to be split out.



+       /* Measure add. */

        begin = rte_rdtsc();



        for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {

@@ -478,6 +960,10 @@ test_lpm_perf(void)

        rte_lpm_delete_all(lpm);

        rte_lpm_free(lpm);



+       test_lpm_rcu_perf();

+

+       test_lpm_rcu_perf_multi_writer();

+

        return 0;

 }





Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com><mailto:vladimir.medvedkin@intel.com>







--

Regards,

Vladimir

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
  2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-08 12:36       ` Medvedkin, Vladimir
@ 2020-07-08 14:30       ` David Marchand
  2020-07-08 15:34         ` Ruifeng Wang
  1 sibling, 1 reply; 137+ messages in thread
From: David Marchand @ 2020-07-08 14:30 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, dev, Ananyev,
	Konstantin, Honnappa Nagarahalli, nd

On Tue, Jul 7, 2020 at 5:16 PM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b9d49ac87..7889f21b3 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
>
>  #ifndef _RTE_LPM_H_
> @@ -20,6 +21,7 @@
>  #include <rte_memory.h>
>  #include <rte_common.h>
>  #include <rte_vect.h>
> +#include <rte_rcu_qsbr.h>
>
>  #ifdef __cplusplus
>  extern "C" {
> @@ -62,6 +64,17 @@ extern "C" {
>  /** Bitmask used to indicate successful lookup */
>  #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
>
> +/** @internal Default RCU defer queue entries to reclaim in one go. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_MAX     16
> +
> +/** RCU reclamation modes */
> +enum rte_lpm_qsbr_mode {
> +       /** Create defer queue for reclaim. */
> +       RTE_LPM_QSBR_MODE_DQ = 0,
> +       /** Use blocking mode reclaim. No defer queue created. */
> +       RTE_LPM_QSBR_MODE_SYNC
> +};
> +
>  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>  /** @internal Tbl24 entry structure. */
>  __extension__
> @@ -130,6 +143,28 @@ struct rte_lpm {
>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +#ifdef ALLOW_EXPERIMENTAL_API
> +       /* RCU config. */
> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> +#endif
> +};

I can see failures in travis reports for v7 and v6.
I reproduced them in my env.

1 function with some indirect sub-type change:

  [C]'function int rte_lpm_add(rte_lpm*, uint32_t, uint8_t, uint32_t)'
at rte_lpm.c:764:1 has some indirect sub-type changes:
    parameter 1 of type 'rte_lpm*' has sub-type changes:
      in pointed to type 'struct rte_lpm' at rte_lpm.h:134:1:
        type size hasn't changed
        3 data member insertions:
          'rte_rcu_qsbr* rte_lpm::v', at offset 536873600 (in bits) at
rte_lpm.h:148:1
          'rte_lpm_qsbr_mode rte_lpm::rcu_mode', at offset 536873664
(in bits) at rte_lpm.h:149:1
          'rte_rcu_qsbr_dq* rte_lpm::dq', at offset 536873728 (in
bits) at rte_lpm.h:150:1


Going back to my proposal of hiding what does not need to be seen.

Disclaimer, *this is quick & dirty* but it builds and passes ABI check:

$ git diff
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index d498ba761..7109aef6a 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -115,6 +115,15 @@ rte_lpm_find_existing(const char *name)
        return l;
 }

+struct internal_lpm {
+       /* Public object */
+       struct rte_lpm lpm;
+       /* RCU config. */
+       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
+       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
+};
+
 /*
  * Allocates memory for LPM object
  */
@@ -123,6 +132,7 @@ rte_lpm_create(const char *name, int socket_id,
                const struct rte_lpm_config *config)
 {
        char mem_name[RTE_LPM_NAMESIZE];
+       struct internal_lpm *internal = NULL;
        struct rte_lpm *lpm = NULL;
        struct rte_tailq_entry *te;
        uint32_t mem_size, rules_size, tbl8s_size;
@@ -141,12 +151,6 @@ rte_lpm_create(const char *name, int socket_id,

        snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);

-       /* Determine the amount of memory to allocate. */
-       mem_size = sizeof(*lpm);
-       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
        rte_mcfg_tailq_write_lock();

        /* guarantee there's no existing */
@@ -170,16 +174,23 @@ rte_lpm_create(const char *name, int socket_id,
                goto exit;
        }

+       /* Determine the amount of memory to allocate. */
+       mem_size = sizeof(*internal);
+       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
        /* Allocate memory to store the LPM data structures. */
-       lpm = rte_zmalloc_socket(mem_name, mem_size,
+       internal = rte_zmalloc_socket(mem_name, mem_size,
                        RTE_CACHE_LINE_SIZE, socket_id);
-       if (lpm == NULL) {
+       if (internal == NULL) {
                RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
                rte_free(te);
                rte_errno = ENOMEM;
                goto exit;
        }

+       lpm = &internal->lpm;
        lpm->rules_tbl = rte_zmalloc_socket(NULL,
                        (size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);

@@ -226,6 +237,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+       struct internal_lpm *internal;
        struct rte_lpm_list *lpm_list;
        struct rte_tailq_entry *te;

@@ -247,8 +259,9 @@ rte_lpm_free(struct rte_lpm *lpm)

        rte_mcfg_tailq_write_unlock();

-       if (lpm->dq)
-               rte_rcu_qsbr_dq_delete(lpm->dq);
+       internal = container_of(lpm, struct internal_lpm, lpm);
+       if (internal->dq != NULL)
+               rte_rcu_qsbr_dq_delete(internal->dq);
        rte_free(lpm->tbl8);
        rte_free(lpm->rules_tbl);
        rte_free(lpm);
@@ -276,13 +289,15 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
rte_lpm_rcu_config *cfg,
 {
        char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
        struct rte_rcu_qsbr_dq_parameters params = {0};
+       struct internal_lpm *internal;

-       if ((lpm == NULL) || (cfg == NULL)) {
+       if (lpm == NULL || cfg == NULL) {
                rte_errno = EINVAL;
                return 1;
        }

-       if (lpm->v) {
+       internal = container_of(lpm, struct internal_lpm, lpm);
+       if (internal->v != NULL) {
                rte_errno = EEXIST;
                return 1;
        }
@@ -305,20 +320,19 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
rte_lpm_rcu_config *cfg,
                params.free_fn = __lpm_rcu_qsbr_free_resource;
                params.p = lpm;
                params.v = cfg->v;
-               lpm->dq = rte_rcu_qsbr_dq_create(&params);
-               if (lpm->dq == NULL) {
-                       RTE_LOG(ERR, LPM,
-                                       "LPM QS defer queue creation failed\n");
+               internal->dq = rte_rcu_qsbr_dq_create(&params);
+               if (internal->dq == NULL) {
+                       RTE_LOG(ERR, LPM, "LPM QS defer queue creation
failed\n");
                        return 1;
                }
                if (dq)
-                       *dq = lpm->dq;
+                       *dq = internal->dq;
        } else {
                rte_errno = EINVAL;
                return 1;
        }
-       lpm->rcu_mode = cfg->mode;
-       lpm->v = cfg->v;
+       internal->rcu_mode = cfg->mode;
+       internal->v = cfg->v;

        return 0;
 }
@@ -502,12 +516,13 @@ _tbl8_alloc(struct rte_lpm *lpm)
 static int32_t
 tbl8_alloc(struct rte_lpm *lpm)
 {
+       struct internal_lpm *internal = container_of(lpm, struct
internal_lpm, lpm);
        int32_t group_idx; /* tbl8 group index. */

        group_idx = _tbl8_alloc(lpm);
-       if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
+       if (group_idx == -ENOSPC && internal->dq != NULL) {
                /* If there are no tbl8 groups try to reclaim one. */
-               if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
+               if (rte_rcu_qsbr_dq_reclaim(internal->dq, 1, NULL,
NULL, NULL) == 0)
                        group_idx = _tbl8_alloc(lpm);
        }

@@ -518,20 +533,21 @@ static void
 tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
        struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+       struct internal_lpm *internal = container_of(lpm, struct
internal_lpm, lpm);

-       if (!lpm->v) {
+       if (internal->v == NULL) {
                /* Set tbl8 group invalid*/
                __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
                                __ATOMIC_RELAXED);
-       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
                /* Wait for quiescent state change. */
-               rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
+               rte_rcu_qsbr_synchronize(internal->v, RTE_QSBR_THRID_INVALID);
                /* Set tbl8 group invalid*/
                __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
                                __ATOMIC_RELAXED);
-       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
                /* Push into QSBR defer queue. */
-               rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
+               rte_rcu_qsbr_dq_enqueue(internal->dq, (void
*)&tbl8_group_start);
        }
 }

diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 7889f21b3..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -143,12 +143,6 @@ struct rte_lpm {
                        __rte_cache_aligned; /**< LPM tbl24 table. */
        struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
        struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
-#ifdef ALLOW_EXPERIMENTAL_API
-       /* RCU config. */
-       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
-       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
-       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
-#endif
 };

 /** LPM RCU QSBR configuration structure. */




-- 
David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
  2020-07-08 14:30       ` David Marchand
@ 2020-07-08 15:34         ` Ruifeng Wang
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-08 15:34 UTC (permalink / raw)
  To: David Marchand
  Cc: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, dev, Ananyev,
	Konstantin, Honnappa Nagarahalli, nd, nd


> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, July 8, 2020 10:30 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>; dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
> 
> On Tue, Jul 7, 2020 at 5:16 PM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > b9d49ac87..7889f21b3 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #ifndef _RTE_LPM_H_
> > @@ -20,6 +21,7 @@
> >  #include <rte_memory.h>
> >  #include <rte_common.h>
> >  #include <rte_vect.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> > @@ -62,6 +64,17 @@ extern "C" {
> >  /** Bitmask used to indicate successful lookup */
> >  #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
> >
> > +/** @internal Default RCU defer queue entries to reclaim in one go. */
> > +#define RTE_LPM_RCU_DQ_RECLAIM_MAX     16
> > +
> > +/** RCU reclamation modes */
> > +enum rte_lpm_qsbr_mode {
> > +       /** Create defer queue for reclaim. */
> > +       RTE_LPM_QSBR_MODE_DQ = 0,
> > +       /** Use blocking mode reclaim. No defer queue created. */
> > +       RTE_LPM_QSBR_MODE_SYNC
> > +};
> > +
> >  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> >  /** @internal Tbl24 entry structure. */  __extension__ @@ -130,6
> > +143,28 @@ struct rte_lpm {
> >                         __rte_cache_aligned; /**< LPM tbl24 table. */
> >         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       /* RCU config. */
> > +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> > +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> > +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> > +#endif
> > +};
> 
> I can see failures in travis reports for v7 and v6.
> I reproduced them in my env.
> 
> 1 function with some indirect sub-type change:
> 
>   [C]'function int rte_lpm_add(rte_lpm*, uint32_t, uint8_t, uint32_t)'
> at rte_lpm.c:764:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_lpm*' has sub-type changes:
>       in pointed to type 'struct rte_lpm' at rte_lpm.h:134:1:
>         type size hasn't changed
>         3 data member insertions:
>           'rte_rcu_qsbr* rte_lpm::v', at offset 536873600 (in bits) at
> rte_lpm.h:148:1
>           'rte_lpm_qsbr_mode rte_lpm::rcu_mode', at offset 536873664 (in bits)
> at rte_lpm.h:149:1
>           'rte_rcu_qsbr_dq* rte_lpm::dq', at offset 536873728 (in
> bits) at rte_lpm.h:150:1
> 
Sorry, I thought if ALLOW_EXPERIMENTAL was added, ABI would be kept when experimental was not allowed by user.
ABI and ALLOW_EXPERIMENTAL should be two different things.

> 
> Going back to my proposal of hiding what does not need to be seen.
> 
> Disclaimer, *this is quick & dirty* but it builds and passes ABI check:
> 
> $ git diff
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> d498ba761..7109aef6a 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
I understand your proposal in v5 now. A new data structure encloses rte_lpm and new members that for RCU use.
In this way, rte_lpm ABI is kept. And we can move out other members in rte_lpm that not need to be exposed in 20.11 release.
I will fix the ABI issue in next version.

> @@ -115,6 +115,15 @@ rte_lpm_find_existing(const char *name)
>         return l;
>  }
> 
> +struct internal_lpm {
> +       /* Public object */
> +       struct rte_lpm lpm;
> +       /* RCU config. */
> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> +};
> +
>  /*
>   * Allocates memory for LPM object
>   */
> @@ -123,6 +132,7 @@ rte_lpm_create(const char *name, int socket_id,
>                 const struct rte_lpm_config *config)  {
>         char mem_name[RTE_LPM_NAMESIZE];
> +       struct internal_lpm *internal = NULL;
>         struct rte_lpm *lpm = NULL;
>         struct rte_tailq_entry *te;
>         uint32_t mem_size, rules_size, tbl8s_size; @@ -141,12 +151,6 @@
> rte_lpm_create(const char *name, int socket_id,
> 
>         snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
> 
> -       /* Determine the amount of memory to allocate. */
> -       mem_size = sizeof(*lpm);
> -       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> -       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> -                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config-
> >number_tbl8s);
> -
>         rte_mcfg_tailq_write_lock();
> 
>         /* guarantee there's no existing */ @@ -170,16 +174,23 @@
> rte_lpm_create(const char *name, int socket_id,
>                 goto exit;
>         }
> 
> +       /* Determine the amount of memory to allocate. */
> +       mem_size = sizeof(*internal);
> +       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> +       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> +                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> + config->number_tbl8s);
> +
>         /* Allocate memory to store the LPM data structures. */
> -       lpm = rte_zmalloc_socket(mem_name, mem_size,
> +       internal = rte_zmalloc_socket(mem_name, mem_size,
>                         RTE_CACHE_LINE_SIZE, socket_id);
> -       if (lpm == NULL) {
> +       if (internal == NULL) {
>                 RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
>                 rte_free(te);
>                 rte_errno = ENOMEM;
>                 goto exit;
>         }
> 
> +       lpm = &internal->lpm;
>         lpm->rules_tbl = rte_zmalloc_socket(NULL,
>                         (size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
> 
> @@ -226,6 +237,7 @@ rte_lpm_create(const char *name, int socket_id,
> void  rte_lpm_free(struct rte_lpm *lpm)  {
> +       struct internal_lpm *internal;
>         struct rte_lpm_list *lpm_list;
>         struct rte_tailq_entry *te;
> 
> @@ -247,8 +259,9 @@ rte_lpm_free(struct rte_lpm *lpm)
> 
>         rte_mcfg_tailq_write_unlock();
> 
> -       if (lpm->dq)
> -               rte_rcu_qsbr_dq_delete(lpm->dq);
> +       internal = container_of(lpm, struct internal_lpm, lpm);
> +       if (internal->dq != NULL)
> +               rte_rcu_qsbr_dq_delete(internal->dq);
>         rte_free(lpm->tbl8);
>         rte_free(lpm->rules_tbl);
>         rte_free(lpm);
> @@ -276,13 +289,15 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
> rte_lpm_rcu_config *cfg,  {
>         char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
>         struct rte_rcu_qsbr_dq_parameters params = {0};
> +       struct internal_lpm *internal;
> 
> -       if ((lpm == NULL) || (cfg == NULL)) {
> +       if (lpm == NULL || cfg == NULL) {
>                 rte_errno = EINVAL;
>                 return 1;
>         }
> 
> -       if (lpm->v) {
> +       internal = container_of(lpm, struct internal_lpm, lpm);
> +       if (internal->v != NULL) {
>                 rte_errno = EEXIST;
>                 return 1;
>         }
> @@ -305,20 +320,19 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
> rte_lpm_rcu_config *cfg,
>                 params.free_fn = __lpm_rcu_qsbr_free_resource;
>                 params.p = lpm;
>                 params.v = cfg->v;
> -               lpm->dq = rte_rcu_qsbr_dq_create(&params);
> -               if (lpm->dq == NULL) {
> -                       RTE_LOG(ERR, LPM,
> -                                       "LPM QS defer queue creation failed\n");
> +               internal->dq = rte_rcu_qsbr_dq_create(&params);
> +               if (internal->dq == NULL) {
> +                       RTE_LOG(ERR, LPM, "LPM QS defer queue creation
> failed\n");
>                         return 1;
>                 }
>                 if (dq)
> -                       *dq = lpm->dq;
> +                       *dq = internal->dq;
>         } else {
>                 rte_errno = EINVAL;
>                 return 1;
>         }
> -       lpm->rcu_mode = cfg->mode;
> -       lpm->v = cfg->v;
> +       internal->rcu_mode = cfg->mode;
> +       internal->v = cfg->v;
> 
>         return 0;
>  }
> @@ -502,12 +516,13 @@ _tbl8_alloc(struct rte_lpm *lpm)  static int32_t
> tbl8_alloc(struct rte_lpm *lpm)  {
> +       struct internal_lpm *internal = container_of(lpm, struct
> internal_lpm, lpm);
>         int32_t group_idx; /* tbl8 group index. */
> 
>         group_idx = _tbl8_alloc(lpm);
> -       if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
> +       if (group_idx == -ENOSPC && internal->dq != NULL) {
>                 /* If there are no tbl8 groups try to reclaim one. */
> -               if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
> +               if (rte_rcu_qsbr_dq_reclaim(internal->dq, 1, NULL,
> NULL, NULL) == 0)
>                         group_idx = _tbl8_alloc(lpm);
>         }
> 
> @@ -518,20 +533,21 @@ static void
>  tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)  {
>         struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +       struct internal_lpm *internal = container_of(lpm, struct
> internal_lpm, lpm);
> 
> -       if (!lpm->v) {
> +       if (internal->v == NULL) {
>                 /* Set tbl8 group invalid*/
>                 __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
>                                 __ATOMIC_RELAXED);
> -       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> +       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
>                 /* Wait for quiescent state change. */
> -               rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
> +               rte_rcu_qsbr_synchronize(internal->v,
> + RTE_QSBR_THRID_INVALID);
>                 /* Set tbl8 group invalid*/
>                 __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
>                                 __ATOMIC_RELAXED);
> -       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> +       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
>                 /* Push into QSBR defer queue. */
> -               rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
> +               rte_rcu_qsbr_dq_enqueue(internal->dq, (void
> *)&tbl8_group_start);
>         }
>  }
> 
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> 7889f21b3..a9568fcdd 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -143,12 +143,6 @@ struct rte_lpm {
>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */ -#ifdef
> ALLOW_EXPERIMENTAL_API
> -       /* RCU config. */
> -       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> -       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> -       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> -#endif
>  };
> 
>  /** LPM RCU QSBR configuration structure. */
> 
> 
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (11 preceding siblings ...)
  2020-07-07 15:15   ` [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09  8:02   ` Ruifeng Wang
  2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                       ` (2 more replies)
  2020-07-09 15:42   ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-10  2:22   ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
  14 siblings, 3 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09  8:02 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 167 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 28 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR
  2020-07-09  8:02   ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09  8:02     ` Ruifeng Wang
  2020-07-09 11:49       ` David Marchand
  2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
  2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09  8:02 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct warps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 167 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  53 +++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 237 insertions(+), 24 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..4fbf5b6df 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -39,6 +40,17 @@ enum valid_flag {
 	VALID
 };
 
+/** @internal LPM structure. */
+struct __rte_lpm {
+	/* LPM metadata. */
+	struct rte_lpm lpm;
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
 	char mem_name[RTE_LPM_NAMESIZE];
+	struct __rte_lpm *internal_lpm = NULL;
 	struct rte_lpm *lpm = NULL;
 	struct rte_tailq_entry *te;
 	uint32_t mem_size, rules_size, tbl8s_size;
@@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
 
 	snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
 
-	/* Determine the amount of memory to allocate. */
-	mem_size = sizeof(*lpm);
-	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
 	rte_mcfg_tailq_write_lock();
 
 	/* guarantee there's no existing */
@@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
 		goto exit;
 	}
 
+	/* Determine the amount of memory to allocate. */
+	mem_size = sizeof(*internal_lpm);
+	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
 	/* allocate tailq entry */
 	te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
 	if (te == NULL) {
@@ -170,22 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
 	}
 
 	/* Allocate memory to store the LPM data structures. */
-	lpm = rte_zmalloc_socket(mem_name, mem_size,
+	internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
 			RTE_CACHE_LINE_SIZE, socket_id);
-	if (lpm == NULL) {
+	if (internal_lpm == NULL) {
 		RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
 
+	lpm = &internal_lpm->lpm;
 	lpm->rules_tbl = rte_zmalloc_socket(NULL,
 			(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
 
 	if (lpm->rules_tbl == NULL) {
 		RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
-		rte_free(lpm);
-		lpm = NULL;
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
@@ -197,8 +211,8 @@ rte_lpm_create(const char *name, int socket_id,
 	if (lpm->tbl8 == NULL) {
 		RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
 		rte_free(lpm->rules_tbl);
-		rte_free(lpm);
-		lpm = NULL;
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
@@ -225,6 +239,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm_list *lpm_list;
 	struct rte_tailq_entry *te;
 
@@ -246,12 +261,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->dq)
+		rte_rcu_qsbr_dq_delete(internal_lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	struct __rte_lpm *internal_lpm;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if (lpm == NULL || cfg == NULL) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->v != NULL) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		internal_lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (internal_lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM, "LPM defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = internal_lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	internal_lpm->rcu_mode = cfg->mode;
+	internal_lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +481,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +515,47 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if (group_idx == -ENOSPC && internal_lpm->dq != NULL) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(internal_lpm->dq, 1,
+				NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (internal_lpm->v == NULL) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(internal_lpm->v,
+			RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(internal_lpm->dq,
+				(void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +644,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +690,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1098,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1114,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -132,6 +145,22 @@ struct rte_lpm {
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
 };
 
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
+};
+
 /**
  * Create an LPM object.
  *
@@ -179,6 +208,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v8 2/3] test/lpm: add LPM RCU integration functional tests
  2020-07-09  8:02   ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-09  8:02     ` Ruifeng Wang
  2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09  8:02 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 app/test/test_lpm.c | 291 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 290 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index 3a3fd097f..8330501f0 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,9 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
+static int32_t test21(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +65,10 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20,
+	test21
 };
 
 #define MAX_DEPTH 32
@@ -1265,6 +1272,288 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check returns
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	/* Invalid QSBR mode */
+	rcu_cfg.mode = 2;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	rcu_cfg.v = qsv2;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add DQ mode functional test.
+ * Reader and writer are in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 0, 2, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
+static struct rte_lpm *g_lpm;
+static struct rte_rcu_qsbr *g_v;
+static uint32_t g_ip = RTE_IPV4(192, 0, 2, 100);
+static volatile uint8_t writer_done;
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+#define WRITER_ITERATIONS	512
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(g_v, 0);
+	rte_rcu_qsbr_thread_online(g_v, 0);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(g_v, 0);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(g_v, 0);
+	rte_rcu_qsbr_thread_unregister(g_v, 0);
+
+	return 0;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add sync mode functional test.
+ * 1 Reader and 1 writer. They cannot be in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable with sync mode to LPM
+ *  - Register a reader thread. Reader keeps looking up a specific rule.
+ *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)
+ */
+int32_t
+test21(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	int32_t status;
+	uint32_t i, next_hop;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for %s, expecting at least 2\n",
+			__func__);
+		return TEST_SKIPPED;
+	}
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(g_lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(g_v != NULL);
+
+	status = rte_rcu_qsbr_init(g_v, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = g_v;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	writer_done = 0;
+	/* Launch reader thread */
+	rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+				rte_get_next_lcore(-1, 1, 0));
+
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+	if (status != 0) {
+		printf("%s: Failed to add rule\n", __func__);
+		goto error;
+	}
+
+	/* Writer update */
+	for (i = 0; i < WRITER_ITERATIONS; i++) {
+		status = rte_lpm_delete(g_lpm, g_ip, depth);
+		if (status != 0) {
+			printf("%s: Failed to delete rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+
+		status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+		if (status != 0) {
+			printf("%s: Failed to add rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+	}
+
+error:
+	writer_done = 1;
+	/* Wait until reader exited. */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(g_lpm);
+	rte_free(g_v);
+
+	return (status == 0) ? PASS : -1;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v8 3/3] test/lpm: add RCU integration performance tests
  2020-07-09  8:02   ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
@ 2020-07-09  8:02     ` Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09  8:02 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 489 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 489719c40..dfe186426 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,27 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static uint64_t gwrite_cycles;
+static uint64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +40,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +52,13 @@ struct route_rule {
 };
 
 static struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	RTE_SET_USED(arg);
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	__atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,
+			__ATOMIC_RELAXED);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +825,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +960,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR
  2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-09 11:49       ` David Marchand
  2020-07-09 14:35         ` Ruifeng Wang
  0 siblings, 1 reply; 137+ messages in thread
From: David Marchand @ 2020-07-09 11:49 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, dev, Ananyev,
	Konstantin, Honnappa Nagarahalli, nd

Hello Ruifeng,

On Thu, Jul 9, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> index 38ab512a4..4fbf5b6df 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
>
>  #include <string.h>
> @@ -39,6 +40,17 @@ enum valid_flag {
>         VALID
>  };
>
> +/** @internal LPM structure. */
> +struct __rte_lpm {
> +       /* LPM metadata. */
> +       struct rte_lpm lpm;
> +
> +       /* RCU config. */
> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> +};
> +
>  /* Macro to enable/disable run-time checks. */
>  #if defined(RTE_LIBRTE_LPM_DEBUG)
>  #include <rte_debug.h>
> @@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
>                 const struct rte_lpm_config *config)
>  {
>         char mem_name[RTE_LPM_NAMESIZE];
> +       struct __rte_lpm *internal_lpm = NULL;

Nit: internal_lpm does not need to be initialised to NULL.


>         struct rte_lpm *lpm = NULL;
>         struct rte_tailq_entry *te;
>         uint32_t mem_size, rules_size, tbl8s_size;
> @@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
>
>         snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
>
> -       /* Determine the amount of memory to allocate. */
> -       mem_size = sizeof(*lpm);
> -       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> -       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> -                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
> -
>         rte_mcfg_tailq_write_lock();
>
>         /* guarantee there's no existing */
> @@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
>                 goto exit;
>         }
>
> +       /* Determine the amount of memory to allocate. */
> +       mem_size = sizeof(*internal_lpm);
> +       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> +       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> +                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
> +
>         /* allocate tailq entry */
>         te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
>         if (te == NULL) {
> @@ -170,22 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
>         }
>
>         /* Allocate memory to store the LPM data structures. */
> -       lpm = rte_zmalloc_socket(mem_name, mem_size,
> +       internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
>                         RTE_CACHE_LINE_SIZE, socket_id);
> -       if (lpm == NULL) {
> +       if (internal_lpm == NULL) {
>                 RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
>                 rte_free(te);
>                 rte_errno = ENOMEM;
>                 goto exit;
>         }
>
> +       lpm = &internal_lpm->lpm;

From this point...

>         lpm->rules_tbl = rte_zmalloc_socket(NULL,
>                         (size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
>
>         if (lpm->rules_tbl == NULL) {
>                 RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
> -               rte_free(lpm);
> -               lpm = NULL;
> +               rte_free(internal_lpm);
> +               internal_lpm = NULL;

... lpm is set to &internal_lpm->lpm and will be returned by jumping
to the exit label.
So freeing internal_lpm is necessary, but the lpm variable must be set
to NULL too.


>                 rte_free(te);
>                 rte_errno = ENOMEM;
>                 goto exit;
> @@ -197,8 +211,8 @@ rte_lpm_create(const char *name, int socket_id,
>         if (lpm->tbl8 == NULL) {
>                 RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
>                 rte_free(lpm->rules_tbl);
> -               rte_free(lpm);
> -               lpm = NULL;
> +               rte_free(internal_lpm);
> +               internal_lpm = NULL;

Ditto.


>                 rte_free(te);
>                 rte_errno = ENOMEM;
>                 goto exit;

-- 
David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR
  2020-07-09 11:49       ` David Marchand
@ 2020-07-09 14:35         ` Ruifeng Wang
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09 14:35 UTC (permalink / raw)
  To: David Marchand
  Cc: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, dev, Ananyev,
	Konstantin, Honnappa Nagarahalli, nd, nd


> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, July 9, 2020 7:50 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>; dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR
> 
> Hello Ruifeng,
> 
Hi David,

Thanks for your review and suggestions.

> On Thu, Jul 9, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 38ab512a4..4fbf5b6df 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #include <string.h>
> > @@ -39,6 +40,17 @@ enum valid_flag {
> >         VALID
> >  };
> >
> > +/** @internal LPM structure. */
> > +struct __rte_lpm {
> > +       /* LPM metadata. */
> > +       struct rte_lpm lpm;
> > +
> > +       /* RCU config. */
> > +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> > +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> > +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> > +};
> > +
> >  /* Macro to enable/disable run-time checks. */  #if
> > defined(RTE_LIBRTE_LPM_DEBUG)  #include <rte_debug.h> @@ -122,6
> +134,7
> > @@ rte_lpm_create(const char *name, int socket_id,
> >                 const struct rte_lpm_config *config)  {
> >         char mem_name[RTE_LPM_NAMESIZE];
> > +       struct __rte_lpm *internal_lpm = NULL;
> 
> Nit: internal_lpm does not need to be initialised to NULL.
> 
Agreed.

> 
> >         struct rte_lpm *lpm = NULL;
> >         struct rte_tailq_entry *te;
> >         uint32_t mem_size, rules_size, tbl8s_size; @@ -140,12 +153,6
> > @@ rte_lpm_create(const char *name, int socket_id,
> >
> >         snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
> >
> > -       /* Determine the amount of memory to allocate. */
> > -       mem_size = sizeof(*lpm);
> > -       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> > -       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> > -                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config-
> >number_tbl8s);
> > -
> >         rte_mcfg_tailq_write_lock();
> >
> >         /* guarantee there's no existing */ @@ -161,6 +168,12 @@
> > rte_lpm_create(const char *name, int socket_id,
> >                 goto exit;
> >         }
> >
> > +       /* Determine the amount of memory to allocate. */
> > +       mem_size = sizeof(*internal_lpm);
> > +       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> > +       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> > +                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> > + config->number_tbl8s);
> > +
> >         /* allocate tailq entry */
> >         te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
> >         if (te == NULL) {
> > @@ -170,22 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
> >         }
> >
> >         /* Allocate memory to store the LPM data structures. */
> > -       lpm = rte_zmalloc_socket(mem_name, mem_size,
> > +       internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
> >                         RTE_CACHE_LINE_SIZE, socket_id);
> > -       if (lpm == NULL) {
> > +       if (internal_lpm == NULL) {
> >                 RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
> >                 rte_free(te);
> >                 rte_errno = ENOMEM;
> >                 goto exit;
> >         }
> >
> > +       lpm = &internal_lpm->lpm;
> 
> From this point...
> 
> >         lpm->rules_tbl = rte_zmalloc_socket(NULL,
> >                         (size_t)rules_size, RTE_CACHE_LINE_SIZE,
> > socket_id);
> >
> >         if (lpm->rules_tbl == NULL) {
> >                 RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
> > -               rte_free(lpm);
> > -               lpm = NULL;
> > +               rte_free(internal_lpm);
> > +               internal_lpm = NULL;
> 
> ... lpm is set to &internal_lpm->lpm and will be returned by jumping to the
> exit label.
> So freeing internal_lpm is necessary, but the lpm variable must be set to
> NULL too.
> 
Yes. lpm will be returned instead of internal_lpm. It must be set to NULL.
Will add this in next version.

> 
> >                 rte_free(te);
> >                 rte_errno = ENOMEM;
> >                 goto exit;
> > @@ -197,8 +211,8 @@ rte_lpm_create(const char *name, int socket_id,
> >         if (lpm->tbl8 == NULL) {
> >                 RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
> >                 rte_free(lpm->rules_tbl);
> > -               rte_free(lpm);
> > -               lpm = NULL;
> > +               rte_free(internal_lpm);
> > +               internal_lpm = NULL;
> 
> Ditto.
> 
Will do.
Thanks.

> 
> >                 rte_free(te);
> >                 rte_errno = ENOMEM;
> >                 goto exit;
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (12 preceding siblings ...)
  2020-07-09  8:02   ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09 15:42   ` Ruifeng Wang
  2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                       ` (2 more replies)
  2020-07-10  2:22   ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
  14 siblings, 3 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09 15:42 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v9:
Cleared lpm when allocation failed. (David)

v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.


Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 26 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR
  2020-07-09 15:42   ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09 15:42     ` Ruifeng Wang
  2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
  2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09 15:42 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct wraps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  53 +++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 237 insertions(+), 22 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..2d687c372 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -39,6 +40,17 @@ enum valid_flag {
 	VALID
 };
 
+/** @internal LPM structure. */
+struct __rte_lpm {
+	/* LPM metadata. */
+	struct rte_lpm lpm;
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
 	char mem_name[RTE_LPM_NAMESIZE];
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm *lpm = NULL;
 	struct rte_tailq_entry *te;
 	uint32_t mem_size, rules_size, tbl8s_size;
@@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
 
 	snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
 
-	/* Determine the amount of memory to allocate. */
-	mem_size = sizeof(*lpm);
-	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
 	rte_mcfg_tailq_write_lock();
 
 	/* guarantee there's no existing */
@@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
 		goto exit;
 	}
 
+	/* Determine the amount of memory to allocate. */
+	mem_size = sizeof(*internal_lpm);
+	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
 	/* allocate tailq entry */
 	te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
 	if (te == NULL) {
@@ -170,21 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
 	}
 
 	/* Allocate memory to store the LPM data structures. */
-	lpm = rte_zmalloc_socket(mem_name, mem_size,
+	internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
 			RTE_CACHE_LINE_SIZE, socket_id);
-	if (lpm == NULL) {
+	if (internal_lpm == NULL) {
 		RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
 
+	lpm = &internal_lpm->lpm;
 	lpm->rules_tbl = rte_zmalloc_socket(NULL,
 			(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
 
 	if (lpm->rules_tbl == NULL) {
 		RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -197,7 +212,8 @@ rte_lpm_create(const char *name, int socket_id,
 	if (lpm->tbl8 == NULL) {
 		RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
 		rte_free(lpm->rules_tbl);
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -225,6 +241,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm_list *lpm_list;
 	struct rte_tailq_entry *te;
 
@@ -246,12 +263,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->dq)
+		rte_rcu_qsbr_dq_delete(internal_lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	struct __rte_lpm *internal_lpm;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if (lpm == NULL || cfg == NULL) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->v != NULL) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		internal_lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (internal_lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM, "LPM defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = internal_lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	internal_lpm->rcu_mode = cfg->mode;
+	internal_lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +483,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +517,47 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if (group_idx == -ENOSPC && internal_lpm->dq != NULL) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(internal_lpm->dq, 1,
+				NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (internal_lpm->v == NULL) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(internal_lpm->v,
+			RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(internal_lpm->dq,
+				(void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +646,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +692,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1100,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1116,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -132,6 +145,22 @@ struct rte_lpm {
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
 };
 
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
+};
+
 /**
  * Create an LPM object.
  *
@@ -179,6 +208,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v9 2/3] test/lpm: add LPM RCU integration functional tests
  2020-07-09 15:42   ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-09 15:42     ` Ruifeng Wang
  2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09 15:42 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 app/test/test_lpm.c | 291 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 290 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index 3a3fd097f..8330501f0 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,9 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
+static int32_t test21(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +65,10 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20,
+	test21
 };
 
 #define MAX_DEPTH 32
@@ -1265,6 +1272,288 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check returns
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	/* Invalid QSBR mode */
+	rcu_cfg.mode = 2;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	rcu_cfg.v = qsv2;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add DQ mode functional test.
+ * Reader and writer are in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 0, 2, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
+static struct rte_lpm *g_lpm;
+static struct rte_rcu_qsbr *g_v;
+static uint32_t g_ip = RTE_IPV4(192, 0, 2, 100);
+static volatile uint8_t writer_done;
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+#define WRITER_ITERATIONS	512
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(g_v, 0);
+	rte_rcu_qsbr_thread_online(g_v, 0);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(g_v, 0);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(g_v, 0);
+	rte_rcu_qsbr_thread_unregister(g_v, 0);
+
+	return 0;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add sync mode functional test.
+ * 1 Reader and 1 writer. They cannot be in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable with sync mode to LPM
+ *  - Register a reader thread. Reader keeps looking up a specific rule.
+ *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)
+ */
+int32_t
+test21(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	int32_t status;
+	uint32_t i, next_hop;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for %s, expecting at least 2\n",
+			__func__);
+		return TEST_SKIPPED;
+	}
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(g_lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(g_v != NULL);
+
+	status = rte_rcu_qsbr_init(g_v, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = g_v;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	writer_done = 0;
+	/* Launch reader thread */
+	rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+				rte_get_next_lcore(-1, 1, 0));
+
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+	if (status != 0) {
+		printf("%s: Failed to add rule\n", __func__);
+		goto error;
+	}
+
+	/* Writer update */
+	for (i = 0; i < WRITER_ITERATIONS; i++) {
+		status = rte_lpm_delete(g_lpm, g_ip, depth);
+		if (status != 0) {
+			printf("%s: Failed to delete rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+
+		status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+		if (status != 0) {
+			printf("%s: Failed to add rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+	}
+
+error:
+	writer_done = 1;
+	/* Wait until reader exited. */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(g_lpm);
+	rte_free(g_v);
+
+	return (status == 0) ? PASS : -1;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v9 3/3] test/lpm: add RCU integration performance tests
  2020-07-09 15:42   ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
@ 2020-07-09 15:42     ` Ruifeng Wang
  2 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-09 15:42 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 489 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 489719c40..dfe186426 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,27 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static uint64_t gwrite_cycles;
+static uint64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +40,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +52,13 @@ struct route_rule {
 };
 
 static struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	RTE_SET_USED(arg);
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	__atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,
+			__ATOMIC_RELAXED);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +825,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +960,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library
  2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
                     ` (13 preceding siblings ...)
  2020-07-09 15:42   ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-10  2:22   ` Ruifeng Wang
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
                       ` (3 more replies)
  14 siblings, 4 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-10  2:22 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v10:
Added missing Acked-by tags.

v9:
Cleared lpm when allocation failed. (David)

v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 26 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR
  2020-07-10  2:22   ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-10  2:22     ` Ruifeng Wang
  2020-07-10  2:29       ` Ruifeng Wang
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-10  2:22 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct wraps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  53 +++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 237 insertions(+), 22 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..2d687c372 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -39,6 +40,17 @@ enum valid_flag {
 	VALID
 };
 
+/** @internal LPM structure. */
+struct __rte_lpm {
+	/* LPM metadata. */
+	struct rte_lpm lpm;
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
 	char mem_name[RTE_LPM_NAMESIZE];
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm *lpm = NULL;
 	struct rte_tailq_entry *te;
 	uint32_t mem_size, rules_size, tbl8s_size;
@@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
 
 	snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
 
-	/* Determine the amount of memory to allocate. */
-	mem_size = sizeof(*lpm);
-	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
 	rte_mcfg_tailq_write_lock();
 
 	/* guarantee there's no existing */
@@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
 		goto exit;
 	}
 
+	/* Determine the amount of memory to allocate. */
+	mem_size = sizeof(*internal_lpm);
+	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
 	/* allocate tailq entry */
 	te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
 	if (te == NULL) {
@@ -170,21 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
 	}
 
 	/* Allocate memory to store the LPM data structures. */
-	lpm = rte_zmalloc_socket(mem_name, mem_size,
+	internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
 			RTE_CACHE_LINE_SIZE, socket_id);
-	if (lpm == NULL) {
+	if (internal_lpm == NULL) {
 		RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
 
+	lpm = &internal_lpm->lpm;
 	lpm->rules_tbl = rte_zmalloc_socket(NULL,
 			(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
 
 	if (lpm->rules_tbl == NULL) {
 		RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -197,7 +212,8 @@ rte_lpm_create(const char *name, int socket_id,
 	if (lpm->tbl8 == NULL) {
 		RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
 		rte_free(lpm->rules_tbl);
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -225,6 +241,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm_list *lpm_list;
 	struct rte_tailq_entry *te;
 
@@ -246,12 +263,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->dq)
+		rte_rcu_qsbr_dq_delete(internal_lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	struct __rte_lpm *internal_lpm;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if (lpm == NULL || cfg == NULL) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->v != NULL) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		internal_lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (internal_lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM, "LPM defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = internal_lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	internal_lpm->rcu_mode = cfg->mode;
+	internal_lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +483,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +517,47 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if (group_idx == -ENOSPC && internal_lpm->dq != NULL) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(internal_lpm->dq, 1,
+				NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (internal_lpm->v == NULL) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(internal_lpm->v,
+			RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(internal_lpm->dq,
+				(void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +646,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +692,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1100,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1116,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -132,6 +145,22 @@ struct rte_lpm {
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
 };
 
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
+};
+
 /**
  * Create an LPM object.
  *
@@ -179,6 +208,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v10 2/3] test/lpm: add LPM RCU integration functional tests
  2020-07-10  2:22   ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-10  2:22     ` Ruifeng Wang
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
  2020-07-10 12:21     ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library David Marchand
  3 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-10  2:22 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Add positive and negative tests for API rte_lpm_rcu_qsbr_add.
Also test LPM library behavior when RCU QSBR is enabled.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 app/test/test_lpm.c | 291 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 290 insertions(+), 1 deletion(-)

diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c
index 3a3fd097f..8330501f0 100644
--- a/app/test/test_lpm.c
+++ b/app/test/test_lpm.c
@@ -8,6 +8,7 @@
 
 #include <rte_ip.h>
 #include <rte_lpm.h>
+#include <rte_malloc.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
@@ -40,6 +41,9 @@ static int32_t test15(void);
 static int32_t test16(void);
 static int32_t test17(void);
 static int32_t test18(void);
+static int32_t test19(void);
+static int32_t test20(void);
+static int32_t test21(void);
 
 rte_lpm_test tests[] = {
 /* Test Cases */
@@ -61,7 +65,10 @@ rte_lpm_test tests[] = {
 	test15,
 	test16,
 	test17,
-	test18
+	test18,
+	test19,
+	test20,
+	test21
 };
 
 #define MAX_DEPTH 32
@@ -1265,6 +1272,288 @@ test18(void)
 	return PASS;
 }
 
+/*
+ * rte_lpm_rcu_qsbr_add positive and negative tests.
+ *  - Add RCU QSBR variable to LPM
+ *  - Add another RCU QSBR variable to LPM
+ *  - Check returns
+ */
+int32_t
+test19(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	struct rte_rcu_qsbr *qsv2;
+	int32_t status;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(RTE_MAX_LCORE);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, RTE_MAX_LCORE);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	/* Invalid QSBR mode */
+	rcu_cfg.mode = 2;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	/* Create and attach another RCU QSBR to LPM table */
+	qsv2 = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+					RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv2 != NULL);
+
+	rcu_cfg.v = qsv2;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status != 0);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+	rte_free(qsv2);
+
+	return PASS;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add DQ mode functional test.
+ * Reader and writer are in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable to LPM
+ *  - Add a rule with depth=28 (> 24)
+ *  - Register a reader thread (not a real thread)
+ *  - Reader lookup existing rule
+ *  - Writer delete the rule
+ *  - Reader lookup the rule
+ *  - Writer re-add the rule (no available tbl8 group)
+ *  - Reader report quiescent state and unregister
+ *  - Writer re-add the rule
+ *  - Reader lookup the rule
+ */
+int32_t
+test20(void)
+{
+	struct rte_lpm *lpm = NULL;
+	struct rte_lpm_config config;
+	size_t sz;
+	struct rte_rcu_qsbr *qsv;
+	int32_t status;
+	uint32_t ip, next_hop, next_hop_return;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	qsv = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(qsv != NULL);
+
+	status = rte_rcu_qsbr_init(qsv, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = qsv;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_DQ;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	ip = RTE_IPV4(192, 0, 2, 100);
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(lpm->tbl24[ip>>8].valid_group);
+
+	/* Register pseudo reader */
+	status = rte_rcu_qsbr_thread_register(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+	rte_rcu_qsbr_thread_online(qsv, 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	/* Writer update */
+	status = rte_lpm_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(!lpm->tbl24[ip>>8].valid);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status != 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status != 0);
+
+	/* Reader quiescent */
+	rte_rcu_qsbr_quiescent(qsv, 0);
+
+	status = rte_lpm_add(lpm, ip, depth, next_hop);
+	TEST_LPM_ASSERT(status == 0);
+
+	rte_rcu_qsbr_thread_offline(qsv, 0);
+	status = rte_rcu_qsbr_thread_unregister(qsv, 0);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT(status == 0);
+	TEST_LPM_ASSERT(next_hop_return == next_hop);
+
+	rte_lpm_free(lpm);
+	rte_free(qsv);
+
+	return PASS;
+}
+
+static struct rte_lpm *g_lpm;
+static struct rte_rcu_qsbr *g_v;
+static uint32_t g_ip = RTE_IPV4(192, 0, 2, 100);
+static volatile uint8_t writer_done;
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+#define WRITER_ITERATIONS	512
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(g_v, 0);
+	rte_rcu_qsbr_thread_online(g_v, 0);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(g_lpm, g_ip, &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(g_v, 0);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(g_v, 0);
+	rte_rcu_qsbr_thread_unregister(g_v, 0);
+
+	return 0;
+}
+
+/*
+ * rte_lpm_rcu_qsbr_add sync mode functional test.
+ * 1 Reader and 1 writer. They cannot be in the same thread in this test.
+ *  - Create LPM which supports 1 tbl8 group at max
+ *  - Add RCU QSBR variable with sync mode to LPM
+ *  - Register a reader thread. Reader keeps looking up a specific rule.
+ *  - Writer keeps adding and deleting a specific rule with depth=28 (> 24)
+ */
+int32_t
+test21(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	int32_t status;
+	uint32_t i, next_hop;
+	uint8_t depth;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for %s, expecting at least 2\n",
+			__func__);
+		return TEST_SKIPPED;
+	}
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = 1;
+	config.flags = 0;
+
+	g_lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(g_lpm != NULL);
+
+	/* Create RCU QSBR variable */
+	sz = rte_rcu_qsbr_get_memsize(1);
+	g_v = (struct rte_rcu_qsbr *)rte_zmalloc_socket(NULL, sz,
+				RTE_CACHE_LINE_SIZE, SOCKET_ID_ANY);
+	TEST_LPM_ASSERT(g_v != NULL);
+
+	status = rte_rcu_qsbr_init(g_v, 1);
+	TEST_LPM_ASSERT(status == 0);
+
+	rcu_cfg.v = g_v;
+	rcu_cfg.mode = RTE_LPM_QSBR_MODE_SYNC;
+	/* Attach RCU QSBR to LPM table */
+	status = rte_lpm_rcu_qsbr_add(g_lpm, &rcu_cfg, NULL);
+	TEST_LPM_ASSERT(status == 0);
+
+	writer_done = 0;
+	/* Launch reader thread */
+	rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+				rte_get_next_lcore(-1, 1, 0));
+
+	depth = 28;
+	next_hop = 1;
+	status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+	if (status != 0) {
+		printf("%s: Failed to add rule\n", __func__);
+		goto error;
+	}
+
+	/* Writer update */
+	for (i = 0; i < WRITER_ITERATIONS; i++) {
+		status = rte_lpm_delete(g_lpm, g_ip, depth);
+		if (status != 0) {
+			printf("%s: Failed to delete rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+
+		status = rte_lpm_add(g_lpm, g_ip, depth, next_hop);
+		if (status != 0) {
+			printf("%s: Failed to add rule at iteration %d\n",
+				__func__, i);
+			goto error;
+		}
+	}
+
+error:
+	writer_done = 1;
+	/* Wait until reader exited. */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(g_lpm);
+	rte_free(g_v);
+
+	return (status == 0) ? PASS : -1;
+}
+
 /*
  * Do all unit tests.
  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [dpdk-dev] [PATCH v10 3/3] test/lpm: add RCU integration performance tests
  2020-07-10  2:22   ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
@ 2020-07-10  2:22     ` Ruifeng Wang
  2020-07-10  2:29       ` Ruifeng Wang
  2020-07-10 12:21     ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library David Marchand
  3 siblings, 1 reply; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-10  2:22 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd

From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

Add performance tests for RCU integration. The performance
difference with and without RCU integration is very small
(~1% to ~2%) on both Arm and x86 platforms.

Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 app/test/test_lpm_perf.c | 492 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 489 insertions(+), 3 deletions(-)

diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
index 489719c40..dfe186426 100644
--- a/app/test/test_lpm_perf.c
+++ b/app/test/test_lpm_perf.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <stdio.h>
@@ -10,12 +11,27 @@
 #include <rte_cycles.h>
 #include <rte_random.h>
 #include <rte_branch_prediction.h>
+#include <rte_malloc.h>
 #include <rte_ip.h>
 #include <rte_lpm.h>
 
 #include "test.h"
 #include "test_xmmt_ops.h"
 
+struct rte_lpm *lpm;
+static struct rte_rcu_qsbr *rv;
+static volatile uint8_t writer_done;
+static volatile uint32_t thr_id;
+static uint64_t gwrite_cycles;
+static uint64_t gwrites;
+/* LPM APIs are not thread safe, use mutex to provide thread safety */
+static pthread_mutex_t lpm_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/* Report quiescent state interval every 1024 lookups. Larger critical
+ * sections in reader will result in writer polling multiple times.
+ */
+#define QSBR_REPORTING_INTERVAL 1024
+
 #define TEST_LPM_ASSERT(cond) do {                                            \
 	if (!(cond)) {                                                        \
 		printf("Error at line %d: \n", __LINE__);                     \
@@ -24,6 +40,7 @@
 } while(0)
 
 #define ITERATIONS (1 << 10)
+#define RCU_ITERATIONS 10
 #define BATCH_SIZE (1 << 12)
 #define BULK_SIZE 32
 
@@ -35,9 +52,13 @@ struct route_rule {
 };
 
 static struct route_rule large_route_table[MAX_RULE_NUM];
+/* Route table for routes with depth > 24 */
+struct route_rule large_ldepth_route_table[MAX_RULE_NUM];
 
 static uint32_t num_route_entries;
+static uint32_t num_ldepth_route_entries;
 #define NUM_ROUTE_ENTRIES num_route_entries
+#define NUM_LDEPTH_ROUTE_ENTRIES num_ldepth_route_entries
 
 enum {
 	IP_CLASS_A,
@@ -191,7 +212,7 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	uint32_t ip_head_mask;
 	uint32_t rule_num;
 	uint32_t k;
-	struct route_rule *ptr_rule;
+	struct route_rule *ptr_rule, *ptr_ldepth_rule;
 
 	if (ip_class == IP_CLASS_A) {        /* IP Address class A */
 		fixed_bit_num = IP_HEAD_BIT_NUM_A;
@@ -236,10 +257,20 @@ static void generate_random_rule_prefix(uint32_t ip_class, uint8_t depth)
 	 */
 	start = lrand48() & mask;
 	ptr_rule = &large_route_table[num_route_entries];
+	ptr_ldepth_rule = &large_ldepth_route_table[num_ldepth_route_entries];
 	for (k = 0; k < rule_num; k++) {
 		ptr_rule->ip = (start << (RTE_LPM_MAX_DEPTH - depth))
 			| ip_head_mask;
 		ptr_rule->depth = depth;
+		/* If the depth of the route is more than 24, store it
+		 * in another table as well.
+		 */
+		if (depth > 24) {
+			ptr_ldepth_rule->ip = ptr_rule->ip;
+			ptr_ldepth_rule->depth = ptr_rule->depth;
+			ptr_ldepth_rule++;
+			num_ldepth_route_entries++;
+		}
 		ptr_rule++;
 		start = (start + step) & mask;
 	}
@@ -273,6 +304,7 @@ static void generate_large_route_rule_table(void)
 	uint8_t  depth;
 
 	num_route_entries = 0;
+	num_ldepth_route_entries = 0;
 	memset(large_route_table, 0, sizeof(large_route_table));
 
 	for (ip_class = IP_CLASS_A; ip_class <= IP_CLASS_C; ip_class++) {
@@ -316,10 +348,460 @@ print_route_distribution(const struct route_rule *table, uint32_t n)
 	printf("\n");
 }
 
+/* Check condition and return an error if true. */
+static uint16_t enabled_core_ids[RTE_MAX_LCORE];
+static unsigned int num_cores;
+
+/* Simple way to allocate thread ids in 0 to RTE_MAX_LCORE space */
+static inline uint32_t
+alloc_thread_id(void)
+{
+	uint32_t tmp_thr_id;
+
+	tmp_thr_id = __atomic_fetch_add(&thr_id, 1, __ATOMIC_RELAXED);
+	if (tmp_thr_id >= RTE_MAX_LCORE)
+		printf("Invalid thread id %u\n", tmp_thr_id);
+
+	return tmp_thr_id;
+}
+
+/*
+ * Reader thread using rte_lpm data structure without RCU.
+ */
+static int
+test_lpm_reader(void *arg)
+{
+	int i;
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+	} while (!writer_done);
+
+	return 0;
+}
+
+/*
+ * Reader thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_reader(void *arg)
+{
+	int i;
+	uint32_t thread_id = alloc_thread_id();
+	uint32_t ip_batch[QSBR_REPORTING_INTERVAL];
+	uint32_t next_hop_return = 0;
+
+	RTE_SET_USED(arg);
+	/* Register this thread to report quiescent state */
+	rte_rcu_qsbr_thread_register(rv, thread_id);
+	rte_rcu_qsbr_thread_online(rv, thread_id);
+
+	do {
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			ip_batch[i] = rte_rand();
+
+		for (i = 0; i < QSBR_REPORTING_INTERVAL; i++)
+			rte_lpm_lookup(lpm, ip_batch[i], &next_hop_return);
+
+		/* Update quiescent state */
+		rte_rcu_qsbr_quiescent(rv, thread_id);
+	} while (!writer_done);
+
+	rte_rcu_qsbr_thread_offline(rv, thread_id);
+	rte_rcu_qsbr_thread_unregister(rv, thread_id);
+
+	return 0;
+}
+
+/*
+ * Writer thread using rte_lpm data structure with RCU.
+ */
+static int
+test_lpm_rcu_qsbr_writer(void *arg)
+{
+	unsigned int i, j, si, ei;
+	uint64_t begin, total_cycles;
+	uint8_t core_id = (uint8_t)((uintptr_t)arg);
+	uint32_t next_hop_add = 0xAA;
+
+	RTE_SET_USED(arg);
+	/* 2 writer threads are used */
+	if (core_id % 2 == 0) {
+		si = 0;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+	} else {
+		si = NUM_LDEPTH_ROUTE_ENTRIES / 2;
+		ei = NUM_LDEPTH_ROUTE_ENTRIES;
+	}
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+
+		/* Delete all the entries */
+		for (j = si; j < ei; j++) {
+			pthread_mutex_lock(&lpm_mutex);
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+			}
+			pthread_mutex_unlock(&lpm_mutex);
+		}
+	}
+
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	__atomic_fetch_add(&gwrite_cycles, total_cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&gwrites,
+			2 * NUM_LDEPTH_ROUTE_ENTRIES * RCU_ITERATIONS,
+			__ATOMIC_RELAXED);
+
+	return 0;
+}
+
+/*
+ * Functional test:
+ * 2 writers, rest are readers
+ */
+static int
+test_lpm_rcu_perf_multi_writer(void)
+{
+	struct rte_lpm_config config;
+	size_t sz;
+	unsigned int i;
+	uint16_t core_id;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 3) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 3\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 2 writers, %d readers, RCU integration enabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	/* Wait and check return value from reader threads */
+	writer_done = 1;
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 2 writers, %d readers, RCU integration disabled\n",
+		num_cores - 2);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&gwrite_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&gwrites, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 2; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Launch writer threads */
+	for (i = 0; i < 2; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_writer,
+					(void *)(uintptr_t)i,
+					enabled_core_ids[i]);
+
+	/* Wait for writer threads */
+	for (i = 0; i < 2; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	printf("Total LPM Adds: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		2 * ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %"PRIu64" cycles\n",
+		__atomic_load_n(&gwrite_cycles, __ATOMIC_RELAXED) /
+			__atomic_load_n(&gwrites, __ATOMIC_RELAXED)
+		);
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 2; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
+/*
+ * Functional test:
+ * Single writer, rest are readers
+ */
+static int
+test_lpm_rcu_perf(void)
+{
+	struct rte_lpm_config config;
+	uint64_t begin, total_cycles;
+	size_t sz;
+	unsigned int i, j;
+	uint16_t core_id;
+	uint32_t next_hop_add = 0xAA;
+	struct rte_lpm_rcu_config rcu_cfg = {0};
+
+	if (rte_lcore_count() < 2) {
+		printf("Not enough cores for lpm_rcu_perf_autotest, expecting at least 2\n");
+		return TEST_SKIPPED;
+	}
+
+	num_cores = 0;
+	RTE_LCORE_FOREACH_SLAVE(core_id) {
+		enabled_core_ids[num_cores] = core_id;
+		num_cores++;
+	}
+
+	printf("\nPerf test: 1 writer, %d readers, RCU integration enabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	/* Init RCU variable */
+	sz = rte_rcu_qsbr_get_memsize(num_cores);
+	rv = (struct rte_rcu_qsbr *)rte_zmalloc("rcu0", sz,
+						RTE_CACHE_LINE_SIZE);
+	rte_rcu_qsbr_init(rv, num_cores);
+
+	rcu_cfg.v = rv;
+	/* Assign the RCU variable to LPM */
+	if (rte_lpm_rcu_qsbr_add(lpm, &rcu_cfg, NULL) != 0) {
+		printf("RCU variable assignment failed\n");
+		goto error;
+	}
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_rcu_qsbr_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			goto error;
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+	lpm = NULL;
+	rv = NULL;
+
+	/* Test without RCU integration */
+	printf("\nPerf test: 1 writer, %d readers, RCU integration disabled\n",
+		num_cores);
+
+	/* Create LPM table */
+	config.max_rules = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.number_tbl8s = NUM_LDEPTH_ROUTE_ENTRIES;
+	config.flags = 0;
+	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	writer_done = 0;
+	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+
+	/* Launch reader threads */
+	for (i = 0; i < num_cores; i++)
+		rte_eal_remote_launch(test_lpm_reader, NULL,
+					enabled_core_ids[i]);
+
+	/* Measure add/delete. */
+	begin = rte_rdtsc_precise();
+	for (i = 0; i < RCU_ITERATIONS; i++) {
+		/* Add all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_add(lpm, large_ldepth_route_table[j].ip,
+					large_ldepth_route_table[j].depth,
+					next_hop_add) != 0) {
+				printf("Failed to add iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+
+		/* Delete all the entries */
+		for (j = 0; j < NUM_LDEPTH_ROUTE_ENTRIES; j++)
+			if (rte_lpm_delete(lpm, large_ldepth_route_table[j].ip,
+				large_ldepth_route_table[j].depth) != 0) {
+				printf("Failed to delete iteration %d, route# %d\n",
+					i, j);
+				goto error;
+			}
+	}
+	total_cycles = rte_rdtsc_precise() - begin;
+
+	printf("Total LPM Adds: %d\n", ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Total LPM Deletes: %d\n",
+		ITERATIONS * NUM_LDEPTH_ROUTE_ENTRIES);
+	printf("Average LPM Add/Del: %g cycles\n",
+		(double)total_cycles / (NUM_LDEPTH_ROUTE_ENTRIES * ITERATIONS));
+
+	writer_done = 1;
+	/* Wait and check return value from reader threads */
+	for (i = 0; i < num_cores; i++)
+		if (rte_eal_wait_lcore(enabled_core_ids[i]) < 0)
+			printf("Warning: lcore %u not finished.\n",
+				enabled_core_ids[i]);
+
+	rte_lpm_free(lpm);
+
+	return 0;
+
+error:
+	writer_done = 1;
+	/* Wait until all readers have exited */
+	rte_eal_mp_wait_lcore();
+
+	rte_lpm_free(lpm);
+	rte_free(rv);
+
+	return -1;
+}
+
 static int
 test_lpm_perf(void)
 {
-	struct rte_lpm *lpm = NULL;
 	struct rte_lpm_config config;
 
 	config.max_rules = 2000000;
@@ -343,7 +825,7 @@ test_lpm_perf(void)
 	lpm = rte_lpm_create(__func__, SOCKET_ID_ANY, &config);
 	TEST_LPM_ASSERT(lpm != NULL);
 
-	/* Measue add. */
+	/* Measure add. */
 	begin = rte_rdtsc();
 
 	for (i = 0; i < NUM_ROUTE_ENTRIES; i++) {
@@ -478,6 +960,10 @@ test_lpm_perf(void)
 	rte_lpm_delete_all(lpm);
 	rte_lpm_free(lpm);
 
+	test_lpm_rcu_perf();
+
+	test_lpm_rcu_perf_multi_writer();
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-10  2:29       ` Ruifeng Wang
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-10  2:29 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, Vladimir Medvedkin,
	John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, Honnappa Nagarahalli, nd, nd

The ci/checkpatch warning is a false positive.

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, July 10, 2020 10:22 AM
> To: Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>
> Cc: dev@dpdk.org; konstantin.ananyev@intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Subject: [PATCH v10 1/3] lib/lpm: integrate RCU QSBR
> 
> Currently, the tbl8 group is freed even though the readers might be using the
> tbl8 group entries. The freed tbl8 group can be reallocated quickly. This
> results in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of integrating
> RCU library into other libraries.
> 
> To avoid ABI breakage, a struct __rte_lpm is created for lpm library internal
> use. This struct wraps rte_lpm that has been exposed and also includes
> members that don't need to be exposed such as RCU related config.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> ---
>  doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
>  lib/librte_lpm/Makefile            |   2 +-
>  lib/librte_lpm/meson.build         |   1 +
>  lib/librte_lpm/rte_lpm.c           | 165 +++++++++++++++++++++++++----
>  lib/librte_lpm/rte_lpm.h           |  53 +++++++++
>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>  6 files changed, 237 insertions(+), 22 deletions(-)
> 


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v10 3/3] test/lpm: add RCU integration performance tests
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
@ 2020-07-10  2:29       ` Ruifeng Wang
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-10  2:29 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, Vladimir Medvedkin
  Cc: dev, mdr, konstantin.ananyev, Honnappa Nagarahalli, nd, nd

The ci/checkpatch warning is a false positive.

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Ruifeng Wang
> Sent: Friday, July 10, 2020 10:22 AM
> To: Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>
> Cc: dev@dpdk.org; mdr@ashroe.eu; konstantin.ananyev@intel.com;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: [dpdk-dev] [PATCH v10 3/3] test/lpm: add RCU integration
> performance tests
> 
> From: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 
> Add performance tests for RCU integration. The performance difference
> with and without RCU integration is very small (~1% to ~2%) on both Arm and
> x86 platforms.
> 
> Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> ---
>  app/test/test_lpm_perf.c | 492
> ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 489 insertions(+), 3 deletions(-)
> 


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library
  2020-07-10  2:22   ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
                       ` (2 preceding siblings ...)
  2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
@ 2020-07-10 12:21     ` David Marchand
  2020-07-10 14:34       ` Ruifeng Wang
  3 siblings, 1 reply; 137+ messages in thread
From: David Marchand @ 2020-07-10 12:21 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: dev, Ray Kinsella, Ananyev, Konstantin, Honnappa Nagarahalli, nd,
	Vladimir Medvedkin

On Fri, Jul 10, 2020 at 4:22 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> This patchset integrates RCU QSBR support with LPM library.
>
> Resource reclaimation implementation was splitted from the original
> series, and has already been part of RCU library. Rework the series
> to base LPM integration on RCU reclaimation APIs.
>
> New API rte_lpm_rcu_qsbr_add is introduced for application to
> register a RCU variable that LPM library will use. This provides
> user the handle to enable RCU that integrated in LPM library.
>
> Functional tests and performance tests are added to cover the
> integration with RCU.

Series applied.

A comment though.

I am surprised to see the defer queue is still exposed out of lpm.

+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+    struct rte_rcu_qsbr_dq **dq);

If this is intended, we will need unit tests for this parameter as I
could see none.
Else, it can be removed.

Please send a followup patch for rc2.
Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library
  2020-07-10 12:21     ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library David Marchand
@ 2020-07-10 14:34       ` Ruifeng Wang
  0 siblings, 0 replies; 137+ messages in thread
From: Ruifeng Wang @ 2020-07-10 14:34 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Ray Kinsella, Ananyev, Konstantin, Honnappa Nagarahalli, nd,
	Vladimir Medvedkin, nd


> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, July 10, 2020 8:21 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: dev <dev@dpdk.org>; Ray Kinsella <mdr@ashroe.eu>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Vladimir
> Medvedkin <vladimir.medvedkin@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library
> 
> On Fri, Jul 10, 2020 at 4:22 AM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> >
> > This patchset integrates RCU QSBR support with LPM library.
> >
> > Resource reclaimation implementation was splitted from the original
> > series, and has already been part of RCU library. Rework the series to
> > base LPM integration on RCU reclaimation APIs.
> >
> > New API rte_lpm_rcu_qsbr_add is introduced for application to register
> > a RCU variable that LPM library will use. This provides user the
> > handle to enable RCU that integrated in LPM library.
> >
> > Functional tests and performance tests are added to cover the
> > integration with RCU.
> 
> Series applied.
> 
> A comment though.
> 
> I am surprised to see the defer queue is still exposed out of lpm.
> 
> +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config
> *cfg,
> +    struct rte_rcu_qsbr_dq **dq);
> 
> If this is intended, we will need unit tests for this parameter as I could see
> none.
> Else, it can be removed.
> 
Looking at comments in v4, there was consensus that exposure of defer queue is not needed. Enough flexibility has been provided to configure defer queue.
I should have removed this prarameter.

> Please send a followup patch for rc2.
Will send out followup patch.
Thanks.

/Ruifeng
> Thanks.
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 137+ messages in thread

end of thread, other threads:[~2020-07-10 14:35 UTC | newest]

Thread overview: 137+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22  6:34 [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Ruifeng Wang
2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details Ruifeng Wang
2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API Ruifeng Wang
2019-08-22  6:34 ` [dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2019-08-23  1:23   ` Stephen Hemminger
2019-08-26  3:11     ` Ruifeng Wang (Arm Technology China)
2019-08-26  5:32       ` Honnappa Nagarahalli
2019-08-22 15:52 ` [dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library Honnappa Nagarahalli
2019-09-06  9:45 ` [dpdk-dev] [PATCH v2 0/6] " Ruifeng Wang
2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 1/6] doc/rcu: add RCU integration design details Ruifeng Wang
2019-09-06 19:44     ` Honnappa Nagarahalli
2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 2/6] lib/ring: add peek API Ruifeng Wang
2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 3/6] lib/lpm: integrate RCU QSBR Ruifeng Wang
2019-09-06 19:44     ` Honnappa Nagarahalli
2019-09-18 16:15     ` Medvedkin, Vladimir
2019-09-19  6:17       ` Ruifeng Wang (Arm Technology China)
2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 4/6] app/test: add test case for LPM RCU integration Ruifeng Wang
2019-09-06 19:45     ` Honnappa Nagarahalli
2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 5/6] test/lpm: reset total time Ruifeng Wang
2019-09-18 16:17     ` Medvedkin, Vladimir
2019-09-19  6:22       ` Ruifeng Wang (Arm Technology China)
2019-09-06  9:45   ` [dpdk-dev] [PATCH v2 6/6] test/lpm: add RCU integration performance tests Ruifeng Wang
2019-09-06 19:46     ` Honnappa Nagarahalli
2019-10-01  6:29   ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Honnappa Nagarahalli
2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API Honnappa Nagarahalli
2019-10-02 18:42       ` Ananyev, Konstantin
2019-10-03 19:49         ` Honnappa Nagarahalli
2019-10-07  9:01           ` Ananyev, Konstantin
2019-10-09  4:25             ` Honnappa Nagarahalli
2019-10-10 15:09               ` Ananyev, Konstantin
2019-10-11  5:03                 ` Honnappa Nagarahalli
2019-10-11 14:41                   ` Ananyev, Konstantin
2019-10-11 18:28                     ` Honnappa Nagarahalli
2019-10-13 20:09                       ` Ananyev, Konstantin
2019-10-14  4:11                         ` Honnappa Nagarahalli
2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs Honnappa Nagarahalli
2019-10-02 17:39       ` Ananyev, Konstantin
2019-10-03  6:29         ` Honnappa Nagarahalli
2019-10-03 12:26           ` Ananyev, Konstantin
2019-10-04  6:07             ` Honnappa Nagarahalli
2019-10-07 10:46               ` Ananyev, Konstantin
2019-10-13  4:35                 ` Honnappa Nagarahalli
2019-10-02 18:50       ` Ananyev, Konstantin
2019-10-03  6:42         ` Honnappa Nagarahalli
2019-10-03 11:52           ` Ananyev, Konstantin
2019-10-04 19:01       ` Medvedkin, Vladimir
2019-10-07 13:11       ` Medvedkin, Vladimir
2019-10-13  3:02         ` Honnappa Nagarahalli
2019-10-15 16:48           ` Medvedkin, Vladimir
2019-10-18  3:47             ` Honnappa Nagarahalli
2019-10-01  6:29     ` [dpdk-dev] [PATCH v3 3/3] doc/rcu: add RCU integration design details Honnappa Nagarahalli
2020-03-29 20:57     ` [dpdk-dev] [PATCH v3 0/3] Add RCU reclamation APIs Thomas Monjalon
2020-03-30 17:37       ` Honnappa Nagarahalli
2020-04-03 18:41     ` [dpdk-dev] [PATCH v4 0/4] " Honnappa Nagarahalli
2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 1/4] lib/rcu: add resource " Honnappa Nagarahalli
2020-04-07 17:39         ` Ananyev, Konstantin
2020-04-19 23:22           ` Honnappa Nagarahalli
2020-04-20  8:19             ` Ananyev, Konstantin
2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 2/4] test/rcu: test cases for RCU defer queue APIs Honnappa Nagarahalli
2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 3/4] doc/rcu: add RCU integration design details Honnappa Nagarahalli
2020-04-03 18:41       ` [dpdk-dev] [PATCH v4 4/4] lib/rcu: add additional debug logs Honnappa Nagarahalli
2020-04-22  3:30     ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs Honnappa Nagarahalli
2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 1/4] lib/rcu: add resource " Honnappa Nagarahalli
2020-04-22  8:36         ` Ananyev, Konstantin
2020-04-22  8:42           ` David Marchand
2020-04-22  8:51             ` David Marchand
2020-04-22  9:26               ` Ananyev, Konstantin
2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 2/4] test/rcu: test cases for RCU defer queue APIs Honnappa Nagarahalli
2020-04-22  8:27         ` Ananyev, Konstantin
2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 3/4] doc/rcu: add RCU integration design details Honnappa Nagarahalli
2020-04-22  3:30       ` [dpdk-dev] [PATCH v5 4/4] lib/rcu: add additional debug logs Honnappa Nagarahalli
2020-04-22  8:25         ` Ananyev, Konstantin
2020-04-22 18:46       ` [dpdk-dev] [PATCH v5 0/4] Add RCU reclamation APIs David Marchand
2019-10-01 18:28   ` [dpdk-dev] [PATCH v3 0/3] RCU integration with LPM library Honnappa Nagarahalli
2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 1/3] lib/lpm: integrate RCU QSBR Honnappa Nagarahalli
2019-10-04 16:05       ` Medvedkin, Vladimir
2019-10-09  3:48         ` Honnappa Nagarahalli
2019-10-07  9:21       ` Ananyev, Konstantin
2019-10-13  4:36         ` Honnappa Nagarahalli
2019-10-15 11:15           ` Ananyev, Konstantin
2019-10-18  3:32             ` Honnappa Nagarahalli
2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 2/3] app/test: add test case for LPM RCU integration Honnappa Nagarahalli
2019-10-01 18:28     ` [dpdk-dev] [PATCH v3 3/3] test/lpm: add RCU integration performance tests Honnappa Nagarahalli
2019-10-02 13:02       ` Aaron Conole
2019-10-03  9:09         ` Bruce Richardson
2020-06-08  5:16   ` [dpdk-dev] [PATCH v4 0/3] RCU integration with LPM library Ruifeng Wang
2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-06-08 18:46       ` Honnappa Nagarahalli
2020-06-18 17:36         ` Medvedkin, Vladimir
2020-06-18 17:21       ` Medvedkin, Vladimir
2020-06-22  5:46         ` Ruifeng Wang
2020-06-23  4:34           ` Honnappa Nagarahalli
2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
2020-06-08  5:16     ` [dpdk-dev] [PATCH v4 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
2020-06-29  8:02   ` [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library Ruifeng Wang
2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-06-29 11:56       ` David Marchand
2020-06-29 12:55         ` Bruce Richardson
2020-06-30 10:35           ` Kinsella, Ray
2020-07-03  7:43         ` David Marchand
2020-07-04 17:00         ` Ruifeng Wang
2020-06-30 10:33       ` Kinsella, Ray
2020-06-29  8:02     ` [dpdk-dev] [PATCH v5 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
2020-06-29  8:03     ` [dpdk-dev] [PATCH v5 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
2020-07-07 14:40   ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
2020-07-07 14:40     ` [dpdk-dev] [PATCH v6 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
2020-07-07 15:15   ` [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-08 12:36       ` Medvedkin, Vladimir
2020-07-08 14:30       ` David Marchand
2020-07-08 15:34         ` Ruifeng Wang
2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
2020-07-08 12:37       ` Medvedkin, Vladimir
2020-07-08 14:00         ` Ruifeng Wang
2020-07-07 15:15     ` [dpdk-dev] [PATCH v7 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
2020-07-08 12:37       ` Medvedkin, Vladimir
2020-07-08 14:07         ` Ruifeng Wang
2020-07-09  8:02   ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-09 11:49       ` David Marchand
2020-07-09 14:35         ` Ruifeng Wang
2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
2020-07-09  8:02     ` [dpdk-dev] [PATCH v8 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
2020-07-09 15:42   ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
2020-07-09 15:42     ` [dpdk-dev] [PATCH v9 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
2020-07-10  2:22   ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-10  2:29       ` Ruifeng Wang
2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 2/3] test/lpm: add LPM RCU integration functional tests Ruifeng Wang
2020-07-10  2:22     ` [dpdk-dev] [PATCH v10 3/3] test/lpm: add RCU integration performance tests Ruifeng Wang
2020-07-10  2:29       ` Ruifeng Wang
2020-07-10 12:21     ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library David Marchand
2020-07-10 14:34       ` Ruifeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).