DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC 0/6] New sync modes for ring
@ 2020-02-24 11:35 Konstantin Ananyev
  2020-02-24 11:35 ` [dpdk-dev] [RFC 1/6] test/ring: add contention stress test Konstantin Ananyev
                   ` (8 more replies)
  0 siblings, 9 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-02-24 11:35 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, Konstantin Ananyev

Upfront note - that RFC is not a complete patch.
It introduces an ABI breakage, plus it doesn't update ring_elem
code properly, etc.
I plan to deal with all these things in later versions.
Right now I seek an initial feedback about proposed ideas.
Would also ask people to repeat performance tests (see below)
on their platforms to confirm the impact.

More and more customers use(/try to use) DPDK based apps within
overcommitted systems (multiple acttive threads over same pysical cores):
VM, container deployments, etc.
One quite common problem they hit: Lock-Holder-Preemption with rte_ring.
LHP is quite a common problem for spin-based sync primitives
(spin-locks, etc.) on overcommitted systems.
The situation gets much worse when some sort of
fair-locking technique is used (ticket-lock, etc.).
As now not only lock-owner but also lock-waiters scheduling
order matters a lot.
This is a well-known problem for kernel within VMs:
http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
The problem with rte_ring is that while head accusion is sort of
un-fair locking, waiting on tail is very similar to ticket lock schema -
tail has to be updated in particular order.
That makes current rte_ring implementation to perform
really pure on some overcommited scenarios.
While it is probably not possible to completely resolve this problem in
userspace only (without some kernel communication/intervention),
removing fairness in tail update can mitigate it significantly.
So this RFC proposes two new optional ring synchronization modes:
1) Head/Tail Sync (HTS) mode
In that mode enqueue/dequeue operation is fully serialized:
    only one thread at a time is allowed to perform given op.
    As another enhancement provide ability to split enqueue/dequeue
    operation into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
    That allows user to inspect objects in the ring without removing
    them from it (aka MT safe peek).
2) Relaxed Tail Sync (RTS)
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Test results on IA (see below) show significant improvements
for average enqueue/dequeue op times on overcommitted systems.
For 'classic' DPDK deployments (one thread per core) original MP/MC
algorithm still shows best numbers, though for 64-bit target
RTS numbers are not that far away.
Numbers were produced by ring_stress_*autotest
(first patch in these series).

X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj

                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51

2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68

8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12

i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj

                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91

2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65

8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

Konstantin Ananyev (6):
  test/ring: add contention stress test
  ring: rework ring layout to allow new sync schemes
  ring: introduce RTS ring mode
  test/ring: add contention stress test for RTS ring
  ring: introduce HTS ring mode
  test/ring: add contention stress test for HTS ring

 app/test/Makefile                      |   3 +
 app/test/meson.build                   |   3 +
 app/test/test_pdump.c                  |   6 +-
 app/test/test_ring_hts_stress.c        |  28 ++
 app/test/test_ring_rts_stress.c        |  28 ++
 app/test/test_ring_stress.c            |  27 ++
 app/test/test_ring_stress.h            | 477 +++++++++++++++++++
 lib/librte_pdump/rte_pdump.c           |   2 +-
 lib/librte_port/rte_port_ring.c        |  12 +-
 lib/librte_ring/Makefile               |   4 +-
 lib/librte_ring/meson.build            |   4 +-
 lib/librte_ring/rte_ring.c             |  84 +++-
 lib/librte_ring/rte_ring.h             | 619 +++++++++++++++++++++++--
 lib/librte_ring/rte_ring_elem.h        |   8 +-
 lib/librte_ring/rte_ring_hts_generic.h | 228 +++++++++
 lib/librte_ring/rte_ring_rts_generic.h | 240 ++++++++++
 16 files changed, 1721 insertions(+), 52 deletions(-)
 create mode 100644 app/test/test_ring_hts_stress.c
 create mode 100644 app/test/test_ring_rts_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 lib/librte_ring/rte_ring_hts_generic.h
 create mode 100644 lib/librte_ring/rte_ring_rts_generic.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [RFC 1/6] test/ring: add contention stress test
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
@ 2020-02-24 11:35 ` Konstantin Ananyev
  2020-02-24 11:35 ` [dpdk-dev] [RFC 2/6] ring: rework ring layout to allow new sync schemes Konstantin Ananyev
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-02-24 11:35 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, Konstantin Ananyev

Introduce new test-case to measure ring perfomance under contention
(miltiple producers/consumers).
Starts dequeue/enqueue loop on all available slave lcores.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile           |   1 +
 app/test/meson.build        |   1 +
 app/test/test_ring_stress.c |  27 ++
 app/test/test_ring_stress.h | 477 ++++++++++++++++++++++++++++++++++++
 4 files changed, 506 insertions(+)
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h

diff --git a/app/test/Makefile b/app/test/Makefile
index 1f080d162..4f586d95f 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 0a2ce710f..84dde28ad 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_perf.c',
+	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
new file mode 100644
index 000000000..5689e06c8
--- /dev/null
+++ b/app/test/test_ring_stress.c
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num, 0);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
new file mode 100644
index 000000000..c6f0bc9f1
--- /dev/null
+++ b/app/test/test_ring_stress.h
@@ -0,0 +1,477 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <rte_ring.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+#include <rte_random.h>
+#include <rte_malloc.h>
+
+#include "test.h"
+
+/*
+ * Measures performance of ring enqueue/dequeue under high contention
+ */
+
+#define RING_NAME	"RING_STRESS"
+#define BULK_NUM	32
+#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
+
+enum {
+	WRK_CMD_STOP,
+	WRK_CMD_RUN,
+};
+
+static volatile uint32_t wrk_cmd __rte_cache_aligned;
+
+/* test run-time in seconds */
+static const uint32_t run_time = 60;
+
+struct lcore_stat {
+	uint64_t nb_cycle;
+	struct {
+		uint64_t nb_call;
+		uint64_t nb_obj;
+		uint64_t nb_cycle;
+		uint64_t max_cycle;
+		uint64_t min_cycle;
+	} op;
+};
+
+struct lcore_arg {
+	struct rte_ring *rng;
+	struct lcore_stat stats;
+} __rte_cache_aligned;
+
+struct ring_elem {
+	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)];
+} __rte_cache_aligned;
+
+/*
+ * redefinable functions
+ */
+static uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail);
+
+static uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free);
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
+
+
+static void
+lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
+	uint64_t tm, int32_t prcs)
+{
+	ls->op.nb_call += call;
+	ls->op.nb_obj += obj;
+	ls->op.nb_cycle += tm;
+	if (prcs) {
+		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
+		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
+	}
+}
+
+static void
+lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+
+	ms->op.nb_call += ls->op.nb_call;
+	ms->op.nb_obj += ls->op.nb_obj;
+	ms->op.nb_cycle += ls->op.nb_cycle;
+	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
+	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle);
+}
+
+static void
+lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
+	lcore_op_stat_aggr(ms, ls);
+}
+
+static void
+lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls)
+{
+	long double st;
+
+	st = (long double)rte_get_timer_hz() / US_PER_S;
+
+	if (lc == UINT32_MAX)
+		fprintf(f, "%s(AGGREGATE)={\n", __func__);
+	else
+		fprintf(f, "%s(lc=%u)={\n", __func__, lc);
+
+	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
+		ls->nb_cycle, (long double)ls->nb_cycle / st);
+
+	fprintf(f, "\tDEQ+ENQ={\n");
+
+	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
+	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
+	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
+	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_obj / ls->op.nb_call);
+	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_obj);
+	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_call);
+
+	/* if min/max cycles per call stats was collected */
+	if (ls->op.min_cycle != UINT64_MAX) {
+		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.max_cycle,
+			(long double)ls->op.max_cycle / st);
+		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.min_cycle,
+			(long double)ls->op.min_cycle / st);
+	}
+
+	fprintf(f, "\t},\n");
+	fprintf(f, "};\n");
+}
+
+static void
+fill_ring_elm(struct ring_elem *elm, uint32_t fill)
+{
+	uint32_t i;
+
+	for (i = 0; i != RTE_DIM(elm->cnt); i++)
+		elm->cnt[i] = fill;
+}
+
+static int32_t
+check_updt_elem(struct ring_elem *elm[], uint32_t num,
+	const struct ring_elem *check, const struct ring_elem *fill)
+{
+	uint32_t i;
+
+	static rte_spinlock_t dump_lock;
+
+	for (i = 0; i != num; i++) {
+		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
+			rte_spinlock_lock(&dump_lock);
+			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
+				"offending object: %p\n",
+				__func__, rte_lcore_id(), num, i, elm[i]);
+			rte_memdump(stdout, "expected", check, sizeof(*check));
+			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
+			rte_spinlock_unlock(&dump_lock);
+			return -EINVAL;
+		}
+		memcpy(elm[i], fill, sizeof(*elm[i]));
+	}
+
+	return 0;
+}
+
+static int
+check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
+	const char *fname, const char *opname)
+{
+	if (exp != res) {
+		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
+			fname, lc, opname, exp, res);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static int
+test_worker_prcs(void *arg)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl, tm0, tm1;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		tm0 = rte_rdtsc_precise();
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+		tm0 = rte_rdtsc_precise() - tm0;
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		tm1 = rte_rdtsc_precise();
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+		tm1 = rte_rdtsc_precise() - tm1;
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, 1);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	la->stats.nb_cycle = rte_rdtsc_precise() - cl;
+	return rc;
+}
+
+static int
+test_worker_avg(void *arg)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, 0, 0);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	/* final stats update */
+	cl = rte_rdtsc_precise() - cl;
+	lcore_stat_update(&la->stats, 0, 0, cl, 0);
+	la->stats.nb_cycle = cl;
+
+	return rc;
+}
+
+static void
+mt1_fini(struct rte_ring *rng, void *data)
+{
+	rte_free(rng);
+	rte_free(data);
+}
+
+static int
+mt1_init(struct rte_ring **rng, void **data, uint32_t num)
+{
+	int32_t rc;
+	size_t sz;
+	uint32_t i, nr;
+	struct rte_ring *r;
+	struct ring_elem *elm;
+	void *p;
+
+	*rng = NULL;
+	*data = NULL;
+
+	sz = num * sizeof(*elm);
+	elm = rte_zmalloc(NULL, sz, alignof(*elm));
+
+	/* alloc ring */
+	nr = 2 * num;
+	sz = rte_ring_get_memsize(nr);
+	r = rte_zmalloc(NULL, sz, alignof(*r));
+	if (r == NULL) {
+		printf("%s: alloca(%zu) for FIFO with %u elems failed",
+			__func__, sz, nr);
+		return -ENOMEM;
+	}
+	rc = _st_ring_init(r, RING_NAME, nr);
+	if (rc != 0) {
+		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
+			__func__, r, nr, rc, strerror(-rc));
+		return rc;
+	}
+
+	for (i = 0; i != num; i++) {
+		fill_ring_elm(elm + i, UINT32_MAX);
+		p = elm + i;
+		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
+			break;
+	}
+
+	if (i != num) {
+		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
+			__func__, r, num, i);
+		return -ENOSPC;
+	}
+
+	*rng = r;
+	*data = elm;
+	return 0;
+}
+
+static int
+test_mt1(int (*test)(void *))
+{
+	int32_t rc;
+	uint32_t lc, mc;
+	struct rte_ring *r;
+	void *data;
+	struct lcore_arg arg[RTE_MAX_LCORE];
+
+	static const struct lcore_stat init_stat = {
+		.op.min_cycle = UINT64_MAX,
+	};
+
+	rc = mt1_init(&r, &data, RING_SIZE);
+	if (rc != 0) {
+		mt1_fini(r, data);
+		return rc;
+	}
+
+	memset(arg, 0, sizeof(arg));
+
+	/* launch on all slaves */
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		arg[lc].rng = r;
+		arg[lc].stats = init_stat;
+		rte_eal_remote_launch(test, &arg[lc], lc);
+	}
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_RUN;
+	rte_smp_wmb();
+
+	usleep(run_time * US_PER_S);
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_STOP;
+	rte_smp_wmb();
+
+	/* wait for slaves and collect stats. */
+	mc = rte_lcore_id();
+	arg[mc].stats = init_stat;
+
+	rc = 0;
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		rc |= rte_eal_wait_lcore(lc);
+		lcore_stat_dump(stdout, lc, &arg[lc].stats);
+		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
+	}
+
+	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+	mt1_fini(r, data);
+	return rc;
+}
+
+static int
+test_ring_stress(void)
+{
+	int32_t rc;
+	uint32_t i, k;
+
+	const struct {
+		const char *name;
+		int (*func)(int (*)(void *));
+		int (*wfunc)(void *arg);
+	} tests[] = {
+		{
+			.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
+			.func = test_mt1,
+			.wfunc = test_worker_prcs,
+		},
+		{
+			.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
+			.func = test_mt1,
+			.wfunc = test_worker_avg,
+		},
+	};
+
+	for (i = 0, k = 0; i != RTE_DIM(tests); i++) {
+
+		printf("TEST %s START\n", tests[i].name);
+
+		rc = tests[i].func(tests[i].wfunc);
+		k += (rc == 0);
+
+		if (rc != 0)
+			printf("TEST-CASE %s FAILED\n", tests[i].name);
+		else
+			printf("TEST-CASE %s OK\n", tests[i].name);
+	}
+
+	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+		i, k, i - k);
+	return (k != i);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [RFC 2/6] ring: rework ring layout to allow new sync schemes
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
  2020-02-24 11:35 ` [dpdk-dev] [RFC 1/6] test/ring: add contention stress test Konstantin Ananyev
@ 2020-02-24 11:35 ` Konstantin Ananyev
  2020-02-24 11:35 ` [dpdk-dev] [RFC 3/6] ring: introduce RTS ring mode Konstantin Ananyev
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-02-24 11:35 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, Konstantin Ananyev

Change from *single* to *sync_type* to allow different
synchronisation schemes to be applied.
Change layout to make sure that *sync_type* and *tail*
will always reside on same offsets.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_pdump.c           |  6 +--
 lib/librte_pdump/rte_pdump.c    |  2 +-
 lib/librte_port/rte_port_ring.c | 12 +++---
 lib/librte_ring/rte_ring.c      |  6 ++-
 lib/librte_ring/rte_ring.h      | 76 ++++++++++++++++++++++++---------
 lib/librte_ring/rte_ring_elem.h |  8 ++--
 6 files changed, 71 insertions(+), 39 deletions(-)

diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index ad183184c..6a1180bcb 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -57,8 +57,7 @@ run_pdump_client_tests(void)
 	if (ret < 0)
 		return -1;
 	mp->flags = 0x0000;
-	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
-				      RING_F_SP_ENQ | RING_F_SC_DEQ);
+	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
 	if (ring_client == NULL) {
 		printf("rte_ring_create SR0 failed");
 		return -1;
@@ -71,9 +70,6 @@ run_pdump_client_tests(void)
 	}
 	rte_eth_dev_probing_finish(eth_dev);
 
-	ring_client->prod.single = 0;
-	ring_client->cons.single = 0;
-
 	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
 
 	for (itr = 0; itr < NUM_ITR; itr++) {
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 8a01ac510..65364f2c5 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 		rte_errno = EINVAL;
 		return -1;
 	}
-	if (ring->prod.single || ring->cons.single) {
+	if (rte_ring_prod_single(ring) || rte_ring_cons_single(ring)) {
 		PDUMP_LOG(ERR, "ring with either SP or SC settings"
 		" is not valid for pdump, should have MP and MC settings\n");
 		rte_errno = EINVAL;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 47fcdd06a..2f6c050fa 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->cons.single && is_multi) ||
-		(!(conf->ring->cons.single) && !is_multi)) {
+		(rte_ring_cons_single(conf->ring) && is_multi) ||
+		(!rte_ring_cons_single(conf->ring) && !is_multi)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
 	}
@@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
@@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 77e5de099..fa5733907 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
-	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
+	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 18fc5d845..aa8c628eb 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -61,11 +61,22 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-/* structure to hold a pair of head/tail values and other metadata */
+/** prod/cons sync types */
+enum {
+	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
+	RTE_RING_SYNC_ST,     /**< single thread only */
+};
+
+/**
+ * structure to hold a pair of head/tail values and other metadata.
+ * used by RTE_RING_SYNC_MT, RTE_RING_SYNC_ST sync types.
+ * Depending on sync_type format of that structure might be different,
+ * but offset for *sync_type* and *tail* values should remain the same.
+ */
 struct rte_ring_headtail {
-	volatile uint32_t head;  /**< Prod/consumer head. */
-	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t single;         /**< True if single prod/cons */
+	uint32_t sync_type;                      /**< sync type of prod/cons */
+	volatile uint32_t tail __rte_aligned(8); /**< prod/consumer tail. */
+	volatile uint32_t head;                  /**< prod/consumer head. */
 };
 
 /**
@@ -116,11 +127,10 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
-/* @internal defines for passing to the enqueue dequeue worker functions */
-#define __IS_SP 1
-#define __IS_MP 0
-#define __IS_SC 1
-#define __IS_MC 0
+#define __IS_SP RTE_RING_SYNC_ST
+#define __IS_MP RTE_RING_SYNC_MT
+#define __IS_SC RTE_RING_SYNC_ST
+#define __IS_MC RTE_RING_SYNC_MT
 
 /**
  * Calculate the memory size needed for a ring
@@ -420,7 +430,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MP, free_space);
+			RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -443,7 +453,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SP, free_space);
+			RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -470,7 +480,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -554,7 +564,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MC, available);
+			RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -578,7 +588,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SC, available);
+			RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -605,7 +615,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 /**
@@ -777,6 +787,30 @@ rte_ring_get_capacity(const struct rte_ring *r)
 	return r->capacity;
 }
 
+static inline uint32_t
+rte_ring_get_prod_sync_type(const struct rte_ring *r)
+{
+	return r->prod.sync_type;
+}
+
+static inline int
+rte_ring_prod_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
+static inline uint32_t
+rte_ring_get_cons_sync_type(const struct rte_ring *r)
+{
+	return r->cons.sync_type;
+}
+
+static inline int
+rte_ring_cons_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -820,7 +854,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -843,7 +877,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -870,7 +904,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -898,7 +932,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -923,7 +957,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -951,7 +985,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 3976757ed..ff7a28ea5 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -570,7 +570,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
 }
 
 /**
@@ -734,7 +734,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
 }
 
 /**
@@ -902,7 +902,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
 }
 
 /**
@@ -995,7 +995,7 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [RFC 3/6] ring: introduce RTS ring mode
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
  2020-02-24 11:35 ` [dpdk-dev] [RFC 1/6] test/ring: add contention stress test Konstantin Ananyev
  2020-02-24 11:35 ` [dpdk-dev] [RFC 2/6] ring: rework ring layout to allow new sync schemes Konstantin Ananyev
@ 2020-02-24 11:35 ` Konstantin Ananyev
  2020-02-24 11:35 ` [dpdk-dev] [RFC 4/6] test/ring: add contention stress test for RTS ring Konstantin Ananyev
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-02-24 11:35 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, Konstantin Ananyev

Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
Aim to reduce stall times in case when ring is used on
overcommited cpus (multiple active threads on the same cpu).
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   3 +-
 lib/librte_ring/meson.build            |   3 +-
 lib/librte_ring/rte_ring.c             |  75 ++++++-
 lib/librte_ring/rte_ring.h             | 300 +++++++++++++++++++++++--
 lib/librte_ring/rte_ring_rts_generic.h | 240 ++++++++++++++++++++
 5 files changed, 598 insertions(+), 23 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_rts_generic.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 917c560ad..4f90344f4 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -18,6 +18,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_rts_generic.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index f2f3ccc88..dc8d7dbea 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -5,7 +5,8 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_rts_generic.h')
 
 # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
 allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index fa5733907..1ce0af3e5 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
+/* by default set head/tail distance as 1/8 of ring capacity */
+#define HTD_MAX_DEF	8
+
 /* return the size of memory occupied by a ring */
 ssize_t
 rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
@@ -82,8 +85,56 @@ rte_ring_get_memsize(unsigned int count)
 void
 rte_ring_reset(struct rte_ring *r)
 {
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+	memset((void *)(uintptr_t)&r->prod.tail, 0,
+		offsetof(struct rte_ring, pad1) -
+		offsetof(struct rte_ring, prod.tail));
+	memset((void *)(uintptr_t)&r->cons.tail, 0,
+		offsetof(struct rte_ring, pad2) -
+		offsetof(struct rte_ring, cons.tail));
+}
+
+/*
+ * helper function, calculates sync_type values for prod and cons
+ * based on input flags. Returns zero at success or negative
+ * errno value otherwise.
+ */
+static int
+get_sync_type(uint32_t flags, uint32_t *prod_st, uint32_t *cons_st)
+{
+	static const uint32_t prod_st_flags =
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+	static const uint32_t cons_st_flags =
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+
+	switch (flags & prod_st_flags) {
+	case 0:
+		*prod_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SP_ENQ:
+		*prod_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MP_RTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (flags & cons_st_flags) {
+	case 0:
+		*cons_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SC_DEQ:
+		*cons_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MC_RTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 int
@@ -100,16 +151,20 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_rts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
+
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
 	ret = strlcpy(r->name, name, sizeof(r->name));
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
-	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
+	if (ret != 0)
+		return ret;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
@@ -126,8 +181,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 		r->mask = count - 1;
 		r->capacity = r->mask;
 	}
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+
+	/* set default values for head-tail distance */
+	if (flags & RING_F_MP_RTS_ENQ)
+		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
+	if (flags & RING_F_MC_RTS_DEQ)
+		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
 
 	return 0;
 }
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index aa8c628eb..a130aeb9d 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -65,13 +65,15 @@ enum rte_ring_queue_behavior {
 enum {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
+	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
 };
 
 /**
  * structure to hold a pair of head/tail values and other metadata.
  * used by RTE_RING_SYNC_MT, RTE_RING_SYNC_ST sync types.
- * Depending on sync_type format of that structure might be different,
- * but offset for *sync_type* and *tail* values should remain the same.
+ * Depending on sync_type format of that structure might differ
+ * depending on the sync mechanism selelcted, but offsets for
+ * *sync_type* and *tail* values should always remain the same.
  */
 struct rte_ring_headtail {
 	uint32_t sync_type;                      /**< sync type of prod/cons */
@@ -79,6 +81,21 @@ struct rte_ring_headtail {
 	volatile uint32_t head;                  /**< prod/consumer head. */
 };
 
+union rte_ring_ht_poscnt {
+	uint64_t raw;
+	struct {
+		uint32_t pos; /**< head/tail position */
+		uint32_t cnt; /**< head/tail reference counter */
+	} val;
+};
+
+struct rte_ring_rts_headtail {
+	uint32_t sync_type; /**< sync type of prod/cons */
+	uint32_t htd_max;   /**< max allowed distance between head/tail */
+	volatile union rte_ring_ht_poscnt tail;
+	volatile union rte_ring_ht_poscnt head;
+};
+
 /**
  * An RTE ring structure.
  *
@@ -106,11 +123,21 @@ struct rte_ring {
 	char pad0 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail prod;
+		struct rte_ring_rts_headtail rts_prod;
+	}  __rte_cache_aligned;
+
 	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail cons;
+		struct rte_ring_rts_headtail rts_cons;
+	}  __rte_cache_aligned;
+
 	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
@@ -127,6 +154,9 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
+#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
+#define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
+
 #define __IS_SP RTE_RING_SYNC_ST
 #define __IS_MP RTE_RING_SYNC_MT
 #define __IS_SC RTE_RING_SYNC_ST
@@ -407,6 +437,82 @@ __rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
 	return n;
 }
 
+#include <rte_ring_rts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue(struct rte_ring *r, void * const *obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue(struct rte_ring *r, void **obj_table,
+		unsigned int n, enum rte_ring_queue_behavior behavior,
+		unsigned int *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
 /**
  * Enqueue several objects on the ring (multi-producers safe).
  *
@@ -456,6 +562,29 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_SYNC_ST, free_space);
 }
 
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -479,8 +608,18 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_rts_enqueue_bulk(r, obj_table, n, free_space);
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -591,6 +730,29 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 			RTE_RING_SYNC_ST, available);
 }
 
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
+}
+
 /**
  * Dequeue several objects from a ring.
  *
@@ -614,8 +776,18 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_rts_dequeue_bulk(r, obj_table, n, available);
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -811,6 +983,42 @@ rte_ring_cons_single(const struct rte_ring *r)
 	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
 }
 
+static inline uint32_t
+rte_ring_get_prod_htd_max(const struct rte_ring *r)
+{
+	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+static inline int
+rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_prod.htd_max = v;
+	return 0;
+}
+
+static inline uint32_t
+rte_ring_get_cons_htd_max(const struct rte_ring *r)
+{
+	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+static inline int
+rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_cons.htd_max = v;
+	return 0;
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -880,6 +1088,29 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static __rte_always_inline unsigned
+rte_ring_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -903,8 +1134,18 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_rts_enqueue_burst(r, obj_table, n, free_space);
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -960,6 +1201,30 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static __rte_always_inline unsigned
+rte_ring_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
 /**
  * Dequeue multiple objects from a ring up to a maximum number.
  *
@@ -983,9 +1248,18 @@ static __rte_always_inline unsigned
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_rts_dequeue_burst(r, obj_table, n, available);
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_rts_generic.h b/lib/librte_ring/rte_ring_rts_generic.h
new file mode 100644
index 000000000..add8630b2
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_generic.h
@@ -0,0 +1,240 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_GENERIC_H_
+#define _RTE_RING_RTS_GENERIC_H_
+
+/**
+ * @file rte_ring_rts_generic.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * The main idea remains the same as for our original MP/MC synchronization
+ * mechanism.
+ * The main difference is that tail value is increased not
+ * by every thread that finished enqueue/dequeue,
+ * but only by the last one doing enqueue/dequeue.
+ * That allows threads to skip spinning on tail value,
+ * leaving actual tail value change to last thread in the update queue.
+ * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+ * one for head update, second for tail update.
+ * As a gain it allows thread to avoid spinning/waiting on tail value.
+ * In comparision original MP/MC algorithm requires one 32-bit CAS
+ * for head update and waiting/spinning on tail value.
+ *
+ * Brief outline:
+ *  - introduce refcnt for both head and tail.
+ *  - increment head.refcnt for each head.value update
+ *  - write head:value and head:refcnt atomically (64-bit CAS)
+ *  - move tail.value ahead only when tail.refcnt + 1 == head.refcnt
+ *  - increment tail.refcnt when each enqueue/dequeue op finishes
+ *    (no matter is tail:value going to change or not)
+ *  - write tail.value and tail.recnt atomically (64-bit CAS)
+ *
+ * To avoid head/tail starvation:
+ *  - limit max allowed distance between head and tail value (HTD_MAX).
+ *    I.E. thread is allowed to proceed with changing head.value,
+ *    only when:  head.value - tail.value <= HTD_MAX
+ * HTD_MAX is an optional parameter.
+ * With HTD_MAX == 0 we'll have ring with fully synchronized
+ * head/tail (like HTS).
+ * With HTD_MAX == UINT32_MAX - no limitation.
+ * By default HTD_MAX == ring.capacity / 8.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union rte_ring_ht_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	do {
+		ot.raw = ht->tail.raw;
+		rte_smp_rmb();
+
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = rte_atomic64_read((rte_atomic64_t *)
+			(uintptr_t)&ht->head.raw);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (rte_atomic64_cmpset(&ht->tail.raw, ot.raw, nt.raw) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union rte_ring_ht_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+	h->raw = ht->head.raw;
+	rte_smp_rmb();
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = ht->head.raw;
+		rte_smp_rmb();
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* read prod head (may spin on prod tail) */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			return 0;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (rte_atomic64_cmpset(&r->rts_prod.head.raw,
+			oh.raw, nh.raw) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* read cons head (may spin on cons tail) */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			return 0;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (rte_atomic64_cmpset(&r->rts_cons.head.raw,
+			oh.raw, nh.raw) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_GENERIC_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [RFC 4/6] test/ring: add contention stress test for RTS ring
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
                   ` (2 preceding siblings ...)
  2020-02-24 11:35 ` [dpdk-dev] [RFC 3/6] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-02-24 11:35 ` Konstantin Ananyev
  2020-02-24 11:35 ` [dpdk-dev] [RFC 5/6] ring: introduce HTS ring mode Konstantin Ananyev
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-02-24 11:35 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, Konstantin Ananyev

Introduce new test case to test RTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_rts_stress.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 30 insertions(+)
 create mode 100644 app/test/test_ring_rts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 4f586d95f..d22b9f702 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 84dde28ad..fa4fb4b51 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_perf.c',
+	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_rts_stress.c b/app/test/test_ring_rts_stress.c
new file mode 100644
index 000000000..525d222b2
--- /dev/null
+++ b/app/test/test_ring_rts_stress.c
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_rts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_rts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_rts_autotest, test_ring_stress);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [RFC 5/6] ring: introduce HTS ring mode
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
                   ` (3 preceding siblings ...)
  2020-02-24 11:35 ` [dpdk-dev] [RFC 4/6] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-02-24 11:35 ` Konstantin Ananyev
  2020-03-25 20:44   ` Honnappa Nagarahalli
  2020-02-24 11:35 ` [dpdk-dev] [RFC 6/6] test/ring: add contention stress test for HTS ring Konstantin Ananyev
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-02-24 11:35 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, Konstantin Ananyev

Introduce head/tail sync mode for MT ring synchronization.
In that mode enqueue/dequeue operation is fully serialized:
only one thread at a time is allowed to perform given op.
Suppose to reduce stall times in case when ring is used on
overcommitted cpus (multiple active threads on the same cpu).
As another enhancement provide ability to split enqueue/dequeue
operation into two phases:
  - enqueue/dequeue start
  - enqueue/dequeue finish
That allows user to inspect objects in the ring without removing
them from it (aka MT safe peek).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   1 +
 lib/librte_ring/meson.build            |   1 +
 lib/librte_ring/rte_ring.c             |  15 +-
 lib/librte_ring/rte_ring.h             | 259 ++++++++++++++++++++++++-
 lib/librte_ring/rte_ring_hts_generic.h | 228 ++++++++++++++++++++++
 5 files changed, 500 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_hts_generic.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 4f90344f4..0c7f8f918 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
+					rte_ring_hts_generic.h \
 					rte_ring_rts_generic.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index dc8d7dbea..5aa673199 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,6 +6,7 @@ headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
+		'rte_ring_hts_generic.h',
 		'rte_ring_rts_generic.h')
 
 # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 1ce0af3e5..d3b948667 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -102,9 +102,9 @@ static int
 get_sync_type(uint32_t flags, uint32_t *prod_st, uint32_t *cons_st)
 {
 	static const uint32_t prod_st_flags =
-		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ | RING_F_MP_HTS_ENQ);
 	static const uint32_t cons_st_flags =
-		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ | RING_F_MC_HTS_DEQ);
 
 	switch (flags & prod_st_flags) {
 	case 0:
@@ -116,6 +116,9 @@ get_sync_type(uint32_t flags, uint32_t *prod_st, uint32_t *cons_st)
 	case RING_F_MP_RTS_ENQ:
 		*prod_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MP_HTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -130,6 +133,9 @@ get_sync_type(uint32_t flags, uint32_t *prod_st, uint32_t *cons_st)
 	case RING_F_MC_RTS_DEQ:
 		*cons_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MC_HTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -151,6 +157,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_hts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
+
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
 		offsetof(struct rte_ring_rts_headtail, sync_type));
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index a130aeb9d..52edcea11 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -66,11 +66,11 @@ enum {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
 	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
 };
 
 /**
- * structure to hold a pair of head/tail values and other metadata.
- * used by RTE_RING_SYNC_MT, RTE_RING_SYNC_ST sync types.
+ * Structure to hold a pair of head/tail values and other metadata.
  * Depending on sync_type format of that structure might differ
  * depending on the sync mechanism selelcted, but offsets for
  * *sync_type* and *tail* values should always remain the same.
@@ -96,6 +96,19 @@ struct rte_ring_rts_headtail {
 	volatile union rte_ring_ht_poscnt head;
 };
 
+union rte_ring_ht_pos {
+	uint64_t raw;
+	struct {
+		uint32_t tail; /**< tail position */
+		uint32_t head; /**< head position */
+	} pos;
+};
+
+struct rte_ring_hts_headtail {
+	uint32_t sync_type; /**< sync type of prod/cons */
+	volatile union rte_ring_ht_pos ht __rte_aligned(8);
+};
+
 /**
  * An RTE ring structure.
  *
@@ -126,6 +139,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail prod;
+		struct rte_ring_hts_headtail hts_prod;
 		struct rte_ring_rts_headtail rts_prod;
 	}  __rte_cache_aligned;
 
@@ -135,6 +149,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail cons;
+		struct rte_ring_hts_headtail hts_cons;
 		struct rte_ring_rts_headtail rts_cons;
 	}  __rte_cache_aligned;
 
@@ -157,6 +172,9 @@ struct rte_ring {
 #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
 #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
 
+#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP HTS". */
+#define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC HTS". */
+
 #define __IS_SP RTE_RING_SYNC_ST
 #define __IS_MP RTE_RING_SYNC_MT
 #define __IS_SC RTE_RING_SYNC_ST
@@ -513,6 +531,82 @@ __rte_ring_do_rts_dequeue(struct rte_ring *r, void **obj_table,
 	return n;
 }
 
+#include <rte_ring_hts_generic.h>
+
+/**
+ * @internal Start to enqueue several objects on the HTS ring.
+ * Note that user has to call appropriate enqueue_finish()
+ * to complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue_start(struct rte_ring *r, void * const *obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0)
+		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Start to dequeue several objects from the HTS ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, enum rte_ring_queue_behavior behavior,
+		unsigned int *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0)
+		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
 /**
  * Enqueue several objects on the ring (multi-producers safe).
  *
@@ -585,6 +679,47 @@ rte_ring_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			free_space);
 }
 
+/**
+ * Start to enqueue several objects on the HTS ring (multi-producers safe).
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_hts_enqueue_bulk_start(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_start(r, obj_table, n,
+		RTE_RING_QUEUE_FIXED, free_space);
+}
+
+static __rte_always_inline void
+rte_ring_hts_enqueue_finish(struct rte_ring *r, unsigned int n)
+{
+	__rte_ring_hts_update_tail(&r->hts_prod, n, 1);
+}
+
+static __rte_always_inline unsigned int
+rte_ring_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	n = rte_ring_hts_enqueue_bulk_start(r, obj_table, n, free_space);
+	if (n != 0)
+		rte_ring_hts_enqueue_finish(r, n);
+	return n;
+}
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -615,6 +750,8 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_rts_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_hts_enqueue_bulk(r, obj_table, n, free_space);
 	}
 
 	/* valid ring should never reach this point */
@@ -753,6 +890,47 @@ rte_ring_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
 			available);
 }
 
+/**
+ * Start to dequeue several objects from an HTS ring (multi-consumers safe).
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_hts_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_start(r, obj_table, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+static __rte_always_inline void
+rte_ring_hts_dequeue_finish(struct rte_ring *r, unsigned int n)
+{
+	__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+}
+
+static __rte_always_inline unsigned int
+rte_ring_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	n = rte_ring_hts_dequeue_bulk_start(r, obj_table, n, available);
+	if (n != 0)
+		rte_ring_hts_dequeue_finish(r, n);
+	return n;
+}
+
 /**
  * Dequeue several objects from a ring.
  *
@@ -783,6 +961,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_rts_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_hts_dequeue_bulk(r, obj_table, n, available);
 	}
 
 	/* valid ring should never reach this point */
@@ -1111,6 +1291,41 @@ rte_ring_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_QUEUE_VARIABLE, free_space);
 }
 
+/**
+ * Start to enqueue several objects on the HTS ring (multi-producers safe).
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_hts_enqueue_burst_start(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_start(r, obj_table, n,
+		RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+static __rte_always_inline unsigned int
+rte_ring_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	n = rte_ring_hts_enqueue_burst_start(r, obj_table, n, free_space);
+	if (n != 0)
+		rte_ring_hts_enqueue_finish(r, n);
+	return n;
+}
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -1141,6 +1356,8 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_rts_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_hts_enqueue_burst(r, obj_table, n, free_space);
 	}
 
 	/* valid ring should never reach this point */
@@ -1225,6 +1442,42 @@ rte_ring_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
 	return __rte_ring_do_rts_dequeue(r, obj_table, n,
 			RTE_RING_QUEUE_VARIABLE, available);
 }
+
+/**
+ * Start to dequeue several objects from an HTS ring (multi-consumers safe).
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+static __rte_always_inline unsigned int
+rte_ring_hts_dequeue_burst_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_start(r, obj_table, n,
+		RTE_RING_QUEUE_VARIABLE, available);
+}
+
+static __rte_always_inline unsigned int
+rte_ring_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	n = rte_ring_hts_dequeue_burst_start(r, obj_table, n, available);
+	if (n != 0)
+		rte_ring_hts_dequeue_finish(r, n);
+	return n;
+}
+
 /**
  * Dequeue multiple objects from a ring up to a maximum number.
  *
@@ -1255,6 +1508,8 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_rts_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_hts_dequeue_burst(r, obj_table, n, available);
 	}
 
 	/* valid ring should never reach this point */
diff --git a/lib/librte_ring/rte_ring_hts_generic.h b/lib/librte_ring/rte_ring_hts_generic.h
new file mode 100644
index 000000000..7e447e30b
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_generic.h
@@ -0,0 +1,228 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_GENERIC_H_
+#define _RTE_RING_HTS_GENERIC_H_
+
+/**
+ * @file rte_ring_hts_generic.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * In that mode enqueue/dequeue operation is fully serialized:
+ * only one thread at a time is allowed to perform given op.
+ * This is achieved by thread is allowed to proceed with changing head.value
+ * only when head.value == tail.value.
+ * Both head and tail values are updated atomically (as one 64-bit value).
+ * As another enhancement that provides ability to split enqueue/dequeue
+ * operation into two phases:
+ * - enqueue/dequeue start
+ * - enqueue/dequeue finish
+ * That allows user to inspect objects in the ring without removing
+ * them from it (aka MT safe peek).
+ * As an example:
+ * // read 1 elem from the ring:
+ * n = rte_ring_hts_dequeue_bulk_start(ring, &obj, 1, NULL);
+ * if (n != 0) {
+ *    //examined object
+ *    if (object_examine(obj) == KEEP)
+ *       //decided to keep it in the ring.
+ *       rte_ring_hts_dequeue_finish(ring, 0);
+ *    else
+ *       //decided to remove it in the ring.
+ *       rte_ring_hts_dequeue_finish(ring, n);
+ * }
+ * Note that between _start_ and _finish_ the ring is sort of locked -
+ * none other thread can proceed with enqueue(/dequeue) operation till
+ * _finish_ will complete.
+ */
+
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	uint32_t n;
+	union rte_ring_ht_pos p;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
+
+	n = p.pos.head - p.pos.tail;
+	RTE_ASSERT(n >= num);
+	RTE_SET_USED(n);
+
+	p.pos.head = p.pos.tail + num;
+	p.pos.tail = p.pos.head;
+
+	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union rte_ring_ht_pos *p)
+{
+	p->raw = rte_atomic64_read((rte_atomic64_t *)
+			(uintptr_t)&ht->ht.raw);
+
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = rte_atomic64_read((rte_atomic64_t *)
+				(uintptr_t)&ht->ht.raw);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			return 0;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
+			op.raw, np.raw) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			return 0;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (rte_atomic64_cmpset(&r->hts_cons.ht.raw,
+			op.raw, np.raw) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_GENERIC_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [RFC 6/6] test/ring: add contention stress test for HTS ring
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
                   ` (4 preceding siblings ...)
  2020-02-24 11:35 ` [dpdk-dev] [RFC 5/6] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-02-24 11:35 ` Konstantin Ananyev
  2020-02-24 16:59 ` [dpdk-dev] [RFC 0/6] New sync modes for ring Stephen Hemminger
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-02-24 11:35 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, Konstantin Ananyev

Introduce new test case to test HTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_hts_stress.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 30 insertions(+)
 create mode 100644 app/test/test_ring_hts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index d22b9f702..ff151cd55 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,6 +77,7 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
diff --git a/app/test/meson.build b/app/test/meson.build
index fa4fb4b51..7e58fa999 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,6 +100,7 @@ test_sources = files('commands.c',
 	'test_rib.c',
 	'test_rib6.c',
 	'test_ring.c',
+	'test_ring_hts_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_hts_stress.c b/app/test/test_ring_hts_stress.c
new file mode 100644
index 000000000..b7f2d21fc
--- /dev/null
+++ b/app/test/test_ring_hts_stress.c
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_hts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_hts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_hts_autotest, test_ring_stress);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
                   ` (5 preceding siblings ...)
  2020-02-24 11:35 ` [dpdk-dev] [RFC 6/6] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-02-24 16:59 ` Stephen Hemminger
  2020-02-24 17:59   ` Jerin Jacob
  2020-03-25 20:43 ` Honnappa Nagarahalli
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
  8 siblings, 1 reply; 146+ messages in thread
From: Stephen Hemminger @ 2020-02-24 16:59 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, olivier.matz

On Mon, 24 Feb 2020 11:35:09 +0000
Konstantin Ananyev <konstantin.ananyev@intel.com> wrote:

> Upfront note - that RFC is not a complete patch.
> It introduces an ABI breakage, plus it doesn't update ring_elem
> code properly, etc.
> I plan to deal with all these things in later versions.
> Right now I seek an initial feedback about proposed ideas.
> Would also ask people to repeat performance tests (see below)
> on their platforms to confirm the impact.
> 
> More and more customers use(/try to use) DPDK based apps within
> overcommitted systems (multiple acttive threads over same pysical cores):
> VM, container deployments, etc.
> One quite common problem they hit: Lock-Holder-Preemption with rte_ring.
> LHP is quite a common problem for spin-based sync primitives
> (spin-locks, etc.) on overcommitted systems.
> The situation gets much worse when some sort of
> fair-locking technique is used (ticket-lock, etc.).
> As now not only lock-owner but also lock-waiters scheduling
> order matters a lot.
> This is a well-known problem for kernel within VMs:
> http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> The problem with rte_ring is that while head accusion is sort of
> un-fair locking, waiting on tail is very similar to ticket lock schema -
> tail has to be updated in particular order.
> That makes current rte_ring implementation to perform
> really pure on some overcommited scenarios.

Rather than reform rte_ring to fit this scenario, it would make
more sense to me to introduce another primitive. The current lockless
ring performs very well for the isolated thread model that DPDK
was built around. This looks like a case of customers violating
the usage model of the DPDK and then being surprised at the fallout.


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-24 16:59 ` [dpdk-dev] [RFC 0/6] New sync modes for ring Stephen Hemminger
@ 2020-02-24 17:59   ` Jerin Jacob
  2020-02-24 19:35     ` Stephen Hemminger
  2020-02-25  0:58     ` Honnappa Nagarahalli
  0 siblings, 2 replies; 146+ messages in thread
From: Jerin Jacob @ 2020-02-24 17:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Konstantin Ananyev, dpdk-dev, Olivier Matz

On Mon, Feb 24, 2020 at 10:29 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 24 Feb 2020 11:35:09 +0000
> Konstantin Ananyev <konstantin.ananyev@intel.com> wrote:
>
> > Upfront note - that RFC is not a complete patch.
> > It introduces an ABI breakage, plus it doesn't update ring_elem
> > code properly, etc.
> > I plan to deal with all these things in later versions.
> > Right now I seek an initial feedback about proposed ideas.
> > Would also ask people to repeat performance tests (see below)
> > on their platforms to confirm the impact.
> >
> > More and more customers use(/try to use) DPDK based apps within
> > overcommitted systems (multiple acttive threads over same pysical cores):
> > VM, container deployments, etc.
> > One quite common problem they hit: Lock-Holder-Preemption with rte_ring.
> > LHP is quite a common problem for spin-based sync primitives
> > (spin-locks, etc.) on overcommitted systems.
> > The situation gets much worse when some sort of
> > fair-locking technique is used (ticket-lock, etc.).
> > As now not only lock-owner but also lock-waiters scheduling
> > order matters a lot.
> > This is a well-known problem for kernel within VMs:
> > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> > The problem with rte_ring is that while head accusion is sort of
> > un-fair locking, waiting on tail is very similar to ticket lock schema -
> > tail has to be updated in particular order.
> > That makes current rte_ring implementation to perform
> > really pure on some overcommited scenarios.
>
> Rather than reform rte_ring to fit this scenario, it would make
> more sense to me to introduce another primitive. The current lockless
> ring performs very well for the isolated thread model that DPDK
> was built around. This looks like a case of customers violating
> the usage model of the DPDK and then being surprised at the fallout.

I agree with Stephen here.

I think, adding more runtime check in the enqueue() and dequeue() will
have a bad effect on the low-end cores too.
But I agree with the problem statement that in the virtualization use
case, It may be possible to have N virtual cores runs on a physical
core.

IMO, The best solution would be keeping the ring API same and have a
different flavor in "compile-time". Something like
liburcu did for accommodating different flavors.

i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
application can simply include ONE header file in a C file based on
the flavor.
If need both at runtime. Need to have function pointer or so in the
application and define the function in different c file by including
the approaite flavor in C file.

#include <urcu-qsbr.h> /* QSBR RCU flavor */
#include <urcu-bp.h> /* Bulletproof RCU flavor */













>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-24 17:59   ` Jerin Jacob
@ 2020-02-24 19:35     ` Stephen Hemminger
  2020-02-24 20:52       ` Honnappa Nagarahalli
  2020-02-25 13:41       ` Ananyev, Konstantin
  2020-02-25  0:58     ` Honnappa Nagarahalli
  1 sibling, 2 replies; 146+ messages in thread
From: Stephen Hemminger @ 2020-02-24 19:35 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Konstantin Ananyev, dpdk-dev, Olivier Matz

On Mon, 24 Feb 2020 23:29:57 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:

> On Mon, Feb 24, 2020 at 10:29 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Mon, 24 Feb 2020 11:35:09 +0000
> > Konstantin Ananyev <konstantin.ananyev@intel.com> wrote:
> >  
> > > Upfront note - that RFC is not a complete patch.
> > > It introduces an ABI breakage, plus it doesn't update ring_elem
> > > code properly, etc.
> > > I plan to deal with all these things in later versions.
> > > Right now I seek an initial feedback about proposed ideas.
> > > Would also ask people to repeat performance tests (see below)
> > > on their platforms to confirm the impact.
> > >
> > > More and more customers use(/try to use) DPDK based apps within
> > > overcommitted systems (multiple acttive threads over same pysical cores):
> > > VM, container deployments, etc.
> > > One quite common problem they hit: Lock-Holder-Preemption with rte_ring.
> > > LHP is quite a common problem for spin-based sync primitives
> > > (spin-locks, etc.) on overcommitted systems.
> > > The situation gets much worse when some sort of
> > > fair-locking technique is used (ticket-lock, etc.).
> > > As now not only lock-owner but also lock-waiters scheduling
> > > order matters a lot.
> > > This is a well-known problem for kernel within VMs:
> > > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> > > The problem with rte_ring is that while head accusion is sort of
> > > un-fair locking, waiting on tail is very similar to ticket lock schema -
> > > tail has to be updated in particular order.
> > > That makes current rte_ring implementation to perform
> > > really pure on some overcommited scenarios.  
> >
> > Rather than reform rte_ring to fit this scenario, it would make
> > more sense to me to introduce another primitive. The current lockless
> > ring performs very well for the isolated thread model that DPDK
> > was built around. This looks like a case of customers violating
> > the usage model of the DPDK and then being surprised at the fallout.  
> 
> I agree with Stephen here.
> 
> I think, adding more runtime check in the enqueue() and dequeue() will
> have a bad effect on the low-end cores too.
> But I agree with the problem statement that in the virtualization use
> case, It may be possible to have N virtual cores runs on a physical
> core.
> 
> IMO, The best solution would be keeping the ring API same and have a
> different flavor in "compile-time". Something like
> liburcu did for accommodating different flavors.
> 
> i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
> application can simply include ONE header file in a C file based on
> the flavor.
> If need both at runtime. Need to have function pointer or so in the
> application and define the function in different c file by including
> the approaite flavor in C file.

This would also be a good time to consider the tradeoffs of the
heavy use of inlining that is done in rte_ring vs the impact that
has on API/ABI stability.



^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-24 19:35     ` Stephen Hemminger
@ 2020-02-24 20:52       ` Honnappa Nagarahalli
  2020-02-25 11:45         ` Ananyev, Konstantin
  2020-02-25 13:41       ` Ananyev, Konstantin
  1 sibling, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-02-24 20:52 UTC (permalink / raw)
  To: Stephen Hemminger, Jerin Jacob
  Cc: Konstantin Ananyev, dpdk-dev, Olivier Matz, Honnappa Nagarahalli, nd, nd



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger
> Sent: Monday, February 24, 2020 1:35 PM
> To: Jerin Jacob <jerinjacobk@gmail.com>
> Cc: Konstantin Ananyev <konstantin.ananyev@intel.com>; dpdk-dev
> <dev@dpdk.org>; Olivier Matz <olivier.matz@6wind.com>
> Subject: Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
> 
> On Mon, 24 Feb 2020 23:29:57 +0530
> Jerin Jacob <jerinjacobk@gmail.com> wrote:
> 
> > On Mon, Feb 24, 2020 at 10:29 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > On Mon, 24 Feb 2020 11:35:09 +0000
> > > Konstantin Ananyev <konstantin.ananyev@intel.com> wrote:
> > >
> > > > Upfront note - that RFC is not a complete patch.
> > > > It introduces an ABI breakage, plus it doesn't update ring_elem
> > > > code properly, etc.
> > > > I plan to deal with all these things in later versions.
> > > > Right now I seek an initial feedback about proposed ideas.
> > > > Would also ask people to repeat performance tests (see below) on
> > > > their platforms to confirm the impact.
> > > >
> > > > More and more customers use(/try to use) DPDK based apps within
> > > > overcommitted systems (multiple acttive threads over same pysical
> cores):
> > > > VM, container deployments, etc.
> > > > One quite common problem they hit: Lock-Holder-Preemption with
> rte_ring.
> > > > LHP is quite a common problem for spin-based sync primitives
> > > > (spin-locks, etc.) on overcommitted systems.
> > > > The situation gets much worse when some sort of fair-locking
> > > > technique is used (ticket-lock, etc.).
> > > > As now not only lock-owner but also lock-waiters scheduling order
> > > > matters a lot.
> > > > This is a well-known problem for kernel within VMs:
> > > > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > > > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> > > > The problem with rte_ring is that while head accusion is sort of
> > > > un-fair locking, waiting on tail is very similar to ticket lock
> > > > schema - tail has to be updated in particular order.
> > > > That makes current rte_ring implementation to perform really pure
> > > > on some overcommited scenarios.
> > >
> > > Rather than reform rte_ring to fit this scenario, it would make more
> > > sense to me to introduce another primitive. The current lockless
> > > ring performs very well for the isolated thread model that DPDK was
> > > built around. This looks like a case of customers violating the
> > > usage model of the DPDK and then being surprised at the fallout.
> >
> > I agree with Stephen here.
> >
> > I think, adding more runtime check in the enqueue() and dequeue() will
> > have a bad effect on the low-end cores too.
> > But I agree with the problem statement that in the virtualization use
> > case, It may be possible to have N virtual cores runs on a physical
> > core.
> >
> > IMO, The best solution would be keeping the ring API same and have a
> > different flavor in "compile-time". Something like liburcu did for
> > accommodating different flavors.
> >
> > i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
> > application can simply include ONE header file in a C file based on
> > the flavor.
> > If need both at runtime. Need to have function pointer or so in the
> > application and define the function in different c file by including
> > the approaite flavor in C file.
> 
> This would also be a good time to consider the tradeoffs of the heavy use of
> inlining that is done in rte_ring vs the impact that has on API/ABI stability.
> 
I was working on few requirements in rte_ring library for RCU defer APIs. RFC is at https://patchwork.dpdk.org/cover/66020/. 

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-24 17:59   ` Jerin Jacob
  2020-02-24 19:35     ` Stephen Hemminger
@ 2020-02-25  0:58     ` Honnappa Nagarahalli
  2020-02-25 15:14       ` Ananyev, Konstantin
  1 sibling, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-02-25  0:58 UTC (permalink / raw)
  To: Jerin Jacob, Stephen Hemminger
  Cc: Konstantin Ananyev, dpdk-dev, Olivier Matz, Honnappa Nagarahalli, nd, nd

<snip>

> 
> On Mon, Feb 24, 2020 at 10:29 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Mon, 24 Feb 2020 11:35:09 +0000
> > Konstantin Ananyev <konstantin.ananyev@intel.com> wrote:
> >
> > > Upfront note - that RFC is not a complete patch.
> > > It introduces an ABI breakage, plus it doesn't update ring_elem code
> > > properly, etc.
> > > I plan to deal with all these things in later versions.
> > > Right now I seek an initial feedback about proposed ideas.
> > > Would also ask people to repeat performance tests (see below) on
> > > their platforms to confirm the impact.
> > >
> > > More and more customers use(/try to use) DPDK based apps within
> > > overcommitted systems (multiple acttive threads over same pysical cores):
> > > VM, container deployments, etc.
> > > One quite common problem they hit: Lock-Holder-Preemption with
> rte_ring.
> > > LHP is quite a common problem for spin-based sync primitives
> > > (spin-locks, etc.) on overcommitted systems.
> > > The situation gets much worse when some sort of fair-locking
> > > technique is used (ticket-lock, etc.).
> > > As now not only lock-owner but also lock-waiters scheduling order
> > > matters a lot.
> > > This is a well-known problem for kernel within VMs:
> > > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
These slides seem to indicate that the problems are mitigated through the Hypervisor configuration. Do we still need to address the issues?

> > > The problem with rte_ring is that while head accusion is sort of
> > > un-fair locking, waiting on tail is very similar to ticket lock
> > > schema - tail has to be updated in particular order.
> > > That makes current rte_ring implementation to perform really pure on
> > > some overcommited scenarios.
> >
> > Rather than reform rte_ring to fit this scenario, it would make more
> > sense to me to introduce another primitive. The current lockless ring
> > performs very well for the isolated thread model that DPDK was built
> > around. This looks like a case of customers violating the usage model
> > of the DPDK and then being surprised at the fallout.
> 
> I agree with Stephen here.
> 
> I think, adding more runtime check in the enqueue() and dequeue() will have a
> bad effect on the low-end cores too.
> But I agree with the problem statement that in the virtualization use case, It
> may be possible to have N virtual cores runs on a physical core.
It is hard to imagine that there are data plane applications deployed in such environments. Wouldn't this affect the performance terribly?

> 
> IMO, The best solution would be keeping the ring API same and have a
> different flavor in "compile-time". Something like liburcu did for
> accommodating different flavors.
> 
> i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The application
> can simply include ONE header file in a C file based on the flavor.
> If need both at runtime. Need to have function pointer or so in the application
> and define the function in different c file by including the approaite flavor in C
> file.
> 
> #include <urcu-qsbr.h> /* QSBR RCU flavor */ #include <urcu-bp.h> /*
> Bulletproof RCU flavor */
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> >

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-24 20:52       ` Honnappa Nagarahalli
@ 2020-02-25 11:45         ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-02-25 11:45 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Stephen Hemminger, Jerin Jacob
  Cc: dpdk-dev, Olivier Matz, nd, nd


> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger
> > Sent: Monday, February 24, 2020 1:35 PM
> > To: Jerin Jacob <jerinjacobk@gmail.com>
> > Cc: Konstantin Ananyev <konstantin.ananyev@intel.com>; dpdk-dev
> > <dev@dpdk.org>; Olivier Matz <olivier.matz@6wind.com>
> > Subject: Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
> >
> > On Mon, 24 Feb 2020 23:29:57 +0530
> > Jerin Jacob <jerinjacobk@gmail.com> wrote:
> >
> > > On Mon, Feb 24, 2020 at 10:29 PM Stephen Hemminger
> > > <stephen@networkplumber.org> wrote:
> > > >
> > > > On Mon, 24 Feb 2020 11:35:09 +0000
> > > > Konstantin Ananyev <konstantin.ananyev@intel.com> wrote:
> > > >
> > > > > Upfront note - that RFC is not a complete patch.
> > > > > It introduces an ABI breakage, plus it doesn't update ring_elem
> > > > > code properly, etc.
> > > > > I plan to deal with all these things in later versions.
> > > > > Right now I seek an initial feedback about proposed ideas.
> > > > > Would also ask people to repeat performance tests (see below) on
> > > > > their platforms to confirm the impact.
> > > > >
> > > > > More and more customers use(/try to use) DPDK based apps within
> > > > > overcommitted systems (multiple acttive threads over same pysical
> > cores):
> > > > > VM, container deployments, etc.
> > > > > One quite common problem they hit: Lock-Holder-Preemption with
> > rte_ring.
> > > > > LHP is quite a common problem for spin-based sync primitives
> > > > > (spin-locks, etc.) on overcommitted systems.
> > > > > The situation gets much worse when some sort of fair-locking
> > > > > technique is used (ticket-lock, etc.).
> > > > > As now not only lock-owner but also lock-waiters scheduling order
> > > > > matters a lot.
> > > > > This is a well-known problem for kernel within VMs:
> > > > > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > > > > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> > > > > The problem with rte_ring is that while head accusion is sort of
> > > > > un-fair locking, waiting on tail is very similar to ticket lock
> > > > > schema - tail has to be updated in particular order.
> > > > > That makes current rte_ring implementation to perform really pure
> > > > > on some overcommited scenarios.
> > > >
> > > > Rather than reform rte_ring to fit this scenario, it would make more
> > > > sense to me to introduce another primitive. The current lockless
> > > > ring performs very well for the isolated thread model that DPDK was
> > > > built around. This looks like a case of customers violating the
> > > > usage model of the DPDK and then being surprised at the fallout.
> > >
> > > I agree with Stephen here.
> > >
> > > I think, adding more runtime check in the enqueue() and dequeue() will
> > > have a bad effect on the low-end cores too.
> > > But I agree with the problem statement that in the virtualization use
> > > case, It may be possible to have N virtual cores runs on a physical
> > > core.
> > >
> > > IMO, The best solution would be keeping the ring API same and have a
> > > different flavor in "compile-time". Something like liburcu did for
> > > accommodating different flavors.
> > >
> > > i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
> > > application can simply include ONE header file in a C file based on
> > > the flavor.
> > > If need both at runtime. Need to have function pointer or so in the
> > > application and define the function in different c file by including
> > > the approaite flavor in C file.
> >
> > This would also be a good time to consider the tradeoffs of the heavy use of
> > inlining that is done in rte_ring vs the impact that has on API/ABI stability.
> >
> I was working on few requirements in rte_ring library for RCU defer APIs. RFC is at https://patchwork.dpdk.org/cover/66020/.

Yep, noticed your patch, seems we sort of collided here.
As I understand you patch aims to provide functionality similar to my HTS one.
Will try to look at yours one in next fee days, hopefully we can end-up with
some common denominator.
Konstantin




^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-24 19:35     ` Stephen Hemminger
  2020-02-24 20:52       ` Honnappa Nagarahalli
@ 2020-02-25 13:41       ` Ananyev, Konstantin
  2020-02-26 16:53         ` Morten Brørup
  2020-02-27 10:31         ` Jerin Jacob
  1 sibling, 2 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-02-25 13:41 UTC (permalink / raw)
  To: Stephen Hemminger, Jerin Jacob; +Cc: dpdk-dev, Olivier Matz, drc

> > > > Upfront note - that RFC is not a complete patch.
> > > > It introduces an ABI breakage, plus it doesn't update ring_elem
> > > > code properly, etc.
> > > > I plan to deal with all these things in later versions.
> > > > Right now I seek an initial feedback about proposed ideas.
> > > > Would also ask people to repeat performance tests (see below)
> > > > on their platforms to confirm the impact.
> > > >
> > > > More and more customers use(/try to use) DPDK based apps within
> > > > overcommitted systems (multiple acttive threads over same pysical cores):
> > > > VM, container deployments, etc.
> > > > One quite common problem they hit: Lock-Holder-Preemption with rte_ring.
> > > > LHP is quite a common problem for spin-based sync primitives
> > > > (spin-locks, etc.) on overcommitted systems.
> > > > The situation gets much worse when some sort of
> > > > fair-locking technique is used (ticket-lock, etc.).
> > > > As now not only lock-owner but also lock-waiters scheduling
> > > > order matters a lot.
> > > > This is a well-known problem for kernel within VMs:
> > > > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > > > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> > > > The problem with rte_ring is that while head accusion is sort of
> > > > un-fair locking, waiting on tail is very similar to ticket lock schema -
> > > > tail has to be updated in particular order.
> > > > That makes current rte_ring implementation to perform
> > > > really pure on some overcommited scenarios.
> > >
> > > Rather than reform rte_ring to fit this scenario, it would make
> > > more sense to me to introduce another primitive. 

I don't see much advantages it will bring us.
As a disadvantages, for developers and maintainers - code duplication,
for end users - extra code churn and removed ability to mix and match
different sync modes in one ring.

> The current lockless
> > > ring performs very well for the isolated thread model that DPDK
> > > was built around. This looks like a case of customers violating
> > > the usage model of the DPDK and then being surprised at the fallout.

For customers using isolated thread model - nothing should change
(both in terms of API and performance).
Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
way (via flags and _init_), and MP/MC remains as default one.
From other side I don't see why we should ignore customers that want to use
their DPDK apps in different deployment scenarios.

> >
> > I agree with Stephen here.
> >
> > I think, adding more runtime check in the enqueue() and dequeue() will
> > have a bad effect on the low-end cores too.

We do have a run-time check in our current enqueue()/dequeue implementation.
In fact we support both modes: we have generic rte_ring_enqueue(/dequeue)_bulk(/burst)
where sync behaviour is determined at runtime by value of prod(/cons).single.
Or user can call  rte_ring_(mp/sp)_enqueue_* functions directly.
This RFC follows exactly the same paradigm:
rte_ring_enqueue(/dequeue)_bulk(/burst) kept generic and it's
behaviour is determined at runtime, by value of prod(/cons).sync_type.
Or user can call enqueue/dequeue with particular sync mode directly:
rte_ring_(mp/sp/rts/hts)_enqueue_(bulk/burst)*.
The only thing that changed:
 Format of prod/cons now could differ depending on mode selected at _init_.
 So you can't create a ring for let say SP mode and then in the middle of data-path
 change your mind and start using MP_RTS mode.
 For existing modes (SP/MP, SC/MC)  format remains the same and user can still
 use them interchangeably, though of course that is an error prone practice.   

> > But I agree with the problem statement that in the virtualization use
> > case, It may be possible to have N virtual cores runs on a physical
> > core.
> >
> > IMO, The best solution would be keeping the ring API same and have a
> > different flavor in "compile-time". Something like
> > liburcu did for accommodating different flavors.
> >
> > i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
> > application can simply include ONE header file in a C file based on
> > the flavor.

I don't think it is a flexible enough approach.
In one app user might need to have several rings with different sync modes.
Or even user might need a ring with different sync modes for enqueue/dequeue.

> > If need both at runtime. Need to have function pointer or so in the
> > application and define the function in different c file by including
> > the approaite flavor in C file.

Big issue with function pointers here would be DPDK MP model.
AFAIK,  rte_ring is quite popular mechanism for IPC between DPDK apps.
To support such model, we'll need to split rte_ring data into 'shared'
and 'private' and initialize private one for every process that is going to use it.
That sounds like a massive change, and I am not sure the required effort will worth it. 
BTW, if user just calls API functions without trying to access structure internals directly,
I don't think it would be a big difference for him what is inside:
indirect function call or inlined switch(...) {}.  

> This would also be a good time to consider the tradeoffs of the
> heavy use of inlining that is done in rte_ring vs the impact that
> has on API/ABI stability.

Yes, hiding rte_ring implementation inside .c would help a lot
in terms of ABI maintenance and would make our future life easier.
The question is what is the price for it in terms of performance,
and are we ready to pay it. Not to mention that it would cause
changes in many other libs/apps...
So I think it should be a subject for a separate discussion.
But, agree it would be good at least to measure the performance
impact of such change.
If I'll have some spare cycles, will give it a try.
Meanwhile, can I ask Jerin and other guys to repeat tests from this RFC
on their HW? Before continuing discussion would probably be good to know
does the suggested patch work as expected across different platforms.
Thanks
Konstantin   

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-25  0:58     ` Honnappa Nagarahalli
@ 2020-02-25 15:14       ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-02-25 15:14 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Jerin Jacob, Stephen Hemminger
  Cc: dpdk-dev, Olivier Matz, nd, nd, wang.yong19

> > > > Upfront note - that RFC is not a complete patch.
> > > > It introduces an ABI breakage, plus it doesn't update ring_elem code
> > > > properly, etc.
> > > > I plan to deal with all these things in later versions.
> > > > Right now I seek an initial feedback about proposed ideas.
> > > > Would also ask people to repeat performance tests (see below) on
> > > > their platforms to confirm the impact.
> > > >
> > > > More and more customers use(/try to use) DPDK based apps within
> > > > overcommitted systems (multiple acttive threads over same pysical cores):
> > > > VM, container deployments, etc.
> > > > One quite common problem they hit: Lock-Holder-Preemption with
> > rte_ring.
> > > > LHP is quite a common problem for spin-based sync primitives
> > > > (spin-locks, etc.) on overcommitted systems.
> > > > The situation gets much worse when some sort of fair-locking
> > > > technique is used (ticket-lock, etc.).
> > > > As now not only lock-owner but also lock-waiters scheduling order
> > > > matters a lot.
> > > > This is a well-known problem for kernel within VMs:
> > > > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > > > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> These slides seem to indicate that the problems are mitigated through the Hypervisor configuration. Do we still need to address the issues?

I am not really an expert here, but AFAIK current mitigations deal mostly with guest kernel:
linux implements PV version of spinlocks (unfair and/or based on hypercall availability),
hypervisor might make decision itself based on is guest in user/kernel mode,
plus on some special cpu instructions. 
We do spin in user-space mode.
Might be hypervisors became smarter these days, but so far,
I heard about few different customers that hit such problem.
As an example, NA DPDK summit presentation:
https://dpdkna2019.sched.com/event/WYBG/dpdk-containers-challenges-solutions-wang-yong-zte
page 16 (problem #4) describes same issue.

> 
> > > > The problem with rte_ring is that while head accusion is sort of
> > > > un-fair locking, waiting on tail is very similar to ticket lock
> > > > schema - tail has to be updated in particular order.
> > > > That makes current rte_ring implementation to perform really pure on
> > > > some overcommited scenarios.
> > >
> > > Rather than reform rte_ring to fit this scenario, it would make more
> > > sense to me to introduce another primitive. The current lockless ring
> > > performs very well for the isolated thread model that DPDK was built
> > > around. This looks like a case of customers violating the usage model
> > > of the DPDK and then being surprised at the fallout.
> >
> > I agree with Stephen here.
> >
> > I think, adding more runtime check in the enqueue() and dequeue() will have a
> > bad effect on the low-end cores too.
> > But I agree with the problem statement that in the virtualization use case, It
> > may be possible to have N virtual cores runs on a physical core.
> It is hard to imagine that there are data plane applications deployed in such environments. Wouldn't this affect the performance terribly?

It wouldn't reach same performance as isolated threads, 
but for some tasks it might be enough. 
AFAIK, one quite common scenario - few isolated threads(/processes) doing
actual IO and then spread packets over dozens(/hundreds) non-isolated
consumers. 

> 
> >
> > IMO, The best solution would be keeping the ring API same and have a
> > different flavor in "compile-time". Something like liburcu did for
> > accommodating different flavors.
> >
> > i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The application
> > can simply include ONE header file in a C file based on the flavor.
> > If need both at runtime. Need to have function pointer or so in the application
> > and define the function in different c file by including the approaite flavor in C
> > file.
> >
> > #include <urcu-qsbr.h> /* QSBR RCU flavor */ #include <urcu-bp.h> /*
> > Bulletproof RCU flavor */

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-25 13:41       ` Ananyev, Konstantin
@ 2020-02-26 16:53         ` Morten Brørup
  2020-02-27 10:31         ` Jerin Jacob
  1 sibling, 0 replies; 146+ messages in thread
From: Morten Brørup @ 2020-02-26 16:53 UTC (permalink / raw)
  To: Ananyev, Konstantin, Stephen Hemminger, Jerin Jacob
  Cc: dpdk-dev, Olivier Matz, drc

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> 

<snip>

> > > > > More and more customers use(/try to use) DPDK based apps within
> > > > > overcommitted systems (multiple acttive threads over same
> pysical cores):
> > > > > VM, container deployments, etc.

<snip>

> > > > > That makes current rte_ring implementation to perform
> > > > > really pure on some overcommited scenarios.
> > > >
> > > > Rather than reform rte_ring to fit this scenario, it would make
> > > > more sense to me to introduce another primitive.
> 
> I don't see much advantages it will bring us.
> As a disadvantages, for developers and maintainers - code duplication,
> for end users - extra code churn and removed ability to mix and match
> different sync modes in one ring.
> 

I strongly agree with Konstantin on this.

Please consider this discussion at a higher abstraction level:

As DPDK applications grow in number and popularity, people will deploy them in overcommitted systems. In this scenario, I consider it extremely unlikely that the user is able to dedicate N physical cores to N specific lcores of the M total lcores of the DPDK application. In the typical hypervisor scenario, the user will be able to assign a number of virtual CPUs to the virtual machine running the DPDK application, and these vCPUs will either be dedicated to the virtual machine or shared with other virtual machines.

DPDK is currently designed for dedicated CPUs only. If a user runs it on shared vCPUs, the user is violating an important DPDK precondition, and DPDK's behavior is undefined!

Adding the ability to run DPDK applications on shared vCPUs would be a great improvement.

I prefer that support for this is as painless as possible for the DPDK application developer.

Perhaps run-time detection during the EAL initialization could be a solution. The EAL would then configure relevant libraries to use the appropriate synchronization primitives.


Med venlig hilsen / kind regards
- Morten Brørup




^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-25 13:41       ` Ananyev, Konstantin
  2020-02-26 16:53         ` Morten Brørup
@ 2020-02-27 10:31         ` Jerin Jacob
  2020-02-28  0:17           ` David Christensen
  1 sibling, 1 reply; 146+ messages in thread
From: Jerin Jacob @ 2020-02-27 10:31 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Stephen Hemminger, dpdk-dev, Olivier Matz, drc

On Tue, Feb 25, 2020 at 7:11 PM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:

> We do have a run-time check in our current enqueue()/dequeue implementation.
> In fact we support both modes: we have generic rte_ring_enqueue(/dequeue)_bulk(/burst)
> where sync behaviour is determined at runtime by value of prod(/cons).single.
> Or user can call  rte_ring_(mp/sp)_enqueue_* functions directly.
> This RFC follows exactly the same paradigm:
> rte_ring_enqueue(/dequeue)_bulk(/burst) kept generic and it's
> behaviour is determined at runtime, by value of prod(/cons).sync_type.
> Or user can call enqueue/dequeue with particular sync mode directly:
> rte_ring_(mp/sp/rts/hts)_enqueue_(bulk/burst)*.
> The only thing that changed:
>  Format of prod/cons now could differ depending on mode selected at _init_.
>  So you can't create a ring for let say SP mode and then in the middle of data-path
>  change your mind and start using MP_RTS mode.
>  For existing modes (SP/MP, SC/MC)  format remains the same and user can still
>  use them interchangeably, though of course that is an error prone practice.

Makes sense.


>
> > > But I agree with the problem statement that in the virtualization use
> > > case, It may be possible to have N virtual cores runs on a physical
> > > core.
> > >
> > > IMO, The best solution would be keeping the ring API same and have a
> > > different flavor in "compile-time". Something like
> > > liburcu did for accommodating different flavors.
> > >
> > > i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
> > > application can simply include ONE header file in a C file based on
> > > the flavor.
>
> I don't think it is a flexible enough approach.
> In one app user might need to have several rings with different sync modes.
> Or even user might need a ring with different sync modes for enqueue/dequeue.

Ack.


> Yes, hiding rte_ring implementation inside .c would help a lot
> in terms of ABI maintenance and would make our future life easier.
> The question is what is the price for it in terms of performance,
> and are we ready to pay it. Not to mention that it would cause
> changes in many other libs/apps...
> So I think it should be a subject for a separate discussion.
> But, agree it would be good at least to measure the performance
> impact of such change.
> If I'll have some spare cycles, will give it a try.
> Meanwhile, can I ask Jerin and other guys to repeat tests from this RFC
> on their HW? Before continuing discussion would probably be good to know
> does the suggested patch work as expected across different platforms.


I tested on an arm64 HW. The former section is without the
patch(20.02) and later one with this patch.
I agree with Konstantin that getting more platform tests will be good
early so that we can focus on the approach
to avoid back and forth latter.


RTE>>ring_perf_autotest // without path

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 289.78
legacy APIs: MP/MC: single: 516.20

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 312.88
legacy APIs: SP/SC: burst (size: 32): 426.72
legacy APIs: MP/MC: burst (size: 8): 510.95
legacy APIs: MP/MC: burst (size: 32): 702.01

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 306.74
legacy APIs: SP/SC: bulk (size: 32): 411.56
legacy APIs: MP/MC: bulk (size: 8): 501.32
legacy APIs: MP/MC: bulk (size: 32): 693.07

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.00
legacy APIs: MP/MC: bulk (size: 8): 7.00

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 74.36
legacy APIs: MP/MC: bulk (size: 8): 110.18
legacy APIs: SP/SC: bulk (size: 32): 23.04
legacy APIs: MP/MC: bulk (size: 32): 32.29

### Testing using all slave nodes ##
Bulk enq/dequeue count on size 8
Core [8] count = 293741
Core [9] count = 293741
Total count (size: 8): 587482

Bulk enq/dequeue count on size 32
Core [8] count = 244909
Core [9] count = 244909
Total count (size: 32): 1077300

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 255.37
elem APIs: element size 16B: MP/MC: single: 456.68

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 291.99
elem APIs: element size 16B: SP/SC: burst (size: 32): 456.25
elem APIs: element size 16B: MP/MC: burst (size: 8): 497.77
elem APIs: element size 16B: MP/MC: burst (size: 32): 680.87

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 284.40
elem APIs: element size 16B: SP/SC: bulk (size: 32): 453.17
elem APIs: element size 16B: MP/MC: bulk (size: 8): 485.77
elem APIs: element size 16B: MP/MC: bulk (size: 32): 675.08

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 8.00
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.00

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 74.45
elem APIs: element size 16B: MP/MC: bulk (size: 8): 105.91
elem APIs: element size 16B: SP/SC: bulk (size: 32): 22.92
elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.55

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [8] count = 308724
Core [9] count = 308723
Total count (size: 8): 617447

Bulk enq/dequeue count on size 32
Core [8] count = 214269
Core [9] count = 214269
Total count (size: 32): 1045985

RTE>>ring_perf_autotest // with patch

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 289.78
legacy APIs: MP/MC: single: 475.76

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 323.91
legacy APIs: SP/SC: burst (size: 32): 424.60
legacy APIs: MP/MC: burst (size: 8): 523.00
legacy APIs: MP/MC: burst (size: 32): 717.09

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 317.74
legacy APIs: SP/SC: bulk (size: 32): 413.57
legacy APIs: MP/MC: bulk (size: 8): 512.89
legacy APIs: MP/MC: bulk (size: 32): 712.45

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.00
legacy APIs: MP/MC: bulk (size: 8): 7.00

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 74.82
legacy APIs: MP/MC: bulk (size: 8): 96.45
legacy APIs: SP/SC: bulk (size: 32): 22.97
legacy APIs: MP/MC: bulk (size: 32): 32.52

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [8] count = 283928
Core [9] count = 283927
Total count (size: 8): 567855

Bulk enq/dequeue count on size 32
Core [8] count = 223916
Core [9] count = 223915
Total count (size: 32): 1015686

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 267.65
elem APIs: element size 16B: MP/MC: single: 439.06

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 302.44
elem APIs: element size 16B: SP/SC: burst (size: 32): 466.31
elem APIs: element size 16B: MP/MC: burst (size: 8): 502.51
elem APIs: element size 16B: MP/MC: burst (size: 32): 695.81

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 295.15
elem APIs: element size 16B: SP/SC: bulk (size: 32): 462.77
elem APIs: element size 16B: MP/MC: bulk (size: 8): 496.89
elem APIs: element size 16B: MP/MC: bulk (size: 32): 690.46

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.50
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.44

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 65.85
elem APIs: element size 16B: MP/MC: bulk (size: 8): 103.80
elem APIs: element size 16B: SP/SC: bulk (size: 32): 23.27
elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.17

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [8] count = 304223
Core [9] count = 304221
Total count (size: 8): 608444

Bulk enq/dequeue count on size 32
Core [8] count = 214856
Core [9] count = 214855
Total count (size: 32): 1038155
Test OK
RTE>>quit









> Thanks
> Konstantin

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-27 10:31         ` Jerin Jacob
@ 2020-02-28  0:17           ` David Christensen
  2020-03-20 16:45             ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: David Christensen @ 2020-02-28  0:17 UTC (permalink / raw)
  To: Jerin Jacob, Ananyev, Konstantin
  Cc: Stephen Hemminger, dpdk-dev, Olivier Matz

> On Tue, Feb 25, 2020 at 7:11 PM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> 
>> We do have a run-time check in our current enqueue()/dequeue implementation.
>> In fact we support both modes: we have generic rte_ring_enqueue(/dequeue)_bulk(/burst)
>> where sync behaviour is determined at runtime by value of prod(/cons).single.
>> Or user can call  rte_ring_(mp/sp)_enqueue_* functions directly.
>> This RFC follows exactly the same paradigm:
>> rte_ring_enqueue(/dequeue)_bulk(/burst) kept generic and it's
>> behaviour is determined at runtime, by value of prod(/cons).sync_type.
>> Or user can call enqueue/dequeue with particular sync mode directly:
>> rte_ring_(mp/sp/rts/hts)_enqueue_(bulk/burst)*.
>> The only thing that changed:
>>   Format of prod/cons now could differ depending on mode selected at _init_.
>>   So you can't create a ring for let say SP mode and then in the middle of data-path
>>   change your mind and start using MP_RTS mode.
>>   For existing modes (SP/MP, SC/MC)  format remains the same and user can still
>>   use them interchangeably, though of course that is an error prone practice.
> 
> Makes sense.
> 
> 
>>
>>>> But I agree with the problem statement that in the virtualization use
>>>> case, It may be possible to have N virtual cores runs on a physical
>>>> core.
>>>>
>>>> IMO, The best solution would be keeping the ring API same and have a
>>>> different flavor in "compile-time". Something like
>>>> liburcu did for accommodating different flavors.
>>>>
>>>> i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
>>>> application can simply include ONE header file in a C file based on
>>>> the flavor.
>>
>> I don't think it is a flexible enough approach.
>> In one app user might need to have several rings with different sync modes.
>> Or even user might need a ring with different sync modes for enqueue/dequeue.
> 
> Ack.
> 
> 
>> Yes, hiding rte_ring implementation inside .c would help a lot
>> in terms of ABI maintenance and would make our future life easier.
>> The question is what is the price for it in terms of performance,
>> and are we ready to pay it. Not to mention that it would cause
>> changes in many other libs/apps...
>> So I think it should be a subject for a separate discussion.
>> But, agree it would be good at least to measure the performance
>> impact of such change.
>> If I'll have some spare cycles, will give it a try.
>> Meanwhile, can I ask Jerin and other guys to repeat tests from this RFC
>> on their HW? Before continuing discussion would probably be good to know
>> does the suggested patch work as expected across different platforms.
> 
> 
> I tested on an arm64 HW. The former section is without the
> patch(20.02) and later one with this patch.
> I agree with Konstantin that getting more platform tests will be good
> early so that we can focus on the approach
> to avoid back and forth latter.
> 
> 
> RTE>>ring_perf_autotest // without path
> 
> ### Testing single element enq/deq ###
> legacy APIs: SP/SC: single: 289.78
> legacy APIs: MP/MC: single: 516.20
> 
> ### Testing burst enq/deq ###
> legacy APIs: SP/SC: burst (size: 8): 312.88
> legacy APIs: SP/SC: burst (size: 32): 426.72
> legacy APIs: MP/MC: burst (size: 8): 510.95
> legacy APIs: MP/MC: burst (size: 32): 702.01
> 
> ### Testing bulk enq/deq ###
> legacy APIs: SP/SC: bulk (size: 8): 306.74
> legacy APIs: SP/SC: bulk (size: 32): 411.56
> legacy APIs: MP/MC: bulk (size: 8): 501.32
> legacy APIs: MP/MC: bulk (size: 32): 693.07
> 
> ### Testing empty bulk deq ###
> legacy APIs: SP/SC: bulk (size: 8): 7.00
> legacy APIs: MP/MC: bulk (size: 8): 7.00
> 
> ### Testing using two physical cores ###
> legacy APIs: SP/SC: bulk (size: 8): 74.36
> legacy APIs: MP/MC: bulk (size: 8): 110.18
> legacy APIs: SP/SC: bulk (size: 32): 23.04
> legacy APIs: MP/MC: bulk (size: 32): 32.29
> 
> ### Testing using all slave nodes ##
> Bulk enq/dequeue count on size 8
> Core [8] count = 293741
> Core [9] count = 293741
> Total count (size: 8): 587482
> 
> Bulk enq/dequeue count on size 32
> Core [8] count = 244909
> Core [9] count = 244909
> Total count (size: 32): 1077300
> 
> ### Testing single element enq/deq ###
> elem APIs: element size 16B: SP/SC: single: 255.37
> elem APIs: element size 16B: MP/MC: single: 456.68
> 
> ### Testing burst enq/deq ###
> elem APIs: element size 16B: SP/SC: burst (size: 8): 291.99
> elem APIs: element size 16B: SP/SC: burst (size: 32): 456.25
> elem APIs: element size 16B: MP/MC: burst (size: 8): 497.77
> elem APIs: element size 16B: MP/MC: burst (size: 32): 680.87
> 
> ### Testing bulk enq/deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 284.40
> elem APIs: element size 16B: SP/SC: bulk (size: 32): 453.17
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 485.77
> elem APIs: element size 16B: MP/MC: bulk (size: 32): 675.08
> 
> ### Testing empty bulk deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 8.00
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.00
> 
> ### Testing using two physical cores ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 74.45
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 105.91
> elem APIs: element size 16B: SP/SC: bulk (size: 32): 22.92
> elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.55
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [8] count = 308724
> Core [9] count = 308723
> Total count (size: 8): 617447
> 
> Bulk enq/dequeue count on size 32
> Core [8] count = 214269
> Core [9] count = 214269
> Total count (size: 32): 1045985
> 
> RTE>>ring_perf_autotest // with patch
> 
> ### Testing single element enq/deq ###
> legacy APIs: SP/SC: single: 289.78
> legacy APIs: MP/MC: single: 475.76
> 
> ### Testing burst enq/deq ###
> legacy APIs: SP/SC: burst (size: 8): 323.91
> legacy APIs: SP/SC: burst (size: 32): 424.60
> legacy APIs: MP/MC: burst (size: 8): 523.00
> legacy APIs: MP/MC: burst (size: 32): 717.09
> 
> ### Testing bulk enq/deq ###
> legacy APIs: SP/SC: bulk (size: 8): 317.74
> legacy APIs: SP/SC: bulk (size: 32): 413.57
> legacy APIs: MP/MC: bulk (size: 8): 512.89
> legacy APIs: MP/MC: bulk (size: 32): 712.45
> 
> ### Testing empty bulk deq ###
> legacy APIs: SP/SC: bulk (size: 8): 7.00
> legacy APIs: MP/MC: bulk (size: 8): 7.00
> 
> ### Testing using two physical cores ###
> legacy APIs: SP/SC: bulk (size: 8): 74.82
> legacy APIs: MP/MC: bulk (size: 8): 96.45
> legacy APIs: SP/SC: bulk (size: 32): 22.97
> legacy APIs: MP/MC: bulk (size: 32): 32.52
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [8] count = 283928
> Core [9] count = 283927
> Total count (size: 8): 567855
> 
> Bulk enq/dequeue count on size 32
> Core [8] count = 223916
> Core [9] count = 223915
> Total count (size: 32): 1015686
> 
> ### Testing single element enq/deq ###
> elem APIs: element size 16B: SP/SC: single: 267.65
> elem APIs: element size 16B: MP/MC: single: 439.06
> 
> ### Testing burst enq/deq ###
> elem APIs: element size 16B: SP/SC: burst (size: 8): 302.44
> elem APIs: element size 16B: SP/SC: burst (size: 32): 466.31
> elem APIs: element size 16B: MP/MC: burst (size: 8): 502.51
> elem APIs: element size 16B: MP/MC: burst (size: 32): 695.81
> 
> ### Testing bulk enq/deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 295.15
> elem APIs: element size 16B: SP/SC: bulk (size: 32): 462.77
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 496.89
> elem APIs: element size 16B: MP/MC: bulk (size: 32): 690.46
> 
> ### Testing empty bulk deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.50
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.44
> 
> ### Testing using two physical cores ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 65.85
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 103.80
> elem APIs: element size 16B: SP/SC: bulk (size: 32): 23.27
> elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.17
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [8] count = 304223
> Core [9] count = 304221
> Total count (size: 8): 608444
> 
> Bulk enq/dequeue count on size 32
> Core [8] count = 214856
> Core [9] count = 214855
> Total count (size: 32): 1038155
> Test OK
> RTE>>quit
> 
> 

Encountered a couple of different build errors with these patches on my 
Power 9 system:

In file included from ../lib/librte_ring/rte_ring.h:534,
                  from ../drivers/mempool/ring/rte_mempool_ring.c:9:
../lib/librte_ring/rte_ring_hts_generic.h: In function 
‘__rte_ring_hts_update_tail’:
../lib/librte_ring/rte_ring_hts_generic.h:61:2: warning: implicit 
declaration of function ‘RTE_ASSERT’; did you mean ‘RTE_STR’? 
[-Wimplicit-function-declaration]
   RTE_ASSERT(n >= num);
   ^~~~~~~~~~
   RTE_STR

Fixed by adding "#include <rte_debug.h>" to rte_ring.h.

Also encountered:

In file included from ../app/test/test_ring_hts_stress.c:5:
../app/test/test_ring_stress.h: In function ‘check_updt_elem’:
../app/test/test_ring_stress.h:162:9: error: unknown type name 
‘rte_spinlock_t’
   static rte_spinlock_t dump_lock;
          ^~~~~~~~~~~~~~
../app/test/test_ring_stress.h:166:4: warning: implicit declaration of 
function ‘rte_spinlock_lock’; did you mean ‘rte_calloc_socket’? 
[-Wimplicit-function-declaration]
     rte_spinlock_lock(&dump_lock);
     ^~~~~~~~~~~~~~~~~
     rte_calloc_socket
../app/test/test_ring_stress.h:166:4: warning: nested extern declaration 
of ‘rte_spinlock_lock’ [-Wnested-externs]
../app/test/test_ring_stress.h:172:4: warning: implicit declaration of 
function ‘rte_spinlock_unlock’; did you mean ‘pthread_rwlock_unlock’? 
[-Wimplicit-function-declaration]
     rte_spinlock_unlock(&dump_lock);
     ^~~~~~~~~~~~~~~~~~~
     pthread_rwlock_unlock
../app/test/test_ring_stress.h:172:4: warning: nested extern declaration 
of ‘rte_spinlock_unlock’ [-Wnested-externs]

Fixed by adding "#include <rte_spinlock.h>" to test_ring_stress.h.

Autoperf test results
---------------------
RTE>>ring_perf_autotest // DPDK 20.02, without patch

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.14
legacy APIs: MP/MC: single: 56.26

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.59
legacy APIs: SP/SC: burst (size: 32): 49.87
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.68

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.59
legacy APIs: SP/SC: bulk (size: 32): 49.85
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.60

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.46
legacy APIs: MP/MC: bulk (size: 8): 16.20
legacy APIs: SP/SC: bulk (size: 32): 3.21
legacy APIs: MP/MC: bulk (size: 32): 3.73

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 33.34
legacy APIs: MP/MC: bulk (size: 8): 37.99
legacy APIs: SP/SC: bulk (size: 32): 10.19
legacy APIs: MP/MC: bulk (size: 32): 11.90

### Testing using two NUMA nodes ###
legacy APIs: SP/SC: bulk (size: 8): 49.50
legacy APIs: MP/MC: bulk (size: 8): 63.65
legacy APIs: SP/SC: bulk (size: 32): 12.49
legacy APIs: MP/MC: bulk (size: 32): 23.53

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5604
Core [5] count = 5563
Core [6] count = 5576
Core [7] count = 5630
Core [8] count = 5643
Core [9] count = 5727
Core [10] count = 5698
Core [11] count = 5711
Core [64] count = 5259
Core [65] count = 5322
Core [66] count = 5321
Core [67] count = 5310
Core [68] count = 4350
Core [69] count = 4455
Core [70] count = 4546
Core [71] count = 4475
Total count (size: 8): 84190

Bulk enq/dequeue count on size 32
Core [4] count = 5543
Core [5] count = 5555
Core [6] count = 5596
Core [7] count = 5584
Core [8] count = 5613
Core [9] count = 5686
Core [10] count = 5689
Core [11] count = 5677
Core [64] count = 5228
Core [65] count = 5389
Core [66] count = 5406
Core [67] count = 5359
Core [68] count = 4554
Core [69] count = 4673
Core [70] count = 4675
Core [71] count = 4644
Total count (size: 32): 169061

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.84
elem APIs: element size 16B: MP/MC: single: 56.77

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 44.98
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.16
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.58
elem APIs: element size 16B: MP/MC: burst (size: 32): 74.75

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.01
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.08
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.58
elem APIs: element size 16B: MP/MC: bulk (size: 32): 74.76

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.18
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.44
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.97

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 42.07
elem APIs: element size 16B: MP/MC: bulk (size: 8): 44.50
elem APIs: element size 16B: SP/SC: bulk (size: 32): 10.73
elem APIs: element size 16B: MP/MC: bulk (size: 32): 11.73

### Testing using two NUMA nodes ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 49.55
elem APIs: element size 16B: MP/MC: bulk (size: 8): 93.10
elem APIs: element size 16B: SP/SC: bulk (size: 32): 12.33
elem APIs: element size 16B: MP/MC: bulk (size: 32): 27.10

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5489
Core [5] count = 5559
Core [6] count = 5566
Core [7] count = 5577
Core [8] count = 5645
Core [9] count = 5699
Core [10] count = 5695
Core [11] count = 5733
Core [64] count = 5202
Core [65] count = 5284
Core [66] count = 5319
Core [67] count = 5349
Core [68] count = 4331
Core [69] count = 4465
Core [70] count = 4484
Core [71] count = 4439
Total count (size: 8): 83836

Bulk enq/dequeue count on size 32
Core [4] count = 5567
Core [5] count = 5492
Core [6] count = 5517
Core [7] count = 5515
Core [8] count = 5593
Core [9] count = 5650
Core [10] count = 5694
Core [11] count = 5665
Core [64] count = 5236
Core [65] count = 5319
Core [66] count = 5333
Core [67] count = 5304
Core [68] count = 4608
Core [69] count = 4669
Core [70] count = 4690
Core [71] count = 4654
Total count (size: 32): 168342
Test OK
---------------------
RTE>>ring_perf_autotest // DPDK 20.02, without patch

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.18
legacy APIs: MP/MC: single: 56.26

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.60
legacy APIs: SP/SC: burst (size: 32): 49.86
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.67

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.59
legacy APIs: SP/SC: bulk (size: 32): 49.86
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.63

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.07
legacy APIs: MP/MC: bulk (size: 8): 16.24
legacy APIs: SP/SC: bulk (size: 32): 3.20
legacy APIs: MP/MC: bulk (size: 32): 3.72

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 33.41
legacy APIs: MP/MC: bulk (size: 8): 38.01
legacy APIs: SP/SC: bulk (size: 32): 10.23
legacy APIs: MP/MC: bulk (size: 32): 11.90

### Testing using two NUMA nodes ###
legacy APIs: SP/SC: bulk (size: 8): 49.27
legacy APIs: MP/MC: bulk (size: 8): 64.80
legacy APIs: SP/SC: bulk (size: 32): 12.45
legacy APIs: MP/MC: bulk (size: 32): 23.11

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5637
Core [5] count = 5599
Core [6] count = 5623
Core [7] count = 5627
Core [8] count = 5723
Core [9] count = 5758
Core [10] count = 5714
Core [11] count = 5724
Core [64] count = 5310
Core [65] count = 5438
Core [66] count = 5448
Core [67] count = 5374
Core [68] count = 4441
Core [69] count = 4550
Core [70] count = 4550
Core [71] count = 4558
Total count (size: 8): 85074

Bulk enq/dequeue count on size 32
Core [4] count = 5608
Core [5] count = 5623
Core [6] count = 5590
Core [7] count = 5658
Core [8] count = 5680
Core [9] count = 5738
Core [10] count = 5692
Core [11] count = 5712
Core [64] count = 5273
Core [65] count = 5363
Core [66] count = 5341
Core [67] count = 5349
Core [68] count = 4591
Core [69] count = 4673
Core [70] count = 4698
Core [71] count = 4687
Total count (size: 32): 170350

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.82
elem APIs: element size 16B: MP/MC: single: 56.79

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 44.99
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.00
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.59
elem APIs: element size 16B: MP/MC: burst (size: 32): 74.78

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 44.97
elem APIs: element size 16B: SP/SC: bulk (size: 32): 58.91
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.60
elem APIs: element size 16B: MP/MC: bulk (size: 32): 74.61

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.18
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.41
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.19
elem APIs: element size 16B: MP/MC: bulk (size: 32): 4.06

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 42.08
elem APIs: element size 16B: MP/MC: bulk (size: 8): 44.52
elem APIs: element size 16B: SP/SC: bulk (size: 32): 10.73
elem APIs: element size 16B: MP/MC: bulk (size: 32): 12.39

### Testing using two NUMA nodes ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 49.65
elem APIs: element size 16B: MP/MC: bulk (size: 8): 93.27
elem APIs: element size 16B: SP/SC: bulk (size: 32): 12.38
elem APIs: element size 16B: MP/MC: bulk (size: 32): 27.19

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5629
Core [5] count = 5585
Core [6] count = 5676
Core [7] count = 5604
Core [8] count = 5639
Core [9] count = 5731
Core [10] count = 5694
Core [11] count = 5707
Core [64] count = 5254
Core [65] count = 5331
Core [66] count = 5340
Core [67] count = 5355
Core [68] count = 4339
Core [69] count = 4481
Core [70] count = 4504
Core [71] count = 4507
Total count (size: 8): 84376

Bulk enq/dequeue count on size 32
Core [4] count = 5518
Core [5] count = 5493
Core [6] count = 5559
Core [7] count = 5484
Core [8] count = 5623
Core [9] count = 5669
Core [10] count = 5661
Core [11] count = 5658
Core [64] count = 5207
Core [65] count = 5305
Core [66] count = 5273
Core [67] count = 5303
Core [68] count = 4542
Core [69] count = 4682
Core [70] count = 4672
Core [71] count = 4609
Total count (size: 32): 168634
Test OK


Dave

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-28  0:17           ` David Christensen
@ 2020-03-20 16:45             ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-03-20 16:45 UTC (permalink / raw)
  To: David Christensen, Jerin Jacob; +Cc: Stephen Hemminger, dpdk-dev, Olivier Matz


> >
> >
> > I tested on an arm64 HW. The former section is without the
> > patch(20.02) and later one with this patch.
> > I agree with Konstantin that getting more platform tests will be good
> > early so that we can focus on the approach
> > to avoid back and forth latter.
> >
> >
> > RTE>>ring_perf_autotest // without path
> >
> > ### Testing single element enq/deq ###
> > legacy APIs: SP/SC: single: 289.78
> > legacy APIs: MP/MC: single: 516.20
> >
> > ### Testing burst enq/deq ###
> > legacy APIs: SP/SC: burst (size: 8): 312.88
> > legacy APIs: SP/SC: burst (size: 32): 426.72
> > legacy APIs: MP/MC: burst (size: 8): 510.95
> > legacy APIs: MP/MC: burst (size: 32): 702.01
> >
> > ### Testing bulk enq/deq ###
> > legacy APIs: SP/SC: bulk (size: 8): 306.74
> > legacy APIs: SP/SC: bulk (size: 32): 411.56
> > legacy APIs: MP/MC: bulk (size: 8): 501.32
> > legacy APIs: MP/MC: bulk (size: 32): 693.07
> >
> > ### Testing empty bulk deq ###
> > legacy APIs: SP/SC: bulk (size: 8): 7.00
> > legacy APIs: MP/MC: bulk (size: 8): 7.00
> >
> > ### Testing using two physical cores ###
> > legacy APIs: SP/SC: bulk (size: 8): 74.36
> > legacy APIs: MP/MC: bulk (size: 8): 110.18
> > legacy APIs: SP/SC: bulk (size: 32): 23.04
> > legacy APIs: MP/MC: bulk (size: 32): 32.29
> >
> > ### Testing using all slave nodes ##
> > Bulk enq/dequeue count on size 8
> > Core [8] count = 293741
> > Core [9] count = 293741
> > Total count (size: 8): 587482
> >
> > Bulk enq/dequeue count on size 32
> > Core [8] count = 244909
> > Core [9] count = 244909
> > Total count (size: 32): 1077300
> >
> > ### Testing single element enq/deq ###
> > elem APIs: element size 16B: SP/SC: single: 255.37
> > elem APIs: element size 16B: MP/MC: single: 456.68
> >
> > ### Testing burst enq/deq ###
> > elem APIs: element size 16B: SP/SC: burst (size: 8): 291.99
> > elem APIs: element size 16B: SP/SC: burst (size: 32): 456.25
> > elem APIs: element size 16B: MP/MC: burst (size: 8): 497.77
> > elem APIs: element size 16B: MP/MC: burst (size: 32): 680.87
> >
> > ### Testing bulk enq/deq ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 284.40
> > elem APIs: element size 16B: SP/SC: bulk (size: 32): 453.17
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 485.77
> > elem APIs: element size 16B: MP/MC: bulk (size: 32): 675.08
> >
> > ### Testing empty bulk deq ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 8.00
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.00
> >
> > ### Testing using two physical cores ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 74.45
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 105.91
> > elem APIs: element size 16B: SP/SC: bulk (size: 32): 22.92
> > elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.55
> >
> > ### Testing using all slave nodes ###
> >
> > Bulk enq/dequeue count on size 8
> > Core [8] count = 308724
> > Core [9] count = 308723
> > Total count (size: 8): 617447
> >
> > Bulk enq/dequeue count on size 32
> > Core [8] count = 214269
> > Core [9] count = 214269
> > Total count (size: 32): 1045985
> >
> > RTE>>ring_perf_autotest // with patch
> >
> > ### Testing single element enq/deq ###
> > legacy APIs: SP/SC: single: 289.78
> > legacy APIs: MP/MC: single: 475.76
> >
> > ### Testing burst enq/deq ###
> > legacy APIs: SP/SC: burst (size: 8): 323.91
> > legacy APIs: SP/SC: burst (size: 32): 424.60
> > legacy APIs: MP/MC: burst (size: 8): 523.00
> > legacy APIs: MP/MC: burst (size: 32): 717.09
> >
> > ### Testing bulk enq/deq ###
> > legacy APIs: SP/SC: bulk (size: 8): 317.74
> > legacy APIs: SP/SC: bulk (size: 32): 413.57
> > legacy APIs: MP/MC: bulk (size: 8): 512.89
> > legacy APIs: MP/MC: bulk (size: 32): 712.45
> >
> > ### Testing empty bulk deq ###
> > legacy APIs: SP/SC: bulk (size: 8): 7.00
> > legacy APIs: MP/MC: bulk (size: 8): 7.00
> >
> > ### Testing using two physical cores ###
> > legacy APIs: SP/SC: bulk (size: 8): 74.82
> > legacy APIs: MP/MC: bulk (size: 8): 96.45
> > legacy APIs: SP/SC: bulk (size: 32): 22.97
> > legacy APIs: MP/MC: bulk (size: 32): 32.52
> >
> > ### Testing using all slave nodes ###
> >
> > Bulk enq/dequeue count on size 8
> > Core [8] count = 283928
> > Core [9] count = 283927
> > Total count (size: 8): 567855
> >
> > Bulk enq/dequeue count on size 32
> > Core [8] count = 223916
> > Core [9] count = 223915
> > Total count (size: 32): 1015686
> >
> > ### Testing single element enq/deq ###
> > elem APIs: element size 16B: SP/SC: single: 267.65
> > elem APIs: element size 16B: MP/MC: single: 439.06
> >
> > ### Testing burst enq/deq ###
> > elem APIs: element size 16B: SP/SC: burst (size: 8): 302.44
> > elem APIs: element size 16B: SP/SC: burst (size: 32): 466.31
> > elem APIs: element size 16B: MP/MC: burst (size: 8): 502.51
> > elem APIs: element size 16B: MP/MC: burst (size: 32): 695.81
> >
> > ### Testing bulk enq/deq ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 295.15
> > elem APIs: element size 16B: SP/SC: bulk (size: 32): 462.77
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 496.89
> > elem APIs: element size 16B: MP/MC: bulk (size: 32): 690.46
> >
> > ### Testing empty bulk deq ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.50
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.44
> >
> > ### Testing using two physical cores ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 65.85
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 103.80
> > elem APIs: element size 16B: SP/SC: bulk (size: 32): 23.27
> > elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.17
> >
> > ### Testing using all slave nodes ###
> >
> > Bulk enq/dequeue count on size 8
> > Core [8] count = 304223
> > Core [9] count = 304221
> > Total count (size: 8): 608444
> >
> > Bulk enq/dequeue count on size 32
> > Core [8] count = 214856
> > Core [9] count = 214855
> > Total count (size: 32): 1038155
> > Test OK
> > RTE>>quit
> >
> >
> 
> Encountered a couple of different build errors with these patches on my
> Power 9 system:
> 
> In file included from ../lib/librte_ring/rte_ring.h:534,
>                   from ../drivers/mempool/ring/rte_mempool_ring.c:9:
> ../lib/librte_ring/rte_ring_hts_generic.h: In function
> ‘__rte_ring_hts_update_tail’:
> ../lib/librte_ring/rte_ring_hts_generic.h:61:2: warning: implicit
> declaration of function ‘RTE_ASSERT’; did you mean ‘RTE_STR’?
> [-Wimplicit-function-declaration]
>    RTE_ASSERT(n >= num);
>    ^~~~~~~~~~
>    RTE_STR
> 
> Fixed by adding "#include <rte_debug.h>" to rte_ring.h.
> 
> Also encountered:
> 
> In file included from ../app/test/test_ring_hts_stress.c:5:
> ../app/test/test_ring_stress.h: In function ‘check_updt_elem’:
> ../app/test/test_ring_stress.h:162:9: error: unknown type name
> ‘rte_spinlock_t’
>    static rte_spinlock_t dump_lock;
>           ^~~~~~~~~~~~~~
> ../app/test/test_ring_stress.h:166:4: warning: implicit declaration of
> function ‘rte_spinlock_lock’; did you mean ‘rte_calloc_socket’?
> [-Wimplicit-function-declaration]
>      rte_spinlock_lock(&dump_lock);
>      ^~~~~~~~~~~~~~~~~
>      rte_calloc_socket
> ../app/test/test_ring_stress.h:166:4: warning: nested extern declaration
> of ‘rte_spinlock_lock’ [-Wnested-externs]
> ../app/test/test_ring_stress.h:172:4: warning: implicit declaration of
> function ‘rte_spinlock_unlock’; did you mean ‘pthread_rwlock_unlock’?
> [-Wimplicit-function-declaration]
>      rte_spinlock_unlock(&dump_lock);
>      ^~~~~~~~~~~~~~~~~~~
>      pthread_rwlock_unlock
> ../app/test/test_ring_stress.h:172:4: warning: nested extern declaration
> of ‘rte_spinlock_unlock’ [-Wnested-externs]
> 
> Fixed by adding "#include <rte_spinlock.h>" to test_ring_stress.h.

Thanks a lot for trying it guys. 
About compilation issues - will  to address it in V1.
Konstantin


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
                   ` (6 preceding siblings ...)
  2020-02-24 16:59 ` [dpdk-dev] [RFC 0/6] New sync modes for ring Stephen Hemminger
@ 2020-03-25 20:43 ` Honnappa Nagarahalli
  2020-03-26  1:50   ` Ananyev, Konstantin
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
  8 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-03-25 20:43 UTC (permalink / raw)
  To: Konstantin Ananyev, dev; +Cc: olivier.matz, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: [dpdk-dev] [RFC 0/6] New sync modes for ring
> 
> Upfront note - that RFC is not a complete patch.
> It introduces an ABI breakage, plus it doesn't update ring_elem code properly,
As per the current rules, these changes (in the current form) will be accepted only for 20.11 release. How do we address this for immediate requirements like RCU defer APIs? 
I suggest that we move forward with my RFC (taking into consideration your feedback) to make progress on RCU APIs.

> etc.
> I plan to deal with all these things in later versions.
> Right now I seek an initial feedback about proposed ideas.
> Would also ask people to repeat performance tests (see below) on their
> platforms to confirm the impact.
> 
> More and more customers use(/try to use) DPDK based apps within
> overcommitted systems (multiple acttive threads over same pysical cores):
> VM, container deployments, etc.
> One quite common problem they hit: Lock-Holder-Preemption with rte_ring.
> LHP is quite a common problem for spin-based sync primitives (spin-locks, etc.)
> on overcommitted systems.
> The situation gets much worse when some sort of fair-locking technique is
> used (ticket-lock, etc.).
> As now not only lock-owner but also lock-waiters scheduling order matters a
> lot.
> This is a well-known problem for kernel within VMs:
> http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> The problem with rte_ring is that while head accusion is sort of un-fair locking,
> waiting on tail is very similar to ticket lock schema - tail has to be updated in
> particular order.
> That makes current rte_ring implementation to perform really pure on some
> overcommited scenarios.
> While it is probably not possible to completely resolve this problem in
> userspace only (without some kernel communication/intervention), removing
> fairness in tail update can mitigate it significantly.
> So this RFC proposes two new optional ring synchronization modes:
> 1) Head/Tail Sync (HTS) mode
> In that mode enqueue/dequeue operation is fully serialized:
>     only one thread at a time is allowed to perform given op.
>     As another enhancement provide ability to split enqueue/dequeue
>     operation into two phases:
>       - enqueue/dequeue start
>       - enqueue/dequeue finish
>     That allows user to inspect objects in the ring without removing
>     them from it (aka MT safe peek).
IMO, this will not address the problem described above. For ex: when a producer updates the head and gets scheduled out, other producers have to spin. The problem is probably worse as with non-HTS case moving of the head and copying of the ring elements can happen in parallel between the producers (similarly for consumers).
IMO, HTS should not be a configurable flag. In RCU requirement, a MP enqueue and HTS dequeue are required.

> 2) Relaxed Tail Sync (RTS)
> The main difference from original MP/MC algorithm is that tail value is
> increased not by every thread that finished enqueue/dequeue, but only by the
> last one.
> That allows threads to avoid spinning on ring tail value, leaving actual tail value
> change to the last thread in the update queue.
This can be a configurable flag on the ring.
I am not sure how this solves the problem you have stated above completely. Updating the count from all intermediate threads is still required to update the value of the head. But yes, it reduces the severity of the problem by not enforcing the order in which the tail is updated.
I also think it introduces the problem on the other side of the ring because the tail is not updated soon enough (the other side has to wait longer for the elements to become available). It also introduces another configuration parameter (HTD_MAX_DEF) which they have to deal with.
Users have to still implement the current hypervisor related solutions.
IMO, we should run the benchmark for this on an over committed setup to understand the benefits.

> 
> Test results on IA (see below) show significant improvements for average
> enqueue/dequeue op times on overcommitted systems.
> For 'classic' DPDK deployments (one thread per core) original MP/MC
> algorithm still shows best numbers, though for 64-bit target RTS numbers are
> not that far away.
> Numbers were produced by ring_stress_*autotest (first patch in these series).
> 
> X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> DEQ+ENQ average cycles/obj
> 
>                                                 MP/MC      HTS     RTS
> 1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
> 2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
> 4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
> 8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
> 16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
> 32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51
> 
> 2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
> 4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
> 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
> 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02
> 1175.14 32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80
> 4627.48 4892.68
> 
> 8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
> 16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
> 32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12
> 
> i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> DEQ+ENQ average cycles/obj
> 
>                                                 MP/MC      HTS     RTS
> 1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
> 2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
> 8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
> 32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91
> 
> 2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
> 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
> 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90
> 1416.65
> 
> 8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
> 32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87
> 
> Konstantin Ananyev (6):
>   test/ring: add contention stress test
>   ring: rework ring layout to allow new sync schemes
>   ring: introduce RTS ring mode
>   test/ring: add contention stress test for RTS ring
>   ring: introduce HTS ring mode
>   test/ring: add contention stress test for HTS ring
> 
>  app/test/Makefile                      |   3 +
>  app/test/meson.build                   |   3 +
>  app/test/test_pdump.c                  |   6 +-
>  app/test/test_ring_hts_stress.c        |  28 ++
>  app/test/test_ring_rts_stress.c        |  28 ++
>  app/test/test_ring_stress.c            |  27 ++
>  app/test/test_ring_stress.h            | 477 +++++++++++++++++++
>  lib/librte_pdump/rte_pdump.c           |   2 +-
>  lib/librte_port/rte_port_ring.c        |  12 +-
>  lib/librte_ring/Makefile               |   4 +-
>  lib/librte_ring/meson.build            |   4 +-
>  lib/librte_ring/rte_ring.c             |  84 +++-
>  lib/librte_ring/rte_ring.h             | 619 +++++++++++++++++++++++--
>  lib/librte_ring/rte_ring_elem.h        |   8 +-
>  lib/librte_ring/rte_ring_hts_generic.h | 228 +++++++++
> lib/librte_ring/rte_ring_rts_generic.h | 240 ++++++++++
>  16 files changed, 1721 insertions(+), 52 deletions(-)  create mode 100644
> app/test/test_ring_hts_stress.c  create mode 100644
> app/test/test_ring_rts_stress.c  create mode 100644
> app/test/test_ring_stress.c  create mode 100644 app/test/test_ring_stress.h
> create mode 100644 lib/librte_ring/rte_ring_hts_generic.h
>  create mode 100644 lib/librte_ring/rte_ring_rts_generic.h
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 5/6] ring: introduce HTS ring mode
  2020-02-24 11:35 ` [dpdk-dev] [RFC 5/6] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-03-25 20:44   ` Honnappa Nagarahalli
  2020-03-26 12:26     ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-03-25 20:44 UTC (permalink / raw)
  To: Konstantin Ananyev, dev; +Cc: olivier.matz, nd, Honnappa Nagarahalli, nd

<snip>

> 
> Introduce head/tail sync mode for MT ring synchronization.
> In that mode enqueue/dequeue operation is fully serialized:
> only one thread at a time is allowed to perform given op.
> Suppose to reduce stall times in case when ring is used on overcommitted
> cpus (multiple active threads on the same cpu).
> As another enhancement provide ability to split enqueue/dequeue operation
> into two phases:
>   - enqueue/dequeue start
>   - enqueue/dequeue finish
> That allows user to inspect objects in the ring without removing them from it
> (aka MT safe peek).
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_ring/Makefile               |   1 +
>  lib/librte_ring/meson.build            |   1 +
>  lib/librte_ring/rte_ring.c             |  15 +-
>  lib/librte_ring/rte_ring.h             | 259 ++++++++++++++++++++++++-
>  lib/librte_ring/rte_ring_hts_generic.h | 228 ++++++++++++++++++++++
>  5 files changed, 500 insertions(+), 4 deletions(-)  create mode 100644
> lib/librte_ring/rte_ring_hts_generic.h
> 
> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> 4f90344f4..0c7f8f918 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -19,6 +19,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> rte_ring.h \
>  					rte_ring_elem.h \
>  					rte_ring_generic.h \
>  					rte_ring_c11_mem.h \
> +					rte_ring_hts_generic.h \
>  					rte_ring_rts_generic.h
> 
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> dc8d7dbea..5aa673199 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -6,6 +6,7 @@ headers = files('rte_ring.h',
>  		'rte_ring_elem.h',
>  		'rte_ring_c11_mem.h',
>  		'rte_ring_generic.h',
> +		'rte_ring_hts_generic.h',
>  		'rte_ring_rts_generic.h')
> 
>  # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> 1ce0af3e5..d3b948667 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -102,9 +102,9 @@ static int
>  get_sync_type(uint32_t flags, uint32_t *prod_st, uint32_t *cons_st)  {
>  	static const uint32_t prod_st_flags =
> -		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
> +		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ |
> RING_F_MP_HTS_ENQ);
>  	static const uint32_t cons_st_flags =
> -		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
> +		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ |
> RING_F_MC_HTS_DEQ);
> 
>  	switch (flags & prod_st_flags) {
>  	case 0:
> @@ -116,6 +116,9 @@ get_sync_type(uint32_t flags, uint32_t *prod_st,
> uint32_t *cons_st)
>  	case RING_F_MP_RTS_ENQ:
>  		*prod_st = RTE_RING_SYNC_MT_RTS;
>  		break;
> +	case RING_F_MP_HTS_ENQ:
> +		*prod_st = RTE_RING_SYNC_MT_HTS;
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
> @@ -130,6 +133,9 @@ get_sync_type(uint32_t flags, uint32_t *prod_st,
> uint32_t *cons_st)
>  	case RING_F_MC_RTS_DEQ:
>  		*cons_st = RTE_RING_SYNC_MT_RTS;
>  		break;
> +	case RING_F_MC_HTS_DEQ:
> +		*cons_st = RTE_RING_SYNC_MT_HTS;
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
> @@ -151,6 +157,11 @@ rte_ring_init(struct rte_ring *r, const char *name,
> unsigned count,
>  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
>  			  RTE_CACHE_LINE_MASK) != 0);
> 
> +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
> +		offsetof(struct rte_ring_hts_headtail, sync_type));
> +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
> +		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
> +
>  	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
>  		offsetof(struct rte_ring_rts_headtail, sync_type));
>  	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) != diff --git
> a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> a130aeb9d..52edcea11 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -66,11 +66,11 @@ enum {
>  	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
>  	RTE_RING_SYNC_ST,     /**< single thread only */
>  	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
> +	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
>  };
> 
>  /**
> - * structure to hold a pair of head/tail values and other metadata.
> - * used by RTE_RING_SYNC_MT, RTE_RING_SYNC_ST sync types.
> + * Structure to hold a pair of head/tail values and other metadata.
>   * Depending on sync_type format of that structure might differ
>   * depending on the sync mechanism selelcted, but offsets for
>   * *sync_type* and *tail* values should always remain the same.
> @@ -96,6 +96,19 @@ struct rte_ring_rts_headtail {
>  	volatile union rte_ring_ht_poscnt head;  };
> 
> +union rte_ring_ht_pos {
> +	uint64_t raw;
> +	struct {
> +		uint32_t tail; /**< tail position */
> +		uint32_t head; /**< head position */
> +	} pos;
> +};
> +
> +struct rte_ring_hts_headtail {
> +	uint32_t sync_type; /**< sync type of prod/cons */
> +	volatile union rte_ring_ht_pos ht __rte_aligned(8); };
> +
>  /**
>   * An RTE ring structure.
>   *
> @@ -126,6 +139,7 @@ struct rte_ring {
>  	RTE_STD_C11
>  	union {
>  		struct rte_ring_headtail prod;
> +		struct rte_ring_hts_headtail hts_prod;
>  		struct rte_ring_rts_headtail rts_prod;
>  	}  __rte_cache_aligned;
> 
> @@ -135,6 +149,7 @@ struct rte_ring {
>  	RTE_STD_C11
>  	union {
>  		struct rte_ring_headtail cons;
> +		struct rte_ring_hts_headtail hts_cons;
>  		struct rte_ring_rts_headtail rts_cons;
>  	}  __rte_cache_aligned;
> 
> @@ -157,6 +172,9 @@ struct rte_ring {
>  #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS".
> */  #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC
> RTS". */
> 
> +#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP
> HTS".
> +*/ #define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC
> +HTS". */
> +
>  #define __IS_SP RTE_RING_SYNC_ST
>  #define __IS_MP RTE_RING_SYNC_MT
>  #define __IS_SC RTE_RING_SYNC_ST
> @@ -513,6 +531,82 @@ __rte_ring_do_rts_dequeue(struct rte_ring *r, void
> **obj_table,
>  	return n;
>  }
> 
> +#include <rte_ring_hts_generic.h>
> +
> +/**
> + * @internal Start to enqueue several objects on the HTS ring.
> + * Note that user has to call appropriate enqueue_finish()
> + * to complete given enqueue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param free_space
> + *   returns the amount of space after the enqueue operation has finished
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_hts_enqueue_start(struct rte_ring *r, void * const *obj_table,
> +		uint32_t n, enum rte_ring_queue_behavior behavior,
> +		uint32_t *free_space)
> +{
> +	uint32_t free, head;
> +
> +	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
> +
> +	if (n != 0)
> +		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
> +
> +	if (free_space != NULL)
> +		*free_space = free - n;
> +	return n;
> +}
rte_ring.h is becoming too big. May be we should move these functions to another HTS specific file. But leave the top level API in rte_ring.h. Similarly for RTS.

> +
> +/**
> + * @internal Start to dequeue several objects from the HTS ring.
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to pull from the ring.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param available
> + *   returns the number of remaining ring entries after the dequeue has
> finished
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_hts_dequeue_start(struct rte_ring *r, void **obj_table,
> +		unsigned int n, enum rte_ring_queue_behavior behavior,
> +		unsigned int *available)
> +{
> +	uint32_t entries, head;
> +
> +	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
> +
> +	if (n != 0)
> +		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
> +
> +	if (available != NULL)
> +		*available = entries - n;
> +	return n;
> +}
> +
>  /**
>   * Enqueue several objects on the ring (multi-producers safe).
>   *
> @@ -585,6 +679,47 @@ rte_ring_rts_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,
>  			free_space);
>  }
> 
> +/**
> + * Start to enqueue several objects on the HTS ring (multi-producers safe).
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_hts_enqueue_bulk_start(struct rte_ring *r, void * const *obj_table,
> +			 unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_hts_enqueue_start(r, obj_table, n,
> +		RTE_RING_QUEUE_FIXED, free_space);
> +}
I do not clearly understand the requirements on the enqueue_start and enqueue_finish in the form they are here.
IMO, only requirement for these APIs is to provide the ability to avoid intermediate memcpys.

> +
> +static __rte_always_inline void
> +rte_ring_hts_enqueue_finish(struct rte_ring *r, unsigned int n) {
> +	__rte_ring_hts_update_tail(&r->hts_prod, n, 1); }
> +
> +static __rte_always_inline unsigned int
> +rte_ring_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> +			 unsigned int n, unsigned int *free_space) {
> +	n = rte_ring_hts_enqueue_bulk_start(r, obj_table, n, free_space);
> +	if (n != 0)
> +		rte_ring_hts_enqueue_finish(r, n);
> +	return n;
> +}
> +
>  /**
>   * Enqueue several objects on a ring.
>   *
> @@ -615,6 +750,8 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const
> *obj_table,
>  		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
>  	case RTE_RING_SYNC_MT_RTS:
>  		return rte_ring_rts_enqueue_bulk(r, obj_table, n, free_space);
> +	case RTE_RING_SYNC_MT_HTS:
> +		return rte_ring_hts_enqueue_bulk(r, obj_table, n,
> free_space);
>  	}
> 
>  	/* valid ring should never reach this point */ @@ -753,6 +890,47 @@
> rte_ring_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
>  			available);
>  }
> 
> +/**
> + * Start to dequeue several objects from an HTS ring (multi-consumers safe).
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_hts_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_hts_dequeue_start(r, obj_table, n,
> +		RTE_RING_QUEUE_FIXED, available);
> +}
IMO, we should look to provide the ability to avoid intermediate copies when the data from the ring needs to be distributed to different locations.
My proposal in its form is complicated. But, I am thinking that, if the return values are abstracted in a structure, it might look much simple.

> +
> +static __rte_always_inline void
> +rte_ring_hts_dequeue_finish(struct rte_ring *r, unsigned int n) {
> +	__rte_ring_hts_update_tail(&r->hts_cons, n, 0); }
> +
> +static __rte_always_inline unsigned int
> +rte_ring_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	n = rte_ring_hts_dequeue_bulk_start(r, obj_table, n, available);
> +	if (n != 0)
> +		rte_ring_hts_dequeue_finish(r, n);
> +	return n;
> +}
> +
>  /**
>   * Dequeue several objects from a ring.
>   *
> @@ -783,6 +961,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> **obj_table, unsigned int n,
>  		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
>  	case RTE_RING_SYNC_MT_RTS:
>  		return rte_ring_rts_dequeue_bulk(r, obj_table, n, available);
> +	case RTE_RING_SYNC_MT_HTS:
> +		return rte_ring_hts_dequeue_bulk(r, obj_table, n, available);
>  	}
> 
>  	/* valid ring should never reach this point */ @@ -1111,6 +1291,41
> @@ rte_ring_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  			RTE_RING_QUEUE_VARIABLE, free_space);  }
> 
> +/**
> + * Start to enqueue several objects on the HTS ring (multi-producers safe).
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_hts_enqueue_burst_start(struct rte_ring *r, void * const *obj_table,
> +			 unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_hts_enqueue_start(r, obj_table, n,
> +		RTE_RING_QUEUE_VARIABLE, free_space); }
> +
rte_ring_hts_enqueue_burst_finish is not implemented. It requires the 'n' returned from ' rte_ring_hts_enqueue_burst_start' to be passed. We can't completely avoid passing correct information between xxx_start and xxx_finish APIs.

> +static __rte_always_inline unsigned int
> +rte_ring_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> +			 unsigned int n, unsigned int *free_space) {
> +	n = rte_ring_hts_enqueue_burst_start(r, obj_table, n, free_space);
> +	if (n != 0)
> +		rte_ring_hts_enqueue_finish(r, n);
> +	return n;
> +}
> +
>  /**
>   * Enqueue several objects on a ring.
>   *
> @@ -1141,6 +1356,8 @@ rte_ring_enqueue_burst(struct rte_ring *r, void *
> const *obj_table,
>  		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> free_space);
>  	case RTE_RING_SYNC_MT_RTS:
>  		return rte_ring_rts_enqueue_burst(r, obj_table, n,
> free_space);
> +	case RTE_RING_SYNC_MT_HTS:
> +		return rte_ring_hts_enqueue_burst(r, obj_table, n,
> free_space);
>  	}
> 
>  	/* valid ring should never reach this point */ @@ -1225,6 +1442,42
> @@ rte_ring_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
>  	return __rte_ring_do_rts_dequeue(r, obj_table, n,
>  			RTE_RING_QUEUE_VARIABLE, available);  }
> +
> +/**
> + * Start to dequeue several objects from an HTS ring (multi-consumers safe).
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +static __rte_always_inline unsigned int
> +rte_ring_hts_dequeue_burst_start(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_hts_dequeue_start(r, obj_table, n,
> +		RTE_RING_QUEUE_VARIABLE, available);
> +}
> +
> +static __rte_always_inline unsigned int
> +rte_ring_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	n = rte_ring_hts_dequeue_burst_start(r, obj_table, n, available);
> +	if (n != 0)
> +		rte_ring_hts_dequeue_finish(r, n);
> +	return n;
> +}
> +
>  /**
>   * Dequeue multiple objects from a ring up to a maximum number.
>   *
> @@ -1255,6 +1508,8 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> **obj_table,
>  		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
>  	case RTE_RING_SYNC_MT_RTS:
>  		return rte_ring_rts_dequeue_burst(r, obj_table, n, available);
> +	case RTE_RING_SYNC_MT_HTS:
> +		return rte_ring_hts_dequeue_burst(r, obj_table, n, available);
>  	}
> 
>  	/* valid ring should never reach this point */ diff --git
> a/lib/librte_ring/rte_ring_hts_generic.h
> b/lib/librte_ring/rte_ring_hts_generic.h
> new file mode 100644
> index 000000000..7e447e30b
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_hts_generic.h
> @@ -0,0 +1,228 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2020 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_HTS_GENERIC_H_
> +#define _RTE_RING_HTS_GENERIC_H_
> +
> +/**
> + * @file rte_ring_hts_generic.h
> + * It is not recommended to include this file directly,
> + * include <rte_ring.h> instead.
> + * Contains internal helper functions for head/tail sync (HTS) ring mode.
> + * In that mode enqueue/dequeue operation is fully serialized:
> + * only one thread at a time is allowed to perform given op.
> + * This is achieved by thread is allowed to proceed with changing
> +head.value
> + * only when head.value == tail.value.
> + * Both head and tail values are updated atomically (as one 64-bit value).
> + * As another enhancement that provides ability to split
> +enqueue/dequeue
> + * operation into two phases:
> + * - enqueue/dequeue start
> + * - enqueue/dequeue finish
> + * That allows user to inspect objects in the ring without removing
> + * them from it (aka MT safe peek).
> + * As an example:
> + * // read 1 elem from the ring:
> + * n = rte_ring_hts_dequeue_bulk_start(ring, &obj, 1, NULL);
> + * if (n != 0) {
> + *    //examined object
> + *    if (object_examine(obj) == KEEP)
> + *       //decided to keep it in the ring.
> + *       rte_ring_hts_dequeue_finish(ring, 0);
> + *    else
> + *       //decided to remove it in the ring.
> + *       rte_ring_hts_dequeue_finish(ring, n);
> + * }
> + * Note that between _start_ and _finish_ the ring is sort of locked -
> + * none other thread can proceed with enqueue(/dequeue) operation till
> + * _finish_ will complete.
This means it does not solve the problem for over committed systems. Do you agree?

> + */
> +
> +static __rte_always_inline void
> +__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> +	uint32_t enqueue)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos p;
> +
> +	if (enqueue)
> +		rte_smp_wmb();
> +	else
> +		rte_smp_rmb();
> +
> +	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
> +
> +	n = p.pos.head - p.pos.tail;
> +	RTE_ASSERT(n >= num);
> +	RTE_SET_USED(n);
> +
> +	p.pos.head = p.pos.tail + num;
> +	p.pos.tail = p.pos.head;
> +
> +	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw); }
> +
> +/**
> + * @internal waits till tail will become equal to head.
> + * Means no writer/reader is active for that ring.
> + * Suppose to work as serialization point.
> + */
> +static __rte_always_inline void
> +__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
> +		union rte_ring_ht_pos *p)
> +{
> +	p->raw = rte_atomic64_read((rte_atomic64_t *)
> +			(uintptr_t)&ht->ht.raw);
> +
> +	while (p->pos.head != p->pos.tail) {
> +		rte_pause();
> +		p->raw = rte_atomic64_read((rte_atomic64_t *)
> +				(uintptr_t)&ht->ht.raw);
> +	}
> +}
> +
> +/**
> + * @internal This function updates the producer head for enqueue
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sp
> + *   Indicates whether multi-producer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where enqueue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where enqueue finishes
> + * @param free_entries
> + *   Returns the amount of free space in the ring BEFORE head was moved
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *free_entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos np, op;
> +
> +	const uint32_t capacity = r->capacity;
> +
> +	do {
> +		/* Reset n to the initial burst count */
> +		n = num;
> +
> +		/* wait for tail to be equal to head */
> +		__rte_ring_hts_head_wait(&r->hts_prod, &op);
> +
> +		/* add rmb barrier to avoid load/load reorder in weak
> +		 * memory model. It is noop on x86
> +		 */
> +		rte_smp_rmb();
> +
> +		/*
> +		 *  The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * *old_head > cons_tail). So 'free_entries' is always between
> 0
> +		 * and capacity (which is < size).
> +		 */
> +		*free_entries = capacity + r->cons.tail - op.pos.head;
> +
> +		/* check that we have enough room in ring */
> +		if (unlikely(n > *free_entries))
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> +					0 : *free_entries;
> +
> +		if (n == 0)
> +			return 0;
> +
> +		np.pos.tail = op.pos.tail;
> +		np.pos.head = op.pos.head + n;
> +
> +	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
> +			op.raw, np.raw) == 0);
> +
> +	*old_head = op.pos.head;
> +	return n;
> +}
> +
> +/**
> + * @internal This function updates the consumer head for dequeue
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sc
> + *   Indicates whether multi-consumer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where dequeue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where dequeue finishes
> + * @param entries
> + *   Returns the number of entries in the ring BEFORE head was moved
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos np, op;
> +
> +	/* move cons.head atomically */
> +	do {
> +		/* Restore n as it may change every loop */
> +		n = num;
> +
> +		/* wait for tail to be equal to head */
> +		__rte_ring_hts_head_wait(&r->hts_cons, &op);
> +
> +		/* add rmb barrier to avoid load/load reorder in weak
> +		 * memory model. It is noop on x86
> +		 */
> +		rte_smp_rmb();
> +
> +		/* The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * cons_head > prod_tail). So 'entries' is always between 0
> +		 * and size(ring)-1.
> +		 */
> +		*entries = r->prod.tail - op.pos.head;
> +
> +		/* Set the actual entries for dequeue */
> +		if (n > *entries)
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> *entries;
> +
> +		if (unlikely(n == 0))
> +			return 0;
> +
> +		np.pos.tail = op.pos.tail;
> +		np.pos.head = op.pos.head + n;
> +
> +	} while (rte_atomic64_cmpset(&r->hts_cons.ht.raw,
> +			op.raw, np.raw) == 0);
> +
> +	*old_head = op.pos.head;
> +	return n;
> +}
> +
> +#endif /* _RTE_RING_HTS_GENERIC_H_ */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-03-25 20:43 ` Honnappa Nagarahalli
@ 2020-03-26  1:50   ` Ananyev, Konstantin
  2020-03-30 21:29     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-03-26  1:50 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: olivier.matz, nd, nd

> 
> <snip>
> 
> > Subject: [dpdk-dev] [RFC 0/6] New sync modes for ring
> >
> > Upfront note - that RFC is not a complete patch.
> > It introduces an ABI breakage, plus it doesn't update ring_elem code properly,
> As per the current rules, these changes (in the current form) will be accepted only for 20.11 release. How do we address this for immediate
> requirements like RCU defer APIs?

I think I found a way to introduce these new modes without API/ABI breakage.
Working on v1 right now. Plan to submit it by end of that week/start of next one.

> I suggest that we move forward with my RFC (taking into consideration your feedback) to make progress on RCU APIs.
> 
> > etc.
> > I plan to deal with all these things in later versions.
> > Right now I seek an initial feedback about proposed ideas.
> > Would also ask people to repeat performance tests (see below) on their
> > platforms to confirm the impact.
> >
> > More and more customers use(/try to use) DPDK based apps within
> > overcommitted systems (multiple acttive threads over same pysical cores):
> > VM, container deployments, etc.
> > One quite common problem they hit: Lock-Holder-Preemption with rte_ring.
> > LHP is quite a common problem for spin-based sync primitives (spin-locks, etc.)
> > on overcommitted systems.
> > The situation gets much worse when some sort of fair-locking technique is
> > used (ticket-lock, etc.).
> > As now not only lock-owner but also lock-waiters scheduling order matters a
> > lot.
> > This is a well-known problem for kernel within VMs:
> > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> > The problem with rte_ring is that while head accusion is sort of un-fair locking,
> > waiting on tail is very similar to ticket lock schema - tail has to be updated in
> > particular order.
> > That makes current rte_ring implementation to perform really pure on some
> > overcommited scenarios.
> > While it is probably not possible to completely resolve this problem in
> > userspace only (without some kernel communication/intervention), removing
> > fairness in tail update can mitigate it significantly.
> > So this RFC proposes two new optional ring synchronization modes:
> > 1) Head/Tail Sync (HTS) mode
> > In that mode enqueue/dequeue operation is fully serialized:
> >     only one thread at a time is allowed to perform given op.
> >     As another enhancement provide ability to split enqueue/dequeue
> >     operation into two phases:
> >       - enqueue/dequeue start
> >       - enqueue/dequeue finish
> >     That allows user to inspect objects in the ring without removing
> >     them from it (aka MT safe peek).
> IMO, this will not address the problem described above.

It does, please see the results produced by ring_stress_*autotest below.
Let say for test-case: 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8' it shows:
avg number of cycles per object for enqueue /dequeue:
MP/MC: 280314.32
HTS:         294.72
RTS:         318.79

Customer who tried it reported similar level of improvement.
Actually if you have time - would be very interesting to see what numbers will be on ARM boxes.
To reproduce, just:
$cat ring_tests_u4
ring_stress_autotest
ring_stress_hts_autotest
ring_stress_rts_autotest

/app/test/dpdk-test --lcores='6,(10-13)@7,(20-23)@8'  -n 4 < ring_tests_u4 2>&1 | tee res1

Then look at the ' AGGREGATE' stats.
Right now it is a bit too verbose, so probably the easiest thing to extract same numbers quickly:
grep 'cycles/obj'  res1 | grep 'cycles/obj' | cat -n | awk '{if ($(1)%9==0) print $(NF);}'
280314.32
1057833.55
294.72
480.10
318.79
461.52

First 2 numbers will be for MP/MC, next 2 for HTS, last 2 for RTS.

> For ex: when a producer updates the head and gets scheduled out, other producers
> have to spin.

Sure, as I wrote in original cover letter:
" While it is probably not possible to completely resolve this problem in
userspace only (without some kernel communication/intervention),
removing fairness in tail update can mitigate it significantly."
Results from the ring_stress_*_autotest confirm that.

> The problem is probably worse as with non-HTS case moving of the head and copying of the ring elements can happen in
> parallel between the producers (similarly for consumers).

Yes as we serialize the ring, we remove possibility of simultaneous copy.
That's why for 'normal' cases (one thread per core) original MP/MC is usually faster.
Though on overcommitted scenarios current MP/MC performance degrades dramatically.
The main problem with current MP/MC implementation is in that tail update
have to be done in strict order (sort of fair locking scheme).
Which means that we have much-much worse LHP manifestation,
then when we use unfair schemes.
With serialized ring (HTS) we remove that ordering completely
(same idea as switch from fair to unfair locking for PV spin-locks).  

> IMO, HTS should not be a configurable flag. 

Why?

> In RCU requirement, a MP enqueue and HTS dequeue are required.

This is supported, user can specify different modes for consumer and producer:
(0 | RING_F_MC_HTS_DEQ).
Then it is up to the user either to call generic rte_ring_enqueue/rte_ring_dequeue,
or specify mode manually by function name:
rte_ring_mp_enqueue_bulk/ rte_ring_hts_dequeue_bulk.

> 
> > 2) Relaxed Tail Sync (RTS)
> > The main difference from original MP/MC algorithm is that tail value is
> > increased not by every thread that finished enqueue/dequeue, but only by the
> > last one.
> > That allows threads to avoid spinning on ring tail value, leaving actual tail value
> > change to the last thread in the update queue.
> This can be a configurable flag on the ring.
> I am not sure how this solves the problem you have stated above completely. Updating the count from all intermediate threads is still
> required to update the value of the head. But yes, it reduces the severity of the problem by not enforcing the order in which the tail is
> updated.

As I said above, main source of slowdown here -
that we have to  update tail in particular order.
So the main objective (same as for HTS) is to remove
that ordering.  

> I also think it introduces the problem on the other side of the ring because the tail is not updated soon enough (the other side has to wait
> longer for the elements to become available).

Yes, producer/consumer starvation.
That's why we need max allowed Head-Tail-Distance (htd_max) -
to limit how far head can go away from tail.

> It also introduces another configuration parameter (HTD_MAX_DEF) which they have to deal
> with.

If user doesn't provide any value, it will be set by default to ring.capacity / 8.
From my measurements works quite well.
Though there possibility for the user to set another value, if needed.

> Users have to still implement the current hypervisor related solutions.

Didn't get what you trying to say with that phrase.

> IMO, we should run the benchmark for this on an over committed setup to understand the benefits.

That's why I created ring_stress_*autotest test-cases and collected numbers provided below.
I suppose they clearly show the problem on overcommitted scenarios,
and how RTS/HTS improve that situation. 
Would appreciate if you repeat these tests on your machines. 

> 
> >
> > Test results on IA (see below) show significant improvements for average
> > enqueue/dequeue op times on overcommitted systems.
> > For 'classic' DPDK deployments (one thread per core) original MP/MC
> > algorithm still shows best numbers, though for 64-bit target RTS numbers are
> > not that far away.
> > Numbers were produced by ring_stress_*autotest (first patch in these series).
> >
> > X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> > DEQ+ENQ average cycles/obj
> >
> >                                                 MP/MC      HTS     RTS
> > 1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
> > 2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
> > 4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
> > 8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
> > 16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
> > 32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51
> >
> > 2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
> > 4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
> > 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
> > 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02
> > 1175.14 32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80
> > 4627.48 4892.68
> >
> > 8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
> > 16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
> > 32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12
> >
> > i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> > DEQ+ENQ average cycles/obj
> >
> >                                                 MP/MC      HTS     RTS
> > 1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
> > 2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
> > 8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
> > 32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91
> >
> > 2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
> > 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
> > 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90
> > 1416.65
> >
> > 8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
> > 32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87
> >
> > Konstantin Ananyev (6):
> >   test/ring: add contention stress test
> >   ring: rework ring layout to allow new sync schemes
> >   ring: introduce RTS ring mode
> >   test/ring: add contention stress test for RTS ring
> >   ring: introduce HTS ring mode
> >   test/ring: add contention stress test for HTS ring
> >
> >  app/test/Makefile                      |   3 +
> >  app/test/meson.build                   |   3 +
> >  app/test/test_pdump.c                  |   6 +-
> >  app/test/test_ring_hts_stress.c        |  28 ++
> >  app/test/test_ring_rts_stress.c        |  28 ++
> >  app/test/test_ring_stress.c            |  27 ++
> >  app/test/test_ring_stress.h            | 477 +++++++++++++++++++
> >  lib/librte_pdump/rte_pdump.c           |   2 +-
> >  lib/librte_port/rte_port_ring.c        |  12 +-
> >  lib/librte_ring/Makefile               |   4 +-
> >  lib/librte_ring/meson.build            |   4 +-
> >  lib/librte_ring/rte_ring.c             |  84 +++-
> >  lib/librte_ring/rte_ring.h             | 619 +++++++++++++++++++++++--
> >  lib/librte_ring/rte_ring_elem.h        |   8 +-
> >  lib/librte_ring/rte_ring_hts_generic.h | 228 +++++++++
> > lib/librte_ring/rte_ring_rts_generic.h | 240 ++++++++++
> >  16 files changed, 1721 insertions(+), 52 deletions(-)  create mode 100644
> > app/test/test_ring_hts_stress.c  create mode 100644
> > app/test/test_ring_rts_stress.c  create mode 100644
> > app/test/test_ring_stress.c  create mode 100644 app/test/test_ring_stress.h
> > create mode 100644 lib/librte_ring/rte_ring_hts_generic.h
> >  create mode 100644 lib/librte_ring/rte_ring_rts_generic.h
> >
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 5/6] ring: introduce HTS ring mode
  2020-03-25 20:44   ` Honnappa Nagarahalli
@ 2020-03-26 12:26     ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-03-26 12:26 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: olivier.matz, nd, nd

> > Introduce head/tail sync mode for MT ring synchronization.
> > In that mode enqueue/dequeue operation is fully serialized:
> > only one thread at a time is allowed to perform given op.
> > Suppose to reduce stall times in case when ring is used on overcommitted
> > cpus (multiple active threads on the same cpu).
> > As another enhancement provide ability to split enqueue/dequeue operation
> > into two phases:
> >   - enqueue/dequeue start
> >   - enqueue/dequeue finish
> > That allows user to inspect objects in the ring without removing them from it
> > (aka MT safe peek).
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/librte_ring/Makefile               |   1 +
> >  lib/librte_ring/meson.build            |   1 +
> >  lib/librte_ring/rte_ring.c             |  15 +-
> >  lib/librte_ring/rte_ring.h             | 259 ++++++++++++++++++++++++-
> >  lib/librte_ring/rte_ring_hts_generic.h | 228 ++++++++++++++++++++++
> >  5 files changed, 500 insertions(+), 4 deletions(-)  create mode 100644
> > lib/librte_ring/rte_ring_hts_generic.h
> >
> > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> > 4f90344f4..0c7f8f918 100644
> > --- a/lib/librte_ring/Makefile
> > +++ b/lib/librte_ring/Makefile
> > @@ -19,6 +19,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> > rte_ring.h \
> >  					rte_ring_elem.h \
> >  					rte_ring_generic.h \
> >  					rte_ring_c11_mem.h \
> > +					rte_ring_hts_generic.h \
> >  					rte_ring_rts_generic.h
> >
> >  include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> > dc8d7dbea..5aa673199 100644
> > --- a/lib/librte_ring/meson.build
> > +++ b/lib/librte_ring/meson.build
> > @@ -6,6 +6,7 @@ headers = files('rte_ring.h',
> >  		'rte_ring_elem.h',
> >  		'rte_ring_c11_mem.h',
> >  		'rte_ring_generic.h',
> > +		'rte_ring_hts_generic.h',
> >  		'rte_ring_rts_generic.h')
> >
> >  # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> > 1ce0af3e5..d3b948667 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -102,9 +102,9 @@ static int
> >  get_sync_type(uint32_t flags, uint32_t *prod_st, uint32_t *cons_st)  {
> >  	static const uint32_t prod_st_flags =
> > -		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
> > +		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ |
> > RING_F_MP_HTS_ENQ);
> >  	static const uint32_t cons_st_flags =
> > -		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
> > +		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ |
> > RING_F_MC_HTS_DEQ);
> >
> >  	switch (flags & prod_st_flags) {
> >  	case 0:
> > @@ -116,6 +116,9 @@ get_sync_type(uint32_t flags, uint32_t *prod_st,
> > uint32_t *cons_st)
> >  	case RING_F_MP_RTS_ENQ:
> >  		*prod_st = RTE_RING_SYNC_MT_RTS;
> >  		break;
> > +	case RING_F_MP_HTS_ENQ:
> > +		*prod_st = RTE_RING_SYNC_MT_HTS;
> > +		break;
> >  	default:
> >  		return -EINVAL;
> >  	}
> > @@ -130,6 +133,9 @@ get_sync_type(uint32_t flags, uint32_t *prod_st,
> > uint32_t *cons_st)
> >  	case RING_F_MC_RTS_DEQ:
> >  		*cons_st = RTE_RING_SYNC_MT_RTS;
> >  		break;
> > +	case RING_F_MC_HTS_DEQ:
> > +		*cons_st = RTE_RING_SYNC_MT_HTS;
> > +		break;
> >  	default:
> >  		return -EINVAL;
> >  	}
> > @@ -151,6 +157,11 @@ rte_ring_init(struct rte_ring *r, const char *name,
> > unsigned count,
> >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> >  			  RTE_CACHE_LINE_MASK) != 0);
> >
> > +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
> > +		offsetof(struct rte_ring_hts_headtail, sync_type));
> > +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
> > +		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
> > +
> >  	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
> >  		offsetof(struct rte_ring_rts_headtail, sync_type));
> >  	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) != diff --git
> > a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> > a130aeb9d..52edcea11 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -66,11 +66,11 @@ enum {
> >  	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> >  	RTE_RING_SYNC_ST,     /**< single thread only */
> >  	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
> > +	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
> >  };
> >
> >  /**
> > - * structure to hold a pair of head/tail values and other metadata.
> > - * used by RTE_RING_SYNC_MT, RTE_RING_SYNC_ST sync types.
> > + * Structure to hold a pair of head/tail values and other metadata.
> >   * Depending on sync_type format of that structure might differ
> >   * depending on the sync mechanism selelcted, but offsets for
> >   * *sync_type* and *tail* values should always remain the same.
> > @@ -96,6 +96,19 @@ struct rte_ring_rts_headtail {
> >  	volatile union rte_ring_ht_poscnt head;  };
> >
> > +union rte_ring_ht_pos {
> > +	uint64_t raw;
> > +	struct {
> > +		uint32_t tail; /**< tail position */
> > +		uint32_t head; /**< head position */
> > +	} pos;
> > +};
> > +
> > +struct rte_ring_hts_headtail {
> > +	uint32_t sync_type; /**< sync type of prod/cons */
> > +	volatile union rte_ring_ht_pos ht __rte_aligned(8); };
> > +
> >  /**
> >   * An RTE ring structure.
> >   *
> > @@ -126,6 +139,7 @@ struct rte_ring {
> >  	RTE_STD_C11
> >  	union {
> >  		struct rte_ring_headtail prod;
> > +		struct rte_ring_hts_headtail hts_prod;
> >  		struct rte_ring_rts_headtail rts_prod;
> >  	}  __rte_cache_aligned;
> >
> > @@ -135,6 +149,7 @@ struct rte_ring {
> >  	RTE_STD_C11
> >  	union {
> >  		struct rte_ring_headtail cons;
> > +		struct rte_ring_hts_headtail hts_cons;
> >  		struct rte_ring_rts_headtail rts_cons;
> >  	}  __rte_cache_aligned;
> >
> > @@ -157,6 +172,9 @@ struct rte_ring {
> >  #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS".
> > */  #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC
> > RTS". */
> >
> > +#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP
> > HTS".
> > +*/ #define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC
> > +HTS". */
> > +
> >  #define __IS_SP RTE_RING_SYNC_ST
> >  #define __IS_MP RTE_RING_SYNC_MT
> >  #define __IS_SC RTE_RING_SYNC_ST
> > @@ -513,6 +531,82 @@ __rte_ring_do_rts_dequeue(struct rte_ring *r, void
> > **obj_table,
> >  	return n;
> >  }
> >
> > +#include <rte_ring_hts_generic.h>
> > +
> > +/**
> > + * @internal Start to enqueue several objects on the HTS ring.
> > + * Note that user has to call appropriate enqueue_finish()
> > + * to complete given enqueue operation.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > + * @param free_space
> > + *   returns the amount of space after the enqueue operation has finished
> > + * @return
> > + *   Actual number of objects enqueued.
> > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_do_hts_enqueue_start(struct rte_ring *r, void * const *obj_table,
> > +		uint32_t n, enum rte_ring_queue_behavior behavior,
> > +		uint32_t *free_space)
> > +{
> > +	uint32_t free, head;
> > +
> > +	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
> > +
> > +	if (n != 0)
> > +		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
> > +
> > +	if (free_space != NULL)
> > +		*free_space = free - n;
> > +	return n;
> > +}
> rte_ring.h is becoming too big. May be we should move these functions to another HTS specific file. But leave the top level API in rte_ring.h.
> Similarly for RTS.

Good point, will try in v1.

> 
> > +
> > +/**
> > + * @internal Start to dequeue several objects from the HTS ring.
> > + * Note that user has to call appropriate dequeue_finish()
> > + * to complete given dequeue operation.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to pull from the ring.
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> > ring
> > + * @param available
> > + *   returns the number of remaining ring entries after the dequeue has
> > finished
> > + * @return
> > + *   - Actual number of objects dequeued.
> > + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_do_hts_dequeue_start(struct rte_ring *r, void **obj_table,
> > +		unsigned int n, enum rte_ring_queue_behavior behavior,
> > +		unsigned int *available)
> > +{
> > +	uint32_t entries, head;
> > +
> > +	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
> > +
> > +	if (n != 0)
> > +		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
> > +
> > +	if (available != NULL)
> > +		*available = entries - n;
> > +	return n;
> > +}
> > +
> >  /**
> >   * Enqueue several objects on the ring (multi-producers safe).
> >   *
> > @@ -585,6 +679,47 @@ rte_ring_rts_enqueue_bulk(struct rte_ring *r, void *
> > const *obj_table,
> >  			free_space);
> >  }
> >
> > +/**
> > + * Start to enqueue several objects on the HTS ring (multi-producers safe).
> > + * Note that user has to call appropriate dequeue_finish()
> > + * to complete given dequeue operation.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects enqueued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_ring_hts_enqueue_bulk_start(struct rte_ring *r, void * const *obj_table,
> > +			 unsigned int n, unsigned int *free_space) {
> > +	return __rte_ring_do_hts_enqueue_start(r, obj_table, n,
> > +		RTE_RING_QUEUE_FIXED, free_space);
> > +}
> I do not clearly understand the requirements on the enqueue_start and enqueue_finish in the form they are here.
> IMO, only requirement for these APIs is to provide the ability to avoid intermediate memcpys.

I think the main objective here is to provide 'MT safe peek' functionality.
The requirement is to split let say dequeue operation into two parts:
1. start - copy N elems into provided by user data buffer and guarantee that
    these elems will remain in the ring till finish().
2. finish - remove M(<=N) elems from the ring. 

For enqueue it a mirror:
1. start - reserve space for N elems in the ring.
2. finish - copy M (<=N) to the ring. 

> 
> > +
> > +static __rte_always_inline void
> > +rte_ring_hts_enqueue_finish(struct rte_ring *r, unsigned int n) {
> > +	__rte_ring_hts_update_tail(&r->hts_prod, n, 1); }
> > +
> > +static __rte_always_inline unsigned int
> > +rte_ring_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > +			 unsigned int n, unsigned int *free_space) {
> > +	n = rte_ring_hts_enqueue_bulk_start(r, obj_table, n, free_space);
> > +	if (n != 0)
> > +		rte_ring_hts_enqueue_finish(r, n);
> > +	return n;
> > +}
> > +
> >  /**
> >   * Enqueue several objects on a ring.
> >   *
> > @@ -615,6 +750,8 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const
> > *obj_table,
> >  		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
> >  	case RTE_RING_SYNC_MT_RTS:
> >  		return rte_ring_rts_enqueue_bulk(r, obj_table, n, free_space);
> > +	case RTE_RING_SYNC_MT_HTS:
> > +		return rte_ring_hts_enqueue_bulk(r, obj_table, n,
> > free_space);
> >  	}
> >
> >  	/* valid ring should never reach this point */ @@ -753,6 +890,47 @@
> > rte_ring_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
> >  			available);
> >  }
> >
> > +/**
> > + * Start to dequeue several objects from an HTS ring (multi-consumers safe).
> > + * Note that user has to call appropriate dequeue_finish()
> > + * to complete given dequeue operation.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   The number of objects dequeued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_ring_hts_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	return __rte_ring_do_hts_dequeue_start(r, obj_table, n,
> > +		RTE_RING_QUEUE_FIXED, available);
> > +}
> IMO, we should look to provide the ability to avoid intermediate copies when the data from the ring needs to be distributed to different
> locations.

As I said in other thread - I am not sure it would provide any gain in terms of performance.
Unless we have a case with bulk transfers and big size elems.
If you still strongly feel SG is needed here, I think it should be an add-on API not the main and only one.

> My proposal in its form is complicated. But, I am thinking that, if the return values are abstracted in a structure, it might look much simple.
> 
> > +
> > +static __rte_always_inline void
> > +rte_ring_hts_dequeue_finish(struct rte_ring *r, unsigned int n) {
> > +	__rte_ring_hts_update_tail(&r->hts_cons, n, 0); }
> > +
> > +static __rte_always_inline unsigned int
> > +rte_ring_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	n = rte_ring_hts_dequeue_bulk_start(r, obj_table, n, available);
> > +	if (n != 0)
> > +		rte_ring_hts_dequeue_finish(r, n);
> > +	return n;
> > +}
> > +
> >  /**
> >   * Dequeue several objects from a ring.
> >   *
> > @@ -783,6 +961,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > **obj_table, unsigned int n,
> >  		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> >  	case RTE_RING_SYNC_MT_RTS:
> >  		return rte_ring_rts_dequeue_bulk(r, obj_table, n, available);
> > +	case RTE_RING_SYNC_MT_HTS:
> > +		return rte_ring_hts_dequeue_bulk(r, obj_table, n, available);
> >  	}
> >
> >  	/* valid ring should never reach this point */ @@ -1111,6 +1291,41
> > @@ rte_ring_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> >  			RTE_RING_QUEUE_VARIABLE, free_space);  }
> >
> > +/**
> > + * Start to enqueue several objects on the HTS ring (multi-producers safe).
> > + * Note that user has to call appropriate dequeue_finish()
> > + * to complete given dequeue operation.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects enqueued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_ring_hts_enqueue_burst_start(struct rte_ring *r, void * const *obj_table,
> > +			 unsigned int n, unsigned int *free_space) {
> > +	return __rte_ring_do_hts_enqueue_start(r, obj_table, n,
> > +		RTE_RING_QUEUE_VARIABLE, free_space); }
> > +
> rte_ring_hts_enqueue_burst_finish is not implemented.

No need to, finish() is identical for both _bulk and _burst.
That's why we have 2 starts:
rte_ring_hts_enqueue_bulk_start()
rte_ring_hts_enqueue_burst_start()
and one finish:
rte_ring_hts_enqueue_finish().

Same story for dequeue.

> It requires the 'n' returned from ' rte_ring_hts_enqueue_burst_start' to be passed.

Yes, it requires some m <= n to be passed.
That's the whole point of peek - we want to be able to inspect N elems
possibly without retrieving them from the ring.
I.E.: inspect N, retrieve M <=N.
 

> We can't completely avoid passing correct information between xxx_start and xxx_finish APIs.

Yes we can't.
But in that model we can check that provided by finish() value is valid,
plus we don't provide user direct access to the contents of the ring,
and don't require him to specify head/tail values directly. 

 
> > +static __rte_always_inline unsigned int
> > +rte_ring_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > +			 unsigned int n, unsigned int *free_space) {
> > +	n = rte_ring_hts_enqueue_burst_start(r, obj_table, n, free_space);
> > +	if (n != 0)
> > +		rte_ring_hts_enqueue_finish(r, n);
> > +	return n;
> > +}
> > +
> >  /**
> >   * Enqueue several objects on a ring.
> >   *
> > @@ -1141,6 +1356,8 @@ rte_ring_enqueue_burst(struct rte_ring *r, void *
> > const *obj_table,
> >  		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> > free_space);
> >  	case RTE_RING_SYNC_MT_RTS:
> >  		return rte_ring_rts_enqueue_burst(r, obj_table, n,
> > free_space);
> > +	case RTE_RING_SYNC_MT_HTS:
> > +		return rte_ring_hts_enqueue_burst(r, obj_table, n,
> > free_space);
> >  	}
> >
> >  	/* valid ring should never reach this point */ @@ -1225,6 +1442,42
> > @@ rte_ring_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
> >  	return __rte_ring_do_rts_dequeue(r, obj_table, n,
> >  			RTE_RING_QUEUE_VARIABLE, available);  }
> > +
> > +/**
> > + * Start to dequeue several objects from an HTS ring (multi-consumers safe).
> > + * Note that user has to call appropriate dequeue_finish()
> > + * to complete given dequeue operation.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   The number of objects dequeued, either 0 or n
> > + */
> > +static __rte_always_inline unsigned int
> > +rte_ring_hts_dequeue_burst_start(struct rte_ring *r, void **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	return __rte_ring_do_hts_dequeue_start(r, obj_table, n,
> > +		RTE_RING_QUEUE_VARIABLE, available);
> > +}
> > +
> > +static __rte_always_inline unsigned int
> > +rte_ring_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	n = rte_ring_hts_dequeue_burst_start(r, obj_table, n, available);
> > +	if (n != 0)
> > +		rte_ring_hts_dequeue_finish(r, n);
> > +	return n;
> > +}
> > +
> >  /**
> >   * Dequeue multiple objects from a ring up to a maximum number.
> >   *
> > @@ -1255,6 +1508,8 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> > **obj_table,
> >  		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> >  	case RTE_RING_SYNC_MT_RTS:
> >  		return rte_ring_rts_dequeue_burst(r, obj_table, n, available);
> > +	case RTE_RING_SYNC_MT_HTS:
> > +		return rte_ring_hts_dequeue_burst(r, obj_table, n, available);
> >  	}
> >
> >  	/* valid ring should never reach this point */ diff --git
> > a/lib/librte_ring/rte_ring_hts_generic.h
> > b/lib/librte_ring/rte_ring_hts_generic.h
> > new file mode 100644
> > index 000000000..7e447e30b
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_hts_generic.h
> > @@ -0,0 +1,228 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2010-2020 Intel Corporation
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_HTS_GENERIC_H_
> > +#define _RTE_RING_HTS_GENERIC_H_
> > +
> > +/**
> > + * @file rte_ring_hts_generic.h
> > + * It is not recommended to include this file directly,
> > + * include <rte_ring.h> instead.
> > + * Contains internal helper functions for head/tail sync (HTS) ring mode.
> > + * In that mode enqueue/dequeue operation is fully serialized:
> > + * only one thread at a time is allowed to perform given op.
> > + * This is achieved by thread is allowed to proceed with changing
> > +head.value
> > + * only when head.value == tail.value.
> > + * Both head and tail values are updated atomically (as one 64-bit value).
> > + * As another enhancement that provides ability to split
> > +enqueue/dequeue
> > + * operation into two phases:
> > + * - enqueue/dequeue start
> > + * - enqueue/dequeue finish
> > + * That allows user to inspect objects in the ring without removing
> > + * them from it (aka MT safe peek).
> > + * As an example:
> > + * // read 1 elem from the ring:
> > + * n = rte_ring_hts_dequeue_bulk_start(ring, &obj, 1, NULL);
> > + * if (n != 0) {
> > + *    //examined object
> > + *    if (object_examine(obj) == KEEP)
> > + *       //decided to keep it in the ring.
> > + *       rte_ring_hts_dequeue_finish(ring, 0);
> > + *    else
> > + *       //decided to remove it in the ring.
> > + *       rte_ring_hts_dequeue_finish(ring, n);
> > + * }
> > + * Note that between _start_ and _finish_ the ring is sort of locked -
> > + * none other thread can proceed with enqueue(/dequeue) operation till
> > + * _finish_ will complete.
> This means it does not solve the problem for over committed systems. Do you agree?

I never stated that serialized ring fixes the problem completely.
I said that current approach mitigates is quite well.
And yes, I still believe that statement is correct.
See other thread for more detailed discussion.

> 
> > + */
> > +
> > +static __rte_always_inline void
> > +__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> > +	uint32_t enqueue)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_pos p;
> > +
> > +	if (enqueue)
> > +		rte_smp_wmb();
> > +	else
> > +		rte_smp_rmb();
> > +
> > +	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
> > +
> > +	n = p.pos.head - p.pos.tail;
> > +	RTE_ASSERT(n >= num);
> > +	RTE_SET_USED(n);
> > +
> > +	p.pos.head = p.pos.tail + num;
> > +	p.pos.tail = p.pos.head;
> > +
> > +	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw); }
> > +
> > +/**
> > + * @internal waits till tail will become equal to head.
> > + * Means no writer/reader is active for that ring.
> > + * Suppose to work as serialization point.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
> > +		union rte_ring_ht_pos *p)
> > +{
> > +	p->raw = rte_atomic64_read((rte_atomic64_t *)
> > +			(uintptr_t)&ht->ht.raw);
> > +
> > +	while (p->pos.head != p->pos.tail) {
> > +		rte_pause();
> > +		p->raw = rte_atomic64_read((rte_atomic64_t *)
> > +				(uintptr_t)&ht->ht.raw);
> > +	}
> > +}
> > +
> > +/**
> > + * @internal This function updates the producer head for enqueue
> > + *
> > + * @param r
> > + *   A pointer to the ring structure
> > + * @param is_sp
> > + *   Indicates whether multi-producer path is needed or not
> > + * @param n
> > + *   The number of elements we will want to enqueue, i.e. how far should the
> > + *   head be moved
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > + * @param old_head
> > + *   Returns head value as it was before the move, i.e. where enqueue starts
> > + * @param new_head
> > + *   Returns the current/new head value i.e. where enqueue finishes
> > + * @param free_entries
> > + *   Returns the amount of free space in the ring BEFORE head was moved
> > + * @return
> > + *   Actual number of objects enqueued.
> > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
> > +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> > +	uint32_t *free_entries)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_pos np, op;
> > +
> > +	const uint32_t capacity = r->capacity;
> > +
> > +	do {
> > +		/* Reset n to the initial burst count */
> > +		n = num;
> > +
> > +		/* wait for tail to be equal to head */
> > +		__rte_ring_hts_head_wait(&r->hts_prod, &op);
> > +
> > +		/* add rmb barrier to avoid load/load reorder in weak
> > +		 * memory model. It is noop on x86
> > +		 */
> > +		rte_smp_rmb();
> > +
> > +		/*
> > +		 *  The subtraction is done between two unsigned 32bits value
> > +		 * (the result is always modulo 32 bits even if we have
> > +		 * *old_head > cons_tail). So 'free_entries' is always between
> > 0
> > +		 * and capacity (which is < size).
> > +		 */
> > +		*free_entries = capacity + r->cons.tail - op.pos.head;
> > +
> > +		/* check that we have enough room in ring */
> > +		if (unlikely(n > *free_entries))
> > +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> > +					0 : *free_entries;
> > +
> > +		if (n == 0)
> > +			return 0;
> > +
> > +		np.pos.tail = op.pos.tail;
> > +		np.pos.head = op.pos.head + n;
> > +
> > +	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
> > +			op.raw, np.raw) == 0);
> > +
> > +	*old_head = op.pos.head;
> > +	return n;
> > +}
> > +
> > +/**
> > + * @internal This function updates the consumer head for dequeue
> > + *
> > + * @param r
> > + *   A pointer to the ring structure
> > + * @param is_sc
> > + *   Indicates whether multi-consumer path is needed or not
> > + * @param n
> > + *   The number of elements we will want to enqueue, i.e. how far should the
> > + *   head be moved
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> > ring
> > + * @param old_head
> > + *   Returns head value as it was before the move, i.e. where dequeue starts
> > + * @param new_head
> > + *   Returns the current/new head value i.e. where dequeue finishes
> > + * @param entries
> > + *   Returns the number of entries in the ring BEFORE head was moved
> > + * @return
> > + *   - Actual number of objects dequeued.
> > + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
> > +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> > +	uint32_t *entries)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_pos np, op;
> > +
> > +	/* move cons.head atomically */
> > +	do {
> > +		/* Restore n as it may change every loop */
> > +		n = num;
> > +
> > +		/* wait for tail to be equal to head */
> > +		__rte_ring_hts_head_wait(&r->hts_cons, &op);
> > +
> > +		/* add rmb barrier to avoid load/load reorder in weak
> > +		 * memory model. It is noop on x86
> > +		 */
> > +		rte_smp_rmb();
> > +
> > +		/* The subtraction is done between two unsigned 32bits value
> > +		 * (the result is always modulo 32 bits even if we have
> > +		 * cons_head > prod_tail). So 'entries' is always between 0
> > +		 * and size(ring)-1.
> > +		 */
> > +		*entries = r->prod.tail - op.pos.head;
> > +
> > +		/* Set the actual entries for dequeue */
> > +		if (n > *entries)
> > +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> > *entries;
> > +
> > +		if (unlikely(n == 0))
> > +			return 0;
> > +
> > +		np.pos.tail = op.pos.tail;
> > +		np.pos.head = op.pos.head + n;
> > +
> > +	} while (rte_atomic64_cmpset(&r->hts_cons.ht.raw,
> > +			op.raw, np.raw) == 0);
> > +
> > +	*old_head = op.pos.head;
> > +	return n;
> > +}
> > +
> > +#endif /* _RTE_RING_HTS_GENERIC_H_ */
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-03-26  1:50   ` Ananyev, Konstantin
@ 2020-03-30 21:29     ` Honnappa Nagarahalli
  2020-03-30 23:37       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-03-30 21:29 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: olivier.matz, nd, Honnappa Nagarahalli, nd

<snip>

> >
> > > Subject: [dpdk-dev] [RFC 0/6] New sync modes for ring
> > >
> > > Upfront note - that RFC is not a complete patch.
> > > It introduces an ABI breakage, plus it doesn't update ring_elem code
> > > properly,
> > As per the current rules, these changes (in the current form) will be
> > accepted only for 20.11 release. How do we address this for immediate
> requirements like RCU defer APIs?
> 
> I think I found a way to introduce these new modes without API/ABI breakage.
> Working on v1 right now. Plan to submit it by end of that week/start of next
> one.
ok

> 
> > I suggest that we move forward with my RFC (taking into consideration your
> feedback) to make progress on RCU APIs.
> >
> > > etc.
> > > I plan to deal with all these things in later versions.
> > > Right now I seek an initial feedback about proposed ideas.
> > > Would also ask people to repeat performance tests (see below) on
> > > their platforms to confirm the impact.
> > >
> > > More and more customers use(/try to use) DPDK based apps within
> > > overcommitted systems (multiple acttive threads over same pysical cores):
> > > VM, container deployments, etc.
> > > One quite common problem they hit: Lock-Holder-Preemption with
> rte_ring.
> > > LHP is quite a common problem for spin-based sync primitives
> > > (spin-locks, etc.) on overcommitted systems.
> > > The situation gets much worse when some sort of fair-locking
> > > technique is used (ticket-lock, etc.).
> > > As now not only lock-owner but also lock-waiters scheduling order
> > > matters a lot.
> > > This is a well-known problem for kernel within VMs:
> > > http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> > > https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> > > The problem with rte_ring is that while head accusion is sort of
> > > un-fair locking, waiting on tail is very similar to ticket lock
> > > schema - tail has to be updated in particular order.
> > > That makes current rte_ring implementation to perform really pure on
> > > some overcommited scenarios.
> > > While it is probably not possible to completely resolve this problem
> > > in userspace only (without some kernel communication/intervention),
> > > removing fairness in tail update can mitigate it significantly.
> > > So this RFC proposes two new optional ring synchronization modes:
> > > 1) Head/Tail Sync (HTS) mode
> > > In that mode enqueue/dequeue operation is fully serialized:
> > >     only one thread at a time is allowed to perform given op.
> > >     As another enhancement provide ability to split enqueue/dequeue
> > >     operation into two phases:
> > >       - enqueue/dequeue start
> > >       - enqueue/dequeue finish
> > >     That allows user to inspect objects in the ring without removing
> > >     them from it (aka MT safe peek).
> > IMO, this will not address the problem described above.
> 
> It does, please see the results produced by ring_stress_*autotest below.
> Let say for test-case: 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8' it
Had not looked at these tests. Please see the numbers below.

> shows:
> avg number of cycles per object for enqueue /dequeue:
> MP/MC: 280314.32
> HTS:         294.72
> RTS:         318.79
> 
> Customer who tried it reported similar level of improvement.
Is this tested with the VM/Container setup described in the slides you referred to?

> Actually if you have time - would be very interesting to see what numbers will
> be on ARM boxes.
> To reproduce, just:
> $cat ring_tests_u4
> ring_stress_autotest
> ring_stress_hts_autotest
> ring_stress_rts_autotest
> 
> /app/test/dpdk-test --lcores='6,(10-13)@7,(20-23)@8'  -n 4 < ring_tests_u4
> 2>&1 | tee res1
> 
> Then look at the ' AGGREGATE' stats.
> Right now it is a bit too verbose, so probably the easiest thing to extract same
> numbers quickly:
> grep 'cycles/obj'  res1 | grep 'cycles/obj' | cat -n | awk '{if ($(1)%9==0) print
> $(NF);}'
> 280314.32
> 1057833.55
> 294.72
> 480.10
> 318.79
> 461.52
> 
> First 2 numbers will be for MP/MC, next 2 for HTS, last 2 for RTS.
12305.05
12027.09
3.59
7.37
4.41
7.98

> 
> > For ex: when a producer updates the head and gets scheduled out, other
> > producers have to spin.
> 
> Sure, as I wrote in original cover letter:
> " While it is probably not possible to completely resolve this problem in
> userspace only (without some kernel communication/intervention), removing
> fairness in tail update can mitigate it significantly."
> Results from the ring_stress_*_autotest confirm that.
> 
> > The problem is probably worse as with non-HTS case moving of the head
> > and copying of the ring elements can happen in parallel between the
> producers (similarly for consumers).
> 
> Yes as we serialize the ring, we remove possibility of simultaneous copy.
> That's why for 'normal' cases (one thread per core) original MP/MC is usually
> faster.
> Though on overcommitted scenarios current MP/MC performance degrades
> dramatically.
> The main problem with current MP/MC implementation is in that tail update
> have to be done in strict order (sort of fair locking scheme).
> Which means that we have much-much worse LHP manifestation, then when
> we use unfair schemes.
> With serialized ring (HTS) we remove that ordering completely (same idea as
> switch from fair to unfair locking for PV spin-locks).
> 
> > IMO, HTS should not be a configurable flag.
> 
> Why?
> 
> > In RCU requirement, a MP enqueue and HTS dequeue are required.
> 
> This is supported, user can specify different modes for consumer and
> producer:
> (0 | RING_F_MC_HTS_DEQ).
> Then it is up to the user either to call generic
> rte_ring_enqueue/rte_ring_dequeue,
> or specify mode manually by function name:
> rte_ring_mp_enqueue_bulk/ rte_ring_hts_dequeue_bulk.
Ok, that should be good.

> 
> >
> > > 2) Relaxed Tail Sync (RTS)
> > > The main difference from original MP/MC algorithm is that tail value
> > > is increased not by every thread that finished enqueue/dequeue, but
> > > only by the last one.
> > > That allows threads to avoid spinning on ring tail value, leaving
> > > actual tail value change to the last thread in the update queue.
> > This can be a configurable flag on the ring.
> > I am not sure how this solves the problem you have stated above
> > completely. Updating the count from all intermediate threads is still
> > required to update the value of the head. But yes, it reduces the severity of
> the problem by not enforcing the order in which the tail is updated.
> 
> As I said above, main source of slowdown here - that we have to  update tail in
> particular order.
> So the main objective (same as for HTS) is to remove that ordering.
> 
> > I also think it introduces the problem on the other side of the ring
> > because the tail is not updated soon enough (the other side has to wait
> longer for the elements to become available).
> 
> Yes, producer/consumer starvation.
> That's why we need max allowed Head-Tail-Distance (htd_max) - to limit how
> far head can go away from tail.
> 
> > It also introduces another configuration parameter (HTD_MAX_DEF) which
> > they have to deal with.
> 
> If user doesn't provide any value, it will be set by default to ring.capacity / 8.
> From my measurements works quite well.
> Though there possibility for the user to set another value, if needed.
> 
> > Users have to still implement the current hypervisor related solutions.
> 
> Didn't get what you trying to say with that phrase.
The references you provided talked about resolving LHP by doing co-scheduling of vCPUs (which I think could be applied to DPDK applications). I am saying that we still need such mechanisms along with these solutions.

> 
> > IMO, we should run the benchmark for this on an over committed setup to
> understand the benefits.
> 
> That's why I created ring_stress_*autotest test-cases and collected numbers
> provided below.
> I suppose they clearly show the problem on overcommitted scenarios, and
> how RTS/HTS improve that situation.
> Would appreciate if you repeat these tests on your machines.
> 
> >
> > >
> > > Test results on IA (see below) show significant improvements for
> > > average enqueue/dequeue op times on overcommitted systems.
> > > For 'classic' DPDK deployments (one thread per core) original MP/MC
> > > algorithm still shows best numbers, though for 64-bit target RTS
> > > numbers are not that far away.
> > > Numbers were produced by ring_stress_*autotest (first patch in these
> series).
> > >
> > > X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> > > DEQ+ENQ average cycles/obj
> > >
> > >                                                 MP/MC      HTS     RTS
> > > 1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
> > > 2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
> > > 4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
> > > 8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
> > > 16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
> > > 32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51
> > >
> > > 2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
> > > 4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88
> 80.05
> > > 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72
> > > 318.79 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59
> > > 1144.02
> > > 1175.14 32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80
> > > 4627.48 4892.68
> > >
> > > 8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
> > > 16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35
> 678.29
> > > 32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36
> 2714.12
> > >
> > > i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> > > DEQ+ENQ average cycles/obj
> > >
> > >                                                 MP/MC      HTS     RTS
> > > 1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
> > > 2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
> > > 8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
> > > 32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91
> > >
> > > 2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
> > > 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61
> > > 361.57 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86
> > > 1314.90
> > > 1416.65
> > >
> > > 8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
> > > 32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44
> 3028.87
> > >
> > > Konstantin Ananyev (6):
> > >   test/ring: add contention stress test
> > >   ring: rework ring layout to allow new sync schemes
> > >   ring: introduce RTS ring mode
> > >   test/ring: add contention stress test for RTS ring
> > >   ring: introduce HTS ring mode
> > >   test/ring: add contention stress test for HTS ring
> > >
> > >  app/test/Makefile                      |   3 +
> > >  app/test/meson.build                   |   3 +
> > >  app/test/test_pdump.c                  |   6 +-
> > >  app/test/test_ring_hts_stress.c        |  28 ++
> > >  app/test/test_ring_rts_stress.c        |  28 ++
> > >  app/test/test_ring_stress.c            |  27 ++
> > >  app/test/test_ring_stress.h            | 477 +++++++++++++++++++
> > >  lib/librte_pdump/rte_pdump.c           |   2 +-
> > >  lib/librte_port/rte_port_ring.c        |  12 +-
> > >  lib/librte_ring/Makefile               |   4 +-
> > >  lib/librte_ring/meson.build            |   4 +-
> > >  lib/librte_ring/rte_ring.c             |  84 +++-
> > >  lib/librte_ring/rte_ring.h             | 619 +++++++++++++++++++++++--
> > >  lib/librte_ring/rte_ring_elem.h        |   8 +-
> > >  lib/librte_ring/rte_ring_hts_generic.h | 228 +++++++++
> > > lib/librte_ring/rte_ring_rts_generic.h | 240 ++++++++++
> > >  16 files changed, 1721 insertions(+), 52 deletions(-)  create mode
> > > 100644 app/test/test_ring_hts_stress.c  create mode 100644
> > > app/test/test_ring_rts_stress.c  create mode 100644
> > > app/test/test_ring_stress.c  create mode 100644
> > > app/test/test_ring_stress.h create mode 100644
> > > lib/librte_ring/rte_ring_hts_generic.h
> > >  create mode 100644 lib/librte_ring/rte_ring_rts_generic.h
> > >
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-03-30 21:29     ` Honnappa Nagarahalli
@ 2020-03-30 23:37       ` Honnappa Nagarahalli
  2020-03-31 17:21         ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-03-30 23:37 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: olivier.matz, nd, Honnappa Nagarahalli, nd

<snip>
> 
> > >
> > > > Subject: [dpdk-dev] [RFC 0/6] New sync modes for ring
> > > >
> > > > Upfront note - that RFC is not a complete patch.
> > > > It introduces an ABI breakage, plus it doesn't update ring_elem
> > > > code properly,
> > > As per the current rules, these changes (in the current form) will
> > > be accepted only for 20.11 release. How do we address this for
> > > immediate
> > requirements like RCU defer APIs?
> >
> > I think I found a way to introduce these new modes without API/ABI
> breakage.
> > Working on v1 right now. Plan to submit it by end of that week/start
> > of next one.
> ok
RCU defer APIs require the rte_ring_xxx_elem versions. I guess you are adding those as well.

> 
<snip>


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 0/8] New sync modes for ring
  2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
                   ` (7 preceding siblings ...)
  2020-03-25 20:43 ` Honnappa Nagarahalli
@ 2020-03-31 16:43 ` Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 1/8] test/ring: add contention stress test Konstantin Ananyev
                     ` (8 more replies)
  8 siblings, 9 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

RFC - V1 changes:
1. remove ABI brekage (at least I hope I did)
2. Add support for ring_elem
3. Rework peek related API a bit
4. Rework test to make it less verbose and unite all test-cases
   in one command
5. Add new test-case for MT peek API

TODO list:
1. Add C11 atomics support
2. Update docs

These days many customers use(/try to use) DPDK based apps within
overcommitted systems (multiple acttive threads over same pysical cores):
VM, container deployments, etc.
One quite common problem they hit:
Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
LHP/LWP are quite a common problem for spin-based sync primitives
(spin-locks, etc.) on overcommitted systems.
The situation gets much worse when some sort of
fair-locking technique is used (ticket-lock, etc.).
As now not only lock-owner but also lock-waiters scheduling
order matters a lot.
This is a well-known problem for kernel within VMs:
http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
The problem with rte_ring is that while head accusion is sort of
un-fair locking, waiting on tail is very similar to ticket lock schema -
tail has to be updated in particular order.
That makes current rte_ring implementation to perform
really pure on some overcommited scenarios.
While it is probably not possible to completely resolve LHP problem in
userspace only (without some kernel communication/intervention),
removing fairness in tail update can mitigate it significantly.
So this RFC proposes two new optional ring synchronization modes:
1) Head/Tail Sync (HTS) mode
In that mode enqueue/dequeue operation is fully serialized:
    only one thread at a time is allowed to perform given op.
    As another enhancement provide ability to split enqueue/dequeue
    operation into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
    That allows user to inspect objects in the ring without removing
    them from it (aka MT safe peek).
2) Relaxed Tail Sync (RTS)
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Note that these new sync modes are optional.
For current rte_ring users nothing should change
(both in terms of API/ABI and performance).
Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
way (via flags and _init_), and MP/MC remains as default one.
The only thing that changed:
Format of prod/cons now could differ depending on mode selected at _init_.
So user has to stick with one sync model through whole ring lifetime.
In other words, user can't create a ring for let say SP mode and then
in the middle of data-path change his mind and start using MP_RTS mode.
For existing modes (SP/MP, SC/MC) format remains the same and
user can still use them interchangeably, though of course it is an
error prone practice. 

Test results on IA (see below) show significant improvements
for average enqueue/dequeue op times on overcommitted systems.
For 'classic' DPDK deployments (one thread per core) original MP/MC
algorithm still shows best numbers, though for 64-bit target
RTS numbers are not that far away.
Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'

X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj

                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51

2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68

8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12

i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj

                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91

2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65

8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

Konstantin Ananyev (8):
  test/ring: add contention stress test
  ring: prepare ring to allow new sync schemes
  ring: introduce RTS ring mode
  test/ring: add contention stress test for RTS ring
  ring: introduce HTS ring mode
  test/ring: add contention stress test for HTS ring
  ring: introduce peek style API
  test/ring: add stress test for MT peek API

 app/test/Makefile                      |   5 +
 app/test/meson.build                   |   5 +
 app/test/test_pdump.c                  |   6 +-
 app/test/test_ring_hts_stress.c        |  32 ++
 app/test/test_ring_mpmc_stress.c       |  31 ++
 app/test/test_ring_peek_stress.c       |  43 +++
 app/test/test_ring_rts_stress.c        |  32 ++
 app/test/test_ring_stress.c            |  57 ++++
 app/test/test_ring_stress.h            |  37 +++
 app/test/test_ring_stress_impl.h       | 436 +++++++++++++++++++++++++
 lib/librte_pdump/rte_pdump.c           |   2 +-
 lib/librte_port/rte_port_ring.c        |  12 +-
 lib/librte_ring/Makefile               |   9 +-
 lib/librte_ring/meson.build            |   9 +-
 lib/librte_ring/rte_ring.c             | 114 ++++++-
 lib/librte_ring/rte_ring.h             | 243 ++++++++++++--
 lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
 lib/librte_ring/rte_ring_elem.h        | 105 +++++-
 lib/librte_ring/rte_ring_generic.h     |  48 +++
 lib/librte_ring/rte_ring_hts.h         | 210 ++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    | 205 ++++++++++++
 lib/librte_ring/rte_ring_hts_generic.h | 235 +++++++++++++
 lib/librte_ring/rte_ring_peek.h        | 379 +++++++++++++++++++++
 lib/librte_ring/rte_ring_rts.h         | 316 ++++++++++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++
 lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++
 26 files changed, 2974 insertions(+), 56 deletions(-)
 create mode 100644 app/test/test_ring_hts_stress.c
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_peek_stress.c
 create mode 100644 app/test/test_ring_rts_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_hts_generic.h
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_generic.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 1/8] test/ring: add contention stress test
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
@ 2020-03-31 16:43   ` Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 2/8] ring: prepare ring to allow new sync schemes Konstantin Ananyev
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

Introduce new test-case to measure ring perfomance under contention
(miltiple producers/consumers).
Starts dequeue/enqueue loop on all available slave lcores.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |   2 +
 app/test/meson.build             |   2 +
 app/test/test_ring_mpmc_stress.c |  31 +++
 app/test/test_ring_stress.c      |  48 ++++
 app/test/test_ring_stress.h      |  34 +++
 app/test/test_ring_stress_impl.h | 436 +++++++++++++++++++++++++++++++
 6 files changed, 553 insertions(+)
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h

diff --git a/app/test/Makefile b/app/test/Makefile
index 1f080d162..4eefaa887 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 351d29cb6..827b04886 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,7 +100,9 @@ test_sources = files('commands.c',
 	'test_rib.c',
 	'test_rib6.c',
 	'test_ring.c',
+	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_mpmc_stress.c b/app/test/test_ring_mpmc_stress.c
new file mode 100644
index 000000000..1524b0248
--- /dev/null
+++ b/app/test/test_ring_mpmc_stress.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num, 0);
+}
+
+const struct test test_ring_mpmc_stress = {
+	.name = "MP/MC",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
new file mode 100644
index 000000000..60706f799
--- /dev/null
+++ b/app/test/test_ring_stress.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static int
+run_test(const struct test *test)
+{
+	int32_t rc;
+	uint32_t i, k;
+
+	for (i = 0, k = 0; i != test->nb_case; i++) {
+
+		printf("TEST-CASE %s %s START\n",
+			test->name, test->cases[i].name);
+
+		rc = test->cases[i].func(test->cases[i].wfunc);
+		k += (rc == 0);
+
+		if (rc != 0)
+			printf("TEST-CASE %s %s FAILED\n",
+				test->name, test->cases[i].name);
+		else
+			printf("TEST-CASE %s %s OK\n",
+				test->name, test->cases[i].name);
+	}
+
+	return k;
+}
+
+static int
+test_ring_stress(void)
+{
+	uint32_t n, k;
+
+	n = 0;
+	k = 0;
+
+	n += test_ring_mpmc_stress.nb_case;
+	k += run_test(&test_ring_mpmc_stress);
+
+	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+		n, k, n - k);
+	return (k != n);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
new file mode 100644
index 000000000..5ab121fe4
--- /dev/null
+++ b/app/test/test_ring_stress.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <rte_ring.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+#include <rte_random.h>
+#include <rte_malloc.h>
+
+#include "test.h"
+
+struct test_case {
+	const char *name;
+	int (*func)(int (*)(void *));
+	int (*wfunc)(void *arg);
+};
+
+struct test {
+	const char *name;
+	uint32_t nb_case;
+	const struct test_case *cases;
+};
+
+extern const struct test test_ring_mpmc_stress;
diff --git a/app/test/test_ring_stress_impl.h b/app/test/test_ring_stress_impl.h
new file mode 100644
index 000000000..0d6f0d2ae
--- /dev/null
+++ b/app/test/test_ring_stress_impl.h
@@ -0,0 +1,436 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+/*
+ * Measures performance of ring enqueue/dequeue under high contention
+ */
+
+#define RING_NAME	"RING_STRESS"
+#define BULK_NUM	32
+#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
+
+enum {
+	WRK_CMD_STOP,
+	WRK_CMD_RUN,
+};
+
+static volatile uint32_t wrk_cmd __rte_cache_aligned;
+
+/* test run-time in seconds */
+static const uint32_t run_time = 60;
+static const uint32_t verbose;
+
+struct lcore_stat {
+	uint64_t nb_cycle;
+	struct {
+		uint64_t nb_call;
+		uint64_t nb_obj;
+		uint64_t nb_cycle;
+		uint64_t max_cycle;
+		uint64_t min_cycle;
+	} op;
+};
+
+struct lcore_arg {
+	struct rte_ring *rng;
+	struct lcore_stat stats;
+} __rte_cache_aligned;
+
+struct ring_elem {
+	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)];
+} __rte_cache_aligned;
+
+/*
+ * redefinable functions
+ */
+static uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail);
+
+static uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free);
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
+
+
+static void
+lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
+	uint64_t tm, int32_t prcs)
+{
+	ls->op.nb_call += call;
+	ls->op.nb_obj += obj;
+	ls->op.nb_cycle += tm;
+	if (prcs) {
+		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
+		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
+	}
+}
+
+static void
+lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+
+	ms->op.nb_call += ls->op.nb_call;
+	ms->op.nb_obj += ls->op.nb_obj;
+	ms->op.nb_cycle += ls->op.nb_cycle;
+	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
+	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle);
+}
+
+static void
+lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
+	lcore_op_stat_aggr(ms, ls);
+}
+
+static void
+lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls)
+{
+	long double st;
+
+	st = (long double)rte_get_timer_hz() / US_PER_S;
+
+	if (lc == UINT32_MAX)
+		fprintf(f, "%s(AGGREGATE)={\n", __func__);
+	else
+		fprintf(f, "%s(lc=%u)={\n", __func__, lc);
+
+	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
+		ls->nb_cycle, (long double)ls->nb_cycle / st);
+
+	fprintf(f, "\tDEQ+ENQ={\n");
+
+	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
+	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
+	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
+	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_obj / ls->op.nb_call);
+	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_obj);
+	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_call);
+
+	/* if min/max cycles per call stats was collected */
+	if (ls->op.min_cycle != UINT64_MAX) {
+		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.max_cycle,
+			(long double)ls->op.max_cycle / st);
+		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.min_cycle,
+			(long double)ls->op.min_cycle / st);
+	}
+
+	fprintf(f, "\t},\n");
+	fprintf(f, "};\n");
+}
+
+static void
+fill_ring_elm(struct ring_elem *elm, uint32_t fill)
+{
+	uint32_t i;
+
+	for (i = 0; i != RTE_DIM(elm->cnt); i++)
+		elm->cnt[i] = fill;
+}
+
+static int32_t
+check_updt_elem(struct ring_elem *elm[], uint32_t num,
+	const struct ring_elem *check, const struct ring_elem *fill)
+{
+	uint32_t i;
+
+	static rte_spinlock_t dump_lock;
+
+	for (i = 0; i != num; i++) {
+		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
+			rte_spinlock_lock(&dump_lock);
+			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
+				"offending object: %p\n",
+				__func__, rte_lcore_id(), num, i, elm[i]);
+			rte_memdump(stdout, "expected", check, sizeof(*check));
+			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
+			rte_spinlock_unlock(&dump_lock);
+			return -EINVAL;
+		}
+		memcpy(elm[i], fill, sizeof(*elm[i]));
+	}
+
+	return 0;
+}
+
+static int
+check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
+	const char *fname, const char *opname)
+{
+	if (exp != res) {
+		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
+			fname, lc, opname, exp, res);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static int
+test_worker_prcs(void *arg)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl, tm0, tm1;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		tm0 = rte_rdtsc_precise();
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+		tm0 = rte_rdtsc_precise() - tm0;
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		tm1 = rte_rdtsc_precise();
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+		tm1 = rte_rdtsc_precise() - tm1;
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, 1);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	la->stats.nb_cycle = rte_rdtsc_precise() - cl;
+	return rc;
+}
+
+static int
+test_worker_avg(void *arg)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, 0, 0);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	/* final stats update */
+	cl = rte_rdtsc_precise() - cl;
+	lcore_stat_update(&la->stats, 0, 0, cl, 0);
+	la->stats.nb_cycle = cl;
+
+	return rc;
+}
+
+static void
+mt1_fini(struct rte_ring *rng, void *data)
+{
+	rte_free(rng);
+	rte_free(data);
+}
+
+static int
+mt1_init(struct rte_ring **rng, void **data, uint32_t num)
+{
+	int32_t rc;
+	size_t sz;
+	uint32_t i, nr;
+	struct rte_ring *r;
+	struct ring_elem *elm;
+	void *p;
+
+	*rng = NULL;
+	*data = NULL;
+
+	sz = num * sizeof(*elm);
+	elm = rte_zmalloc(NULL, sz, alignof(*elm));
+
+	/* alloc ring */
+	nr = 2 * num;
+	sz = rte_ring_get_memsize(nr);
+	r = rte_zmalloc(NULL, sz, alignof(*r));
+	if (r == NULL) {
+		printf("%s: alloca(%zu) for FIFO with %u elems failed",
+			__func__, sz, nr);
+		return -ENOMEM;
+	}
+	rc = _st_ring_init(r, RING_NAME, nr);
+	if (rc != 0) {
+		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
+			__func__, r, nr, rc, strerror(-rc));
+		return rc;
+	}
+
+	for (i = 0; i != num; i++) {
+		fill_ring_elm(elm + i, UINT32_MAX);
+		p = elm + i;
+		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
+			break;
+	}
+
+	if (i != num) {
+		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
+			__func__, r, num, i);
+		return -ENOSPC;
+	}
+
+	*rng = r;
+	*data = elm;
+	return 0;
+}
+
+static int
+test_mt1(int (*test)(void *))
+{
+	int32_t rc;
+	uint32_t lc, mc;
+	struct rte_ring *r;
+	void *data;
+	struct lcore_arg arg[RTE_MAX_LCORE];
+
+	static const struct lcore_stat init_stat = {
+		.op.min_cycle = UINT64_MAX,
+	};
+
+	rc = mt1_init(&r, &data, RING_SIZE);
+	if (rc != 0) {
+		mt1_fini(r, data);
+		return rc;
+	}
+
+	memset(arg, 0, sizeof(arg));
+
+	/* launch on all slaves */
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		arg[lc].rng = r;
+		arg[lc].stats = init_stat;
+		rte_eal_remote_launch(test, &arg[lc], lc);
+	}
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_RUN;
+	rte_smp_wmb();
+
+	usleep(run_time * US_PER_S);
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_STOP;
+	rte_smp_wmb();
+
+	/* wait for slaves and collect stats. */
+	mc = rte_lcore_id();
+	arg[mc].stats = init_stat;
+
+	rc = 0;
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		rc |= rte_eal_wait_lcore(lc);
+		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
+		if (verbose != 0)
+			lcore_stat_dump(stdout, lc, &arg[lc].stats);
+	}
+
+	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+	mt1_fini(r, data);
+	return rc;
+}
+
+static const struct test_case tests[] = {
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
+		.func = test_mt1,
+		.wfunc = test_worker_prcs,
+	},
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
+		.func = test_mt1,
+		.wfunc = test_worker_avg,
+	},
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 2/8] ring: prepare ring to allow new sync schemes
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 1/8] test/ring: add contention stress test Konstantin Ananyev
@ 2020-03-31 16:43   ` Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 3/8] ring: introduce RTS ring mode Konstantin Ananyev
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

Change from *single* to *sync_type* to allow different
synchronisation schemes to be applied.
Mark *single* as deprecated in comments.
Add new functions to allow user to query ring sync types.
Replace direct access to *single* with appopriate function call.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_pdump.c           |   6 +-
 lib/librte_pdump/rte_pdump.c    |   2 +-
 lib/librte_port/rte_port_ring.c |  12 ++--
 lib/librte_ring/rte_ring.c      |   6 +-
 lib/librte_ring/rte_ring.h      | 113 ++++++++++++++++++++++++++------
 lib/librte_ring/rte_ring_elem.h |   8 +--
 6 files changed, 108 insertions(+), 39 deletions(-)

diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index ad183184c..6a1180bcb 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -57,8 +57,7 @@ run_pdump_client_tests(void)
 	if (ret < 0)
 		return -1;
 	mp->flags = 0x0000;
-	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
-				      RING_F_SP_ENQ | RING_F_SC_DEQ);
+	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
 	if (ring_client == NULL) {
 		printf("rte_ring_create SR0 failed");
 		return -1;
@@ -71,9 +70,6 @@ run_pdump_client_tests(void)
 	}
 	rte_eth_dev_probing_finish(eth_dev);
 
-	ring_client->prod.single = 0;
-	ring_client->cons.single = 0;
-
 	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
 
 	for (itr = 0; itr < NUM_ITR; itr++) {
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 8a01ac510..65364f2c5 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 		rte_errno = EINVAL;
 		return -1;
 	}
-	if (ring->prod.single || ring->cons.single) {
+	if (rte_ring_prod_single(ring) || rte_ring_cons_single(ring)) {
 		PDUMP_LOG(ERR, "ring with either SP or SC settings"
 		" is not valid for pdump, should have MP and MC settings\n");
 		rte_errno = EINVAL;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 47fcdd06a..2f6c050fa 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->cons.single && is_multi) ||
-		(!(conf->ring->cons.single) && !is_multi)) {
+		(rte_ring_cons_single(conf->ring) && is_multi) ||
+		(!rte_ring_cons_single(conf->ring) && !is_multi)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
 	}
@@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
@@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 77e5de099..fa5733907 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
-	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
+	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 18fc5d845..d4775a063 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -61,11 +61,27 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-/* structure to hold a pair of head/tail values and other metadata */
+/** prod/cons sync types */
+enum rte_ring_sync_type {
+	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
+	RTE_RING_SYNC_ST,     /**< single thread only */
+};
+
+/**
+ * structure to hold a pair of head/tail values and other metadata.
+ * Depending on sync_type format of that structure might be different,
+ * but offset for *sync_type* and *tail* values should remain the same.
+ */
 struct rte_ring_headtail {
-	volatile uint32_t head;  /**< Prod/consumer head. */
-	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t single;         /**< True if single prod/cons */
+	volatile uint32_t head;      /**< prod/consumer head. */
+	volatile uint32_t tail;      /**< prod/consumer tail. */
+	RTE_STD_C11
+	union {
+		/** sync type of prod/cons */
+		enum rte_ring_sync_type sync_type;
+		/** deprecated -  True if single prod/cons */
+		uint32_t single;
+	};
 };
 
 /**
@@ -116,11 +132,10 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
-/* @internal defines for passing to the enqueue dequeue worker functions */
-#define __IS_SP 1
-#define __IS_MP 0
-#define __IS_SC 1
-#define __IS_MC 0
+#define __IS_SP RTE_RING_SYNC_ST
+#define __IS_MP RTE_RING_SYNC_MT
+#define __IS_SC RTE_RING_SYNC_ST
+#define __IS_MC RTE_RING_SYNC_MT
 
 /**
  * Calculate the memory size needed for a ring
@@ -420,7 +435,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MP, free_space);
+			RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -443,7 +458,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SP, free_space);
+			RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -470,7 +485,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -554,7 +569,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MC, available);
+			RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -578,7 +593,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SC, available);
+			RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -605,7 +620,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 /**
@@ -777,6 +792,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
 	return r->capacity;
 }
 
+/**
+ * Return sync type used by producer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_prod_sync_type(const struct rte_ring *r)
+{
+	return r->prod.sync_type;
+}
+
+/**
+ * Check is the ring for single producer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SP, zero otherwise.
+ */
+static inline int
+rte_ring_prod_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
+/**
+ * Return sync type used by consumer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_cons_sync_type(const struct rte_ring *r)
+{
+	return r->cons.sync_type;
+}
+
+/**
+ * Check is the ring for single consumer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SC, zero otherwise.
+ */
+static inline int
+rte_ring_cons_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -820,7 +891,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -843,7 +914,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -870,7 +941,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -898,7 +969,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -923,7 +994,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -951,7 +1022,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 663addc73..28f9836e6 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -570,7 +570,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
 }
 
 /**
@@ -734,7 +734,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
 }
 
 /**
@@ -902,7 +902,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
 }
 
 /**
@@ -995,7 +995,7 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 3/8] ring: introduce RTS ring mode
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 1/8] test/ring: add contention stress test Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 2/8] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-03-31 16:43   ` Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 4/8] test/ring: add contention stress test for RTS ring Konstantin Ananyev
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
Aim to reduce stall times in case when ring is used on
overcommited cpus (multiple active threads on the same cpu).
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   5 +-
 lib/librte_ring/meson.build            |   5 +-
 lib/librte_ring/rte_ring.c             | 100 +++++++-
 lib/librte_ring/rte_ring.h             | 109 ++++++++-
 lib/librte_ring/rte_ring_elem.h        |  86 ++++++-
 lib/librte_ring/rte_ring_rts.h         | 316 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++++++
 lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++
 8 files changed, 1007 insertions(+), 29 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_generic.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 917c560ad..8f5c284cc 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -18,6 +18,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_rts.h \
+					rte_ring_rts_elem.h \
+					rte_ring_rts_generic.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index f2f3ccc88..612936afb 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -5,7 +5,10 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_rts.h',
+		'rte_ring_rts_elem.h',
+		'rte_ring_rts_generic.h')
 
 # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
 allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index fa5733907..222eec0fb 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
+/* by default set head/tail distance as 1/8 of ring capacity */
+#define HTD_MAX_DEF	8
+
 /* return the size of memory occupied by a ring */
 ssize_t
 rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
@@ -79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
 	return rte_ring_get_memsize_elem(sizeof(void *), count);
 }
 
+/*
+ * internal helper function to reset prod/cons head-tail values.
+ */
+static void
+reset_headtail(void *p)
+{
+	struct rte_ring_headtail *ht;
+	struct rte_ring_rts_headtail *ht_rts;
+
+	ht = p;
+	ht_rts = p;
+
+	switch (ht->sync_type) {
+	case RTE_RING_SYNC_MT:
+	case RTE_RING_SYNC_ST:
+		ht->head = 0;
+		ht->tail = 0;
+		break;
+	case RTE_RING_SYNC_MT_RTS:
+		ht_rts->head.raw = 0;
+		ht_rts->tail.raw = 0;
+		break;
+	default:
+		/* unknown sync mode */
+		RTE_ASSERT(0);
+	}
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+	reset_headtail(&r->prod);
+	reset_headtail(&r->cons);
+}
+
+/*
+ * helper function, calculates sync_type values for prod and cons
+ * based on input flags. Returns zero at success or negative
+ * errno value otherwise.
+ */
+static int
+get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
+	enum rte_ring_sync_type *cons_st)
+{
+	static const uint32_t prod_st_flags =
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+	static const uint32_t cons_st_flags =
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+
+	switch (flags & prod_st_flags) {
+	case 0:
+		*prod_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SP_ENQ:
+		*prod_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MP_RTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (flags & cons_st_flags) {
+	case 0:
+		*cons_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SC_DEQ:
+		*cons_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MC_RTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 int
@@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_rts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
+
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
 	ret = strlcpy(r->name, name, sizeof(r->name));
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
-	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
+	if (ret != 0)
+		return ret;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
@@ -126,8 +206,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 		r->mask = count - 1;
 		r->capacity = r->mask;
 	}
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+
+	/* set default values for head-tail distance */
+	if (flags & RING_F_MP_RTS_ENQ)
+		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
+	if (flags & RING_F_MC_RTS_DEQ)
+		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
 
 	return 0;
 }
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index d4775a063..2b42a0211 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -65,10 +65,13 @@ enum rte_ring_queue_behavior {
 enum rte_ring_sync_type {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
+#ifdef ALLOW_EXPERIMENTAL_API
+	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+#endif
 };
 
 /**
- * structure to hold a pair of head/tail values and other metadata.
+ * structures to hold a pair of head/tail values and other metadata.
  * Depending on sync_type format of that structure might be different,
  * but offset for *sync_type* and *tail* values should remain the same.
  */
@@ -84,6 +87,21 @@ struct rte_ring_headtail {
 	};
 };
 
+union rte_ring_ht_poscnt {
+	uint64_t raw;
+	struct {
+		uint32_t cnt; /**< head/tail reference counter */
+		uint32_t pos; /**< head/tail position */
+	} val;
+};
+
+struct rte_ring_rts_headtail {
+	volatile union rte_ring_ht_poscnt tail;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+	uint32_t htd_max;   /**< max allowed distance between head/tail */
+	volatile union rte_ring_ht_poscnt head;
+};
+
 /**
  * An RTE ring structure.
  *
@@ -111,11 +129,21 @@ struct rte_ring {
 	char pad0 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail prod;
+		struct rte_ring_rts_headtail rts_prod;
+	}  __rte_cache_aligned;
+
 	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail cons;
+		struct rte_ring_rts_headtail rts_cons;
+	}  __rte_cache_aligned;
+
 	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
@@ -132,6 +160,9 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
+#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
+#define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
+
 #define __IS_SP RTE_RING_SYNC_ST
 #define __IS_MP RTE_RING_SYNC_MT
 #define __IS_SC RTE_RING_SYNC_ST
@@ -461,6 +492,10 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_rts.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -484,8 +519,21 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -619,8 +667,20 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -940,8 +1000,21 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -1020,9 +1093,21 @@ static __rte_always_inline unsigned
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
+			available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 28f9836e6..5de0850dc 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -542,6 +542,8 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
 }
 
+#include <rte_ring_rts_elem.h>
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -571,6 +573,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
 			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -733,8 +755,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 /**
@@ -901,8 +940,25 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -993,9 +1049,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
new file mode 100644
index 000000000..18404fe48
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -0,0 +1,316 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_H_
+#define _RTE_RING_RTS_H_
+
+/**
+ * @file rte_ring_rts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for Relaxed Tail Sync (RTS) ring mode.
+ * The main idea remains the same as for our original MP/MC synchronization
+ * mechanism.
+ * The main difference is that tail value is increased not
+ * by every thread that finished enqueue/dequeue,
+ * but only by the last one doing enqueue/dequeue.
+ * That allows threads to skip spinning on tail value,
+ * leaving actual tail value change to last thread in the update queue.
+ * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+ * one for head update, second for tail update.
+ * As a gain it allows thread to avoid spinning/waiting on tail value.
+ * In comparision original MP/MC algorithm requires one 32-bit CAS
+ * for head update and waiting/spinning on tail value.
+ *
+ * Brief outline:
+ *  - introduce refcnt for both head and tail.
+ *  - increment head.refcnt for each head.value update
+ *  - write head:value and head:refcnt atomically (64-bit CAS)
+ *  - move tail.value ahead only when tail.refcnt + 1 == head.refcnt
+ *  - increment tail.refcnt when each enqueue/dequeue op finishes
+ *    (no matter is tail:value going to change or not)
+ *  - write tail.value and tail.recnt atomically (64-bit CAS)
+ *
+ * To avoid producer/consumer starvation:
+ *  - limit max allowed distance between head and tail value (HTD_MAX).
+ *    I.E. thread is allowed to proceed with changing head.value,
+ *    only when:  head.value - tail.value <= HTD_MAX
+ * HTD_MAX is an optional parameter.
+ * With HTD_MAX == 0 we'll have fully serialized ring -
+ * i.e. only one thread at a time will be able to enqueue/dequeue
+ * to/from the ring.
+ * With HTD_MAX >= ring.capacity - no limitation.
+ * By default HTD_MAX == ring.capacity / 8.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue(struct rte_ring *r, void * const *obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue(struct rte_ring *r, void **obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
+}
+
+/**
+ * Return producer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer HTD value, if producer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_prod_htd_max(const struct rte_ring *r)
+{
+	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set producer max Head-Tail-Distance (HTD).
+ * Note that producer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_prod.htd_max = v;
+	return 0;
+}
+
+/**
+ * Return consumer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer HTD value, if consumer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_cons_htd_max(const struct rte_ring *r)
+{
+	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_cons.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set consumer max Head-Tail-Distance (HTD).
+ * Note that consumer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_cons.htd_max = v;
+	return 0;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_elem.h b/lib/librte_ring/rte_ring_rts_elem.h
new file mode 100644
index 000000000..71a331b23
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_elem.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_ELEM_H_
+#define _RTE_RING_RTS_ELEM_H_
+
+/**
+ * @file rte_ring_rts_elem.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ * Contains *ring_elem* functions for Relaxed Tail Sync (RTS) ring mode.
+ * for more details please refer to <rte_ring_rts.h>.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, void * const *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_generic.h b/lib/librte_ring/rte_ring_rts_generic.h
new file mode 100644
index 000000000..31a37924c
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_generic.h
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_GENERIC_H_
+#define _RTE_RING_RTS_GENERIC_H_
+
+/**
+ * @file rte_ring_rts_generic.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union rte_ring_ht_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	do {
+		ot.raw = ht->tail.raw;
+		rte_smp_rmb();
+
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = rte_atomic64_read((rte_atomic64_t *)
+			(uintptr_t)&ht->head.raw);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (rte_atomic64_cmpset(&ht->tail.raw, ot.raw, nt.raw) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union rte_ring_ht_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+	h->raw = ht->head.raw;
+	rte_smp_rmb();
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = ht->head.raw;
+		rte_smp_rmb();
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* read prod head (may spin on prod tail) */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			return 0;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (rte_atomic64_cmpset(&r->rts_prod.head.raw,
+			oh.raw, nh.raw) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* read cons head (may spin on cons tail) */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			return 0;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (rte_atomic64_cmpset(&r->rts_cons.head.raw,
+			oh.raw, nh.raw) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_GENERIC_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 4/8] test/ring: add contention stress test for RTS ring
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
                     ` (2 preceding siblings ...)
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 3/8] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-03-31 16:43   ` Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 5/8] ring: introduce HTS ring mode Konstantin Ananyev
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

Introduce new test case to test RTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_rts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_rts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 4eefaa887..3e15f3791 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -79,6 +79,7 @@ SRCS-y += test_rand_perf.c
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 827b04886..bb67a49f0 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_rts_stress.c b/app/test/test_ring_rts_stress.c
new file mode 100644
index 000000000..f5255f24c
--- /dev/null
+++ b/app/test/test_ring_rts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_rts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_rts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ);
+}
+
+const struct test test_ring_rts_stress = {
+	.name = "MT_RTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 60706f799..eab395e30 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -40,6 +40,9 @@ test_ring_stress(void)
 	n += test_ring_mpmc_stress.nb_case;
 	k += run_test(&test_ring_mpmc_stress);
 
+	n += test_ring_rts_stress.nb_case;
+	k += run_test(&test_ring_rts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 5ab121fe4..206f97cb6 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -32,3 +32,4 @@ struct test {
 };
 
 extern const struct test test_ring_mpmc_stress;
+extern const struct test test_ring_rts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 5/8] ring: introduce HTS ring mode
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
                     ` (3 preceding siblings ...)
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 4/8] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-03-31 16:43   ` Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 6/8] test/ring: add contention stress test for HTS ring Konstantin Ananyev
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

Introduce head/tail sync mode for MT ring synchronization.
In that mode enqueue/dequeue operation is fully serialized:
only one thread at a time is allowed to perform given op.
Suppose to reduce stall times in case when ring is used on
overcommitted cpus (multiple active threads on the same cpu).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   3 +
 lib/librte_ring/meson.build            |   3 +
 lib/librte_ring/rte_ring.c             |  20 ++-
 lib/librte_ring/rte_ring.h             |  31 ++++
 lib/librte_ring/rte_ring_elem.h        |  13 ++
 lib/librte_ring/rte_ring_hts.h         | 210 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    | 205 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_generic.h | 198 +++++++++++++++++++++++
 8 files changed, 681 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_hts_generic.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 8f5c284cc..6fe500f0d 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,9 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
+					rte_ring_hts.h \
+					rte_ring_hts_elem.h \
+					rte_ring_hts_generic.h \
 					rte_ring_rts.h \
 					rte_ring_rts_elem.h \
 					rte_ring_rts_generic.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 612936afb..8e86e037a 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,6 +6,9 @@ headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
+		'rte_ring_hts.h',
+		'rte_ring_hts_elem.h',
+		'rte_ring_hts_generic.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_elem.h',
 		'rte_ring_rts_generic.h')
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 222eec0fb..ebe5ccf0d 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -89,9 +89,11 @@ static void
 reset_headtail(void *p)
 {
 	struct rte_ring_headtail *ht;
+	struct rte_ring_hts_headtail *ht_hts;
 	struct rte_ring_rts_headtail *ht_rts;
 
 	ht = p;
+	ht_hts = p;
 	ht_rts = p;
 
 	switch (ht->sync_type) {
@@ -104,6 +106,9 @@ reset_headtail(void *p)
 		ht_rts->head.raw = 0;
 		ht_rts->tail.raw = 0;
 		break;
+	case RTE_RING_SYNC_MT_HTS:
+		ht_hts->ht.raw = 0;
+		break;
 	default:
 		/* unknown sync mode */
 		RTE_ASSERT(0);
@@ -127,9 +132,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	enum rte_ring_sync_type *cons_st)
 {
 	static const uint32_t prod_st_flags =
-		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ | RING_F_MP_HTS_ENQ);
 	static const uint32_t cons_st_flags =
-		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ | RING_F_MC_HTS_DEQ);
 
 	switch (flags & prod_st_flags) {
 	case 0:
@@ -141,6 +146,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MP_RTS_ENQ:
 		*prod_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MP_HTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -155,6 +163,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MC_RTS_DEQ:
 		*cons_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MC_HTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -176,6 +187,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_hts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
+
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
 		offsetof(struct rte_ring_rts_headtail, sync_type));
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2b42a0211..f295bd7ce 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -67,6 +67,7 @@ enum rte_ring_sync_type {
 	RTE_RING_SYNC_ST,     /**< single thread only */
 #ifdef ALLOW_EXPERIMENTAL_API
 	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
 #endif
 };
 
@@ -102,6 +103,19 @@ struct rte_ring_rts_headtail {
 	volatile union rte_ring_ht_poscnt head;
 };
 
+union rte_ring_ht_pos {
+	uint64_t raw;
+	struct {
+		uint32_t head; /**< head position */
+		uint32_t tail; /**< tail position */
+	} pos;
+};
+
+struct rte_ring_hts_headtail {
+	volatile union rte_ring_ht_pos ht;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+};
+
 /**
  * An RTE ring structure.
  *
@@ -132,6 +146,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail prod;
+		struct rte_ring_hts_headtail hts_prod;
 		struct rte_ring_rts_headtail rts_prod;
 	}  __rte_cache_aligned;
 
@@ -141,6 +156,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail cons;
+		struct rte_ring_hts_headtail hts_cons;
 		struct rte_ring_rts_headtail rts_cons;
 	}  __rte_cache_aligned;
 
@@ -163,6 +179,9 @@ struct rte_ring {
 #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
 #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
 
+#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP HTS". */
+#define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC HTS". */
+
 #define __IS_SP RTE_RING_SYNC_ST
 #define __IS_MP RTE_RING_SYNC_MT
 #define __IS_SC RTE_RING_SYNC_ST
@@ -493,6 +512,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 }
 
 #ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_hts.h>
 #include <rte_ring_rts.h>
 #endif
 
@@ -528,6 +548,9 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -675,6 +698,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 #ifdef ALLOW_EXPERIMENTAL_API
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
 #endif
 	}
 
@@ -1009,6 +1034,9 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -1102,6 +1130,9 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
 			available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
+			available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 5de0850dc..010a564c1 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -542,6 +542,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
 }
 
+#include <rte_ring_hts_elem.h>
 #include <rte_ring_rts_elem.h>
 
 /**
@@ -585,6 +586,9 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
 #endif
 	}
 
@@ -766,6 +770,9 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
@@ -951,6 +958,9 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
 			n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
 #endif
 	}
 
@@ -1060,6 +1070,9 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
new file mode 100644
index 000000000..062d7be6c
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_H_
+#define _RTE_RING_HTS_H_
+
+/**
+ * @file rte_ring_hts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
+ * In that mode enqueue/dequeue operation is fully serialized:
+ * at any given moement only one enqueue/dequeue operation can proceed.
+ * This is achieved by thread is allowed to proceed with changing head.value
+ * only when head.value == tail.value.
+ * Both head and tail values are updated atomically (as one 64-bit value).
+ * To achieve that 64-bit CAS is used by head update routine.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue(struct rte_ring *r, void * const *obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_hts_update_tail(&r->hts_prod, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue(struct rte_ring *r, void **obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_elem.h b/lib/librte_ring/rte_ring_hts_elem.h
new file mode 100644
index 000000000..34f0d121d
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_elem.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_ELEM_H_
+#define _RTE_RING_HTS_ELEM_H_
+
+/**
+ * @file rte_ring_hts_elem.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Contains *ring_elem* functions for Head-Tail Sync (HTS) ring mode.
+ * for more details please refer to <rte_ring_hts.h>.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue_elem(struct rte_ring *r, void * const *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_prod, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue_elem(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_generic.h b/lib/librte_ring/rte_ring_hts_generic.h
new file mode 100644
index 000000000..0b3931ffa
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_generic.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_GENERIC_H_
+#define _RTE_RING_HTS_GENERIC_H_
+
+/**
+ * @file rte_ring_hts_generic.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	union rte_ring_ht_pos p;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
+
+	p.pos.head = p.pos.tail + num;
+	p.pos.tail = p.pos.head;
+
+	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union rte_ring_ht_pos *p)
+{
+	p->raw = rte_atomic64_read((rte_atomic64_t *)
+			(uintptr_t)&ht->ht.raw);
+
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = rte_atomic64_read((rte_atomic64_t *)
+				(uintptr_t)&ht->ht.raw);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			return 0;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
+			op.raw, np.raw) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			return 0;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (rte_atomic64_cmpset(&r->hts_cons.ht.raw,
+			op.raw, np.raw) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_GENERIC_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 6/8] test/ring: add contention stress test for HTS ring
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
                     ` (4 preceding siblings ...)
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 5/8] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-03-31 16:43   ` Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 7/8] ring: introduce peek style API Konstantin Ananyev
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

Introduce new test case to test HTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_hts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_hts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 3e15f3791..72f64e30e 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
+SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
diff --git a/app/test/meson.build b/app/test/meson.build
index bb67a49f0..d6504a08a 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
+	'test_ring_hts_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_hts_stress.c b/app/test/test_ring_hts_stress.c
new file mode 100644
index 000000000..edc9175cb
--- /dev/null
+++ b/app/test/test_ring_hts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_hts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_hts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_hts_stress = {
+	.name = "MT_HTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index eab395e30..29a1368d7 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -43,6 +43,9 @@ test_ring_stress(void)
 	n += test_ring_rts_stress.nb_case;
 	k += run_test(&test_ring_rts_stress);
 
+	n += test_ring_hts_stress.nb_case;
+	k += run_test(&test_ring_hts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 206f97cb6..de5c3750e 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -33,3 +33,4 @@ struct test {
 
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
+extern const struct test test_ring_hts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 7/8] ring: introduce peek style API
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
                     ` (5 preceding siblings ...)
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 6/8] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-03-31 16:43   ` Konstantin Ananyev
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 8/8] test/ring: add stress test for MT peek API Konstantin Ananyev
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
mode, provide an ability to split enqueue/dequeue operation
into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
That allows user to inspect objects in the ring without removing
them from it (aka MT safe peek).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   1 +
 lib/librte_ring/meson.build            |   1 +
 lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
 lib/librte_ring/rte_ring_elem.h        |   4 +
 lib/librte_ring/rte_ring_generic.h     |  48 ++++
 lib/librte_ring/rte_ring_hts_generic.h |  47 ++-
 lib/librte_ring/rte_ring_peek.h        | 379 +++++++++++++++++++++++++
 7 files changed, 519 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_peek.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 6fe500f0d..5f8662737 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,6 +22,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_hts.h \
 					rte_ring_hts_elem.h \
 					rte_ring_hts_generic.h \
+					rte_ring_peek.h \
 					rte_ring_rts.h \
 					rte_ring_rts_elem.h \
 					rte_ring_rts_generic.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 8e86e037a..f5f84dc6e 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -9,6 +9,7 @@ headers = files('rte_ring.h',
 		'rte_ring_hts.h',
 		'rte_ring_hts_elem.h',
 		'rte_ring_hts_generic.h',
+		'rte_ring_peek.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_elem.h',
 		'rte_ring_rts_generic.h')
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 0fb73a337..bb3096721 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -10,6 +10,50 @@
 #ifndef _RTE_RING_C11_MEM_H_
 #define _RTE_RING_C11_MEM_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	RTE_SET_USED(enqueue);
+
+	pos = tail + num;
+	ht->head = pos;
+	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 010a564c1..5bf7c1c1b 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -1083,6 +1083,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	return 0;
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_peek.h>
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
index 953cdbbd5..9f5fdf13b 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -10,6 +10,54 @@
 #ifndef _RTE_RING_GENERIC_H_
 #define _RTE_RING_GENERIC_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	pos = tail + num;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	ht->head = pos;
+	ht->tail = pos;
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_hts_generic.h b/lib/librte_ring/rte_ring_hts_generic.h
index 0b3931ffa..7eac761f9 100644
--- a/lib/librte_ring/rte_ring_hts_generic.h
+++ b/lib/librte_ring/rte_ring_hts_generic.h
@@ -18,9 +18,38 @@
  * For more information please refer to <rte_ring_hts.h>.
  */
 
+/**
+ * @internal get current tail value.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union rte_ring_ht_pos p;
+
+	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
 static __rte_always_inline void
-__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
-	uint32_t enqueue)
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
 {
 	union rte_ring_ht_pos p;
 
@@ -29,14 +58,22 @@ __rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
 	else
 		rte_smp_rmb();
 
-	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
-
-	p.pos.head = p.pos.tail + num;
+	p.pos.head = tail + num;
 	p.pos.tail = p.pos.head;
 
 	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);
 }
 
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	uint32_t tail;
+
+	num = __rte_ring_hts_get_tail(ht, &tail, num);
+	__rte_ring_hts_set_head_tail(ht, tail, num, enqueue);
+}
+
 /**
  * @internal waits till tail will become equal to head.
  * Means no writer/reader is active for that ring.
diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
new file mode 100644
index 000000000..baefd2f7b
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek.h
@@ -0,0 +1,379 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_H_
+#define _RTE_RING_PEEK_H_
+
+/**
+ * @file
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Ring Peek AP
+ * Introduction of rte_ring with serialized producer/consumer (HTS sync mode)
+ * makes possible to split public enqueue/dequeue API into two phases:
+ * - enqueue/dequeue start
+ * - enqueue/dequeue finish
+ * That allows user to inspect objects in the ring without removing them
+ * from it (aka MT safe peek).
+ * Note that right now this new API is avaialble only for two sync modes:
+ * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
+ * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
+ * It is a user responsibility to create/init ring with appropriate sync
+ * modes selected.
+ * As an example:
+ * // read 1 elem from the ring:
+ * n = rte_ring_hts_dequeue_bulk_start(ring, &obj, 1, NULL);
+ * if (n != 0) {
+ *    //examine object
+ *    if (object_examine(obj) == KEEP)
+ *       //decided to keep it in the ring.
+ *       rte_ring_hts_dequeue_finish(ring, 0);
+ *    else
+ *       //decided to remove it from the ring.
+ *       rte_ring_hts_dequeue_finish(ring, n);
+ * }
+ * Note that between _start_ and _finish_ the ring is sort of locked -
+ * none other thread can proceed with enqueue(/dequeue) operation till
+ * _finish_ will complete.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal This function moves prod head value.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_start(struct rte_ring *r, uint32_t n,
+		enum rte_ring_queue_behavior behavior, uint32_t *free_space)
+{
+	uint32_t free, head, next;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &free);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_prod_head(r, n, behavior,
+			&head, &free);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_VARIABLE,
+			free_space);
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_elem_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int esize, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_st_set_head_tail(&r->prod, tail, n, 1);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_hts_set_head_tail(&r->hts_prod, tail, n, 1);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int n)
+{
+	rte_ring_enqueue_elem_finish(r, obj_table, sizeof(uintptr_t), n);
+}
+
+/**
+ * @internal This function moves cons head value and copies up to *n*
+ * objects from the ring to the user provided obj_table.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_start(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t avail, head, next;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_cons_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &avail);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_cons_head(r, n, behavior,
+			&head, &avail);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (n != 0)
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+
+	if (available != NULL)
+		*available = avail - n;
+	return n;
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem_start(struct rte_ring *r, void **obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   Actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_bulk_elem_start(r, obj_table, sizeof(uintptr_t),
+		n, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void **obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_burst_elem_start(r, obj_table,
+		sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->cons, &tail, n);
+		__rte_ring_st_set_head_tail(&r->cons, tail, n, 0);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_PEEK_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v1 8/8] test/ring: add stress test for MT peek API
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
                     ` (6 preceding siblings ...)
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 7/8] ring: introduce peek style API Konstantin Ananyev
@ 2020-03-31 16:43   ` Konstantin Ananyev
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
  8 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-03-31 16:43 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, Konstantin Ananyev

Introduce new test case to test MT peek API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |  1 +
 app/test/meson.build             |  1 +
 app/test/test_ring_peek_stress.c | 43 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c      |  3 +++
 app/test/test_ring_stress.h      |  1 +
 5 files changed, 49 insertions(+)
 create mode 100644 app/test/test_ring_peek_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 72f64e30e..f4c987a98 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -80,6 +80,7 @@ SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_peek_stress.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
diff --git a/app/test/meson.build b/app/test/meson.build
index d6504a08a..cfcf5b81d 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_hts_stress.c',
+	'test_ring_peek_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_peek_stress.c b/app/test/test_ring_peek_stress.c
new file mode 100644
index 000000000..cfc82d728
--- /dev/null
+++ b/app/test/test_ring_peek_stress.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+#include <rte_ring_elem.h>
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_start(r, obj, n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_finish(r, obj, n);
+	return n;
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_peek_stress = {
+	.name = "MT_PEEK",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 29a1368d7..853fcc190 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -46,6 +46,9 @@ test_ring_stress(void)
 	n += test_ring_hts_stress.nb_case;
 	k += run_test(&test_ring_hts_stress);
 
+	n += test_ring_peek_stress.nb_case;
+	k += run_test(&test_ring_peek_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index de5c3750e..8fac7f4d2 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -34,3 +34,4 @@ struct test {
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
 extern const struct test test_ring_hts_stress;
+extern const struct test test_ring_peek_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [RFC 0/6] New sync modes for ring
  2020-03-30 23:37       ` Honnappa Nagarahalli
@ 2020-03-31 17:21         ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-03-31 17:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: olivier.matz, nd, nd

> <snip>
> >
> > > >
> > > > > Subject: [dpdk-dev] [RFC 0/6] New sync modes for ring
> > > > >
> > > > > Upfront note - that RFC is not a complete patch.
> > > > > It introduces an ABI breakage, plus it doesn't update ring_elem
> > > > > code properly,
> > > > As per the current rules, these changes (in the current form) will
> > > > be accepted only for 20.11 release. How do we address this for
> > > > immediate
> > > requirements like RCU defer APIs?
> > >
> > > I think I found a way to introduce these new modes without API/ABI
> > breakage.
> > > Working on v1 right now. Plan to submit it by end of that week/start
> > > of next one.
> > ok
> RCU defer APIs require the rte_ring_xxx_elem versions. I guess you are adding those as well.

Yes, I added it into V1, please have a look.
Also I made 'legacy' peek API (enqueue/dequeue_start/finish) to call
'elem' peek API (enqueue/dequeue_elem_start/finish).
About naming: thought about changing start/finish to reserve/commit,
but decided to left as it is for now - in case you would like to go ahead
with SG API and use reserve/commit there.
Konstantin



^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 0/9] New sync modes for ring
  2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
                     ` (7 preceding siblings ...)
  2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 8/8] test/ring: add stress test for MT peek API Konstantin Ananyev
@ 2020-04-02 22:09   ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 1/9] test/ring: add contention stress test Konstantin Ananyev
                       ` (9 more replies)
  8 siblings, 10 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

V1 - v2 changes:
1. Fix compilation issues
2. Add C11 atomics support
3. Updates devtools/libabigail.abignore (workaround)

RFC - V1 changes:
1. remove ABI brekage (at least I hope I did)
2. Add support for ring_elem
3. Rework peek related API a bit
4. Rework test to make it less verbose and unite all test-cases
   in one command
5. Add new test-case for MT peek API

TODO list:
1. Update docs

These days more and more customers use(/try to use) DPDK based apps within
overcommitted systems (multiple acttive threads over same pysical cores):
VM, container deployments, etc.
One quite common problem they hit:
Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
LHP is quite a common problem for spin-based sync primitives
(spin-locks, etc.) on overcommitted systems.
The situation gets much worse when some sort of
fair-locking technique is used (ticket-lock, etc.).
As now not only lock-owner but also lock-waiters scheduling
order matters a lot (LWP).
These two problems are well-known for kernel within VMs:
http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
The problem with rte_ring is that while head accusion is sort of
un-fair locking, waiting on tail is very similar to ticket lock schema -
tail has to be updated in particular order.
So while head update exibits only LHP scneario,
tail wait and update can cause an LWP. 
That makes current rte_ring implementation to perform
really pure on some overcommited scenarios.
While it is probably not possible to completely resolve LHP problem in
userspace only (without some kernel communication/intervention),
removing fairness in tail update can mitigate current LWP significantly.
This RFC proposes two new optional ring synchronization modes:
1) Head/Tail Sync (HTS) mode
In that mode enqueue/dequeue operation is fully serialized:
    only one thread at a time is allowed to perform given op.
    As another enhancement provide ability to split enqueue/dequeue
    operation into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
    That allows user to inspect objects in the ring without removing
    them from it (aka MT safe peek).
2) Relaxed Tail Sync (RTS)
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Note that these new sync modes are optional.
For current rte_ring users nothing should change
(both in terms of API/ABI and performance).
Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
way (via flags and _init_), and MP/MC remains as default one.
The only thing that changed:
Format of prod/cons now could differ depending on mode selected at _init_.
So user has to stick with one sync model through whole ring lifetime.
In other words, user can't create a ring for let say SP mode and then
in the middle of data-path change his mind and start using MP_RTS mode.
For existing modes (SP/MP, SC/MC) format remains the same and
user can still use them interchangeably, though of course it is an
error prone practice.

Test results on IA (see below) show significant improvements
for average enqueue/dequeue op times on overcommitted systems.
For 'classic' DPDK deployments (one thread per core) original MP/MC
algorithm still shows best numbers, though for 64-bit target
RTS numbers are not that far away.
Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'

X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj

                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51

2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68

8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12

i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj

                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91

2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65

8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

Konstantin Ananyev (9):
  test/ring: add contention stress test
  ring: prepare ring to allow new sync schemes
  ring: introduce RTS ring mode
  test/ring: add contention stress test for RTS ring
  ring: introduce HTS ring mode
  test/ring: add contention stress test for HTS ring
  ring: introduce peek style API
  test/ring: add stress test for MT peek API
  ring: add C11 memory model for new sync modes

 app/test/Makefile                      |   5 +
 app/test/meson.build                   |   5 +
 app/test/test_pdump.c                  |   6 +-
 app/test/test_ring_hts_stress.c        |  32 ++
 app/test/test_ring_mpmc_stress.c       |  31 ++
 app/test/test_ring_peek_stress.c       |  43 +++
 app/test/test_ring_rts_stress.c        |  32 ++
 app/test/test_ring_stress.c            |  57 ++++
 app/test/test_ring_stress.h            |  38 +++
 app/test/test_ring_stress_impl.h       | 444 +++++++++++++++++++++++++
 devtools/libabigail.abignore           |   4 +
 lib/librte_pdump/rte_pdump.c           |   2 +-
 lib/librte_port/rte_port_ring.c        |  12 +-
 lib/librte_ring/Makefile               |  11 +-
 lib/librte_ring/meson.build            |  11 +-
 lib/librte_ring/rte_ring.c             | 114 ++++++-
 lib/librte_ring/rte_ring.h             | 244 ++++++++++++--
 lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
 lib/librte_ring/rte_ring_elem.h        | 105 +++++-
 lib/librte_ring/rte_ring_generic.h     |  48 +++
 lib/librte_ring/rte_ring_hts.h         | 214 ++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h | 222 +++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    | 209 ++++++++++++
 lib/librte_ring/rte_ring_hts_generic.h | 235 +++++++++++++
 lib/librte_ring/rte_ring_peek.h        | 379 +++++++++++++++++++++
 lib/librte_ring/rte_ring_rts.h         | 320 ++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h | 198 +++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    | 209 ++++++++++++
 lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++
 29 files changed, 3428 insertions(+), 56 deletions(-)
 create mode 100644 app/test/test_ring_hts_stress.c
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_peek_stress.c
 create mode 100644 app/test/test_ring_rts_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_hts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_hts_generic.h
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_generic.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 1/9] test/ring: add contention stress test
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test-case to measure ring perfomance under contention
(miltiple producers/consumers).
Starts dequeue/enqueue loop on all available slave lcores.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |   2 +
 app/test/meson.build             |   2 +
 app/test/test_ring_mpmc_stress.c |  31 +++
 app/test/test_ring_stress.c      |  48 ++++
 app/test/test_ring_stress.h      |  35 +++
 app/test/test_ring_stress_impl.h | 444 +++++++++++++++++++++++++++++++
 6 files changed, 562 insertions(+)
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h

diff --git a/app/test/Makefile b/app/test/Makefile
index 1f080d162..4eefaa887 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 351d29cb6..827b04886 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,7 +100,9 @@ test_sources = files('commands.c',
 	'test_rib.c',
 	'test_rib6.c',
 	'test_ring.c',
+	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_mpmc_stress.c b/app/test/test_ring_mpmc_stress.c
new file mode 100644
index 000000000..1524b0248
--- /dev/null
+++ b/app/test/test_ring_mpmc_stress.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num, 0);
+}
+
+const struct test test_ring_mpmc_stress = {
+	.name = "MP/MC",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
new file mode 100644
index 000000000..60706f799
--- /dev/null
+++ b/app/test/test_ring_stress.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static int
+run_test(const struct test *test)
+{
+	int32_t rc;
+	uint32_t i, k;
+
+	for (i = 0, k = 0; i != test->nb_case; i++) {
+
+		printf("TEST-CASE %s %s START\n",
+			test->name, test->cases[i].name);
+
+		rc = test->cases[i].func(test->cases[i].wfunc);
+		k += (rc == 0);
+
+		if (rc != 0)
+			printf("TEST-CASE %s %s FAILED\n",
+				test->name, test->cases[i].name);
+		else
+			printf("TEST-CASE %s %s OK\n",
+				test->name, test->cases[i].name);
+	}
+
+	return k;
+}
+
+static int
+test_ring_stress(void)
+{
+	uint32_t n, k;
+
+	n = 0;
+	k = 0;
+
+	n += test_ring_mpmc_stress.nb_case;
+	k += run_test(&test_ring_mpmc_stress);
+
+	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+		n, k, n - k);
+	return (k != n);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
new file mode 100644
index 000000000..60eac6216
--- /dev/null
+++ b/app/test/test_ring_stress.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <rte_ring.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+#include <rte_random.h>
+#include <rte_malloc.h>
+#include <rte_spinlock.h>
+
+#include "test.h"
+
+struct test_case {
+	const char *name;
+	int (*func)(int (*)(void *));
+	int (*wfunc)(void *arg);
+};
+
+struct test {
+	const char *name;
+	uint32_t nb_case;
+	const struct test_case *cases;
+};
+
+extern const struct test test_ring_mpmc_stress;
diff --git a/app/test/test_ring_stress_impl.h b/app/test/test_ring_stress_impl.h
new file mode 100644
index 000000000..11476d28c
--- /dev/null
+++ b/app/test/test_ring_stress_impl.h
@@ -0,0 +1,444 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+/*
+ * Measures performance of ring enqueue/dequeue under high contention
+ */
+
+#define RING_NAME	"RING_STRESS"
+#define BULK_NUM	32
+#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
+
+enum {
+	WRK_CMD_STOP,
+	WRK_CMD_RUN,
+};
+
+static volatile uint32_t wrk_cmd __rte_cache_aligned;
+
+/* test run-time in seconds */
+static const uint32_t run_time = 60;
+static const uint32_t verbose;
+
+struct lcore_stat {
+	uint64_t nb_cycle;
+	struct {
+		uint64_t nb_call;
+		uint64_t nb_obj;
+		uint64_t nb_cycle;
+		uint64_t max_cycle;
+		uint64_t min_cycle;
+	} op;
+};
+
+struct lcore_arg {
+	struct rte_ring *rng;
+	struct lcore_stat stats;
+} __rte_cache_aligned;
+
+struct ring_elem {
+	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)];
+} __rte_cache_aligned;
+
+/*
+ * redefinable functions
+ */
+static uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail);
+
+static uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free);
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
+
+
+static void
+lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
+	uint64_t tm, int32_t prcs)
+{
+	ls->op.nb_call += call;
+	ls->op.nb_obj += obj;
+	ls->op.nb_cycle += tm;
+	if (prcs) {
+		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
+		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
+	}
+}
+
+static void
+lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+
+	ms->op.nb_call += ls->op.nb_call;
+	ms->op.nb_obj += ls->op.nb_obj;
+	ms->op.nb_cycle += ls->op.nb_cycle;
+	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
+	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle);
+}
+
+static void
+lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
+	lcore_op_stat_aggr(ms, ls);
+}
+
+static void
+lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls)
+{
+	long double st;
+
+	st = (long double)rte_get_timer_hz() / US_PER_S;
+
+	if (lc == UINT32_MAX)
+		fprintf(f, "%s(AGGREGATE)={\n", __func__);
+	else
+		fprintf(f, "%s(lc=%u)={\n", __func__, lc);
+
+	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
+		ls->nb_cycle, (long double)ls->nb_cycle / st);
+
+	fprintf(f, "\tDEQ+ENQ={\n");
+
+	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
+	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
+	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
+	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_obj / ls->op.nb_call);
+	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_obj);
+	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_call);
+
+	/* if min/max cycles per call stats was collected */
+	if (ls->op.min_cycle != UINT64_MAX) {
+		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.max_cycle,
+			(long double)ls->op.max_cycle / st);
+		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.min_cycle,
+			(long double)ls->op.min_cycle / st);
+	}
+
+	fprintf(f, "\t},\n");
+	fprintf(f, "};\n");
+}
+
+static void
+fill_ring_elm(struct ring_elem *elm, uint32_t fill)
+{
+	uint32_t i;
+
+	for (i = 0; i != RTE_DIM(elm->cnt); i++)
+		elm->cnt[i] = fill;
+}
+
+static int32_t
+check_updt_elem(struct ring_elem *elm[], uint32_t num,
+	const struct ring_elem *check, const struct ring_elem *fill)
+{
+	uint32_t i;
+
+	static rte_spinlock_t dump_lock;
+
+	for (i = 0; i != num; i++) {
+		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
+			rte_spinlock_lock(&dump_lock);
+			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
+				"offending object: %p\n",
+				__func__, rte_lcore_id(), num, i, elm[i]);
+			rte_memdump(stdout, "expected", check, sizeof(*check));
+			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
+			rte_spinlock_unlock(&dump_lock);
+			return -EINVAL;
+		}
+		memcpy(elm[i], fill, sizeof(*elm[i]));
+	}
+
+	return 0;
+}
+
+static int
+check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
+	const char *fname, const char *opname)
+{
+	if (exp != res) {
+		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
+			fname, lc, opname, exp, res);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static int
+test_worker_prcs(void *arg)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl, tm0, tm1;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		tm0 = rte_rdtsc_precise();
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+		tm0 = rte_rdtsc_precise() - tm0;
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		tm1 = rte_rdtsc_precise();
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+		tm1 = rte_rdtsc_precise() - tm1;
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, 1);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	la->stats.nb_cycle = rte_rdtsc_precise() - cl;
+	return rc;
+}
+
+static int
+test_worker_avg(void *arg)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, 0, 0);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	/* final stats update */
+	cl = rte_rdtsc_precise() - cl;
+	lcore_stat_update(&la->stats, 0, 0, cl, 0);
+	la->stats.nb_cycle = cl;
+
+	return rc;
+}
+
+static void
+mt1_fini(struct rte_ring *rng, void *data)
+{
+	rte_free(rng);
+	rte_free(data);
+}
+
+static int
+mt1_init(struct rte_ring **rng, void **data, uint32_t num)
+{
+	int32_t rc;
+	size_t sz;
+	uint32_t i, nr;
+	struct rte_ring *r;
+	struct ring_elem *elm;
+	void *p;
+
+	*rng = NULL;
+	*data = NULL;
+
+	sz = num * sizeof(*elm);
+	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
+	if (elm == NULL) {
+		printf("%s: alloc(%zu) for %u elems data failed",
+			__func__, sz, num);
+		return -ENOMEM;
+	}
+
+	*data = elm;
+
+	/* alloc ring */
+	nr = 2 * num;
+	sz = rte_ring_get_memsize(nr);
+	r = rte_zmalloc(NULL, sz, __alignof__(*r));
+	if (r == NULL) {
+		printf("%s: alloc(%zu) for FIFO with %u elems failed",
+			__func__, sz, nr);
+		return -ENOMEM;
+	}
+
+	*rng = r;
+
+	rc = _st_ring_init(r, RING_NAME, nr);
+	if (rc != 0) {
+		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
+			__func__, r, nr, rc, strerror(-rc));
+		return rc;
+	}
+
+	for (i = 0; i != num; i++) {
+		fill_ring_elm(elm + i, UINT32_MAX);
+		p = elm + i;
+		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
+			break;
+	}
+
+	if (i != num) {
+		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
+			__func__, r, num, i);
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
+static int
+test_mt1(int (*test)(void *))
+{
+	int32_t rc;
+	uint32_t lc, mc;
+	struct rte_ring *r;
+	void *data;
+	struct lcore_arg arg[RTE_MAX_LCORE];
+
+	static const struct lcore_stat init_stat = {
+		.op.min_cycle = UINT64_MAX,
+	};
+
+	rc = mt1_init(&r, &data, RING_SIZE);
+	if (rc != 0) {
+		mt1_fini(r, data);
+		return rc;
+	}
+
+	memset(arg, 0, sizeof(arg));
+
+	/* launch on all slaves */
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		arg[lc].rng = r;
+		arg[lc].stats = init_stat;
+		rte_eal_remote_launch(test, &arg[lc], lc);
+	}
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_RUN;
+	rte_smp_wmb();
+
+	usleep(run_time * US_PER_S);
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_STOP;
+	rte_smp_wmb();
+
+	/* wait for slaves and collect stats. */
+	mc = rte_lcore_id();
+	arg[mc].stats = init_stat;
+
+	rc = 0;
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		rc |= rte_eal_wait_lcore(lc);
+		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
+		if (verbose != 0)
+			lcore_stat_dump(stdout, lc, &arg[lc].stats);
+	}
+
+	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+	mt1_fini(r, data);
+	return rc;
+}
+
+static const struct test_case tests[] = {
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
+		.func = test_mt1,
+		.wfunc = test_worker_prcs,
+	},
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
+		.func = test_mt1,
+		.wfunc = test_worker_avg,
+	},
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 2/9] ring: prepare ring to allow new sync schemes
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 1/9] test/ring: add contention stress test Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 3/9] ring: introduce RTS ring mode Konstantin Ananyev
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Change from *single* to *sync_type* to allow different
synchronisation schemes to be applied.
Mark *single* as deprecated in comments.
Add new functions to allow user to query ring sync types.
Replace direct access to *single* with appopriate function call.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_pdump.c           |   6 +-
 lib/librte_pdump/rte_pdump.c    |   2 +-
 lib/librte_port/rte_port_ring.c |  12 ++--
 lib/librte_ring/rte_ring.c      |   6 +-
 lib/librte_ring/rte_ring.h      | 113 ++++++++++++++++++++++++++------
 lib/librte_ring/rte_ring_elem.h |   8 +--
 6 files changed, 108 insertions(+), 39 deletions(-)

diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index ad183184c..6a1180bcb 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -57,8 +57,7 @@ run_pdump_client_tests(void)
 	if (ret < 0)
 		return -1;
 	mp->flags = 0x0000;
-	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
-				      RING_F_SP_ENQ | RING_F_SC_DEQ);
+	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
 	if (ring_client == NULL) {
 		printf("rte_ring_create SR0 failed");
 		return -1;
@@ -71,9 +70,6 @@ run_pdump_client_tests(void)
 	}
 	rte_eth_dev_probing_finish(eth_dev);
 
-	ring_client->prod.single = 0;
-	ring_client->cons.single = 0;
-
 	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
 
 	for (itr = 0; itr < NUM_ITR; itr++) {
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 8a01ac510..65364f2c5 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 		rte_errno = EINVAL;
 		return -1;
 	}
-	if (ring->prod.single || ring->cons.single) {
+	if (rte_ring_prod_single(ring) || rte_ring_cons_single(ring)) {
 		PDUMP_LOG(ERR, "ring with either SP or SC settings"
 		" is not valid for pdump, should have MP and MC settings\n");
 		rte_errno = EINVAL;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 47fcdd06a..2f6c050fa 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->cons.single && is_multi) ||
-		(!(conf->ring->cons.single) && !is_multi)) {
+		(rte_ring_cons_single(conf->ring) && is_multi) ||
+		(!rte_ring_cons_single(conf->ring) && !is_multi)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
 	}
@@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
@@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 77e5de099..fa5733907 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
-	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
+	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 18fc5d845..d4775a063 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -61,11 +61,27 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-/* structure to hold a pair of head/tail values and other metadata */
+/** prod/cons sync types */
+enum rte_ring_sync_type {
+	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
+	RTE_RING_SYNC_ST,     /**< single thread only */
+};
+
+/**
+ * structure to hold a pair of head/tail values and other metadata.
+ * Depending on sync_type format of that structure might be different,
+ * but offset for *sync_type* and *tail* values should remain the same.
+ */
 struct rte_ring_headtail {
-	volatile uint32_t head;  /**< Prod/consumer head. */
-	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t single;         /**< True if single prod/cons */
+	volatile uint32_t head;      /**< prod/consumer head. */
+	volatile uint32_t tail;      /**< prod/consumer tail. */
+	RTE_STD_C11
+	union {
+		/** sync type of prod/cons */
+		enum rte_ring_sync_type sync_type;
+		/** deprecated -  True if single prod/cons */
+		uint32_t single;
+	};
 };
 
 /**
@@ -116,11 +132,10 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
-/* @internal defines for passing to the enqueue dequeue worker functions */
-#define __IS_SP 1
-#define __IS_MP 0
-#define __IS_SC 1
-#define __IS_MC 0
+#define __IS_SP RTE_RING_SYNC_ST
+#define __IS_MP RTE_RING_SYNC_MT
+#define __IS_SC RTE_RING_SYNC_ST
+#define __IS_MC RTE_RING_SYNC_MT
 
 /**
  * Calculate the memory size needed for a ring
@@ -420,7 +435,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MP, free_space);
+			RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -443,7 +458,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SP, free_space);
+			RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -470,7 +485,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -554,7 +569,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MC, available);
+			RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -578,7 +593,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SC, available);
+			RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -605,7 +620,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 /**
@@ -777,6 +792,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
 	return r->capacity;
 }
 
+/**
+ * Return sync type used by producer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_prod_sync_type(const struct rte_ring *r)
+{
+	return r->prod.sync_type;
+}
+
+/**
+ * Check is the ring for single producer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SP, zero otherwise.
+ */
+static inline int
+rte_ring_prod_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
+/**
+ * Return sync type used by consumer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_cons_sync_type(const struct rte_ring *r)
+{
+	return r->cons.sync_type;
+}
+
+/**
+ * Check is the ring for single consumer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SC, zero otherwise.
+ */
+static inline int
+rte_ring_cons_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -820,7 +891,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -843,7 +914,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -870,7 +941,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -898,7 +969,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -923,7 +994,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -951,7 +1022,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 663addc73..28f9836e6 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -570,7 +570,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
 }
 
 /**
@@ -734,7 +734,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
 }
 
 /**
@@ -902,7 +902,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
 }
 
 /**
@@ -995,7 +995,7 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 3/9] ring: introduce RTS ring mode
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 1/9] test/ring: add contention stress test Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
Aim to reduce stall times in case when ring is used on
overcommited cpus (multiple active threads on the same cpu).
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

check-abi.sh reports what I believe is a false-positive about
ring cons/prod changes. As a workaround, devtools/libabigail.abignore is
updated to suppress *struct ring* related errors.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 devtools/libabigail.abignore           |   4 +
 lib/librte_ring/Makefile               |   5 +-
 lib/librte_ring/meson.build            |   5 +-
 lib/librte_ring/rte_ring.c             | 100 +++++++-
 lib/librte_ring/rte_ring.h             | 110 ++++++++-
 lib/librte_ring/rte_ring_elem.h        |  86 ++++++-
 lib/librte_ring/rte_ring_rts.h         | 316 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++++++
 lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++
 9 files changed, 1012 insertions(+), 29 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_generic.h

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index a59df8f13..ece014111 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -11,3 +11,7 @@
         type_kind = enum
         name = rte_crypto_asym_xform_type
         changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+; Ignore updates of ring prod/cons
+[suppress_type]
+        type_kind = struct
+        name = rte_ring
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 917c560ad..8f5c284cc 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -18,6 +18,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_rts.h \
+					rte_ring_rts_elem.h \
+					rte_ring_rts_generic.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index f2f3ccc88..612936afb 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -5,7 +5,10 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_rts.h',
+		'rte_ring_rts_elem.h',
+		'rte_ring_rts_generic.h')
 
 # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
 allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index fa5733907..222eec0fb 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
+/* by default set head/tail distance as 1/8 of ring capacity */
+#define HTD_MAX_DEF	8
+
 /* return the size of memory occupied by a ring */
 ssize_t
 rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
@@ -79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
 	return rte_ring_get_memsize_elem(sizeof(void *), count);
 }
 
+/*
+ * internal helper function to reset prod/cons head-tail values.
+ */
+static void
+reset_headtail(void *p)
+{
+	struct rte_ring_headtail *ht;
+	struct rte_ring_rts_headtail *ht_rts;
+
+	ht = p;
+	ht_rts = p;
+
+	switch (ht->sync_type) {
+	case RTE_RING_SYNC_MT:
+	case RTE_RING_SYNC_ST:
+		ht->head = 0;
+		ht->tail = 0;
+		break;
+	case RTE_RING_SYNC_MT_RTS:
+		ht_rts->head.raw = 0;
+		ht_rts->tail.raw = 0;
+		break;
+	default:
+		/* unknown sync mode */
+		RTE_ASSERT(0);
+	}
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+	reset_headtail(&r->prod);
+	reset_headtail(&r->cons);
+}
+
+/*
+ * helper function, calculates sync_type values for prod and cons
+ * based on input flags. Returns zero at success or negative
+ * errno value otherwise.
+ */
+static int
+get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
+	enum rte_ring_sync_type *cons_st)
+{
+	static const uint32_t prod_st_flags =
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+	static const uint32_t cons_st_flags =
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+
+	switch (flags & prod_st_flags) {
+	case 0:
+		*prod_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SP_ENQ:
+		*prod_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MP_RTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (flags & cons_st_flags) {
+	case 0:
+		*cons_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SC_DEQ:
+		*cons_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MC_RTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 int
@@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_rts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
+
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
 	ret = strlcpy(r->name, name, sizeof(r->name));
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
-	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
+	if (ret != 0)
+		return ret;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
@@ -126,8 +206,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 		r->mask = count - 1;
 		r->capacity = r->mask;
 	}
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+
+	/* set default values for head-tail distance */
+	if (flags & RING_F_MP_RTS_ENQ)
+		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
+	if (flags & RING_F_MC_RTS_DEQ)
+		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
 
 	return 0;
 }
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index d4775a063..f6f084d79 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -48,6 +48,7 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_memzone.h>
 #include <rte_pause.h>
+#include <rte_debug.h>
 
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
@@ -65,10 +66,13 @@ enum rte_ring_queue_behavior {
 enum rte_ring_sync_type {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
+#ifdef ALLOW_EXPERIMENTAL_API
+	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+#endif
 };
 
 /**
- * structure to hold a pair of head/tail values and other metadata.
+ * structures to hold a pair of head/tail values and other metadata.
  * Depending on sync_type format of that structure might be different,
  * but offset for *sync_type* and *tail* values should remain the same.
  */
@@ -84,6 +88,21 @@ struct rte_ring_headtail {
 	};
 };
 
+union rte_ring_ht_poscnt {
+	uint64_t raw;
+	struct {
+		uint32_t cnt; /**< head/tail reference counter */
+		uint32_t pos; /**< head/tail position */
+	} val;
+};
+
+struct rte_ring_rts_headtail {
+	volatile union rte_ring_ht_poscnt tail;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+	uint32_t htd_max;   /**< max allowed distance between head/tail */
+	volatile union rte_ring_ht_poscnt head;
+};
+
 /**
  * An RTE ring structure.
  *
@@ -111,11 +130,21 @@ struct rte_ring {
 	char pad0 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail prod;
+		struct rte_ring_rts_headtail rts_prod;
+	}  __rte_cache_aligned;
+
 	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail cons;
+		struct rte_ring_rts_headtail rts_cons;
+	}  __rte_cache_aligned;
+
 	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
@@ -132,6 +161,9 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
+#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
+#define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
+
 #define __IS_SP RTE_RING_SYNC_ST
 #define __IS_MP RTE_RING_SYNC_MT
 #define __IS_SC RTE_RING_SYNC_ST
@@ -461,6 +493,10 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_rts.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -484,8 +520,21 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -619,8 +668,20 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -940,8 +1001,21 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -1020,9 +1094,21 @@ static __rte_always_inline unsigned
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
+			available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 28f9836e6..5de0850dc 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -542,6 +542,8 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
 }
 
+#include <rte_ring_rts_elem.h>
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -571,6 +573,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
 			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -733,8 +755,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 /**
@@ -901,8 +940,25 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -993,9 +1049,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
new file mode 100644
index 000000000..18404fe48
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -0,0 +1,316 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_H_
+#define _RTE_RING_RTS_H_
+
+/**
+ * @file rte_ring_rts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for Relaxed Tail Sync (RTS) ring mode.
+ * The main idea remains the same as for our original MP/MC synchronization
+ * mechanism.
+ * The main difference is that tail value is increased not
+ * by every thread that finished enqueue/dequeue,
+ * but only by the last one doing enqueue/dequeue.
+ * That allows threads to skip spinning on tail value,
+ * leaving actual tail value change to last thread in the update queue.
+ * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+ * one for head update, second for tail update.
+ * As a gain it allows thread to avoid spinning/waiting on tail value.
+ * In comparision original MP/MC algorithm requires one 32-bit CAS
+ * for head update and waiting/spinning on tail value.
+ *
+ * Brief outline:
+ *  - introduce refcnt for both head and tail.
+ *  - increment head.refcnt for each head.value update
+ *  - write head:value and head:refcnt atomically (64-bit CAS)
+ *  - move tail.value ahead only when tail.refcnt + 1 == head.refcnt
+ *  - increment tail.refcnt when each enqueue/dequeue op finishes
+ *    (no matter is tail:value going to change or not)
+ *  - write tail.value and tail.recnt atomically (64-bit CAS)
+ *
+ * To avoid producer/consumer starvation:
+ *  - limit max allowed distance between head and tail value (HTD_MAX).
+ *    I.E. thread is allowed to proceed with changing head.value,
+ *    only when:  head.value - tail.value <= HTD_MAX
+ * HTD_MAX is an optional parameter.
+ * With HTD_MAX == 0 we'll have fully serialized ring -
+ * i.e. only one thread at a time will be able to enqueue/dequeue
+ * to/from the ring.
+ * With HTD_MAX >= ring.capacity - no limitation.
+ * By default HTD_MAX == ring.capacity / 8.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue(struct rte_ring *r, void * const *obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue(struct rte_ring *r, void **obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
+}
+
+/**
+ * Return producer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer HTD value, if producer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_prod_htd_max(const struct rte_ring *r)
+{
+	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set producer max Head-Tail-Distance (HTD).
+ * Note that producer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_prod.htd_max = v;
+	return 0;
+}
+
+/**
+ * Return consumer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer HTD value, if consumer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_cons_htd_max(const struct rte_ring *r)
+{
+	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_cons.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set consumer max Head-Tail-Distance (HTD).
+ * Note that consumer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_cons.htd_max = v;
+	return 0;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_elem.h b/lib/librte_ring/rte_ring_rts_elem.h
new file mode 100644
index 000000000..71a331b23
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_elem.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_ELEM_H_
+#define _RTE_RING_RTS_ELEM_H_
+
+/**
+ * @file rte_ring_rts_elem.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ * Contains *ring_elem* functions for Relaxed Tail Sync (RTS) ring mode.
+ * for more details please refer to <rte_ring_rts.h>.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, void * const *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_generic.h b/lib/librte_ring/rte_ring_rts_generic.h
new file mode 100644
index 000000000..31a37924c
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_generic.h
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_GENERIC_H_
+#define _RTE_RING_RTS_GENERIC_H_
+
+/**
+ * @file rte_ring_rts_generic.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union rte_ring_ht_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	do {
+		ot.raw = ht->tail.raw;
+		rte_smp_rmb();
+
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = rte_atomic64_read((rte_atomic64_t *)
+			(uintptr_t)&ht->head.raw);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (rte_atomic64_cmpset(&ht->tail.raw, ot.raw, nt.raw) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union rte_ring_ht_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+	h->raw = ht->head.raw;
+	rte_smp_rmb();
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = ht->head.raw;
+		rte_smp_rmb();
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* read prod head (may spin on prod tail) */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			return 0;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (rte_atomic64_cmpset(&r->rts_prod.head.raw,
+			oh.raw, nh.raw) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* read cons head (may spin on cons tail) */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			return 0;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (rte_atomic64_cmpset(&r->rts_cons.head.raw,
+			oh.raw, nh.raw) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_GENERIC_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 4/9] test/ring: add contention stress test for RTS ring
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
                       ` (2 preceding siblings ...)
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 3/9] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 5/9] ring: introduce HTS ring mode Konstantin Ananyev
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test RTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_rts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_rts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 4eefaa887..3e15f3791 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -79,6 +79,7 @@ SRCS-y += test_rand_perf.c
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 827b04886..bb67a49f0 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_rts_stress.c b/app/test/test_ring_rts_stress.c
new file mode 100644
index 000000000..f5255f24c
--- /dev/null
+++ b/app/test/test_ring_rts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_rts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_rts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ);
+}
+
+const struct test test_ring_rts_stress = {
+	.name = "MT_RTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 60706f799..eab395e30 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -40,6 +40,9 @@ test_ring_stress(void)
 	n += test_ring_mpmc_stress.nb_case;
 	k += run_test(&test_ring_mpmc_stress);
 
+	n += test_ring_rts_stress.nb_case;
+	k += run_test(&test_ring_rts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 60eac6216..32aae2072 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -33,3 +33,4 @@ struct test {
 };
 
 extern const struct test test_ring_mpmc_stress;
+extern const struct test test_ring_rts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 5/9] ring: introduce HTS ring mode
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
                       ` (3 preceding siblings ...)
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce head/tail sync mode for MT ring synchronization.
In that mode enqueue/dequeue operation is fully serialized:
only one thread at a time is allowed to perform given op.
Suppose to reduce stall times in case when ring is used on
overcommitted cpus (multiple active threads on the same cpu).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   3 +
 lib/librte_ring/meson.build            |   3 +
 lib/librte_ring/rte_ring.c             |  20 ++-
 lib/librte_ring/rte_ring.h             |  31 ++++
 lib/librte_ring/rte_ring_elem.h        |  13 ++
 lib/librte_ring/rte_ring_hts.h         | 210 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    | 205 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_generic.h | 198 +++++++++++++++++++++++
 8 files changed, 681 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_hts_generic.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 8f5c284cc..6fe500f0d 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,9 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
+					rte_ring_hts.h \
+					rte_ring_hts_elem.h \
+					rte_ring_hts_generic.h \
 					rte_ring_rts.h \
 					rte_ring_rts_elem.h \
 					rte_ring_rts_generic.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 612936afb..8e86e037a 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,6 +6,9 @@ headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
+		'rte_ring_hts.h',
+		'rte_ring_hts_elem.h',
+		'rte_ring_hts_generic.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_elem.h',
 		'rte_ring_rts_generic.h')
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 222eec0fb..ebe5ccf0d 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -89,9 +89,11 @@ static void
 reset_headtail(void *p)
 {
 	struct rte_ring_headtail *ht;
+	struct rte_ring_hts_headtail *ht_hts;
 	struct rte_ring_rts_headtail *ht_rts;
 
 	ht = p;
+	ht_hts = p;
 	ht_rts = p;
 
 	switch (ht->sync_type) {
@@ -104,6 +106,9 @@ reset_headtail(void *p)
 		ht_rts->head.raw = 0;
 		ht_rts->tail.raw = 0;
 		break;
+	case RTE_RING_SYNC_MT_HTS:
+		ht_hts->ht.raw = 0;
+		break;
 	default:
 		/* unknown sync mode */
 		RTE_ASSERT(0);
@@ -127,9 +132,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	enum rte_ring_sync_type *cons_st)
 {
 	static const uint32_t prod_st_flags =
-		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ | RING_F_MP_HTS_ENQ);
 	static const uint32_t cons_st_flags =
-		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ | RING_F_MC_HTS_DEQ);
 
 	switch (flags & prod_st_flags) {
 	case 0:
@@ -141,6 +146,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MP_RTS_ENQ:
 		*prod_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MP_HTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -155,6 +163,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MC_RTS_DEQ:
 		*cons_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MC_HTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -176,6 +187,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_hts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
+
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
 		offsetof(struct rte_ring_rts_headtail, sync_type));
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index f6f084d79..6e4213afa 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -68,6 +68,7 @@ enum rte_ring_sync_type {
 	RTE_RING_SYNC_ST,     /**< single thread only */
 #ifdef ALLOW_EXPERIMENTAL_API
 	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
 #endif
 };
 
@@ -103,6 +104,19 @@ struct rte_ring_rts_headtail {
 	volatile union rte_ring_ht_poscnt head;
 };
 
+union rte_ring_ht_pos {
+	uint64_t raw;
+	struct {
+		uint32_t head; /**< head position */
+		uint32_t tail; /**< tail position */
+	} pos;
+};
+
+struct rte_ring_hts_headtail {
+	volatile union rte_ring_ht_pos ht;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+};
+
 /**
  * An RTE ring structure.
  *
@@ -133,6 +147,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail prod;
+		struct rte_ring_hts_headtail hts_prod;
 		struct rte_ring_rts_headtail rts_prod;
 	}  __rte_cache_aligned;
 
@@ -142,6 +157,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail cons;
+		struct rte_ring_hts_headtail hts_cons;
 		struct rte_ring_rts_headtail rts_cons;
 	}  __rte_cache_aligned;
 
@@ -164,6 +180,9 @@ struct rte_ring {
 #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
 #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
 
+#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP HTS". */
+#define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC HTS". */
+
 #define __IS_SP RTE_RING_SYNC_ST
 #define __IS_MP RTE_RING_SYNC_MT
 #define __IS_SC RTE_RING_SYNC_ST
@@ -494,6 +513,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 }
 
 #ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_hts.h>
 #include <rte_ring_rts.h>
 #endif
 
@@ -529,6 +549,9 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -676,6 +699,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 #ifdef ALLOW_EXPERIMENTAL_API
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
 #endif
 	}
 
@@ -1010,6 +1035,9 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -1103,6 +1131,9 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
 			available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
+			available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 5de0850dc..010a564c1 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -542,6 +542,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
 }
 
+#include <rte_ring_hts_elem.h>
 #include <rte_ring_rts_elem.h>
 
 /**
@@ -585,6 +586,9 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
 #endif
 	}
 
@@ -766,6 +770,9 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
@@ -951,6 +958,9 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
 			n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
 #endif
 	}
 
@@ -1060,6 +1070,9 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
new file mode 100644
index 000000000..062d7be6c
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_H_
+#define _RTE_RING_HTS_H_
+
+/**
+ * @file rte_ring_hts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
+ * In that mode enqueue/dequeue operation is fully serialized:
+ * at any given moement only one enqueue/dequeue operation can proceed.
+ * This is achieved by thread is allowed to proceed with changing head.value
+ * only when head.value == tail.value.
+ * Both head and tail values are updated atomically (as one 64-bit value).
+ * To achieve that 64-bit CAS is used by head update routine.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue(struct rte_ring *r, void * const *obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_hts_update_tail(&r->hts_prod, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue(struct rte_ring *r, void **obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_elem.h b/lib/librte_ring/rte_ring_hts_elem.h
new file mode 100644
index 000000000..34f0d121d
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_elem.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_ELEM_H_
+#define _RTE_RING_HTS_ELEM_H_
+
+/**
+ * @file rte_ring_hts_elem.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Contains *ring_elem* functions for Head-Tail Sync (HTS) ring mode.
+ * for more details please refer to <rte_ring_hts.h>.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue_elem(struct rte_ring *r, void * const *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_prod, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue_elem(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_generic.h b/lib/librte_ring/rte_ring_hts_generic.h
new file mode 100644
index 000000000..0b3931ffa
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_generic.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_GENERIC_H_
+#define _RTE_RING_HTS_GENERIC_H_
+
+/**
+ * @file rte_ring_hts_generic.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	union rte_ring_ht_pos p;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
+
+	p.pos.head = p.pos.tail + num;
+	p.pos.tail = p.pos.head;
+
+	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union rte_ring_ht_pos *p)
+{
+	p->raw = rte_atomic64_read((rte_atomic64_t *)
+			(uintptr_t)&ht->ht.raw);
+
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = rte_atomic64_read((rte_atomic64_t *)
+				(uintptr_t)&ht->ht.raw);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			return 0;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
+			op.raw, np.raw) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			return 0;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (rte_atomic64_cmpset(&r->hts_cons.ht.raw,
+			op.raw, np.raw) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_GENERIC_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 6/9] test/ring: add contention stress test for HTS ring
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
                       ` (4 preceding siblings ...)
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 5/9] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 7/9] ring: introduce peek style API Konstantin Ananyev
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test HTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_hts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_hts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 3e15f3791..72f64e30e 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
+SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
diff --git a/app/test/meson.build b/app/test/meson.build
index bb67a49f0..d6504a08a 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
+	'test_ring_hts_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_hts_stress.c b/app/test/test_ring_hts_stress.c
new file mode 100644
index 000000000..edc9175cb
--- /dev/null
+++ b/app/test/test_ring_hts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_hts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_hts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_hts_stress = {
+	.name = "MT_HTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index eab395e30..29a1368d7 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -43,6 +43,9 @@ test_ring_stress(void)
 	n += test_ring_rts_stress.nb_case;
 	k += run_test(&test_ring_rts_stress);
 
+	n += test_ring_hts_stress.nb_case;
+	k += run_test(&test_ring_hts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 32aae2072..9a87c7f7b 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -34,3 +34,4 @@ struct test {
 
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
+extern const struct test test_ring_hts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 7/9] ring: introduce peek style API
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
                       ` (5 preceding siblings ...)
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
mode, provide an ability to split enqueue/dequeue operation
into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
That allows user to inspect objects in the ring without removing
them from it (aka MT safe peek).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   1 +
 lib/librte_ring/meson.build            |   1 +
 lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
 lib/librte_ring/rte_ring_elem.h        |   4 +
 lib/librte_ring/rte_ring_generic.h     |  48 ++++
 lib/librte_ring/rte_ring_hts_generic.h |  47 ++-
 lib/librte_ring/rte_ring_peek.h        | 379 +++++++++++++++++++++++++
 7 files changed, 519 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_peek.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 6fe500f0d..5f8662737 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,6 +22,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_hts.h \
 					rte_ring_hts_elem.h \
 					rte_ring_hts_generic.h \
+					rte_ring_peek.h \
 					rte_ring_rts.h \
 					rte_ring_rts_elem.h \
 					rte_ring_rts_generic.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 8e86e037a..f5f84dc6e 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -9,6 +9,7 @@ headers = files('rte_ring.h',
 		'rte_ring_hts.h',
 		'rte_ring_hts_elem.h',
 		'rte_ring_hts_generic.h',
+		'rte_ring_peek.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_elem.h',
 		'rte_ring_rts_generic.h')
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 0fb73a337..bb3096721 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -10,6 +10,50 @@
 #ifndef _RTE_RING_C11_MEM_H_
 #define _RTE_RING_C11_MEM_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	RTE_SET_USED(enqueue);
+
+	pos = tail + num;
+	ht->head = pos;
+	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 010a564c1..5bf7c1c1b 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -1083,6 +1083,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	return 0;
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_peek.h>
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
index 953cdbbd5..9f5fdf13b 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -10,6 +10,54 @@
 #ifndef _RTE_RING_GENERIC_H_
 #define _RTE_RING_GENERIC_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	pos = tail + num;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	ht->head = pos;
+	ht->tail = pos;
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_hts_generic.h b/lib/librte_ring/rte_ring_hts_generic.h
index 0b3931ffa..7eac761f9 100644
--- a/lib/librte_ring/rte_ring_hts_generic.h
+++ b/lib/librte_ring/rte_ring_hts_generic.h
@@ -18,9 +18,38 @@
  * For more information please refer to <rte_ring_hts.h>.
  */
 
+/**
+ * @internal get current tail value.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union rte_ring_ht_pos p;
+
+	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
 static __rte_always_inline void
-__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
-	uint32_t enqueue)
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
 {
 	union rte_ring_ht_pos p;
 
@@ -29,14 +58,22 @@ __rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
 	else
 		rte_smp_rmb();
 
-	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
-
-	p.pos.head = p.pos.tail + num;
+	p.pos.head = tail + num;
 	p.pos.tail = p.pos.head;
 
 	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);
 }
 
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	uint32_t tail;
+
+	num = __rte_ring_hts_get_tail(ht, &tail, num);
+	__rte_ring_hts_set_head_tail(ht, tail, num, enqueue);
+}
+
 /**
  * @internal waits till tail will become equal to head.
  * Means no writer/reader is active for that ring.
diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
new file mode 100644
index 000000000..baefd2f7b
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek.h
@@ -0,0 +1,379 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_H_
+#define _RTE_RING_PEEK_H_
+
+/**
+ * @file
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Ring Peek AP
+ * Introduction of rte_ring with serialized producer/consumer (HTS sync mode)
+ * makes possible to split public enqueue/dequeue API into two phases:
+ * - enqueue/dequeue start
+ * - enqueue/dequeue finish
+ * That allows user to inspect objects in the ring without removing them
+ * from it (aka MT safe peek).
+ * Note that right now this new API is avaialble only for two sync modes:
+ * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
+ * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
+ * It is a user responsibility to create/init ring with appropriate sync
+ * modes selected.
+ * As an example:
+ * // read 1 elem from the ring:
+ * n = rte_ring_hts_dequeue_bulk_start(ring, &obj, 1, NULL);
+ * if (n != 0) {
+ *    //examine object
+ *    if (object_examine(obj) == KEEP)
+ *       //decided to keep it in the ring.
+ *       rte_ring_hts_dequeue_finish(ring, 0);
+ *    else
+ *       //decided to remove it from the ring.
+ *       rte_ring_hts_dequeue_finish(ring, n);
+ * }
+ * Note that between _start_ and _finish_ the ring is sort of locked -
+ * none other thread can proceed with enqueue(/dequeue) operation till
+ * _finish_ will complete.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal This function moves prod head value.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_start(struct rte_ring *r, uint32_t n,
+		enum rte_ring_queue_behavior behavior, uint32_t *free_space)
+{
+	uint32_t free, head, next;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &free);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_prod_head(r, n, behavior,
+			&head, &free);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_VARIABLE,
+			free_space);
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_elem_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int esize, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_st_set_head_tail(&r->prod, tail, n, 1);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_hts_set_head_tail(&r->hts_prod, tail, n, 1);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int n)
+{
+	rte_ring_enqueue_elem_finish(r, obj_table, sizeof(uintptr_t), n);
+}
+
+/**
+ * @internal This function moves cons head value and copies up to *n*
+ * objects from the ring to the user provided obj_table.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_start(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t avail, head, next;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_cons_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &avail);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_cons_head(r, n, behavior,
+			&head, &avail);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (n != 0)
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+
+	if (available != NULL)
+		*available = avail - n;
+	return n;
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem_start(struct rte_ring *r, void **obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   Actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_bulk_elem_start(r, obj_table, sizeof(uintptr_t),
+		n, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void **obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_burst_elem_start(r, obj_table,
+		sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->cons, &tail, n);
+		__rte_ring_st_set_head_tail(&r->cons, tail, n, 0);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_PEEK_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 8/9] test/ring: add stress test for MT peek API
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
                       ` (6 preceding siblings ...)
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 7/9] ring: introduce peek style API Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 9/9] ring: add C11 memory model for new sync modes Konstantin Ananyev
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test MT peek API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |  1 +
 app/test/meson.build             |  1 +
 app/test/test_ring_peek_stress.c | 43 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c      |  3 +++
 app/test/test_ring_stress.h      |  1 +
 5 files changed, 49 insertions(+)
 create mode 100644 app/test/test_ring_peek_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 72f64e30e..f4c987a98 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -80,6 +80,7 @@ SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_peek_stress.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
diff --git a/app/test/meson.build b/app/test/meson.build
index d6504a08a..cfcf5b81d 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_hts_stress.c',
+	'test_ring_peek_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_peek_stress.c b/app/test/test_ring_peek_stress.c
new file mode 100644
index 000000000..cfc82d728
--- /dev/null
+++ b/app/test/test_ring_peek_stress.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+#include <rte_ring_elem.h>
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_start(r, obj, n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_finish(r, obj, n);
+	return n;
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_peek_stress = {
+	.name = "MT_PEEK",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 29a1368d7..853fcc190 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -46,6 +46,9 @@ test_ring_stress(void)
 	n += test_ring_hts_stress.nb_case;
 	k += run_test(&test_ring_hts_stress);
 
+	n += test_ring_peek_stress.nb_case;
+	k += run_test(&test_ring_peek_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 9a87c7f7b..60953ce47 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -35,3 +35,4 @@ struct test {
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
 extern const struct test test_ring_hts_stress;
+extern const struct test test_ring_peek_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v2 9/9] ring: add C11 memory model for new sync modes
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
                       ` (7 preceding siblings ...)
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
@ 2020-04-02 22:09     ` Konstantin Ananyev
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-02 22:09 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Add C11 atomics based implementation for RTS and HTS
head/tail update primitivies.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   4 +-
 lib/librte_ring/meson.build            |   2 +
 lib/librte_ring/rte_ring_hts.h         |   4 +
 lib/librte_ring/rte_ring_hts_c11_mem.h | 222 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    |   4 +
 lib/librte_ring/rte_ring_rts.h         |   4 +
 lib/librte_ring/rte_ring_rts_c11_mem.h | 198 ++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    |   4 +
 8 files changed, 441 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 5f8662737..927d105bf 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,9 +22,11 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_hts.h \
 					rte_ring_hts_elem.h \
 					rte_ring_hts_generic.h \
+					rte_ring_hts_c11_mem.h \
 					rte_ring_peek.h \
 					rte_ring_rts.h \
 					rte_ring_rts_elem.h \
-					rte_ring_rts_generic.h
+					rte_ring_rts_generic.h \
+					rte_ring_rts_c11_mem.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index f5f84dc6e..f2e37a8e4 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -7,10 +7,12 @@ headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
 		'rte_ring_hts.h',
+		'rte_ring_hts_c11_mem.h',
 		'rte_ring_hts_elem.h',
 		'rte_ring_hts_generic.h',
 		'rte_ring_peek.h',
 		'rte_ring_rts.h',
+		'rte_ring_rts_c11_mem.h',
 		'rte_ring_rts_elem.h',
 		'rte_ring_rts_generic.h')
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
index 062d7be6c..ddaa47ff1 100644
--- a/lib/librte_ring/rte_ring_hts.h
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -29,7 +29,11 @@
 extern "C" {
 #endif
 
+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_hts_c11_mem.h>
+#else
 #include <rte_ring_hts_generic.h>
+#endif
 
 /**
  * @internal Enqueue several objects on the HTS ring.
diff --git a/lib/librte_ring/rte_ring_hts_c11_mem.h b/lib/librte_ring/rte_ring_hts_c11_mem.h
new file mode 100644
index 000000000..ce0f15f8f
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_c11_mem.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_C11_MEM_H_
+#define _RTE_RING_HTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_hts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+/**
+ * @internal get current tail value.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union rte_ring_ht_pos p;
+
+	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	union rte_ring_ht_pos p;
+
+	RTE_SET_USED(enqueue);
+
+	p.pos.head = tail + num;
+	p.pos.tail = p.pos.head;
+
+	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+}
+
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	uint32_t tail;
+
+	num = __rte_ring_hts_get_tail(ht, &tail, num);
+	__rte_ring_hts_set_head_tail(ht, tail, num, enqueue);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union rte_ring_ht_pos *p)
+{
+	p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* wait for tail to be equal to head, , acquire point */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			return 0;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			return 0;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_C11_MEM_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_elem.h b/lib/librte_ring/rte_ring_hts_elem.h
index 34f0d121d..1e9a49c7a 100644
--- a/lib/librte_ring/rte_ring_hts_elem.h
+++ b/lib/librte_ring/rte_ring_hts_elem.h
@@ -24,7 +24,11 @@
 extern "C" {
 #endif
 
+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_hts_c11_mem.h>
+#else
 #include <rte_ring_hts_generic.h>
+#endif
 
 /**
  * @internal Enqueue several objects on the HTS ring.
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
index 18404fe48..28b2d25f5 100644
--- a/lib/librte_ring/rte_ring_rts.h
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -55,7 +55,11 @@
 extern "C" {
 #endif
 
+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_rts_c11_mem.h>
+#else
 #include <rte_ring_rts_generic.h>
+#endif
 
 /**
  * @internal Enqueue several objects on the RTS ring.
diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h b/lib/librte_ring/rte_ring_rts_c11_mem.h
new file mode 100644
index 000000000..19d3ea288
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_C11_MEM_H_
+#define _RTE_RING_RTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_rts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union rte_ring_ht_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union rte_ring_ht_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+	h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* read prod head (may spin on prod tail, acquire point) */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			return 0;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* read cons head (may spin on cons tail, acquire point) */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			return 0;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
+			&oh.raw, nh.raw,
+			1, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_C11_MEM_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_elem.h b/lib/librte_ring/rte_ring_rts_elem.h
index 71a331b23..23d8aeec7 100644
--- a/lib/librte_ring/rte_ring_rts_elem.h
+++ b/lib/librte_ring/rte_ring_rts_elem.h
@@ -24,7 +24,11 @@
 extern "C" {
 #endif
 
+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_rts_c11_mem.h>
+#else
 #include <rte_ring_rts_generic.h>
+#endif
 
 /**
  * @internal Enqueue several objects on the RTS ring.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 0/9] New sync modes for ring
  2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
                       ` (8 preceding siblings ...)
  2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 9/9] ring: add C11 memory model for new sync modes Konstantin Ananyev
@ 2020-04-03 17:42     ` Konstantin Ananyev
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test Konstantin Ananyev
                         ` (9 more replies)
  9 siblings, 10 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

V2 - V3
1. Few more compilation fixes (for gcc 4.8.X)
2. Extra update devtools/libabigail.abignore (workaround) 

V1 - v2 changes:
1. Fix compilation issues
2. Add C11 atomics support
3. Updates devtools/libabigail.abignore (workaround)

RFC - V1 changes:
1. remove ABI brekage (at least I hope I did)
2. Add support for ring_elem
3. Rework peek related API a bit
4. Rework test to make it less verbose and unite all test-cases
   in one command
5. Add new test-case for MT peek API

TODO list:
1. Update docs

These days more and more customers use(/try to use) DPDK based apps within
overcommitted systems (multiple acttive threads over same pysical cores):
VM, container deployments, etc.
One quite common problem they hit:
Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
LHP is quite a common problem for spin-based sync primitives
(spin-locks, etc.) on overcommitted systems.
The situation gets much worse when some sort of
fair-locking technique is used (ticket-lock, etc.).
As now not only lock-owner but also lock-waiters scheduling
order matters a lot (LWP).
These two problems are well-known for kernel within VMs:
http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
The problem with rte_ring is that while head accusion is sort of
un-fair locking, waiting on tail is very similar to ticket lock schema -
tail has to be updated in particular order.
That makes current rte_ring implementation to perform
really pure on some overcommited scenarios.
It is probably not possible to completely resolve LHP problem in
userspace only (without some kernel communication/intervention).
But removing fairness at tail update helps to avoid LWP and
can mitigate the situation significantly.
This patch proposes two new optional ring synchronization modes:
1) Head/Tail Sync (HTS) mode
In that mode enqueue/dequeue operation is fully serialized:
    only one thread at a time is allowed to perform given op.
    As another enhancement provide ability to split enqueue/dequeue
    operation into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
    That allows user to inspect objects in the ring without removing
    them from it (aka MT safe peek).
2) Relaxed Tail Sync (RTS)
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Note that these new sync modes are optional.
For current rte_ring users nothing should change
(both in terms of API/ABI and performance).
Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
way (via flags and _init_), and MP/MC remains as default one.
The only thing that changed:
Format of prod/cons now could differ depending on mode selected at _init_.
So user has to stick with one sync model through whole ring lifetime.
In other words, user can't create a ring for let say SP mode and then
in the middle of data-path change his mind and start using MP_RTS mode.
For existing modes (SP/MP, SC/MC) format remains the same and
user can still use them interchangeably, though of course it is an
error prone practice.

Test results on IA (see below) show significant improvements
for average enqueue/dequeue op times on overcommitted systems.
For 'classic' DPDK deployments (one thread per core) original MP/MC
algorithm still shows best numbers, though for 64-bit target
RTS numbers are not that far away.
Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'

X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj

                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51

2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68

8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12

i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj

                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91

2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65

8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

Konstantin Ananyev (9):
  test/ring: add contention stress test
  ring: prepare ring to allow new sync schemes
  ring: introduce RTS ring mode
  test/ring: add contention stress test for RTS ring
  ring: introduce HTS ring mode
  test/ring: add contention stress test for HTS ring
  ring: introduce peek style API
  test/ring: add stress test for MT peek API
  ring: add C11 memory model for new sync modes

 app/test/Makefile                      |   5 +
 app/test/meson.build                   |   5 +
 app/test/test_pdump.c                  |   6 +-
 app/test/test_ring_hts_stress.c        |  32 ++
 app/test/test_ring_mpmc_stress.c       |  31 ++
 app/test/test_ring_peek_stress.c       |  43 +++
 app/test/test_ring_rts_stress.c        |  32 ++
 app/test/test_ring_stress.c            |  57 ++++
 app/test/test_ring_stress.h            |  38 +++
 app/test/test_ring_stress_impl.h       | 444 +++++++++++++++++++++++++
 devtools/libabigail.abignore           |   7 +
 lib/librte_pdump/rte_pdump.c           |   2 +-
 lib/librte_port/rte_port_ring.c        |  12 +-
 lib/librte_ring/Makefile               |  11 +-
 lib/librte_ring/meson.build            |  11 +-
 lib/librte_ring/rte_ring.c             | 114 ++++++-
 lib/librte_ring/rte_ring.h             | 244 ++++++++++++--
 lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
 lib/librte_ring/rte_ring_elem.h        | 105 +++++-
 lib/librte_ring/rte_ring_generic.h     |  48 +++
 lib/librte_ring/rte_ring_hts.h         | 214 ++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h | 222 +++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    | 209 ++++++++++++
 lib/librte_ring/rte_ring_hts_generic.h | 235 +++++++++++++
 lib/librte_ring/rte_ring_peek.h        | 379 +++++++++++++++++++++
 lib/librte_ring/rte_ring_rts.h         | 320 ++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h | 198 +++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    | 209 ++++++++++++
 lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++
 29 files changed, 3431 insertions(+), 56 deletions(-)
 create mode 100644 app/test/test_ring_hts_stress.c
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_peek_stress.c
 create mode 100644 app/test/test_ring_rts_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_hts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_hts_generic.h
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_generic.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-08  4:59         ` Honnappa Nagarahalli
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
                         ` (8 subsequent siblings)
  9 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test-case to measure ring perfomance under contention
(miltiple producers/consumers).
Starts dequeue/enqueue loop on all available slave lcores.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |   2 +
 app/test/meson.build             |   2 +
 app/test/test_ring_mpmc_stress.c |  31 +++
 app/test/test_ring_stress.c      |  48 ++++
 app/test/test_ring_stress.h      |  35 +++
 app/test/test_ring_stress_impl.h | 444 +++++++++++++++++++++++++++++++
 6 files changed, 562 insertions(+)
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h

diff --git a/app/test/Makefile b/app/test/Makefile
index 1f080d162..4eefaa887 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 351d29cb6..827b04886 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,7 +100,9 @@ test_sources = files('commands.c',
 	'test_rib.c',
 	'test_rib6.c',
 	'test_ring.c',
+	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_mpmc_stress.c b/app/test/test_ring_mpmc_stress.c
new file mode 100644
index 000000000..1524b0248
--- /dev/null
+++ b/app/test/test_ring_mpmc_stress.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num, 0);
+}
+
+const struct test test_ring_mpmc_stress = {
+	.name = "MP/MC",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
new file mode 100644
index 000000000..60706f799
--- /dev/null
+++ b/app/test/test_ring_stress.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static int
+run_test(const struct test *test)
+{
+	int32_t rc;
+	uint32_t i, k;
+
+	for (i = 0, k = 0; i != test->nb_case; i++) {
+
+		printf("TEST-CASE %s %s START\n",
+			test->name, test->cases[i].name);
+
+		rc = test->cases[i].func(test->cases[i].wfunc);
+		k += (rc == 0);
+
+		if (rc != 0)
+			printf("TEST-CASE %s %s FAILED\n",
+				test->name, test->cases[i].name);
+		else
+			printf("TEST-CASE %s %s OK\n",
+				test->name, test->cases[i].name);
+	}
+
+	return k;
+}
+
+static int
+test_ring_stress(void)
+{
+	uint32_t n, k;
+
+	n = 0;
+	k = 0;
+
+	n += test_ring_mpmc_stress.nb_case;
+	k += run_test(&test_ring_mpmc_stress);
+
+	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+		n, k, n - k);
+	return (k != n);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
new file mode 100644
index 000000000..60eac6216
--- /dev/null
+++ b/app/test/test_ring_stress.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <rte_ring.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+#include <rte_random.h>
+#include <rte_malloc.h>
+#include <rte_spinlock.h>
+
+#include "test.h"
+
+struct test_case {
+	const char *name;
+	int (*func)(int (*)(void *));
+	int (*wfunc)(void *arg);
+};
+
+struct test {
+	const char *name;
+	uint32_t nb_case;
+	const struct test_case *cases;
+};
+
+extern const struct test test_ring_mpmc_stress;
diff --git a/app/test/test_ring_stress_impl.h b/app/test/test_ring_stress_impl.h
new file mode 100644
index 000000000..11476d28c
--- /dev/null
+++ b/app/test/test_ring_stress_impl.h
@@ -0,0 +1,444 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+/*
+ * Measures performance of ring enqueue/dequeue under high contention
+ */
+
+#define RING_NAME	"RING_STRESS"
+#define BULK_NUM	32
+#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
+
+enum {
+	WRK_CMD_STOP,
+	WRK_CMD_RUN,
+};
+
+static volatile uint32_t wrk_cmd __rte_cache_aligned;
+
+/* test run-time in seconds */
+static const uint32_t run_time = 60;
+static const uint32_t verbose;
+
+struct lcore_stat {
+	uint64_t nb_cycle;
+	struct {
+		uint64_t nb_call;
+		uint64_t nb_obj;
+		uint64_t nb_cycle;
+		uint64_t max_cycle;
+		uint64_t min_cycle;
+	} op;
+};
+
+struct lcore_arg {
+	struct rte_ring *rng;
+	struct lcore_stat stats;
+} __rte_cache_aligned;
+
+struct ring_elem {
+	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)];
+} __rte_cache_aligned;
+
+/*
+ * redefinable functions
+ */
+static uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail);
+
+static uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free);
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
+
+
+static void
+lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
+	uint64_t tm, int32_t prcs)
+{
+	ls->op.nb_call += call;
+	ls->op.nb_obj += obj;
+	ls->op.nb_cycle += tm;
+	if (prcs) {
+		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
+		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
+	}
+}
+
+static void
+lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+
+	ms->op.nb_call += ls->op.nb_call;
+	ms->op.nb_obj += ls->op.nb_obj;
+	ms->op.nb_cycle += ls->op.nb_cycle;
+	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
+	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle);
+}
+
+static void
+lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
+	lcore_op_stat_aggr(ms, ls);
+}
+
+static void
+lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls)
+{
+	long double st;
+
+	st = (long double)rte_get_timer_hz() / US_PER_S;
+
+	if (lc == UINT32_MAX)
+		fprintf(f, "%s(AGGREGATE)={\n", __func__);
+	else
+		fprintf(f, "%s(lc=%u)={\n", __func__, lc);
+
+	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
+		ls->nb_cycle, (long double)ls->nb_cycle / st);
+
+	fprintf(f, "\tDEQ+ENQ={\n");
+
+	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
+	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
+	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
+	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_obj / ls->op.nb_call);
+	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_obj);
+	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_call);
+
+	/* if min/max cycles per call stats was collected */
+	if (ls->op.min_cycle != UINT64_MAX) {
+		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.max_cycle,
+			(long double)ls->op.max_cycle / st);
+		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.min_cycle,
+			(long double)ls->op.min_cycle / st);
+	}
+
+	fprintf(f, "\t},\n");
+	fprintf(f, "};\n");
+}
+
+static void
+fill_ring_elm(struct ring_elem *elm, uint32_t fill)
+{
+	uint32_t i;
+
+	for (i = 0; i != RTE_DIM(elm->cnt); i++)
+		elm->cnt[i] = fill;
+}
+
+static int32_t
+check_updt_elem(struct ring_elem *elm[], uint32_t num,
+	const struct ring_elem *check, const struct ring_elem *fill)
+{
+	uint32_t i;
+
+	static rte_spinlock_t dump_lock;
+
+	for (i = 0; i != num; i++) {
+		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
+			rte_spinlock_lock(&dump_lock);
+			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
+				"offending object: %p\n",
+				__func__, rte_lcore_id(), num, i, elm[i]);
+			rte_memdump(stdout, "expected", check, sizeof(*check));
+			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
+			rte_spinlock_unlock(&dump_lock);
+			return -EINVAL;
+		}
+		memcpy(elm[i], fill, sizeof(*elm[i]));
+	}
+
+	return 0;
+}
+
+static int
+check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
+	const char *fname, const char *opname)
+{
+	if (exp != res) {
+		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
+			fname, lc, opname, exp, res);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static int
+test_worker_prcs(void *arg)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl, tm0, tm1;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		tm0 = rte_rdtsc_precise();
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+		tm0 = rte_rdtsc_precise() - tm0;
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		tm1 = rte_rdtsc_precise();
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+		tm1 = rte_rdtsc_precise() - tm1;
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, 1);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	la->stats.nb_cycle = rte_rdtsc_precise() - cl;
+	return rc;
+}
+
+static int
+test_worker_avg(void *arg)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, __func__,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, 0, 0);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	/* final stats update */
+	cl = rte_rdtsc_precise() - cl;
+	lcore_stat_update(&la->stats, 0, 0, cl, 0);
+	la->stats.nb_cycle = cl;
+
+	return rc;
+}
+
+static void
+mt1_fini(struct rte_ring *rng, void *data)
+{
+	rte_free(rng);
+	rte_free(data);
+}
+
+static int
+mt1_init(struct rte_ring **rng, void **data, uint32_t num)
+{
+	int32_t rc;
+	size_t sz;
+	uint32_t i, nr;
+	struct rte_ring *r;
+	struct ring_elem *elm;
+	void *p;
+
+	*rng = NULL;
+	*data = NULL;
+
+	sz = num * sizeof(*elm);
+	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
+	if (elm == NULL) {
+		printf("%s: alloc(%zu) for %u elems data failed",
+			__func__, sz, num);
+		return -ENOMEM;
+	}
+
+	*data = elm;
+
+	/* alloc ring */
+	nr = 2 * num;
+	sz = rte_ring_get_memsize(nr);
+	r = rte_zmalloc(NULL, sz, __alignof__(*r));
+	if (r == NULL) {
+		printf("%s: alloc(%zu) for FIFO with %u elems failed",
+			__func__, sz, nr);
+		return -ENOMEM;
+	}
+
+	*rng = r;
+
+	rc = _st_ring_init(r, RING_NAME, nr);
+	if (rc != 0) {
+		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
+			__func__, r, nr, rc, strerror(-rc));
+		return rc;
+	}
+
+	for (i = 0; i != num; i++) {
+		fill_ring_elm(elm + i, UINT32_MAX);
+		p = elm + i;
+		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
+			break;
+	}
+
+	if (i != num) {
+		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
+			__func__, r, num, i);
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
+static int
+test_mt1(int (*test)(void *))
+{
+	int32_t rc;
+	uint32_t lc, mc;
+	struct rte_ring *r;
+	void *data;
+	struct lcore_arg arg[RTE_MAX_LCORE];
+
+	static const struct lcore_stat init_stat = {
+		.op.min_cycle = UINT64_MAX,
+	};
+
+	rc = mt1_init(&r, &data, RING_SIZE);
+	if (rc != 0) {
+		mt1_fini(r, data);
+		return rc;
+	}
+
+	memset(arg, 0, sizeof(arg));
+
+	/* launch on all slaves */
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		arg[lc].rng = r;
+		arg[lc].stats = init_stat;
+		rte_eal_remote_launch(test, &arg[lc], lc);
+	}
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_RUN;
+	rte_smp_wmb();
+
+	usleep(run_time * US_PER_S);
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_STOP;
+	rte_smp_wmb();
+
+	/* wait for slaves and collect stats. */
+	mc = rte_lcore_id();
+	arg[mc].stats = init_stat;
+
+	rc = 0;
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		rc |= rte_eal_wait_lcore(lc);
+		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
+		if (verbose != 0)
+			lcore_stat_dump(stdout, lc, &arg[lc].stats);
+	}
+
+	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+	mt1_fini(r, data);
+	return rc;
+}
+
+static const struct test_case tests[] = {
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
+		.func = test_mt1,
+		.wfunc = test_worker_prcs,
+	},
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
+		.func = test_mt1,
+		.wfunc = test_worker_avg,
+	},
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 2/9] ring: prepare ring to allow new sync schemes
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-08  4:59         ` Honnappa Nagarahalli
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode Konstantin Ananyev
                         ` (7 subsequent siblings)
  9 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Change from *single* to *sync_type* to allow different
synchronisation schemes to be applied.
Mark *single* as deprecated in comments.
Add new functions to allow user to query ring sync types.
Replace direct access to *single* with appopriate function call.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_pdump.c           |   6 +-
 lib/librte_pdump/rte_pdump.c    |   2 +-
 lib/librte_port/rte_port_ring.c |  12 ++--
 lib/librte_ring/rte_ring.c      |   6 +-
 lib/librte_ring/rte_ring.h      | 113 ++++++++++++++++++++++++++------
 lib/librte_ring/rte_ring_elem.h |   8 +--
 6 files changed, 108 insertions(+), 39 deletions(-)

diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index ad183184c..6a1180bcb 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -57,8 +57,7 @@ run_pdump_client_tests(void)
 	if (ret < 0)
 		return -1;
 	mp->flags = 0x0000;
-	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
-				      RING_F_SP_ENQ | RING_F_SC_DEQ);
+	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
 	if (ring_client == NULL) {
 		printf("rte_ring_create SR0 failed");
 		return -1;
@@ -71,9 +70,6 @@ run_pdump_client_tests(void)
 	}
 	rte_eth_dev_probing_finish(eth_dev);
 
-	ring_client->prod.single = 0;
-	ring_client->cons.single = 0;
-
 	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
 
 	for (itr = 0; itr < NUM_ITR; itr++) {
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 8a01ac510..65364f2c5 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 		rte_errno = EINVAL;
 		return -1;
 	}
-	if (ring->prod.single || ring->cons.single) {
+	if (rte_ring_prod_single(ring) || rte_ring_cons_single(ring)) {
 		PDUMP_LOG(ERR, "ring with either SP or SC settings"
 		" is not valid for pdump, should have MP and MC settings\n");
 		rte_errno = EINVAL;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 47fcdd06a..2f6c050fa 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->cons.single && is_multi) ||
-		(!(conf->ring->cons.single) && !is_multi)) {
+		(rte_ring_cons_single(conf->ring) && is_multi) ||
+		(!rte_ring_cons_single(conf->ring) && !is_multi)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
 	}
@@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
@@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 77e5de099..fa5733907 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
-	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
+	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 18fc5d845..d4775a063 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -61,11 +61,27 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-/* structure to hold a pair of head/tail values and other metadata */
+/** prod/cons sync types */
+enum rte_ring_sync_type {
+	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
+	RTE_RING_SYNC_ST,     /**< single thread only */
+};
+
+/**
+ * structure to hold a pair of head/tail values and other metadata.
+ * Depending on sync_type format of that structure might be different,
+ * but offset for *sync_type* and *tail* values should remain the same.
+ */
 struct rte_ring_headtail {
-	volatile uint32_t head;  /**< Prod/consumer head. */
-	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t single;         /**< True if single prod/cons */
+	volatile uint32_t head;      /**< prod/consumer head. */
+	volatile uint32_t tail;      /**< prod/consumer tail. */
+	RTE_STD_C11
+	union {
+		/** sync type of prod/cons */
+		enum rte_ring_sync_type sync_type;
+		/** deprecated -  True if single prod/cons */
+		uint32_t single;
+	};
 };
 
 /**
@@ -116,11 +132,10 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
-/* @internal defines for passing to the enqueue dequeue worker functions */
-#define __IS_SP 1
-#define __IS_MP 0
-#define __IS_SC 1
-#define __IS_MC 0
+#define __IS_SP RTE_RING_SYNC_ST
+#define __IS_MP RTE_RING_SYNC_MT
+#define __IS_SC RTE_RING_SYNC_ST
+#define __IS_MC RTE_RING_SYNC_MT
 
 /**
  * Calculate the memory size needed for a ring
@@ -420,7 +435,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MP, free_space);
+			RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -443,7 +458,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SP, free_space);
+			RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -470,7 +485,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -554,7 +569,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MC, available);
+			RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -578,7 +593,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SC, available);
+			RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -605,7 +620,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 /**
@@ -777,6 +792,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
 	return r->capacity;
 }
 
+/**
+ * Return sync type used by producer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_prod_sync_type(const struct rte_ring *r)
+{
+	return r->prod.sync_type;
+}
+
+/**
+ * Check is the ring for single producer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SP, zero otherwise.
+ */
+static inline int
+rte_ring_prod_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
+/**
+ * Return sync type used by consumer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_cons_sync_type(const struct rte_ring *r)
+{
+	return r->cons.sync_type;
+}
+
+/**
+ * Check is the ring for single consumer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SC, zero otherwise.
+ */
+static inline int
+rte_ring_cons_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -820,7 +891,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -843,7 +914,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -870,7 +941,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -898,7 +969,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -923,7 +994,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -951,7 +1022,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 663addc73..28f9836e6 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -570,7 +570,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
 }
 
 /**
@@ -734,7 +734,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
 }
 
 /**
@@ -902,7 +902,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
 }
 
 /**
@@ -995,7 +995,7 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test Konstantin Ananyev
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-04 17:27         ` Wang, Haiyue
  2020-04-08  5:00         ` Honnappa Nagarahalli
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
                         ` (6 subsequent siblings)
  9 siblings, 2 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
Aim to reduce stall times in case when ring is used on
overcommited cpus (multiple active threads on the same cpu).
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

check-abi.sh reports what I believe is a false-positive about
ring cons/prod changes. As a workaround, devtools/libabigail.abignore is
updated to suppress *struct ring* related errors.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 devtools/libabigail.abignore           |   7 +
 lib/librte_ring/Makefile               |   5 +-
 lib/librte_ring/meson.build            |   5 +-
 lib/librte_ring/rte_ring.c             | 100 +++++++-
 lib/librte_ring/rte_ring.h             | 110 ++++++++-
 lib/librte_ring/rte_ring_elem.h        |  86 ++++++-
 lib/librte_ring/rte_ring_rts.h         | 316 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++++++
 lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++
 9 files changed, 1015 insertions(+), 29 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_generic.h

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index a59df8f13..cd86d89ca 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -11,3 +11,10 @@
         type_kind = enum
         name = rte_crypto_asym_xform_type
         changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+; Ignore updates of ring prod/cons
+[suppress_type]
+        type_kind = struct
+        name = rte_ring
+[suppress_type]
+        type_kind = struct
+        name = rte_event_ring
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 917c560ad..8f5c284cc 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -18,6 +18,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_rts.h \
+					rte_ring_rts_elem.h \
+					rte_ring_rts_generic.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index f2f3ccc88..612936afb 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -5,7 +5,10 @@ sources = files('rte_ring.c')
 headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_rts.h',
+		'rte_ring_rts_elem.h',
+		'rte_ring_rts_generic.h')
 
 # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
 allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index fa5733907..222eec0fb 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
+/* by default set head/tail distance as 1/8 of ring capacity */
+#define HTD_MAX_DEF	8
+
 /* return the size of memory occupied by a ring */
 ssize_t
 rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
@@ -79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
 	return rte_ring_get_memsize_elem(sizeof(void *), count);
 }
 
+/*
+ * internal helper function to reset prod/cons head-tail values.
+ */
+static void
+reset_headtail(void *p)
+{
+	struct rte_ring_headtail *ht;
+	struct rte_ring_rts_headtail *ht_rts;
+
+	ht = p;
+	ht_rts = p;
+
+	switch (ht->sync_type) {
+	case RTE_RING_SYNC_MT:
+	case RTE_RING_SYNC_ST:
+		ht->head = 0;
+		ht->tail = 0;
+		break;
+	case RTE_RING_SYNC_MT_RTS:
+		ht_rts->head.raw = 0;
+		ht_rts->tail.raw = 0;
+		break;
+	default:
+		/* unknown sync mode */
+		RTE_ASSERT(0);
+	}
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+	reset_headtail(&r->prod);
+	reset_headtail(&r->cons);
+}
+
+/*
+ * helper function, calculates sync_type values for prod and cons
+ * based on input flags. Returns zero at success or negative
+ * errno value otherwise.
+ */
+static int
+get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
+	enum rte_ring_sync_type *cons_st)
+{
+	static const uint32_t prod_st_flags =
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+	static const uint32_t cons_st_flags =
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+
+	switch (flags & prod_st_flags) {
+	case 0:
+		*prod_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SP_ENQ:
+		*prod_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MP_RTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (flags & cons_st_flags) {
+	case 0:
+		*cons_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SC_DEQ:
+		*cons_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MC_RTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 int
@@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_rts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
+
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
 	ret = strlcpy(r->name, name, sizeof(r->name));
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
-	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
+	if (ret != 0)
+		return ret;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
@@ -126,8 +206,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 		r->mask = count - 1;
 		r->capacity = r->mask;
 	}
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+
+	/* set default values for head-tail distance */
+	if (flags & RING_F_MP_RTS_ENQ)
+		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
+	if (flags & RING_F_MC_RTS_DEQ)
+		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
 
 	return 0;
 }
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index d4775a063..f6f084d79 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -48,6 +48,7 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_memzone.h>
 #include <rte_pause.h>
+#include <rte_debug.h>
 
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
@@ -65,10 +66,13 @@ enum rte_ring_queue_behavior {
 enum rte_ring_sync_type {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
+#ifdef ALLOW_EXPERIMENTAL_API
+	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+#endif
 };
 
 /**
- * structure to hold a pair of head/tail values and other metadata.
+ * structures to hold a pair of head/tail values and other metadata.
  * Depending on sync_type format of that structure might be different,
  * but offset for *sync_type* and *tail* values should remain the same.
  */
@@ -84,6 +88,21 @@ struct rte_ring_headtail {
 	};
 };
 
+union rte_ring_ht_poscnt {
+	uint64_t raw;
+	struct {
+		uint32_t cnt; /**< head/tail reference counter */
+		uint32_t pos; /**< head/tail position */
+	} val;
+};
+
+struct rte_ring_rts_headtail {
+	volatile union rte_ring_ht_poscnt tail;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+	uint32_t htd_max;   /**< max allowed distance between head/tail */
+	volatile union rte_ring_ht_poscnt head;
+};
+
 /**
  * An RTE ring structure.
  *
@@ -111,11 +130,21 @@ struct rte_ring {
 	char pad0 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail prod;
+		struct rte_ring_rts_headtail rts_prod;
+	}  __rte_cache_aligned;
+
 	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail cons;
+		struct rte_ring_rts_headtail rts_cons;
+	}  __rte_cache_aligned;
+
 	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
@@ -132,6 +161,9 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
+#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
+#define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
+
 #define __IS_SP RTE_RING_SYNC_ST
 #define __IS_MP RTE_RING_SYNC_MT
 #define __IS_SC RTE_RING_SYNC_ST
@@ -461,6 +493,10 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_rts.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -484,8 +520,21 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -619,8 +668,20 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -940,8 +1001,21 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -1020,9 +1094,21 @@ static __rte_always_inline unsigned
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
+			available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 28f9836e6..5de0850dc 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -542,6 +542,8 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
 }
 
+#include <rte_ring_rts_elem.h>
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -571,6 +573,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
 			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -733,8 +755,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 /**
@@ -901,8 +940,25 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -993,9 +1049,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
new file mode 100644
index 000000000..18404fe48
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -0,0 +1,316 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_H_
+#define _RTE_RING_RTS_H_
+
+/**
+ * @file rte_ring_rts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for Relaxed Tail Sync (RTS) ring mode.
+ * The main idea remains the same as for our original MP/MC synchronization
+ * mechanism.
+ * The main difference is that tail value is increased not
+ * by every thread that finished enqueue/dequeue,
+ * but only by the last one doing enqueue/dequeue.
+ * That allows threads to skip spinning on tail value,
+ * leaving actual tail value change to last thread in the update queue.
+ * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+ * one for head update, second for tail update.
+ * As a gain it allows thread to avoid spinning/waiting on tail value.
+ * In comparision original MP/MC algorithm requires one 32-bit CAS
+ * for head update and waiting/spinning on tail value.
+ *
+ * Brief outline:
+ *  - introduce refcnt for both head and tail.
+ *  - increment head.refcnt for each head.value update
+ *  - write head:value and head:refcnt atomically (64-bit CAS)
+ *  - move tail.value ahead only when tail.refcnt + 1 == head.refcnt
+ *  - increment tail.refcnt when each enqueue/dequeue op finishes
+ *    (no matter is tail:value going to change or not)
+ *  - write tail.value and tail.recnt atomically (64-bit CAS)
+ *
+ * To avoid producer/consumer starvation:
+ *  - limit max allowed distance between head and tail value (HTD_MAX).
+ *    I.E. thread is allowed to proceed with changing head.value,
+ *    only when:  head.value - tail.value <= HTD_MAX
+ * HTD_MAX is an optional parameter.
+ * With HTD_MAX == 0 we'll have fully serialized ring -
+ * i.e. only one thread at a time will be able to enqueue/dequeue
+ * to/from the ring.
+ * With HTD_MAX >= ring.capacity - no limitation.
+ * By default HTD_MAX == ring.capacity / 8.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue(struct rte_ring *r, void * const *obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue(struct rte_ring *r, void **obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
+}
+
+/**
+ * Return producer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer HTD value, if producer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_prod_htd_max(const struct rte_ring *r)
+{
+	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set producer max Head-Tail-Distance (HTD).
+ * Note that producer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_prod.htd_max = v;
+	return 0;
+}
+
+/**
+ * Return consumer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer HTD value, if consumer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_cons_htd_max(const struct rte_ring *r)
+{
+	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_cons.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set consumer max Head-Tail-Distance (HTD).
+ * Note that consumer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_cons.htd_max = v;
+	return 0;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_elem.h b/lib/librte_ring/rte_ring_rts_elem.h
new file mode 100644
index 000000000..71a331b23
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_elem.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_ELEM_H_
+#define _RTE_RING_RTS_ELEM_H_
+
+/**
+ * @file rte_ring_rts_elem.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ * Contains *ring_elem* functions for Relaxed Tail Sync (RTS) ring mode.
+ * for more details please refer to <rte_ring_rts.h>.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, void * const *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_generic.h b/lib/librte_ring/rte_ring_rts_generic.h
new file mode 100644
index 000000000..f88460d47
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_generic.h
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_GENERIC_H_
+#define _RTE_RING_RTS_GENERIC_H_
+
+/**
+ * @file rte_ring_rts_generic.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union rte_ring_ht_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	do {
+		ot.raw = ht->tail.raw;
+		rte_smp_rmb();
+
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = rte_atomic64_read((rte_atomic64_t *)
+			(uintptr_t)&ht->head.raw);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (rte_atomic64_cmpset(&ht->tail.raw, ot.raw, nt.raw) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union rte_ring_ht_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+	h->raw = ht->head.raw;
+	rte_smp_rmb();
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = ht->head.raw;
+		rte_smp_rmb();
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* read prod head (may spin on prod tail) */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (rte_atomic64_cmpset(&r->rts_prod.head.raw,
+			oh.raw, nh.raw) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* read cons head (may spin on cons tail) */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (rte_atomic64_cmpset(&r->rts_cons.head.raw,
+			oh.raw, nh.raw) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_GENERIC_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 4/9] test/ring: add contention stress test for RTS ring
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
                         ` (2 preceding siblings ...)
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 5/9] ring: introduce HTS ring mode Konstantin Ananyev
                         ` (5 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test RTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_rts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_rts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 4eefaa887..3e15f3791 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -79,6 +79,7 @@ SRCS-y += test_rand_perf.c
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 827b04886..bb67a49f0 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_rts_stress.c b/app/test/test_ring_rts_stress.c
new file mode 100644
index 000000000..f5255f24c
--- /dev/null
+++ b/app/test/test_ring_rts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_rts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_rts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ);
+}
+
+const struct test test_ring_rts_stress = {
+	.name = "MT_RTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 60706f799..eab395e30 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -40,6 +40,9 @@ test_ring_stress(void)
 	n += test_ring_mpmc_stress.nb_case;
 	k += run_test(&test_ring_mpmc_stress);
 
+	n += test_ring_rts_stress.nb_case;
+	k += run_test(&test_ring_rts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 60eac6216..32aae2072 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -33,3 +33,4 @@ struct test {
 };
 
 extern const struct test test_ring_mpmc_stress;
+extern const struct test test_ring_rts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 5/9] ring: introduce HTS ring mode
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
                         ` (3 preceding siblings ...)
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-13 23:27         ` Honnappa Nagarahalli
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
                         ` (4 subsequent siblings)
  9 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce head/tail sync mode for MT ring synchronization.
In that mode enqueue/dequeue operation is fully serialized:
only one thread at a time is allowed to perform given op.
Suppose to reduce stall times in case when ring is used on
overcommitted cpus (multiple active threads on the same cpu).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   3 +
 lib/librte_ring/meson.build            |   3 +
 lib/librte_ring/rte_ring.c             |  20 ++-
 lib/librte_ring/rte_ring.h             |  31 ++++
 lib/librte_ring/rte_ring_elem.h        |  13 ++
 lib/librte_ring/rte_ring_hts.h         | 210 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    | 205 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_generic.h | 198 +++++++++++++++++++++++
 8 files changed, 681 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_elem.h
 create mode 100644 lib/librte_ring/rte_ring_hts_generic.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 8f5c284cc..6fe500f0d 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,9 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
+					rte_ring_hts.h \
+					rte_ring_hts_elem.h \
+					rte_ring_hts_generic.h \
 					rte_ring_rts.h \
 					rte_ring_rts_elem.h \
 					rte_ring_rts_generic.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 612936afb..8e86e037a 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,6 +6,9 @@ headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
+		'rte_ring_hts.h',
+		'rte_ring_hts_elem.h',
+		'rte_ring_hts_generic.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_elem.h',
 		'rte_ring_rts_generic.h')
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 222eec0fb..ebe5ccf0d 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -89,9 +89,11 @@ static void
 reset_headtail(void *p)
 {
 	struct rte_ring_headtail *ht;
+	struct rte_ring_hts_headtail *ht_hts;
 	struct rte_ring_rts_headtail *ht_rts;
 
 	ht = p;
+	ht_hts = p;
 	ht_rts = p;
 
 	switch (ht->sync_type) {
@@ -104,6 +106,9 @@ reset_headtail(void *p)
 		ht_rts->head.raw = 0;
 		ht_rts->tail.raw = 0;
 		break;
+	case RTE_RING_SYNC_MT_HTS:
+		ht_hts->ht.raw = 0;
+		break;
 	default:
 		/* unknown sync mode */
 		RTE_ASSERT(0);
@@ -127,9 +132,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	enum rte_ring_sync_type *cons_st)
 {
 	static const uint32_t prod_st_flags =
-		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ | RING_F_MP_HTS_ENQ);
 	static const uint32_t cons_st_flags =
-		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ | RING_F_MC_HTS_DEQ);
 
 	switch (flags & prod_st_flags) {
 	case 0:
@@ -141,6 +146,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MP_RTS_ENQ:
 		*prod_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MP_HTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -155,6 +163,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MC_RTS_DEQ:
 		*cons_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MC_HTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -176,6 +187,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_hts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
+
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
 		offsetof(struct rte_ring_rts_headtail, sync_type));
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index f6f084d79..6e4213afa 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -68,6 +68,7 @@ enum rte_ring_sync_type {
 	RTE_RING_SYNC_ST,     /**< single thread only */
 #ifdef ALLOW_EXPERIMENTAL_API
 	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
 #endif
 };
 
@@ -103,6 +104,19 @@ struct rte_ring_rts_headtail {
 	volatile union rte_ring_ht_poscnt head;
 };
 
+union rte_ring_ht_pos {
+	uint64_t raw;
+	struct {
+		uint32_t head; /**< head position */
+		uint32_t tail; /**< tail position */
+	} pos;
+};
+
+struct rte_ring_hts_headtail {
+	volatile union rte_ring_ht_pos ht;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+};
+
 /**
  * An RTE ring structure.
  *
@@ -133,6 +147,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail prod;
+		struct rte_ring_hts_headtail hts_prod;
 		struct rte_ring_rts_headtail rts_prod;
 	}  __rte_cache_aligned;
 
@@ -142,6 +157,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail cons;
+		struct rte_ring_hts_headtail hts_cons;
 		struct rte_ring_rts_headtail rts_cons;
 	}  __rte_cache_aligned;
 
@@ -164,6 +180,9 @@ struct rte_ring {
 #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
 #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
 
+#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP HTS". */
+#define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC HTS". */
+
 #define __IS_SP RTE_RING_SYNC_ST
 #define __IS_MP RTE_RING_SYNC_MT
 #define __IS_SC RTE_RING_SYNC_ST
@@ -494,6 +513,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 }
 
 #ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_hts.h>
 #include <rte_ring_rts.h>
 #endif
 
@@ -529,6 +549,9 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -676,6 +699,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 #ifdef ALLOW_EXPERIMENTAL_API
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
 #endif
 	}
 
@@ -1010,6 +1035,9 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -1103,6 +1131,9 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
 			available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
+			available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 5de0850dc..010a564c1 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -542,6 +542,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
 }
 
+#include <rte_ring_hts_elem.h>
 #include <rte_ring_rts_elem.h>
 
 /**
@@ -585,6 +586,9 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
 #endif
 	}
 
@@ -766,6 +770,9 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
@@ -951,6 +958,9 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
 			n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
 #endif
 	}
 
@@ -1060,6 +1070,9 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
new file mode 100644
index 000000000..062d7be6c
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_H_
+#define _RTE_RING_HTS_H_
+
+/**
+ * @file rte_ring_hts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
+ * In that mode enqueue/dequeue operation is fully serialized:
+ * at any given moement only one enqueue/dequeue operation can proceed.
+ * This is achieved by thread is allowed to proceed with changing head.value
+ * only when head.value == tail.value.
+ * Both head and tail values are updated atomically (as one 64-bit value).
+ * To achieve that 64-bit CAS is used by head update routine.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue(struct rte_ring *r, void * const *obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_hts_update_tail(&r->hts_prod, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue(struct rte_ring *r, void **obj_table,
+		uint32_t n, enum rte_ring_queue_behavior behavior,
+		uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_elem.h b/lib/librte_ring/rte_ring_hts_elem.h
new file mode 100644
index 000000000..34f0d121d
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_elem.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_ELEM_H_
+#define _RTE_RING_HTS_ELEM_H_
+
+/**
+ * @file rte_ring_hts_elem.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Contains *ring_elem* functions for Head-Tail Sync (HTS) ring mode.
+ * for more details please refer to <rte_ring_hts.h>.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_generic.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue_elem(struct rte_ring *r, void * const *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_prod, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue_elem(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst_elem(struct rte_ring *r, void * const *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst_elem(struct rte_ring *r, void **obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_ELEM_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_generic.h b/lib/librte_ring/rte_ring_hts_generic.h
new file mode 100644
index 000000000..da08f1d94
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_generic.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_GENERIC_H_
+#define _RTE_RING_HTS_GENERIC_H_
+
+/**
+ * @file rte_ring_hts_generic.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	union rte_ring_ht_pos p;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
+
+	p.pos.head = p.pos.tail + num;
+	p.pos.tail = p.pos.head;
+
+	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union rte_ring_ht_pos *p)
+{
+	p->raw = rte_atomic64_read((rte_atomic64_t *)
+			(uintptr_t)&ht->ht.raw);
+
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = rte_atomic64_read((rte_atomic64_t *)
+				(uintptr_t)&ht->ht.raw);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
+			op.raw, np.raw) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* add rmb barrier to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		rte_smp_rmb();
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (rte_atomic64_cmpset(&r->hts_cons.ht.raw,
+			op.raw, np.raw) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_GENERIC_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 6/9] test/ring: add contention stress test for HTS ring
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
                         ` (4 preceding siblings ...)
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 5/9] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API Konstantin Ananyev
                         ` (3 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test HTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_hts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_hts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 3e15f3791..72f64e30e 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
+SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
diff --git a/app/test/meson.build b/app/test/meson.build
index bb67a49f0..d6504a08a 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
+	'test_ring_hts_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_hts_stress.c b/app/test/test_ring_hts_stress.c
new file mode 100644
index 000000000..edc9175cb
--- /dev/null
+++ b/app/test/test_ring_hts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_hts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_hts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_hts_stress = {
+	.name = "MT_HTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index eab395e30..29a1368d7 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -43,6 +43,9 @@ test_ring_stress(void)
 	n += test_ring_rts_stress.nb_case;
 	k += run_test(&test_ring_rts_stress);
 
+	n += test_ring_hts_stress.nb_case;
+	k += run_test(&test_ring_hts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 32aae2072..9a87c7f7b 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -34,3 +34,4 @@ struct test {
 
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
+extern const struct test test_ring_hts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
                         ` (5 preceding siblings ...)
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-14  3:45         ` Honnappa Nagarahalli
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
                         ` (2 subsequent siblings)
  9 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
mode, provide an ability to split enqueue/dequeue operation
into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
That allows user to inspect objects in the ring without removing
them from it (aka MT safe peek).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   1 +
 lib/librte_ring/meson.build            |   1 +
 lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
 lib/librte_ring/rte_ring_elem.h        |   4 +
 lib/librte_ring/rte_ring_generic.h     |  48 ++++
 lib/librte_ring/rte_ring_hts_generic.h |  47 ++-
 lib/librte_ring/rte_ring_peek.h        | 379 +++++++++++++++++++++++++
 7 files changed, 519 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_peek.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 6fe500f0d..5f8662737 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,6 +22,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_hts.h \
 					rte_ring_hts_elem.h \
 					rte_ring_hts_generic.h \
+					rte_ring_peek.h \
 					rte_ring_rts.h \
 					rte_ring_rts_elem.h \
 					rte_ring_rts_generic.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 8e86e037a..f5f84dc6e 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -9,6 +9,7 @@ headers = files('rte_ring.h',
 		'rte_ring_hts.h',
 		'rte_ring_hts_elem.h',
 		'rte_ring_hts_generic.h',
+		'rte_ring_peek.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_elem.h',
 		'rte_ring_rts_generic.h')
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 0fb73a337..bb3096721 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -10,6 +10,50 @@
 #ifndef _RTE_RING_C11_MEM_H_
 #define _RTE_RING_C11_MEM_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	RTE_SET_USED(enqueue);
+
+	pos = tail + num;
+	ht->head = pos;
+	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 010a564c1..5bf7c1c1b 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -1083,6 +1083,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	return 0;
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_peek.h>
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
index 953cdbbd5..9f5fdf13b 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -10,6 +10,54 @@
 #ifndef _RTE_RING_GENERIC_H_
 #define _RTE_RING_GENERIC_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	pos = tail + num;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	ht->head = pos;
+	ht->tail = pos;
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_hts_generic.h b/lib/librte_ring/rte_ring_hts_generic.h
index da08f1d94..8e699c006 100644
--- a/lib/librte_ring/rte_ring_hts_generic.h
+++ b/lib/librte_ring/rte_ring_hts_generic.h
@@ -18,9 +18,38 @@
  * For more information please refer to <rte_ring_hts.h>.
  */
 
+/**
+ * @internal get current tail value.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union rte_ring_ht_pos p;
+
+	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
 static __rte_always_inline void
-__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
-	uint32_t enqueue)
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
 {
 	union rte_ring_ht_pos p;
 
@@ -29,14 +58,22 @@ __rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
 	else
 		rte_smp_rmb();
 
-	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht->ht.raw);
-
-	p.pos.head = p.pos.tail + num;
+	p.pos.head = tail + num;
 	p.pos.tail = p.pos.head;
 
 	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);
 }
 
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	uint32_t tail;
+
+	num = __rte_ring_hts_get_tail(ht, &tail, num);
+	__rte_ring_hts_set_head_tail(ht, tail, num, enqueue);
+}
+
 /**
  * @internal waits till tail will become equal to head.
  * Means no writer/reader is active for that ring.
diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
new file mode 100644
index 000000000..baefd2f7b
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek.h
@@ -0,0 +1,379 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_H_
+#define _RTE_RING_PEEK_H_
+
+/**
+ * @file
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Ring Peek AP
+ * Introduction of rte_ring with serialized producer/consumer (HTS sync mode)
+ * makes possible to split public enqueue/dequeue API into two phases:
+ * - enqueue/dequeue start
+ * - enqueue/dequeue finish
+ * That allows user to inspect objects in the ring without removing them
+ * from it (aka MT safe peek).
+ * Note that right now this new API is avaialble only for two sync modes:
+ * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
+ * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
+ * It is a user responsibility to create/init ring with appropriate sync
+ * modes selected.
+ * As an example:
+ * // read 1 elem from the ring:
+ * n = rte_ring_hts_dequeue_bulk_start(ring, &obj, 1, NULL);
+ * if (n != 0) {
+ *    //examine object
+ *    if (object_examine(obj) == KEEP)
+ *       //decided to keep it in the ring.
+ *       rte_ring_hts_dequeue_finish(ring, 0);
+ *    else
+ *       //decided to remove it from the ring.
+ *       rte_ring_hts_dequeue_finish(ring, n);
+ * }
+ * Note that between _start_ and _finish_ the ring is sort of locked -
+ * none other thread can proceed with enqueue(/dequeue) operation till
+ * _finish_ will complete.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal This function moves prod head value.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_start(struct rte_ring *r, uint32_t n,
+		enum rte_ring_queue_behavior behavior, uint32_t *free_space)
+{
+	uint32_t free, head, next;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &free);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_prod_head(r, n, behavior,
+			&head, &free);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_VARIABLE,
+			free_space);
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_elem_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int esize, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_st_set_head_tail(&r->prod, tail, n, 1);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_hts_set_head_tail(&r->hts_prod, tail, n, 1);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int n)
+{
+	rte_ring_enqueue_elem_finish(r, obj_table, sizeof(uintptr_t), n);
+}
+
+/**
+ * @internal This function moves cons head value and copies up to *n*
+ * objects from the ring to the user provided obj_table.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_start(struct rte_ring *r, void **obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t avail, head, next;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_cons_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &avail);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_cons_head(r, n, behavior,
+			&head, &avail);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (n != 0)
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+
+	if (available != NULL)
+		*available = avail - n;
+	return n;
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem_start(struct rte_ring *r, void **obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   Actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_bulk_elem_start(r, obj_table, sizeof(uintptr_t),
+		n, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void **obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_burst_elem_start(r, obj_table,
+		sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->cons, &tail, n);
+		__rte_ring_st_set_head_tail(&r->cons, tail, n, 0);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		__rte_ring_hts_update_tail(&r->hts_cons, n, 0);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_PEEK_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 8/9] test/ring: add stress test for MT peek API
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
                         ` (6 preceding siblings ...)
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes Konstantin Ananyev
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test MT peek API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |  1 +
 app/test/meson.build             |  1 +
 app/test/test_ring_peek_stress.c | 43 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c      |  3 +++
 app/test/test_ring_stress.h      |  1 +
 5 files changed, 49 insertions(+)
 create mode 100644 app/test/test_ring_peek_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 72f64e30e..f4c987a98 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -80,6 +80,7 @@ SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_peek_stress.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
diff --git a/app/test/meson.build b/app/test/meson.build
index d6504a08a..cfcf5b81d 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_hts_stress.c',
+	'test_ring_peek_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_peek_stress.c b/app/test/test_ring_peek_stress.c
new file mode 100644
index 000000000..cfc82d728
--- /dev/null
+++ b/app/test/test_ring_peek_stress.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+#include <rte_ring_elem.h>
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_start(r, obj, n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_finish(r, obj, n);
+	return n;
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_peek_stress = {
+	.name = "MT_PEEK",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 29a1368d7..853fcc190 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -46,6 +46,9 @@ test_ring_stress(void)
 	n += test_ring_hts_stress.nb_case;
 	k += run_test(&test_ring_hts_stress);
 
+	n += test_ring_peek_stress.nb_case;
+	k += run_test(&test_ring_peek_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 9a87c7f7b..60953ce47 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -35,3 +35,4 @@ struct test {
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
 extern const struct test test_ring_hts_stress;
+extern const struct test test_ring_peek_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
                         ` (7 preceding siblings ...)
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
@ 2020-04-03 17:42       ` Konstantin Ananyev
  2020-04-04 14:16         ` [dpdk-dev] 回复:[PATCH " 周介龙
  2020-04-14  4:28         ` [dpdk-dev] [PATCH " Honnappa Nagarahalli
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
  9 siblings, 2 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-03 17:42 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Add C11 atomics based implementation for RTS and HTS
head/tail update primitivies.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   4 +-
 lib/librte_ring/meson.build            |   2 +
 lib/librte_ring/rte_ring_hts.h         |   4 +
 lib/librte_ring/rte_ring_hts_c11_mem.h | 222 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    |   4 +
 lib/librte_ring/rte_ring_rts.h         |   4 +
 lib/librte_ring/rte_ring_rts_c11_mem.h | 198 ++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    |   4 +
 8 files changed, 441 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 5f8662737..927d105bf 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,9 +22,11 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_hts.h \
 					rte_ring_hts_elem.h \
 					rte_ring_hts_generic.h \
+					rte_ring_hts_c11_mem.h \
 					rte_ring_peek.h \
 					rte_ring_rts.h \
 					rte_ring_rts_elem.h \
-					rte_ring_rts_generic.h
+					rte_ring_rts_generic.h \
+					rte_ring_rts_c11_mem.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index f5f84dc6e..f2e37a8e4 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -7,10 +7,12 @@ headers = files('rte_ring.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
 		'rte_ring_hts.h',
+		'rte_ring_hts_c11_mem.h',
 		'rte_ring_hts_elem.h',
 		'rte_ring_hts_generic.h',
 		'rte_ring_peek.h',
 		'rte_ring_rts.h',
+		'rte_ring_rts_c11_mem.h',
 		'rte_ring_rts_elem.h',
 		'rte_ring_rts_generic.h')
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
index 062d7be6c..ddaa47ff1 100644
--- a/lib/librte_ring/rte_ring_hts.h
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -29,7 +29,11 @@
 extern "C" {
 #endif
 
+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_hts_c11_mem.h>
+#else
 #include <rte_ring_hts_generic.h>
+#endif
 
 /**
  * @internal Enqueue several objects on the HTS ring.
diff --git a/lib/librte_ring/rte_ring_hts_c11_mem.h b/lib/librte_ring/rte_ring_hts_c11_mem.h
new file mode 100644
index 000000000..0218d0e7d
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_c11_mem.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_C11_MEM_H_
+#define _RTE_RING_HTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_hts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+/**
+ * @internal get current tail value.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union rte_ring_ht_pos p;
+
+	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	union rte_ring_ht_pos p;
+
+	RTE_SET_USED(enqueue);
+
+	p.pos.head = tail + num;
+	p.pos.tail = p.pos.head;
+
+	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+}
+
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+	uint32_t enqueue)
+{
+	uint32_t tail;
+
+	num = __rte_ring_hts_get_tail(ht, &tail, num);
+	__rte_ring_hts_set_head_tail(ht, tail, num, enqueue);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union rte_ring_ht_pos *p)
+{
+	p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* wait for tail to be equal to head, , acquire point */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_pos np, op;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* wait for tail to be equal to head */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_C11_MEM_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_elem.h b/lib/librte_ring/rte_ring_hts_elem.h
index 34f0d121d..1e9a49c7a 100644
--- a/lib/librte_ring/rte_ring_hts_elem.h
+++ b/lib/librte_ring/rte_ring_hts_elem.h
@@ -24,7 +24,11 @@
 extern "C" {
 #endif
 
+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_hts_c11_mem.h>
+#else
 #include <rte_ring_hts_generic.h>
+#endif
 
 /**
  * @internal Enqueue several objects on the HTS ring.
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
index 18404fe48..28b2d25f5 100644
--- a/lib/librte_ring/rte_ring_rts.h
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -55,7 +55,11 @@
 extern "C" {
 #endif
 
+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_rts_c11_mem.h>
+#else
 #include <rte_ring_rts_generic.h>
+#endif
 
 /**
  * @internal Enqueue several objects on the RTS ring.
diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h b/lib/librte_ring/rte_ring_rts_c11_mem.h
new file mode 100644
index 000000000..b72901497
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_C11_MEM_H_
+#define _RTE_RING_RTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_rts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union rte_ring_ht_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union rte_ring_ht_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+	h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/* read prod head (may spin on prod tail, acquire point) */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_ht_poscnt nh, oh;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/* read cons head (may spin on cons tail, acquire point) */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
+			&oh.raw, nh.raw,
+			1, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_C11_MEM_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_elem.h b/lib/librte_ring/rte_ring_rts_elem.h
index 71a331b23..23d8aeec7 100644
--- a/lib/librte_ring/rte_ring_rts_elem.h
+++ b/lib/librte_ring/rte_ring_rts_elem.h
@@ -24,7 +24,11 @@
 extern "C" {
 #endif
 
+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_rts_c11_mem.h>
+#else
 #include <rte_ring_rts_generic.h>
+#endif
 
 /**
  * @internal Enqueue several objects on the RTS ring.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] 回复:[PATCH v3 9/9] ring: add C11 memory model for new sync modes
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes Konstantin Ananyev
@ 2020-04-04 14:16         ` 周介龙
  2020-04-14  4:28         ` [dpdk-dev] [PATCH " Honnappa Nagarahalli
  1 sibling, 0 replies; 146+ messages in thread
From: 周介龙 @ 2020-04-04 14:16 UTC (permalink / raw)
  To: dev, Konstantin Ananyev
  Cc: honnappa.nagarahalli, david.marchand, Konstantin Ananyev

Hi,
This patchset really helps my case a lot. 
We have a case which would have several hundreds of threads 
get/put concurrently on single mempool(ring-based) without cache. 
Even on a simplified test case in which 32 threads putting into mempool 
concurrently for 1024 times respectively, it works extremely slow and 
would be finished after tens of seconds. With new sync modes introduced 
by this patchset, the case could finish in 2816979 cycles(RTS)  and 
2279615(HTS) cycles.
It would be great if we could have simple APIs to enable these sync modes
for structures based on rte_ring like mempool.
------------------------------------------------------------------
发件人:Konstantin Ananyev <konstantin.ananyev@intel.com>
发送时间:2020年4月4日(星期六) 01:43
收件人:dev <dev@dpdk.org>
抄 送:honnappa.nagarahalli <honnappa.nagarahalli@arm.com>; david.marchand <david.marchand@redhat.com>; 周介龙(卓腾) <jielong.zjl@antfin.com>; Konstantin Ananyev <konstantin.ananyev@intel.com>
主 题:[PATCH v3 9/9] ring: add C11 memory model for new sync modes

Add C11 atomics based implementation for RTS and HTS
head/tail update primitivies.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   4 +-
 lib/librte_ring/meson.build            |   2 +
 lib/librte_ring/rte_ring_hts.h         |   4 +
 lib/librte_ring/rte_ring_hts_c11_mem.h | 222 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_elem.h    |   4 +
 lib/librte_ring/rte_ring_rts.h         |   4 +
 lib/librte_ring/rte_ring_rts_c11_mem.h | 198 ++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_elem.h    |   4 +
 8 files changed, 441 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 5f8662737..927d105bf 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,9 +22,11 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
      rte_ring_hts.h \
      rte_ring_hts_elem.h \
      rte_ring_hts_generic.h \
+     rte_ring_hts_c11_mem.h \
      rte_ring_peek.h \
      rte_ring_rts.h \
      rte_ring_rts_elem.h \
-     rte_ring_rts_generic.h
+     rte_ring_rts_generic.h \
+     rte_ring_rts_c11_mem.h

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index f5f84dc6e..f2e37a8e4 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -7,10 +7,12 @@ headers = files('rte_ring.h',
   'rte_ring_c11_mem.h',
   'rte_ring_generic.h',
   'rte_ring_hts.h',
+  'rte_ring_hts_c11_mem.h',
   'rte_ring_hts_elem.h',
   'rte_ring_hts_generic.h',
   'rte_ring_peek.h',
   'rte_ring_rts.h',
+  'rte_ring_rts_c11_mem.h',
   'rte_ring_rts_elem.h',
   'rte_ring_rts_generic.h')

diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
index 062d7be6c..ddaa47ff1 100644
--- a/lib/librte_ring/rte_ring_hts.h
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -29,7 +29,11 @@
 extern "C" {
 #endif

+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_hts_c11_mem.h>
+#else
 #include <rte_ring_hts_generic.h>
+#endif

 /**
  * @internal Enqueue several objects on the HTS ring.
diff --git a/lib/librte_ring/rte_ring_hts_c11_mem.h b/lib/librte_ring/rte_ring_hts_c11_mem.h
new file mode 100644
index 000000000..0218d0e7d
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_c11_mem.h
@@ -0,0 +1,222 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_C11_MEM_H_
+#define _RTE_RING_HTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_hts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+/**
+ * @internal get current tail value.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+ uint32_t num)
+{
+ uint32_t n;
+ union rte_ring_ht_pos p;
+
+ p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+ n = p.pos.head - p.pos.tail;
+
+ RTE_ASSERT(n >= num);
+ num = (n >= num) ? num : 0;
+
+ *tail = p.pos.tail;
+ return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+ uint32_t num, uint32_t enqueue)
+{
+ union rte_ring_ht_pos p;
+
+ RTE_SET_USED(enqueue);
+
+ p.pos.head = tail + num;
+ p.pos.tail = p.pos.head;
+
+ __atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+}
+
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
+ uint32_t enqueue)
+{
+ uint32_t tail;
+
+ num = __rte_ring_hts_get_tail(ht, &tail, num);
+ __rte_ring_hts_set_head_tail(ht, tail, num, enqueue);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+  union rte_ring_ht_pos *p)
+{
+ p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+
+ while (p->pos.head != p->pos.tail) {
+  rte_pause();
+  p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+ }
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+ enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+ uint32_t *free_entries)
+{
+ uint32_t n;
+ union rte_ring_ht_pos np, op;
+
+ const uint32_t capacity = r->capacity;
+
+ do {
+  /* Reset n to the initial burst count */
+  n = num;
+
+  /* wait for tail to be equal to head, , acquire point */
+  __rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+  /*
+   *  The subtraction is done between two unsigned 32bits value
+   * (the result is always modulo 32 bits even if we have
+   * *old_head > cons_tail). So 'free_entries' is always between 0
+   * and capacity (which is < size).
+   */
+  *free_entries = capacity + r->cons.tail - op.pos.head;
+
+  /* check that we have enough room in ring */
+  if (unlikely(n > *free_entries))
+   n = (behavior == RTE_RING_QUEUE_FIXED) ?
+     0 : *free_entries;
+
+  if (n == 0)
+   break;
+
+  np.pos.tail = op.pos.tail;
+  np.pos.head = op.pos.head + n;
+
+ } while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
+   &op.raw, np.raw,
+   0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+ *old_head = op.pos.head;
+ return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+ enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+ uint32_t *entries)
+{
+ uint32_t n;
+ union rte_ring_ht_pos np, op;
+
+ /* move cons.head atomically */
+ do {
+  /* Restore n as it may change every loop */
+  n = num;
+
+  /* wait for tail to be equal to head */
+  __rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+  /* The subtraction is done between two unsigned 32bits value
+   * (the result is always modulo 32 bits even if we have
+   * cons_head > prod_tail). So 'entries' is always between 0
+   * and size(ring)-1.
+   */
+  *entries = r->prod.tail - op.pos.head;
+
+  /* Set the actual entries for dequeue */
+  if (n > *entries)
+   n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+  if (unlikely(n == 0))
+   break;
+
+  np.pos.tail = op.pos.tail;
+  np.pos.head = op.pos.head + n;
+
+ } while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
+   &op.raw, np.raw,
+   0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+ *old_head = op.pos.head;
+ return n;
+}
+
+#endif /* _RTE_RING_HTS_C11_MEM_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_elem.h b/lib/librte_ring/rte_ring_hts_elem.h
index 34f0d121d..1e9a49c7a 100644
--- a/lib/librte_ring/rte_ring_hts_elem.h
+++ b/lib/librte_ring/rte_ring_hts_elem.h
@@ -24,7 +24,11 @@
 extern "C" {
 #endif

+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_hts_c11_mem.h>
+#else
 #include <rte_ring_hts_generic.h>
+#endif

 /**
  * @internal Enqueue several objects on the HTS ring.
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
index 18404fe48..28b2d25f5 100644
--- a/lib/librte_ring/rte_ring_rts.h
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -55,7 +55,11 @@
 extern "C" {
 #endif

+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_rts_c11_mem.h>
+#else
 #include <rte_ring_rts_generic.h>
+#endif

 /**
  * @internal Enqueue several objects on the RTS ring.
diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h b/lib/librte_ring/rte_ring_rts_c11_mem.h
new file mode 100644
index 000000000..b72901497
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_C11_MEM_H_
+#define _RTE_RING_RTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_rts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+ union rte_ring_ht_poscnt h, ot, nt;
+
+ /*
+  * If there are other enqueues/dequeues in progress that
+  * might preceded us, then don't update tail with new value.
+  */
+
+ ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+
+ do {
+  /* on 32-bit systems we have to do atomic read here */
+  h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+
+  nt.raw = ot.raw;
+  if (++nt.val.cnt == h.val.cnt)
+   nt.val.pos = h.val.pos;
+
+ } while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
+   0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+ union rte_ring_ht_poscnt *h)
+{
+ uint32_t max;
+
+ max = ht->htd_max;
+ h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+
+ while (h->val.pos - ht->tail.val.pos > max) {
+  rte_pause();
+  h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+ }
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sp
+ *   Indicates whether multi-producer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where enqueue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where enqueue finishes
+ * @param free_entries
+ *   Returns the amount of free space in the ring BEFORE head was moved
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+ enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+ uint32_t *free_entries)
+{
+ uint32_t n;
+ union rte_ring_ht_poscnt nh, oh;
+
+ const uint32_t capacity = r->capacity;
+
+ do {
+  /* Reset n to the initial burst count */
+  n = num;
+
+  /* read prod head (may spin on prod tail, acquire point) */
+  __rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+  /*
+   *  The subtraction is done between two unsigned 32bits value
+   * (the result is always modulo 32 bits even if we have
+   * *old_head > cons_tail). So 'free_entries' is always between 0
+   * and capacity (which is < size).
+   */
+  *free_entries = capacity + r->cons.tail - oh.val.pos;
+
+  /* check that we have enough room in ring */
+  if (unlikely(n > *free_entries))
+   n = (behavior == RTE_RING_QUEUE_FIXED) ?
+     0 : *free_entries;
+
+  if (n == 0)
+   break;
+
+  nh.val.pos = oh.val.pos + n;
+  nh.val.cnt = oh.val.cnt + 1;
+
+ } while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
+   &oh.raw, nh.raw,
+   0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+ *old_head = oh.val.pos;
+ return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ *
+ * @param r
+ *   A pointer to the ring structure
+ * @param is_sc
+ *   Indicates whether multi-consumer path is needed or not
+ * @param n
+ *   The number of elements we will want to enqueue, i.e. how far should the
+ *   head be moved
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param old_head
+ *   Returns head value as it was before the move, i.e. where dequeue starts
+ * @param new_head
+ *   Returns the current/new head value i.e. where dequeue finishes
+ * @param entries
+ *   Returns the number of entries in the ring BEFORE head was moved
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+ enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+ uint32_t *entries)
+{
+ uint32_t n;
+ union rte_ring_ht_poscnt nh, oh;
+
+ /* move cons.head atomically */
+ do {
+  /* Restore n as it may change every loop */
+  n = num;
+
+  /* read cons head (may spin on cons tail, acquire point) */
+  __rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+  /* The subtraction is done between two unsigned 32bits value
+   * (the result is always modulo 32 bits even if we have
+   * cons_head > prod_tail). So 'entries' is always between 0
+   * and size(ring)-1.
+   */
+  *entries = r->prod.tail - oh.val.pos;
+
+  /* Set the actual entries for dequeue */
+  if (n > *entries)
+   n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+  if (unlikely(n == 0))
+   break;
+
+  nh.val.pos = oh.val.pos + n;
+  nh.val.cnt = oh.val.cnt + 1;
+
+ } while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
+   &oh.raw, nh.raw,
+   1, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
+
+ *old_head = oh.val.pos;
+ return n;
+}
+
+#endif /* _RTE_RING_RTS_C11_MEM_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_elem.h b/lib/librte_ring/rte_ring_rts_elem.h
index 71a331b23..23d8aeec7 100644
--- a/lib/librte_ring/rte_ring_rts_elem.h
+++ b/lib/librte_ring/rte_ring_rts_elem.h
@@ -24,7 +24,11 @@
 extern "C" {
 #endif

+#ifdef RTE_USE_C11_MEM_MODEL
+#include <rte_ring_rts_c11_mem.h>
+#else
 #include <rte_ring_rts_generic.h>
+#endif

 /**
  * @internal Enqueue several objects on the RTS ring.
-- 
2.17.1

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-04-04 17:27         ` Wang, Haiyue
  2020-04-08  5:00         ` Honnappa Nagarahalli
  1 sibling, 0 replies; 146+ messages in thread
From: Wang, Haiyue @ 2020-04-04 17:27 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Ananyev, Konstantin

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Konstantin Ananyev
> Sent: Saturday, April 4, 2020 01:42
> To: dev@dpdk.org
> Cc: honnappa.nagarahalli@arm.com; david.marchand@redhat.com; jielong.zjl@antfin.com; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>
> Subject: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
> 
> Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
> Aim to reduce stall times in case when ring is used on
> overcommited cpus (multiple active threads on the same cpu).
> The main difference from original MP/MC algorithm is that
> tail value is increased not by every thread that finished enqueue/dequeue,
> but only by the last one.
> That allows threads to avoid spinning on ring tail value,
> leaving actual tail value change to the last thread in the update queue.
> 
> check-abi.sh reports what I believe is a false-positive about
> ring cons/prod changes. As a workaround, devtools/libabigail.abignore is
> updated to suppress *struct ring* related errors.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  devtools/libabigail.abignore           |   7 +
>  lib/librte_ring/Makefile               |   5 +-
>  lib/librte_ring/meson.build            |   5 +-
>  lib/librte_ring/rte_ring.c             | 100 +++++++-
>  lib/librte_ring/rte_ring.h             | 110 ++++++++-
>  lib/librte_ring/rte_ring_elem.h        |  86 ++++++-
>  lib/librte_ring/rte_ring_rts.h         | 316 +++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++++++
>  lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++
>  9 files changed, 1015 insertions(+), 29 deletions(-)
>  create mode 100644 lib/librte_ring/rte_ring_rts.h
>  create mode 100644 lib/librte_ring/rte_ring_rts_elem.h
>  create mode 100644 lib/librte_ring/rte_ring_rts_generic.h
> 


>  #ifdef __cplusplus
> diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
> new file mode 100644
> index 000000000..18404fe48
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_rts.h
> @@ -0,0 +1,316 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h

Find that it is buf_ring.h in real ;-)

> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */

> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test Konstantin Ananyev
@ 2020-04-08  4:59         ` Honnappa Nagarahalli
  2020-04-09 12:36           ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-08  4:59 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: [PATCH v3 1/9] test/ring: add contention stress test
Minor, would 'add stress test for overcommitted use case' sound better?

> 
> Introduce new test-case to measure ring perfomance under contention
Minor, 'over committed' seems to the word commonly used from the references you provided. Does it make sense to use that?

> (miltiple producers/consumers).
    ^^^^^^^ multiple

> Starts dequeue/enqueue loop on all available slave lcores.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  app/test/Makefile                |   2 +
>  app/test/meson.build             |   2 +
>  app/test/test_ring_mpmc_stress.c |  31 +++
>  app/test/test_ring_stress.c      |  48 ++++
>  app/test/test_ring_stress.h      |  35 +++
>  app/test/test_ring_stress_impl.h | 444 +++++++++++++++++++++++++++++++
Would be good to change the file names to indicate that these tests are for over-committed usecase/configuration.
These are performance tests, better to have 'perf' or 'performance' in their names.

>  6 files changed, 562 insertions(+)
>  create mode 100644 app/test/test_ring_mpmc_stress.c  create mode 100644
> app/test/test_ring_stress.c  create mode 100644 app/test/test_ring_stress.h
> create mode 100644 app/test/test_ring_stress_impl.h
> 
> diff --git a/app/test/Makefile b/app/test/Makefile index
> 1f080d162..4eefaa887 100644
> --- a/app/test/Makefile
> +++ b/app/test/Makefile
> @@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c  SRCS-y +=
> test_rand_perf.c
> 
>  SRCS-y += test_ring.c
> +SRCS-y += test_ring_mpmc_stress.c
>  SRCS-y += test_ring_perf.c
> +SRCS-y += test_ring_stress.c
>  SRCS-y += test_pmd_perf.c
> 
>  ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
> diff --git a/app/test/meson.build b/app/test/meson.build index
> 351d29cb6..827b04886 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -100,7 +100,9 @@ test_sources = files('commands.c',
>  	'test_rib.c',
>  	'test_rib6.c',
>  	'test_ring.c',
> +	'test_ring_mpmc_stress.c',
>  	'test_ring_perf.c',
> +	'test_ring_stress.c',
>  	'test_rwlock.c',
>  	'test_sched.c',
>  	'test_service_cores.c',
> diff --git a/app/test/test_ring_mpmc_stress.c
> b/app/test/test_ring_mpmc_stress.c
> new file mode 100644
> index 000000000..1524b0248
> --- /dev/null
> +++ b/app/test/test_ring_mpmc_stress.c
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include "test_ring_stress_impl.h"
> +
> +static inline uint32_t
> +_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
> +	uint32_t *avail)
> +{
> +	return rte_ring_mc_dequeue_bulk(r, obj, n, avail); }
> +
> +static inline uint32_t
> +_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
> +	uint32_t *free)
> +{
> +	return rte_ring_mp_enqueue_bulk(r, obj, n, free); }
> +
> +static int
> +_st_ring_init(struct rte_ring *r, const char *name, uint32_t num) {
> +	return rte_ring_init(r, name, num, 0); }
> +
> +const struct test test_ring_mpmc_stress = {
> +	.name = "MP/MC",
> +	.nb_case = RTE_DIM(tests),
> +	.cases = tests,
> +};
> diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c new file
> mode 100644 index 000000000..60706f799
> --- /dev/null
> +++ b/app/test/test_ring_stress.c
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include "test_ring_stress.h"
> +
> +static int
> +run_test(const struct test *test)
> +{
> +	int32_t rc;
> +	uint32_t i, k;
> +
> +	for (i = 0, k = 0; i != test->nb_case; i++) {
> +
> +		printf("TEST-CASE %s %s START\n",
> +			test->name, test->cases[i].name);
> +
> +		rc = test->cases[i].func(test->cases[i].wfunc);
> +		k += (rc == 0);
> +
> +		if (rc != 0)
> +			printf("TEST-CASE %s %s FAILED\n",
> +				test->name, test->cases[i].name);
> +		else
> +			printf("TEST-CASE %s %s OK\n",
> +				test->name, test->cases[i].name);
> +	}
> +
> +	return k;
> +}
> +
> +static int
> +test_ring_stress(void)
> +{
> +	uint32_t n, k;
> +
> +	n = 0;
> +	k = 0;
> +
> +	n += test_ring_mpmc_stress.nb_case;
> +	k += run_test(&test_ring_mpmc_stress);
> +
> +	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
> +		n, k, n - k);
> +	return (k != n);
> +}
> +
> +REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
> diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h new file
> mode 100644 index 000000000..60eac6216
> --- /dev/null
> +++ b/app/test/test_ring_stress.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +
> +#include <inttypes.h>
> +#include <stddef.h>
> +#include <stdalign.h>
> +#include <string.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +
> +#include <rte_ring.h>
> +#include <rte_cycles.h>
> +#include <rte_launch.h>
> +#include <rte_pause.h>
> +#include <rte_random.h>
> +#include <rte_malloc.h>
> +#include <rte_spinlock.h>
> +
> +#include "test.h"
> +
> +struct test_case {
> +	const char *name;
> +	int (*func)(int (*)(void *));
> +	int (*wfunc)(void *arg);
> +};
> +
> +struct test {
> +	const char *name;
> +	uint32_t nb_case;
> +	const struct test_case *cases;
> +};
> +
> +extern const struct test test_ring_mpmc_stress;
> diff --git a/app/test/test_ring_stress_impl.h
> b/app/test/test_ring_stress_impl.h
> new file mode 100644
> index 000000000..11476d28c
> --- /dev/null
> +++ b/app/test/test_ring_stress_impl.h
> @@ -0,0 +1,444 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include "test_ring_stress.h"
> +
> +/*
> + * Measures performance of ring enqueue/dequeue under high contention
> +*/
> +
> +#define RING_NAME	"RING_STRESS"
> +#define BULK_NUM	32
> +#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
> +
> +enum {
> +	WRK_CMD_STOP,
> +	WRK_CMD_RUN,
> +};
> +
> +static volatile uint32_t wrk_cmd __rte_cache_aligned;
> +
> +/* test run-time in seconds */
> +static const uint32_t run_time = 60;
> +static const uint32_t verbose;
> +
> +struct lcore_stat {
> +	uint64_t nb_cycle;
> +	struct {
> +		uint64_t nb_call;
> +		uint64_t nb_obj;
> +		uint64_t nb_cycle;
> +		uint64_t max_cycle;
> +		uint64_t min_cycle;
> +	} op;
> +};
> +
> +struct lcore_arg {
> +	struct rte_ring *rng;
> +	struct lcore_stat stats;
> +} __rte_cache_aligned;
> +
> +struct ring_elem {
> +	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)]; }
> +__rte_cache_aligned;
> +
> +/*
> + * redefinable functions
> + */
> +static uint32_t
> +_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
> +	uint32_t *avail);
> +
> +static uint32_t
> +_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
> +	uint32_t *free);
> +
> +static int
> +_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
> +
> +
> +static void
> +lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
> +	uint64_t tm, int32_t prcs)
> +{
> +	ls->op.nb_call += call;
> +	ls->op.nb_obj += obj;
> +	ls->op.nb_cycle += tm;
> +	if (prcs) {
> +		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
> +		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
> +	}
> +}
> +
> +static void
> +lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
> +{
> +
> +	ms->op.nb_call += ls->op.nb_call;
> +	ms->op.nb_obj += ls->op.nb_obj;
> +	ms->op.nb_cycle += ls->op.nb_cycle;
> +	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
> +	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle); }
> +
> +static void
> +lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls) {
> +	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
> +	lcore_op_stat_aggr(ms, ls);
> +}
> +
> +static void
> +lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls) {
> +	long double st;
> +
> +	st = (long double)rte_get_timer_hz() / US_PER_S;
> +
> +	if (lc == UINT32_MAX)
> +		fprintf(f, "%s(AGGREGATE)={\n", __func__);
> +	else
> +		fprintf(f, "%s(lc=%u)={\n", __func__, lc);
> +
> +	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
> +		ls->nb_cycle, (long double)ls->nb_cycle / st);
> +
> +	fprintf(f, "\tDEQ+ENQ={\n");
> +
> +	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
> +	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
> +	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
> +	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
> +		(long double)ls->op.nb_obj / ls->op.nb_call);
> +	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
> +		(long double)ls->op.nb_cycle / ls->op.nb_obj);
> +	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
> +		(long double)ls->op.nb_cycle / ls->op.nb_call);
> +
> +	/* if min/max cycles per call stats was collected */
> +	if (ls->op.min_cycle != UINT64_MAX) {
> +		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
> +			ls->op.max_cycle,
> +			(long double)ls->op.max_cycle / st);
> +		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
> +			ls->op.min_cycle,
> +			(long double)ls->op.min_cycle / st);
> +	}
> +
> +	fprintf(f, "\t},\n");
> +	fprintf(f, "};\n");
> +}
> +
> +static void
> +fill_ring_elm(struct ring_elem *elm, uint32_t fill) {
> +	uint32_t i;
> +
> +	for (i = 0; i != RTE_DIM(elm->cnt); i++)
> +		elm->cnt[i] = fill;
> +}
> +
> +static int32_t
> +check_updt_elem(struct ring_elem *elm[], uint32_t num,
> +	const struct ring_elem *check, const struct ring_elem *fill) {
> +	uint32_t i;
> +
> +	static rte_spinlock_t dump_lock;
> +
> +	for (i = 0; i != num; i++) {
> +		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
> +			rte_spinlock_lock(&dump_lock);
> +			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
> +				"offending object: %p\n",
> +				__func__, rte_lcore_id(), num, i, elm[i]);
> +			rte_memdump(stdout, "expected", check,
> sizeof(*check));
> +			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
> +			rte_spinlock_unlock(&dump_lock);
> +			return -EINVAL;
> +		}
> +		memcpy(elm[i], fill, sizeof(*elm[i]));
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
minor, lcore instead of lc would be better

> +	const char *fname, const char *opname) {
> +	if (exp != res) {
> +		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
Suggest using lcore in the printf

> +			fname, lc, opname, exp, res);
> +		return -ENOSPC;
> +	}
> +	return 0;
> +}
> +
> +static int
> +test_worker_prcs(void *arg)
> +{
> +	int32_t rc;
> +	uint32_t lc, n, num;
minor, lcore instead of lc would be better

> +	uint64_t cl, tm0, tm1;
> +	struct lcore_arg *la;
> +	struct ring_elem def_elm, loc_elm;
> +	struct ring_elem *obj[2 * BULK_NUM];
> +
> +	la = arg;
> +	lc = rte_lcore_id();
> +
> +	fill_ring_elm(&def_elm, UINT32_MAX);
> +	fill_ring_elm(&loc_elm, lc);
> +
> +	while (wrk_cmd != WRK_CMD_RUN) {
> +		rte_smp_rmb();
> +		rte_pause();
> +	}
> +
> +	cl = rte_rdtsc_precise();
> +
> +	do {
> +		/* num in interval [7/8, 11/8] of BULK_NUM */
> +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> +
> +		/* reset all pointer values */
> +		memset(obj, 0, sizeof(obj));
> +
> +		/* dequeue num elems */
> +		tm0 = rte_rdtsc_precise();
> +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> +		tm0 = rte_rdtsc_precise() - tm0;
> +
> +		/* check return value and objects */
> +		rc = check_ring_op(num, n, lc, __func__,
> +			RTE_STR(_st_ring_dequeue_bulk));
> +		if (rc == 0)
> +			rc = check_updt_elem(obj, num, &def_elm,
> &loc_elm);
> +		if (rc != 0)
> +			break;
Since this seems like a performance test, should we skip validating the objects?
Did these tests run on Travis CI? I believe Travis CI has trouble running stress/performance tests if they take too much time.
The RTS and HTS tests should be added to functional tests.

> +
> +		/* enqueue num elems */
> +		rte_compiler_barrier();
> +		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
> +		if (rc != 0)
> +			break;
> +
> +		tm1 = rte_rdtsc_precise();
> +		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
> +		tm1 = rte_rdtsc_precise() - tm1;
> +
> +		/* check return value */
> +		rc = check_ring_op(num, n, lc, __func__,
> +			RTE_STR(_st_ring_enqueue_bulk));
> +		if (rc != 0)
> +			break;
> +
> +		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, 1);
> +
> +	} while (wrk_cmd == WRK_CMD_RUN);
> +
> +	la->stats.nb_cycle = rte_rdtsc_precise() - cl;
> +	return rc;
> +}
> +
> +static int
> +test_worker_avg(void *arg)
> +{
> +	int32_t rc;
> +	uint32_t lc, n, num;
> +	uint64_t cl;
> +	struct lcore_arg *la;
> +	struct ring_elem def_elm, loc_elm;
> +	struct ring_elem *obj[2 * BULK_NUM];
> +
> +	la = arg;
> +	lc = rte_lcore_id();
> +
> +	fill_ring_elm(&def_elm, UINT32_MAX);
> +	fill_ring_elm(&loc_elm, lc);
> +
> +	while (wrk_cmd != WRK_CMD_RUN) {
> +		rte_smp_rmb();
> +		rte_pause();
> +	}
> +
> +	cl = rte_rdtsc_precise();
> +
> +	do {
> +		/* num in interval [7/8, 11/8] of BULK_NUM */
> +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> +
> +		/* reset all pointer values */
> +		memset(obj, 0, sizeof(obj));
> +
> +		/* dequeue num elems */
> +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> +
> +		/* check return value and objects */
> +		rc = check_ring_op(num, n, lc, __func__,
> +			RTE_STR(_st_ring_dequeue_bulk));
> +		if (rc == 0)
> +			rc = check_updt_elem(obj, num, &def_elm,
> &loc_elm);
> +		if (rc != 0)
> +			break;
> +
> +		/* enqueue num elems */
> +		rte_compiler_barrier();
> +		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
> +		if (rc != 0)
> +			break;
> +
> +		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
> +
> +		/* check return value */
> +		rc = check_ring_op(num, n, lc, __func__,
> +			RTE_STR(_st_ring_enqueue_bulk));
> +		if (rc != 0)
> +			break;
> +
> +		lcore_stat_update(&la->stats, 1, num, 0, 0);
> +
> +	} while (wrk_cmd == WRK_CMD_RUN);
> +
> +	/* final stats update */
> +	cl = rte_rdtsc_precise() - cl;
> +	lcore_stat_update(&la->stats, 0, 0, cl, 0);
> +	la->stats.nb_cycle = cl;
> +
> +	return rc;
> +}
Just wondering about the need of 2 tests which run the same functionality. The difference is the way in which numbers are collected. 
Does 'test_worker_avg' adding any value? IMO, we can remove 'test_worker_avg'.

> +
> +static void
> +mt1_fini(struct rte_ring *rng, void *data) {
> +	rte_free(rng);
> +	rte_free(data);
> +}
> +
> +static int
> +mt1_init(struct rte_ring **rng, void **data, uint32_t num) {
> +	int32_t rc;
> +	size_t sz;
> +	uint32_t i, nr;
> +	struct rte_ring *r;
> +	struct ring_elem *elm;
> +	void *p;
> +
> +	*rng = NULL;
> +	*data = NULL;
> +
> +	sz = num * sizeof(*elm);
> +	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
> +	if (elm == NULL) {
> +		printf("%s: alloc(%zu) for %u elems data failed",
> +			__func__, sz, num);
> +		return -ENOMEM;
> +	}
> +
> +	*data = elm;
> +
> +	/* alloc ring */
> +	nr = 2 * num;
> +	sz = rte_ring_get_memsize(nr);
> +	r = rte_zmalloc(NULL, sz, __alignof__(*r));
> +	if (r == NULL) {
> +		printf("%s: alloc(%zu) for FIFO with %u elems failed",
> +			__func__, sz, nr);
> +		return -ENOMEM;
> +	}
> +
> +	*rng = r;
> +
> +	rc = _st_ring_init(r, RING_NAME, nr);
> +	if (rc != 0) {
> +		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
> +			__func__, r, nr, rc, strerror(-rc));
> +		return rc;
> +	}
> +
> +	for (i = 0; i != num; i++) {
> +		fill_ring_elm(elm + i, UINT32_MAX);
> +		p = elm + i;
> +		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
> +			break;
> +	}
> +
> +	if (i != num) {
> +		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
> +			__func__, r, num, i);
> +		return -ENOSPC;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +test_mt1(int (*test)(void *))
> +{
> +	int32_t rc;
> +	uint32_t lc, mc;
> +	struct rte_ring *r;
> +	void *data;
> +	struct lcore_arg arg[RTE_MAX_LCORE];
> +
> +	static const struct lcore_stat init_stat = {
> +		.op.min_cycle = UINT64_MAX,
> +	};
> +
> +	rc = mt1_init(&r, &data, RING_SIZE);
> +	if (rc != 0) {
> +		mt1_fini(r, data);
> +		return rc;
> +	}
> +
> +	memset(arg, 0, sizeof(arg));
> +
> +	/* launch on all slaves */
> +	RTE_LCORE_FOREACH_SLAVE(lc) {
> +		arg[lc].rng = r;
> +		arg[lc].stats = init_stat;
> +		rte_eal_remote_launch(test, &arg[lc], lc);
> +	}
> +
> +	/* signal worker to start test */
> +	wrk_cmd = WRK_CMD_RUN;
> +	rte_smp_wmb();
> +
> +	usleep(run_time * US_PER_S);
> +
> +	/* signal worker to start test */
> +	wrk_cmd = WRK_CMD_STOP;
> +	rte_smp_wmb();
> +
> +	/* wait for slaves and collect stats. */
> +	mc = rte_lcore_id();
> +	arg[mc].stats = init_stat;
> +
> +	rc = 0;
> +	RTE_LCORE_FOREACH_SLAVE(lc) {
> +		rc |= rte_eal_wait_lcore(lc);
> +		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
> +		if (verbose != 0)
> +			lcore_stat_dump(stdout, lc, &arg[lc].stats);
> +	}
> +
> +	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
> +	mt1_fini(r, data);
> +	return rc;
> +}
> +
> +static const struct test_case tests[] = {
> +	{
> +		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
> +		.func = test_mt1,
> +		.wfunc = test_worker_prcs,
> +	},
> +	{
> +		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
> +		.func = test_mt1,
> +		.wfunc = test_worker_avg,
> +	},
> +};
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/9] ring: prepare ring to allow new sync schemes
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-04-08  4:59         ` Honnappa Nagarahalli
  2020-04-09 13:39           ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-08  4:59 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: [PATCH v3 2/9] ring: prepare ring to allow new sync schemes
> 
> Change from *single* to *sync_type* to allow different synchronisation
> schemes to be applied.
> Mark *single* as deprecated in comments.
> Add new functions to allow user to query ring sync types.
> Replace direct access to *single* with appopriate function call.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  app/test/test_pdump.c           |   6 +-
>  lib/librte_pdump/rte_pdump.c    |   2 +-
>  lib/librte_port/rte_port_ring.c |  12 ++--
>  lib/librte_ring/rte_ring.c      |   6 +-
>  lib/librte_ring/rte_ring.h      | 113 ++++++++++++++++++++++++++------
>  lib/librte_ring/rte_ring_elem.h |   8 +--
>  6 files changed, 108 insertions(+), 39 deletions(-)
> 
> diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c index
> ad183184c..6a1180bcb 100644
> --- a/app/test/test_pdump.c
> +++ b/app/test/test_pdump.c
> @@ -57,8 +57,7 @@ run_pdump_client_tests(void)
>  	if (ret < 0)
>  		return -1;
>  	mp->flags = 0x0000;
> -	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
> -				      RING_F_SP_ENQ | RING_F_SC_DEQ);
> +	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
Are you saying to get SP and SC behavior we now have to set the flags to 0? Isn't that a ABI break?

>  	if (ring_client == NULL) {
>  		printf("rte_ring_create SR0 failed");
>  		return -1;
> @@ -71,9 +70,6 @@ run_pdump_client_tests(void)
>  	}
>  	rte_eth_dev_probing_finish(eth_dev);
> 
> -	ring_client->prod.single = 0;
> -	ring_client->cons.single = 0;
Just wondering if users outside of DPDK have done the same. I hope not, otherwise, we have an API break?

> -
>  	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
> 
>  	for (itr = 0; itr < NUM_ITR; itr++) {
> diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
> index 8a01ac510..65364f2c5 100644
> --- a/lib/librte_pdump/rte_pdump.c
> +++ b/lib/librte_pdump/rte_pdump.c
> @@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct
> rte_mempool *mp)
>  		rte_errno = EINVAL;
>  		return -1;
>  	}
> -	if (ring->prod.single || ring->cons.single) {
> +	if (rte_ring_prod_single(ring) || rte_ring_cons_single(ring)) {
>  		PDUMP_LOG(ERR, "ring with either SP or SC settings"
>  		" is not valid for pdump, should have MP and MC settings\n");
>  		rte_errno = EINVAL;
> diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
> index 47fcdd06a..2f6c050fa 100644
> --- a/lib/librte_port/rte_port_ring.c
> +++ b/lib/librte_port/rte_port_ring.c
> @@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int
> socket_id,
>  	/* Check input parameters */
>  	if ((conf == NULL) ||
>  		(conf->ring == NULL) ||
> -		(conf->ring->cons.single && is_multi) ||
> -		(!(conf->ring->cons.single) && !is_multi)) {
> +		(rte_ring_cons_single(conf->ring) && is_multi) ||
> +		(!rte_ring_cons_single(conf->ring) && !is_multi)) {
>  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
>  		return NULL;
>  	}
> @@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params,
> int socket_id,
>  	/* Check input parameters */
>  	if ((conf == NULL) ||
>  		(conf->ring == NULL) ||
> -		(conf->ring->prod.single && is_multi) ||
> -		(!(conf->ring->prod.single) && !is_multi) ||
> +		(rte_ring_prod_single(conf->ring) && is_multi) ||
> +		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
>  		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
>  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
>  		return NULL;
> @@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void
> *params, int socket_id,
>  	/* Check input parameters */
>  	if ((conf == NULL) ||
>  		(conf->ring == NULL) ||
> -		(conf->ring->prod.single && is_multi) ||
> -		(!(conf->ring->prod.single) && !is_multi) ||
> +		(rte_ring_prod_single(conf->ring) && is_multi) ||
> +		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
>  		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
>  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
>  		return NULL;
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> 77e5de099..fa5733907 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name,
> unsigned count,
>  	if (ret < 0 || ret >= (int)sizeof(r->name))
>  		return -ENAMETOOLONG;
>  	r->flags = flags;
> -	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
> -	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
> +	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
> +		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> +	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
> +		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> 
>  	if (flags & RING_F_EXACT_SZ) {
>  		r->size = rte_align32pow2(count + 1); diff --git
> a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> 18fc5d845..d4775a063 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -61,11 +61,27 @@ enum rte_ring_queue_behavior {  #define
> RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
>  			   sizeof(RTE_RING_MZ_PREFIX) + 1)
> 
> -/* structure to hold a pair of head/tail values and other metadata */
> +/** prod/cons sync types */
> +enum rte_ring_sync_type {
> +	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> +	RTE_RING_SYNC_ST,     /**< single thread only */
> +};
> +
> +/**
> + * structure to hold a pair of head/tail values and other metadata.
> + * Depending on sync_type format of that structure might be different,
> + * but offset for *sync_type* and *tail* values should remain the same.
> + */
>  struct rte_ring_headtail {
> -	volatile uint32_t head;  /**< Prod/consumer head. */
> -	volatile uint32_t tail;  /**< Prod/consumer tail. */
> -	uint32_t single;         /**< True if single prod/cons */
> +	volatile uint32_t head;      /**< prod/consumer head. */
> +	volatile uint32_t tail;      /**< prod/consumer tail. */
> +	RTE_STD_C11
> +	union {
> +		/** sync type of prod/cons */
> +		enum rte_ring_sync_type sync_type;
> +		/** deprecated -  True if single prod/cons */
> +		uint32_t single;
> +	};
>  };
> 
>  /**
> @@ -116,11 +132,10 @@ struct rte_ring {
>  #define RING_F_EXACT_SZ 0x0004
>  #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
> 
> -/* @internal defines for passing to the enqueue dequeue worker functions
> */ -#define __IS_SP 1 -#define __IS_MP 0 -#define __IS_SC 1 -#define
> __IS_MC 0
> +#define __IS_SP RTE_RING_SYNC_ST
> +#define __IS_MP RTE_RING_SYNC_MT
> +#define __IS_SC RTE_RING_SYNC_ST
> +#define __IS_MC RTE_RING_SYNC_MT
I think we can remove these #defines and use the new SYNC types

> 
>  /**
>   * Calculate the memory size needed for a ring @@ -420,7 +435,7 @@
> rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			__IS_MP, free_space);
> +			RTE_RING_SYNC_MT, free_space);
>  }
> 
>  /**
> @@ -443,7 +458,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			__IS_SP, free_space);
> +			RTE_RING_SYNC_ST, free_space);
>  }
> 
>  /**
> @@ -470,7 +485,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			r->prod.single, free_space);
> +			r->prod.sync_type, free_space);
>  }
> 
>  /**
> @@ -554,7 +569,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void
> **obj_table,
>  		unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			__IS_MC, available);
> +			RTE_RING_SYNC_MT, available);
>  }
> 
>  /**
> @@ -578,7 +593,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void
> **obj_table,
>  		unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			__IS_SC, available);
> +			RTE_RING_SYNC_ST, available);
>  }
> 
>  /**
> @@ -605,7 +620,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> **obj_table, unsigned int n,
>  		unsigned int *available)
>  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -				r->cons.single, available);
> +				r->cons.sync_type, available);
>  }
> 
>  /**
> @@ -777,6 +792,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
>  	return r->capacity;
>  }
> 
> +/**
> + * Return sync type used by producer in the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   Producer sync type value.
> + */
> +static inline enum rte_ring_sync_type
> +rte_ring_get_prod_sync_type(const struct rte_ring *r) {
> +	return r->prod.sync_type;
> +}
> +
> +/**
> + * Check is the ring for single producer.
                     ^^ if
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   true if ring is SP, zero otherwise.
> + */
> +static inline int
> +rte_ring_prod_single(const struct rte_ring *r) {
would rte_ring_is_prod_single better?

> +	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST); }
> +
> +/**
> + * Return sync type used by consumer in the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   Consumer sync type value.
> + */
> +static inline enum rte_ring_sync_type
> +rte_ring_get_cons_sync_type(const struct rte_ring *r) {
> +	return r->cons.sync_type;
> +}
> +
> +/**
> + * Check is the ring for single consumer.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   true if ring is SC, zero otherwise.
> + */
> +static inline int
> +rte_ring_cons_single(const struct rte_ring *r) {
> +	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST); }
> +
All these new functions are  not required to be called in the data path. They can be made non-inline.

>  /**
>   * Dump the status of all rings on the console
>   *
> @@ -820,7 +891,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void *
> const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> free_space);
>  }
> 
>  /**
> @@ -843,7 +914,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void *
> const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> free_space);
>  }
> 
>  /**
> @@ -870,7 +941,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void *
> const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_VARIABLE,
> -			r->prod.single, free_space);
> +			r->prod.sync_type, free_space);
>  }
> 
>  /**
> @@ -898,7 +969,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void
> **obj_table,
>  		unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> available);
>  }
> 
>  /**
> @@ -923,7 +994,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void
> **obj_table,
>  		unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> available);
>  }
> 
>  /**
> @@ -951,7 +1022,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> **obj_table,  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
>  				RTE_RING_QUEUE_VARIABLE,
> -				r->cons.single, available);
> +				r->cons.sync_type, available);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 663addc73..28f9836e6 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -570,7 +570,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const
> void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
> +			RTE_RING_QUEUE_FIXED, r->prod.sync_type,
> free_space);
>  }
> 
>  /**
> @@ -734,7 +734,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void
> *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, r->cons.single, available);
> +			RTE_RING_QUEUE_FIXED, r->cons.sync_type,
> available);
>  }
> 
>  /**
> @@ -902,7 +902,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r,
> const void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_VARIABLE, r->prod.single,
> free_space);
> +			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type,
> free_space);
>  }
> 
>  /**
> @@ -995,7 +995,7 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void
> *obj_table,  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
>  				RTE_RING_QUEUE_VARIABLE,
> -				r->cons.single, available);
> +				r->cons.sync_type, available);
>  }
> 
>  #ifdef __cplusplus
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode Konstantin Ananyev
  2020-04-04 17:27         ` Wang, Haiyue
@ 2020-04-08  5:00         ` Honnappa Nagarahalli
  2020-04-09 14:52           ` Ananyev, Konstantin
  1 sibling, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-08  5:00 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, Honnappa Nagarahalli, nd, nd

<snip>

> 
> Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
> Aim to reduce stall times in case when ring is used on overcommited cpus
> (multiple active threads on the same cpu).
> The main difference from original MP/MC algorithm is that tail value is
> increased not by every thread that finished enqueue/dequeue, but only by the
> last one.
> That allows threads to avoid spinning on ring tail value, leaving actual tail
> value change to the last thread in the update queue.
> 
> check-abi.sh reports what I believe is a false-positive about ring cons/prod
> changes. As a workaround, devtools/libabigail.abignore is updated to suppress
> *struct ring* related errors.
This can be removed from the commit message.

> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  devtools/libabigail.abignore           |   7 +
>  lib/librte_ring/Makefile               |   5 +-
>  lib/librte_ring/meson.build            |   5 +-
>  lib/librte_ring/rte_ring.c             | 100 +++++++-
>  lib/librte_ring/rte_ring.h             | 110 ++++++++-
>  lib/librte_ring/rte_ring_elem.h        |  86 ++++++-
>  lib/librte_ring/rte_ring_rts.h         | 316 +++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++++++
>  lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++
>  9 files changed, 1015 insertions(+), 29 deletions(-)  create mode 100644
> lib/librte_ring/rte_ring_rts.h  create mode 100644
> lib/librte_ring/rte_ring_rts_elem.h
>  create mode 100644 lib/librte_ring/rte_ring_rts_generic.h
> 
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index
> a59df8f13..cd86d89ca 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -11,3 +11,10 @@
>          type_kind = enum
>          name = rte_crypto_asym_xform_type
>          changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> +; Ignore updates of ring prod/cons
> +[suppress_type]
> +        type_kind = struct
> +        name = rte_ring
> +[suppress_type]
> +        type_kind = struct
> +        name = rte_event_ring
Does this block the reporting of these structures forever?

> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> 917c560ad..8f5c284cc 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -18,6 +18,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
>  					rte_ring_elem.h \
>  					rte_ring_generic.h \
> -					rte_ring_c11_mem.h
> +					rte_ring_c11_mem.h \
> +					rte_ring_rts.h \
> +					rte_ring_rts_elem.h \
> +					rte_ring_rts_generic.h
> 
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> f2f3ccc88..612936afb 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -5,7 +5,10 @@ sources = files('rte_ring.c')  headers = files('rte_ring.h',
>  		'rte_ring_elem.h',
>  		'rte_ring_c11_mem.h',
> -		'rte_ring_generic.h')
> +		'rte_ring_generic.h',
> +		'rte_ring_rts.h',
> +		'rte_ring_rts_elem.h',
> +		'rte_ring_rts_generic.h')
> 
>  # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> allow_experimental_apis = true diff --git a/lib/librte_ring/rte_ring.c
> b/lib/librte_ring/rte_ring.c index fa5733907..222eec0fb 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
>  /* true if x is a power of 2 */
>  #define POWEROF2(x) ((((x)-1) & (x)) == 0)
> 
> +/* by default set head/tail distance as 1/8 of ring capacity */
> +#define HTD_MAX_DEF	8
> +
>  /* return the size of memory occupied by a ring */  ssize_t
> rte_ring_get_memsize_elem(unsigned int esize, unsigned int count) @@ -
> 79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
>  	return rte_ring_get_memsize_elem(sizeof(void *), count);  }
> 
> +/*
> + * internal helper function to reset prod/cons head-tail values.
> + */
> +static void
> +reset_headtail(void *p)
> +{
> +	struct rte_ring_headtail *ht;
> +	struct rte_ring_rts_headtail *ht_rts;
> +
> +	ht = p;
> +	ht_rts = p;
> +
> +	switch (ht->sync_type) {
> +	case RTE_RING_SYNC_MT:
> +	case RTE_RING_SYNC_ST:
> +		ht->head = 0;
> +		ht->tail = 0;
> +		break;
> +	case RTE_RING_SYNC_MT_RTS:
> +		ht_rts->head.raw = 0;
> +		ht_rts->tail.raw = 0;
> +		break;
> +	default:
> +		/* unknown sync mode */
> +		RTE_ASSERT(0);
> +	}
> +}
> +
>  void
>  rte_ring_reset(struct rte_ring *r)
>  {
> -	r->prod.head = r->cons.head = 0;
> -	r->prod.tail = r->cons.tail = 0;
> +	reset_headtail(&r->prod);
> +	reset_headtail(&r->cons);
> +}
> +
> +/*
> + * helper function, calculates sync_type values for prod and cons
> + * based on input flags. Returns zero at success or negative
> + * errno value otherwise.
> + */
> +static int
> +get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
> +	enum rte_ring_sync_type *cons_st)
> +{
> +	static const uint32_t prod_st_flags =
> +		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
> +	static const uint32_t cons_st_flags =
> +		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
> +
> +	switch (flags & prod_st_flags) {
> +	case 0:
> +		*prod_st = RTE_RING_SYNC_MT;
> +		break;
> +	case RING_F_SP_ENQ:
> +		*prod_st = RTE_RING_SYNC_ST;
> +		break;
> +	case RING_F_MP_RTS_ENQ:
> +		*prod_st = RTE_RING_SYNC_MT_RTS;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	switch (flags & cons_st_flags) {
> +	case 0:
> +		*cons_st = RTE_RING_SYNC_MT;
> +		break;
> +	case RING_F_SC_DEQ:
> +		*cons_st = RTE_RING_SYNC_ST;
> +		break;
> +	case RING_F_MC_RTS_DEQ:
> +		*cons_st = RTE_RING_SYNC_MT_RTS;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return 0;
>  }
> 
>  int
> @@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name,
> unsigned count,
>  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
>  			  RTE_CACHE_LINE_MASK) != 0);
> 
> +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
> +		offsetof(struct rte_ring_rts_headtail, sync_type));
> +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
> +		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
> +
>  	/* init the ring structure */
>  	memset(r, 0, sizeof(*r));
>  	ret = strlcpy(r->name, name, sizeof(r->name));
>  	if (ret < 0 || ret >= (int)sizeof(r->name))
>  		return -ENAMETOOLONG;
>  	r->flags = flags;
> -	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
> -		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> -	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
> -		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> +	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
> +	if (ret != 0)
> +		return ret;
> 
>  	if (flags & RING_F_EXACT_SZ) {
>  		r->size = rte_align32pow2(count + 1); @@ -126,8 +206,12
> @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
>  		r->mask = count - 1;
>  		r->capacity = r->mask;
>  	}
> -	r->prod.head = r->cons.head = 0;
> -	r->prod.tail = r->cons.tail = 0;
> +
> +	/* set default values for head-tail distance */
> +	if (flags & RING_F_MP_RTS_ENQ)
> +		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
> +	if (flags & RING_F_MC_RTS_DEQ)
> +		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
> 
>  	return 0;
>  }
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> d4775a063..f6f084d79 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -48,6 +48,7 @@ extern "C" {
>  #include <rte_branch_prediction.h>
>  #include <rte_memzone.h>
>  #include <rte_pause.h>
> +#include <rte_debug.h>
> 
>  #define RTE_TAILQ_RING_NAME "RTE_RING"
> 
> @@ -65,10 +66,13 @@ enum rte_ring_queue_behavior {  enum
> rte_ring_sync_type {
>  	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
>  	RTE_RING_SYNC_ST,     /**< single thread only */
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
> #endif
>  };
> 
>  /**
> - * structure to hold a pair of head/tail values and other metadata.
> + * structures to hold a pair of head/tail values and other metadata.
>   * Depending on sync_type format of that structure might be different,
>   * but offset for *sync_type* and *tail* values should remain the same.
>   */
> @@ -84,6 +88,21 @@ struct rte_ring_headtail {
>  	};
>  };
> 
> +union rte_ring_ht_poscnt {
nit, this is specific to RTS, may be change this to rte_ring_rts_ht_poscnt?

> +	uint64_t raw;
> +	struct {
> +		uint32_t cnt; /**< head/tail reference counter */
> +		uint32_t pos; /**< head/tail position */
> +	} val;
> +};
> +
> +struct rte_ring_rts_headtail {
> +	volatile union rte_ring_ht_poscnt tail;
> +	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
> +	uint32_t htd_max;   /**< max allowed distance between head/tail */
> +	volatile union rte_ring_ht_poscnt head; };
> +
>  /**
>   * An RTE ring structure.
>   *
> @@ -111,11 +130,21 @@ struct rte_ring {
>  	char pad0 __rte_cache_aligned; /**< empty cache line */
> 
>  	/** Ring producer status. */
> -	struct rte_ring_headtail prod __rte_cache_aligned;
> +	RTE_STD_C11
> +	union {
> +		struct rte_ring_headtail prod;
> +		struct rte_ring_rts_headtail rts_prod;
> +	}  __rte_cache_aligned;
> +
>  	char pad1 __rte_cache_aligned; /**< empty cache line */
> 
>  	/** Ring consumer status. */
> -	struct rte_ring_headtail cons __rte_cache_aligned;
> +	RTE_STD_C11
> +	union {
> +		struct rte_ring_headtail cons;
> +		struct rte_ring_rts_headtail rts_cons;
> +	}  __rte_cache_aligned;
> +
>  	char pad2 __rte_cache_aligned; /**< empty cache line */  };
> 
> @@ -132,6 +161,9 @@ struct rte_ring {
>  #define RING_F_EXACT_SZ 0x0004
>  #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
> 
> +#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS".
> +*/ #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC
> +RTS". */
> +
>  #define __IS_SP RTE_RING_SYNC_ST
>  #define __IS_MP RTE_RING_SYNC_MT
>  #define __IS_SC RTE_RING_SYNC_ST
> @@ -461,6 +493,10 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,
>  			RTE_RING_SYNC_ST, free_space);
>  }
> 
> +#ifdef ALLOW_EXPERIMENTAL_API
> +#include <rte_ring_rts.h>
> +#endif
> +
>  /**
>   * Enqueue several objects on a ring.
>   *
> @@ -484,8 +520,21 @@ static __rte_always_inline unsigned int
> rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			r->prod.sync_type, free_space);
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
Have you validated if these affect the performance for the existing APIs?
I am also wondering why should we support these new modes in the legacy APIs?
I think users should move to use rte_ring_xxx_elem APIs. If users want to use RTS/HTS it will be a good time for them to move to new APIs. They anyway have to test their code for RTS/HTS, might as well make the change to new APIs and test both.
It will be less code to maintain for the community as well.

> #ifdef
> +ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> +			free_space);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	return 0;
>  }
> 
>  /**
> @@ -619,8 +668,20 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
>  		unsigned int *available)
>  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -				r->cons.sync_type, available);
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> #ifdef
> +ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> available);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	return 0;
>  }
> 
>  /**
> @@ -940,8 +1001,21 @@ static __rte_always_inline unsigned
> rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_VARIABLE,
> -			r->prod.sync_type, free_space);
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> free_space);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
> #ifdef
> +ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> +			free_space);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	return 0;
>  }
> 
>  /**
> @@ -1020,9 +1094,21 @@ static __rte_always_inline unsigned
> rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> -				RTE_RING_QUEUE_VARIABLE,
> -				r->cons.sync_type, available);
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> #ifdef
> +ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> +			available);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	return 0;
>  }
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 28f9836e6..5de0850dc 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -542,6 +542,8 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r,
> const void *obj_table,
>  			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);  }
> 
> +#include <rte_ring_rts_elem.h>
> +
>  /**
>   * Enqueue several objects on a ring.
>   *
> @@ -571,6 +573,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r,
> const void *obj_table,  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
>  			RTE_RING_QUEUE_FIXED, r->prod.sync_type,
> free_space);
> +
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
> +			free_space);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
> +			free_space);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table,
> esize, n,
> +			free_space);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	if (free_space != NULL)
> +		*free_space = 0;
> +	return 0;
>  }
> 
>  /**
> @@ -733,8 +755,25 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, r->cons.sync_type,
> available);
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
> +			available);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
> +			available);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table,
> esize,
> +			n, available);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	if (available != NULL)
> +		*available = 0;
> +	return 0;
>  }
> 
>  /**
> @@ -901,8 +940,25 @@ static __rte_always_inline unsigned
> rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type,
> free_space);
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
> +			free_space);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
> +			free_space);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table,
> esize,
> +			n, free_space);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	if (free_space != NULL)
> +		*free_space = 0;
> +	return 0;
>  }
> 
>  /**
> @@ -993,9 +1049,25 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -				RTE_RING_QUEUE_VARIABLE,
> -				r->cons.sync_type, available);
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
> +			available);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
> +			available);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table,
> esize,
> +			n, available);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	if (available != NULL)
> +		*available = 0;
> +	return 0;
>  }
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h new
> file mode 100644 index 000000000..18404fe48
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_rts.h
IMO, we should not provide these APIs.

> @@ -0,0 +1,316 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
nit, the year should change to 2020? Look at others too.

> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_RTS_H_
> +#define _RTE_RING_RTS_H_
> +
> +/**
> + * @file rte_ring_rts.h
> + * @b EXPERIMENTAL: this API may change without prior notice
> + * It is not recommended to include this file directly.
> + * Please include <rte_ring.h> instead.
> + *
> + * Contains functions for Relaxed Tail Sync (RTS) ring mode.
> + * The main idea remains the same as for our original MP/MC
                                                                                 ^^^ the
> +synchronization
> + * mechanism.
> + * The main difference is that tail value is increased not
> + * by every thread that finished enqueue/dequeue,
> + * but only by the last one doing enqueue/dequeue.
should we say 'current last' or 'last thread at a given instance'?

> + * That allows threads to skip spinning on tail value,
> + * leaving actual tail value change to last thread in the update queue.
nit, I understand what you mean by 'update queue' here. IMO, we should remove it as it might confuse some.

> + * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
> + * one for head update, second for tail update.
> + * As a gain it allows thread to avoid spinning/waiting on tail value.
> + * In comparision original MP/MC algorithm requires one 32-bit CAS
> + * for head update and waiting/spinning on tail value.
> + *
> + * Brief outline:
> + *  - introduce refcnt for both head and tail.
Suggesting using the same names as used in the structures.

> + *  - increment head.refcnt for each head.value update
> + *  - write head:value and head:refcnt atomically (64-bit CAS)
> + *  - move tail.value ahead only when tail.refcnt + 1 == head.refcnt
May be add '(indicating that this is the last thread updating the tail)'

> + *  - increment tail.refcnt when each enqueue/dequeue op finishes
May be add 'otherwise' at the beginning.

> + *    (no matter is tail:value going to change or not)
nit                            ^^ if
> + *  - write tail.value and tail.recnt atomically (64-bit CAS)
> + *
> + * To avoid producer/consumer starvation:
> + *  - limit max allowed distance between head and tail value (HTD_MAX).
> + *    I.E. thread is allowed to proceed with changing head.value,
> + *    only when:  head.value - tail.value <= HTD_MAX
> + * HTD_MAX is an optional parameter.
> + * With HTD_MAX == 0 we'll have fully serialized ring -
> + * i.e. only one thread at a time will be able to enqueue/dequeue
> + * to/from the ring.
> + * With HTD_MAX >= ring.capacity - no limitation.
> + * By default HTD_MAX == ring.capacity / 8.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_ring_rts_generic.h>
> +
> +/**
> + * @internal Enqueue several objects on the RTS ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param free_space
> + *   returns the amount of space after the enqueue operation has finished
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_rts_enqueue(struct rte_ring *r, void * const *obj_table,
> +		uint32_t n, enum rte_ring_queue_behavior behavior,
> +		uint32_t *free_space)
> +{
> +	uint32_t free, head;
> +
> +	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
> +
> +	if (n != 0) {
> +		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
> +		__rte_ring_rts_update_tail(&r->rts_prod);
> +	}
> +
> +	if (free_space != NULL)
> +		*free_space = free - n;
> +	return n;
> +}
> +
> +/**
> + * @internal Dequeue several objects from the RTS ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to pull from the ring.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param available
> + *   returns the number of remaining ring entries after the dequeue has
> finished
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_rts_dequeue(struct rte_ring *r, void **obj_table,
> +		uint32_t n, enum rte_ring_queue_behavior behavior,
> +		uint32_t *available)
> +{
> +	uint32_t entries, head;
> +
> +	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
> +
> +	if (n != 0) {
> +		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
> +		__rte_ring_rts_update_tail(&r->rts_cons);
> +	}
> +
> +	if (available != NULL)
> +		*available = entries - n;
> +	return n;
> +}
> +
> +/**
> + * Enqueue several objects on the RTS ring (multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> +			 unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_rts_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> +			free_space);
> +}
> +
> +/**
> + * Dequeue several objects from an RTS ring (multi-consumers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_rts_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> +			available);
> +}
> +
> +/**
> + * Return producer max Head-Tail-Distance (HTD).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   Producer HTD value, if producer is set in appropriate sync mode,
> + *   or UINT32_MAX otherwise.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_ring_get_prod_htd_max(const struct rte_ring *r) {
> +	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
> +		return r->rts_prod.htd_max;
> +	return UINT32_MAX;
> +}
> +
> +/**
> + * Set producer max Head-Tail-Distance (HTD).
> + * Note that producer has to use appropriate sync mode (RTS).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param v
> + *   new HTD value to setup.
> + * @return
> + *   Zero on success, or negative error code otherwise.
> + */
> +__rte_experimental
> +static inline int
> +rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v) {
> +	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
> +		return -ENOTSUP;
> +
> +	r->rts_prod.htd_max = v;
> +	return 0;
> +}
> +
> +/**
> + * Return consumer max Head-Tail-Distance (HTD).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   Consumer HTD value, if consumer is set in appropriate sync mode,
> + *   or UINT32_MAX otherwise.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_ring_get_cons_htd_max(const struct rte_ring *r) {
> +	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
> +		return r->rts_cons.htd_max;
> +	return UINT32_MAX;
> +}
> +
> +/**
> + * Set consumer max Head-Tail-Distance (HTD).
> + * Note that consumer has to use appropriate sync mode (RTS).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param v
> + *   new HTD value to setup.
> + * @return
> + *   Zero on success, or negative error code otherwise.
> + */
> +__rte_experimental
> +static inline int
> +rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v) {
> +	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
> +		return -ENOTSUP;
> +
> +	r->rts_cons.htd_max = v;
> +	return 0;
> +}
> +
> +/**
> + * Enqueue several objects on the RTS ring (multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned
> +rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> +			 unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_rts_enqueue(r, obj_table, n,
> +			RTE_RING_QUEUE_VARIABLE, free_space); }
> +
> +/**
> + * Dequeue several objects from an RTS  ring (multi-consumers safe).
> + * When the requested objects are more than the available objects,
> + * only dequeue the actual number of objects.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - n: Actual number of objects dequeued, 0 if ring is empty
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned
> +rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_rts_dequeue(r, obj_table, n,
> +			RTE_RING_QUEUE_VARIABLE, available); }
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RING_RTS_H_ */
> diff --git a/lib/librte_ring/rte_ring_rts_elem.h
> b/lib/librte_ring/rte_ring_rts_elem.h
> new file mode 100644
> index 000000000..71a331b23
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_rts_elem.h
> @@ -0,0 +1,205 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_RTS_ELEM_H_
> +#define _RTE_RING_RTS_ELEM_H_
> +
> +/**
> + * @file rte_ring_rts_elem.h
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It is not recommended to include this file directly.
> + * Please include <rte_ring_elem.h> instead.
> + * Contains *ring_elem* functions for Relaxed Tail Sync (RTS) ring mode.
> + * for more details please refer to <rte_ring_rts.h>.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_ring_rts_generic.h>
> +
> +/**
> + * @internal Enqueue several objects on the RTS ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param free_space
> + *   returns the amount of space after the enqueue operation has finished
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, void * const *obj_table,
obj_table should be of type 'const void * obj_table' (looks like copy paste error). Please check the other APIs below too.

> +	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
'esize' is not documented in the comments above the function. You can copy the header from rte_ring_elem.h file. Please check other APIs as well.

> +	uint32_t *free_space)
> +{
> +	uint32_t free, head;
> +
> +	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
> +
> +	if (n != 0) {
> +		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
> +		__rte_ring_rts_update_tail(&r->rts_prod);
> +	}
> +
> +	if (free_space != NULL)
> +		*free_space = free - n;
> +	return n;
> +}
> +
> +/**
> + * @internal Dequeue several objects from the RTS ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to pull from the ring.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param available
> + *   returns the number of remaining ring entries after the dequeue has
> finished
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void **obj_table,
> +	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
> +	uint32_t *available)
> +{
> +	uint32_t entries, head;
> +
> +	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
> +
> +	if (n != 0) {
> +		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
> +		__rte_ring_rts_update_tail(&r->rts_cons);
> +	}
> +
> +	if (available != NULL)
> +		*available = entries - n;
> +	return n;
> +}
> +
> +/**
> + * Enqueue several objects on the RTS ring (multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, void * const
> *obj_table,
> +	unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, free_space);
> +}
> +
> +/**
> + * Dequeue several objects from an RTS ring (multi-consumers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void **obj_table,
> +	unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
> +		RTE_RING_QUEUE_FIXED, available);
> +}
> +
> +/**
> + * Enqueue several objects on the RTS ring (multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned
> +rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, void * const
> *obj_table,
> +	unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, free_space); }
> +
> +/**
> + * Dequeue several objects from an RTS  ring (multi-consumers safe).
> + * When the requested objects are more than the available objects,
> + * only dequeue the actual number of objects.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - n: Actual number of objects dequeued, 0 if ring is empty
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned
> +rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void **obj_table,
> +	unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, available); }
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RING_RTS_ELEM_H_ */
> diff --git a/lib/librte_ring/rte_ring_rts_generic.h
> b/lib/librte_ring/rte_ring_rts_generic.h
> new file mode 100644
> index 000000000..f88460d47
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_rts_generic.h
I do not know the benefit to providing the generic version. Do you know why this was done in the legacy APIs?
If there is no performance difference between generic and C11 versions, should we just skip the generic version?
The oldest compiler in CI are GCC 4.8.5 and Clang 3.4.2 and C11 built-ins are supported earlier than these compiler versions.
I feel the code is growing exponentially in rte_ring library and we should try to cut non-value-ass code/APIs aggressively.

> @@ -0,0 +1,210 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_RTS_GENERIC_H_
> +#define _RTE_RING_RTS_GENERIC_H_
> +
> +/**
> + * @file rte_ring_rts_generic.h
> + * It is not recommended to include this file directly,
> + * include <rte_ring.h> instead.
> + * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
> + * For more information please refer to <rte_ring_rts.h>.
> + */
> +
> +/**
> + * @internal This function updates tail values.
> + */
> +static __rte_always_inline void
> +__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht) {
> +	union rte_ring_ht_poscnt h, ot, nt;
> +
> +	/*
> +	 * If there are other enqueues/dequeues in progress that
> +	 * might preceded us, then don't update tail with new value.
> +	 */
> +
> +	do {
> +		ot.raw = ht->tail.raw;
> +		rte_smp_rmb();
> +
> +		/* on 32-bit systems we have to do atomic read here */
> +		h.raw = rte_atomic64_read((rte_atomic64_t *)
> +			(uintptr_t)&ht->head.raw);
> +
> +		nt.raw = ot.raw;
> +		if (++nt.val.cnt == h.val.cnt)
> +			nt.val.pos = h.val.pos;
> +
> +	} while (rte_atomic64_cmpset(&ht->tail.raw, ot.raw, nt.raw) == 0); }
> +
> +/**
> + * @internal This function waits till head/tail distance wouldn't
> + * exceed pre-defined max value.
> + */
> +static __rte_always_inline void
> +__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
> +	union rte_ring_ht_poscnt *h)
> +{
> +	uint32_t max;
> +
> +	max = ht->htd_max;
> +	h->raw = ht->head.raw;
> +	rte_smp_rmb();
> +
> +	while (h->val.pos - ht->tail.val.pos > max) {
> +		rte_pause();
> +		h->raw = ht->head.raw;
> +		rte_smp_rmb();
> +	}
> +}
> +
> +/**
> + * @internal This function updates the producer head for enqueue.
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sp
> + *   Indicates whether multi-producer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where enqueue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where enqueue finishes
> + * @param free_entries
> + *   Returns the amount of free space in the ring BEFORE head was moved
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *free_entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_poscnt nh, oh;
> +
> +	const uint32_t capacity = r->capacity;
> +
> +	do {
> +		/* Reset n to the initial burst count */
> +		n = num;
> +
> +		/* read prod head (may spin on prod tail) */
> +		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
> +
> +		/* add rmb barrier to avoid load/load reorder in weak
> +		 * memory model. It is noop on x86
> +		 */
> +		rte_smp_rmb();
> +
> +		/*
> +		 *  The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * *old_head > cons_tail). So 'free_entries' is always between
> 0
> +		 * and capacity (which is < size).
> +		 */
> +		*free_entries = capacity + r->cons.tail - oh.val.pos;
> +
> +		/* check that we have enough room in ring */
> +		if (unlikely(n > *free_entries))
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> +					0 : *free_entries;
> +
> +		if (n == 0)
> +			break;
> +
> +		nh.val.pos = oh.val.pos + n;
> +		nh.val.cnt = oh.val.cnt + 1;
> +
> +	} while (rte_atomic64_cmpset(&r->rts_prod.head.raw,
> +			oh.raw, nh.raw) == 0);
> +
> +	*old_head = oh.val.pos;
> +	return n;
> +}
> +
> +/**
> + * @internal This function updates the consumer head for dequeue
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sc
> + *   Indicates whether multi-consumer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where dequeue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where dequeue finishes
> + * @param entries
> + *   Returns the number of entries in the ring BEFORE head was moved
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_poscnt nh, oh;
> +
> +	/* move cons.head atomically */
> +	do {
> +		/* Restore n as it may change every loop */
> +		n = num;
> +
> +		/* read cons head (may spin on cons tail) */
> +		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
> +
> +
> +		/* add rmb barrier to avoid load/load reorder in weak
> +		 * memory model. It is noop on x86
> +		 */
> +		rte_smp_rmb();
> +
> +		/* The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * cons_head > prod_tail). So 'entries' is always between 0
> +		 * and size(ring)-1.
> +		 */
> +		*entries = r->prod.tail - oh.val.pos;
> +
> +		/* Set the actual entries for dequeue */
> +		if (n > *entries)
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> *entries;
> +
> +		if (unlikely(n == 0))
> +			break;
> +
> +		nh.val.pos = oh.val.pos + n;
> +		nh.val.cnt = oh.val.cnt + 1;
> +
> +	} while (rte_atomic64_cmpset(&r->rts_cons.head.raw,
> +			oh.raw, nh.raw) == 0);
> +
> +	*old_head = oh.val.pos;
> +	return n;
> +}
> +
> +#endif /* _RTE_RING_RTS_GENERIC_H_ */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test
  2020-04-08  4:59         ` Honnappa Nagarahalli
@ 2020-04-09 12:36           ` Ananyev, Konstantin
  2020-04-09 13:00             ` Ananyev, Konstantin
  2020-04-10 16:59             ` Honnappa Nagarahalli
  0 siblings, 2 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-09 12:36 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd


> <snip>
> 
> > Subject: [PATCH v3 1/9] test/ring: add contention stress test
> Minor, would 'add stress test for overcommitted use case' sound better?

I  liked to point out that this test-case can be used as contention stress-test
(many threads do enqueue/dequeue to/from the same ring) for both
over-committed and not scenarios...
Will probably try to add few extra explanations in v4.  
 
> >
> > Introduce new test-case to measure ring perfomance under contention
> Minor, 'over committed' seems to the word commonly used from the references you provided. Does it make sense to use that?
> 
> > (miltiple producers/consumers).
>     ^^^^^^^ multiple

ack.

> 
> > Starts dequeue/enqueue loop on all available slave lcores.
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  app/test/Makefile                |   2 +
> >  app/test/meson.build             |   2 +
> >  app/test/test_ring_mpmc_stress.c |  31 +++
> >  app/test/test_ring_stress.c      |  48 ++++
> >  app/test/test_ring_stress.h      |  35 +++
> >  app/test/test_ring_stress_impl.h | 444 +++++++++++++++++++++++++++++++
> Would be good to change the file names to indicate that these tests are for over-committed usecase/configuration.
> These are performance tests, better to have 'perf' or 'performance' in their names.
> 
> >  6 files changed, 562 insertions(+)
> >  create mode 100644 app/test/test_ring_mpmc_stress.c  create mode 100644
> > app/test/test_ring_stress.c  create mode 100644 app/test/test_ring_stress.h
> > create mode 100644 app/test/test_ring_stress_impl.h
> >
> > diff --git a/app/test/Makefile b/app/test/Makefile index
> > 1f080d162..4eefaa887 100644
> > --- a/app/test/Makefile
> > +++ b/app/test/Makefile
> > @@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c  SRCS-y +=
> > test_rand_perf.c
> >
> >  SRCS-y += test_ring.c
> > +SRCS-y += test_ring_mpmc_stress.c
> >  SRCS-y += test_ring_perf.c
> > +SRCS-y += test_ring_stress.c
> >  SRCS-y += test_pmd_perf.c
> >
> >  ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
> > diff --git a/app/test/meson.build b/app/test/meson.build index
> > 351d29cb6..827b04886 100644
> > --- a/app/test/meson.build
> > +++ b/app/test/meson.build
> > @@ -100,7 +100,9 @@ test_sources = files('commands.c',
> >  	'test_rib.c',
> >  	'test_rib6.c',
> >  	'test_ring.c',
> > +	'test_ring_mpmc_stress.c',
> >  	'test_ring_perf.c',
> > +	'test_ring_stress.c',
> >  	'test_rwlock.c',
> >  	'test_sched.c',
> >  	'test_service_cores.c',
> > diff --git a/app/test/test_ring_mpmc_stress.c
> > b/app/test/test_ring_mpmc_stress.c
> > new file mode 100644
> > index 000000000..1524b0248
> > --- /dev/null
> > +++ b/app/test/test_ring_mpmc_stress.c
> > @@ -0,0 +1,31 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2020 Intel Corporation
> > + */
> > +
> > +#include "test_ring_stress_impl.h"
> > +
> > +static inline uint32_t
> > +_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
> > +	uint32_t *avail)
> > +{
> > +	return rte_ring_mc_dequeue_bulk(r, obj, n, avail); }
> > +
> > +static inline uint32_t
> > +_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
> > +	uint32_t *free)
> > +{
> > +	return rte_ring_mp_enqueue_bulk(r, obj, n, free); }
> > +
> > +static int
> > +_st_ring_init(struct rte_ring *r, const char *name, uint32_t num) {
> > +	return rte_ring_init(r, name, num, 0); }
> > +
> > +const struct test test_ring_mpmc_stress = {
> > +	.name = "MP/MC",
> > +	.nb_case = RTE_DIM(tests),
> > +	.cases = tests,
> > +};
> > diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c new file
> > mode 100644 index 000000000..60706f799
> > --- /dev/null
> > +++ b/app/test/test_ring_stress.c
> > @@ -0,0 +1,48 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2020 Intel Corporation
> > + */
> > +
> > +#include "test_ring_stress.h"
> > +
> > +static int
> > +run_test(const struct test *test)
> > +{
> > +	int32_t rc;
> > +	uint32_t i, k;
> > +
> > +	for (i = 0, k = 0; i != test->nb_case; i++) {
> > +
> > +		printf("TEST-CASE %s %s START\n",
> > +			test->name, test->cases[i].name);
> > +
> > +		rc = test->cases[i].func(test->cases[i].wfunc);
> > +		k += (rc == 0);
> > +
> > +		if (rc != 0)
> > +			printf("TEST-CASE %s %s FAILED\n",
> > +				test->name, test->cases[i].name);
> > +		else
> > +			printf("TEST-CASE %s %s OK\n",
> > +				test->name, test->cases[i].name);
> > +	}
> > +
> > +	return k;
> > +}
> > +
> > +static int
> > +test_ring_stress(void)
> > +{
> > +	uint32_t n, k;
> > +
> > +	n = 0;
> > +	k = 0;
> > +
> > +	n += test_ring_mpmc_stress.nb_case;
> > +	k += run_test(&test_ring_mpmc_stress);
> > +
> > +	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
> > +		n, k, n - k);
> > +	return (k != n);
> > +}
> > +
> > +REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
> > diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h new file
> > mode 100644 index 000000000..60eac6216
> > --- /dev/null
> > +++ b/app/test/test_ring_stress.h
> > @@ -0,0 +1,35 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2020 Intel Corporation
> > + */
> > +
> > +
> > +#include <inttypes.h>
> > +#include <stddef.h>
> > +#include <stdalign.h>
> > +#include <string.h>
> > +#include <stdio.h>
> > +#include <unistd.h>
> > +
> > +#include <rte_ring.h>
> > +#include <rte_cycles.h>
> > +#include <rte_launch.h>
> > +#include <rte_pause.h>
> > +#include <rte_random.h>
> > +#include <rte_malloc.h>
> > +#include <rte_spinlock.h>
> > +
> > +#include "test.h"
> > +
> > +struct test_case {
> > +	const char *name;
> > +	int (*func)(int (*)(void *));
> > +	int (*wfunc)(void *arg);
> > +};
> > +
> > +struct test {
> > +	const char *name;
> > +	uint32_t nb_case;
> > +	const struct test_case *cases;
> > +};
> > +
> > +extern const struct test test_ring_mpmc_stress;
> > diff --git a/app/test/test_ring_stress_impl.h
> > b/app/test/test_ring_stress_impl.h
> > new file mode 100644
> > index 000000000..11476d28c
> > --- /dev/null
> > +++ b/app/test/test_ring_stress_impl.h
> > @@ -0,0 +1,444 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2020 Intel Corporation
> > + */
> > +
> > +#include "test_ring_stress.h"
> > +
> > +/*
> > + * Measures performance of ring enqueue/dequeue under high contention
> > +*/
> > +
> > +#define RING_NAME	"RING_STRESS"
> > +#define BULK_NUM	32
> > +#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
> > +
> > +enum {
> > +	WRK_CMD_STOP,
> > +	WRK_CMD_RUN,
> > +};
> > +
> > +static volatile uint32_t wrk_cmd __rte_cache_aligned;
> > +
> > +/* test run-time in seconds */
> > +static const uint32_t run_time = 60;
> > +static const uint32_t verbose;
> > +
> > +struct lcore_stat {
> > +	uint64_t nb_cycle;
> > +	struct {
> > +		uint64_t nb_call;
> > +		uint64_t nb_obj;
> > +		uint64_t nb_cycle;
> > +		uint64_t max_cycle;
> > +		uint64_t min_cycle;
> > +	} op;
> > +};
> > +
> > +struct lcore_arg {
> > +	struct rte_ring *rng;
> > +	struct lcore_stat stats;
> > +} __rte_cache_aligned;
> > +
> > +struct ring_elem {
> > +	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)]; }
> > +__rte_cache_aligned;
> > +
> > +/*
> > + * redefinable functions
> > + */
> > +static uint32_t
> > +_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
> > +	uint32_t *avail);
> > +
> > +static uint32_t
> > +_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
> > +	uint32_t *free);
> > +
> > +static int
> > +_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
> > +
> > +
> > +static void
> > +lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
> > +	uint64_t tm, int32_t prcs)
> > +{
> > +	ls->op.nb_call += call;
> > +	ls->op.nb_obj += obj;
> > +	ls->op.nb_cycle += tm;
> > +	if (prcs) {
> > +		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
> > +		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
> > +	}
> > +}
> > +
> > +static void
> > +lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
> > +{
> > +
> > +	ms->op.nb_call += ls->op.nb_call;
> > +	ms->op.nb_obj += ls->op.nb_obj;
> > +	ms->op.nb_cycle += ls->op.nb_cycle;
> > +	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
> > +	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle); }
> > +
> > +static void
> > +lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls) {
> > +	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
> > +	lcore_op_stat_aggr(ms, ls);
> > +}
> > +
> > +static void
> > +lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls) {
> > +	long double st;
> > +
> > +	st = (long double)rte_get_timer_hz() / US_PER_S;
> > +
> > +	if (lc == UINT32_MAX)
> > +		fprintf(f, "%s(AGGREGATE)={\n", __func__);
> > +	else
> > +		fprintf(f, "%s(lc=%u)={\n", __func__, lc);
> > +
> > +	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
> > +		ls->nb_cycle, (long double)ls->nb_cycle / st);
> > +
> > +	fprintf(f, "\tDEQ+ENQ={\n");
> > +
> > +	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
> > +	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
> > +	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
> > +	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
> > +		(long double)ls->op.nb_obj / ls->op.nb_call);
> > +	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
> > +		(long double)ls->op.nb_cycle / ls->op.nb_obj);
> > +	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
> > +		(long double)ls->op.nb_cycle / ls->op.nb_call);
> > +
> > +	/* if min/max cycles per call stats was collected */
> > +	if (ls->op.min_cycle != UINT64_MAX) {
> > +		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
> > +			ls->op.max_cycle,
> > +			(long double)ls->op.max_cycle / st);
> > +		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
> > +			ls->op.min_cycle,
> > +			(long double)ls->op.min_cycle / st);
> > +	}
> > +
> > +	fprintf(f, "\t},\n");
> > +	fprintf(f, "};\n");
> > +}
> > +
> > +static void
> > +fill_ring_elm(struct ring_elem *elm, uint32_t fill) {
> > +	uint32_t i;
> > +
> > +	for (i = 0; i != RTE_DIM(elm->cnt); i++)
> > +		elm->cnt[i] = fill;
> > +}
> > +
> > +static int32_t
> > +check_updt_elem(struct ring_elem *elm[], uint32_t num,
> > +	const struct ring_elem *check, const struct ring_elem *fill) {
> > +	uint32_t i;
> > +
> > +	static rte_spinlock_t dump_lock;
> > +
> > +	for (i = 0; i != num; i++) {
> > +		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
> > +			rte_spinlock_lock(&dump_lock);
> > +			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
> > +				"offending object: %p\n",
> > +				__func__, rte_lcore_id(), num, i, elm[i]);
> > +			rte_memdump(stdout, "expected", check,
> > sizeof(*check));
> > +			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
> > +			rte_spinlock_unlock(&dump_lock);
> > +			return -EINVAL;
> > +		}
> > +		memcpy(elm[i], fill, sizeof(*elm[i]));
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
> minor, lcore instead of lc would be better
> 
> > +	const char *fname, const char *opname) {
> > +	if (exp != res) {
> > +		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
> Suggest using lcore in the printf
> 
> > +			fname, lc, opname, exp, res);
> > +		return -ENOSPC;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int
> > +test_worker_prcs(void *arg)
> > +{
> > +	int32_t rc;
> > +	uint32_t lc, n, num;
> minor, lcore instead of lc would be better
> 
> > +	uint64_t cl, tm0, tm1;
> > +	struct lcore_arg *la;
> > +	struct ring_elem def_elm, loc_elm;
> > +	struct ring_elem *obj[2 * BULK_NUM];
> > +
> > +	la = arg;
> > +	lc = rte_lcore_id();
> > +
> > +	fill_ring_elm(&def_elm, UINT32_MAX);
> > +	fill_ring_elm(&loc_elm, lc);
> > +
> > +	while (wrk_cmd != WRK_CMD_RUN) {
> > +		rte_smp_rmb();
> > +		rte_pause();
> > +	}
> > +
> > +	cl = rte_rdtsc_precise();
> > +
> > +	do {
> > +		/* num in interval [7/8, 11/8] of BULK_NUM */
> > +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> > +
> > +		/* reset all pointer values */
> > +		memset(obj, 0, sizeof(obj));
> > +
> > +		/* dequeue num elems */
> > +		tm0 = rte_rdtsc_precise();
> > +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> > +		tm0 = rte_rdtsc_precise() - tm0;
> > +
> > +		/* check return value and objects */
> > +		rc = check_ring_op(num, n, lc, __func__,
> > +			RTE_STR(_st_ring_dequeue_bulk));
> > +		if (rc == 0)
> > +			rc = check_updt_elem(obj, num, &def_elm,
> > &loc_elm);
> > +		if (rc != 0)
> > +			break;
> Since this seems like a performance test, should we skip validating the objects?
> Did these tests run on Travis CI? I believe Travis CI has trouble running stress/performance tests if they take too much time.
> The RTS and HTS tests should be added to functional tests.
> 
> > +
> > +		/* enqueue num elems */
> > +		rte_compiler_barrier();
> > +		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
> > +		if (rc != 0)
> > +			break;
> > +
> > +		tm1 = rte_rdtsc_precise();
> > +		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
> > +		tm1 = rte_rdtsc_precise() - tm1;
> > +
> > +		/* check return value */
> > +		rc = check_ring_op(num, n, lc, __func__,
> > +			RTE_STR(_st_ring_enqueue_bulk));
> > +		if (rc != 0)
> > +			break;
> > +
> > +		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, 1);
> > +
> > +	} while (wrk_cmd == WRK_CMD_RUN);
> > +
> > +	la->stats.nb_cycle = rte_rdtsc_precise() - cl;
> > +	return rc;
> > +}
> > +
> > +static int
> > +test_worker_avg(void *arg)
> > +{
> > +	int32_t rc;
> > +	uint32_t lc, n, num;
> > +	uint64_t cl;
> > +	struct lcore_arg *la;
> > +	struct ring_elem def_elm, loc_elm;
> > +	struct ring_elem *obj[2 * BULK_NUM];
> > +
> > +	la = arg;
> > +	lc = rte_lcore_id();
> > +
> > +	fill_ring_elm(&def_elm, UINT32_MAX);
> > +	fill_ring_elm(&loc_elm, lc);
> > +
> > +	while (wrk_cmd != WRK_CMD_RUN) {
> > +		rte_smp_rmb();
> > +		rte_pause();
> > +	}
> > +
> > +	cl = rte_rdtsc_precise();
> > +
> > +	do {
> > +		/* num in interval [7/8, 11/8] of BULK_NUM */
> > +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> > +
> > +		/* reset all pointer values */
> > +		memset(obj, 0, sizeof(obj));
> > +
> > +		/* dequeue num elems */
> > +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> > +
> > +		/* check return value and objects */
> > +		rc = check_ring_op(num, n, lc, __func__,
> > +			RTE_STR(_st_ring_dequeue_bulk));
> > +		if (rc == 0)
> > +			rc = check_updt_elem(obj, num, &def_elm,
> > &loc_elm);
> > +		if (rc != 0)
> > +			break;
> > +
> > +		/* enqueue num elems */
> > +		rte_compiler_barrier();
> > +		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
> > +		if (rc != 0)
> > +			break;
> > +
> > +		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
> > +
> > +		/* check return value */
> > +		rc = check_ring_op(num, n, lc, __func__,
> > +			RTE_STR(_st_ring_enqueue_bulk));
> > +		if (rc != 0)
> > +			break;
> > +
> > +		lcore_stat_update(&la->stats, 1, num, 0, 0);
> > +
> > +	} while (wrk_cmd == WRK_CMD_RUN);
> > +
> > +	/* final stats update */
> > +	cl = rte_rdtsc_precise() - cl;
> > +	lcore_stat_update(&la->stats, 0, 0, cl, 0);
> > +	la->stats.nb_cycle = cl;
> > +
> > +	return rc;
> > +}
> Just wondering about the need of 2 tests which run the same functionality. The difference is the way in which numbers are collected.
> Does 'test_worker_avg' adding any value? IMO, we can remove 'test_worker_avg'.

Yeh, they are quite similar.
I added _average_ version for two reasons:
1. In precise I call rte_rdtsc_precise() straight before/after 
    enqueue/dequeue op.
    At least at IA rte_rdtsc_precise()  means mb().
    This extra sync point might hide some sync problems in the ring
    enqueue/dequeue itself.
    So having a separate test without such extra sync points
    gives extra confidence that these tests would catch ring sync problems if any.  
2. People usually don't do enqueue/dequeue on its own.
    One of common patterns: dequeue/read-write data from the dequed objects/enqueue.
    So this test measures cycles for dequeue/enqueue plus some reads/writes
    to the objects from the ring.
 
> > +
> > +static void
> > +mt1_fini(struct rte_ring *rng, void *data) {
> > +	rte_free(rng);
> > +	rte_free(data);
> > +}
> > +
> > +static int
> > +mt1_init(struct rte_ring **rng, void **data, uint32_t num) {
> > +	int32_t rc;
> > +	size_t sz;
> > +	uint32_t i, nr;
> > +	struct rte_ring *r;
> > +	struct ring_elem *elm;
> > +	void *p;
> > +
> > +	*rng = NULL;
> > +	*data = NULL;
> > +
> > +	sz = num * sizeof(*elm);
> > +	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
> > +	if (elm == NULL) {
> > +		printf("%s: alloc(%zu) for %u elems data failed",
> > +			__func__, sz, num);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	*data = elm;
> > +
> > +	/* alloc ring */
> > +	nr = 2 * num;
> > +	sz = rte_ring_get_memsize(nr);
> > +	r = rte_zmalloc(NULL, sz, __alignof__(*r));
> > +	if (r == NULL) {
> > +		printf("%s: alloc(%zu) for FIFO with %u elems failed",
> > +			__func__, sz, nr);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	*rng = r;
> > +
> > +	rc = _st_ring_init(r, RING_NAME, nr);
> > +	if (rc != 0) {
> > +		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
> > +			__func__, r, nr, rc, strerror(-rc));
> > +		return rc;
> > +	}
> > +
> > +	for (i = 0; i != num; i++) {
> > +		fill_ring_elm(elm + i, UINT32_MAX);
> > +		p = elm + i;
> > +		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
> > +			break;
> > +	}
> > +
> > +	if (i != num) {
> > +		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
> > +			__func__, r, num, i);
> > +		return -ENOSPC;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +test_mt1(int (*test)(void *))
> > +{
> > +	int32_t rc;
> > +	uint32_t lc, mc;
> > +	struct rte_ring *r;
> > +	void *data;
> > +	struct lcore_arg arg[RTE_MAX_LCORE];
> > +
> > +	static const struct lcore_stat init_stat = {
> > +		.op.min_cycle = UINT64_MAX,
> > +	};
> > +
> > +	rc = mt1_init(&r, &data, RING_SIZE);
> > +	if (rc != 0) {
> > +		mt1_fini(r, data);
> > +		return rc;
> > +	}
> > +
> > +	memset(arg, 0, sizeof(arg));
> > +
> > +	/* launch on all slaves */
> > +	RTE_LCORE_FOREACH_SLAVE(lc) {
> > +		arg[lc].rng = r;
> > +		arg[lc].stats = init_stat;
> > +		rte_eal_remote_launch(test, &arg[lc], lc);
> > +	}
> > +
> > +	/* signal worker to start test */
> > +	wrk_cmd = WRK_CMD_RUN;
> > +	rte_smp_wmb();
> > +
> > +	usleep(run_time * US_PER_S);
> > +
> > +	/* signal worker to start test */
> > +	wrk_cmd = WRK_CMD_STOP;
> > +	rte_smp_wmb();
> > +
> > +	/* wait for slaves and collect stats. */
> > +	mc = rte_lcore_id();
> > +	arg[mc].stats = init_stat;
> > +
> > +	rc = 0;
> > +	RTE_LCORE_FOREACH_SLAVE(lc) {
> > +		rc |= rte_eal_wait_lcore(lc);
> > +		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
> > +		if (verbose != 0)
> > +			lcore_stat_dump(stdout, lc, &arg[lc].stats);
> > +	}
> > +
> > +	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
> > +	mt1_fini(r, data);
> > +	return rc;
> > +}
> > +
> > +static const struct test_case tests[] = {
> > +	{
> > +		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
> > +		.func = test_mt1,
> > +		.wfunc = test_worker_prcs,
> > +	},
> > +	{
> > +		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
> > +		.func = test_mt1,
> > +		.wfunc = test_worker_avg,
> > +	},
> > +};
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test
  2020-04-09 12:36           ` Ananyev, Konstantin
@ 2020-04-09 13:00             ` Ananyev, Konstantin
  2020-04-10 18:01               ` Honnappa Nagarahalli
  2020-04-10 16:59             ` Honnappa Nagarahalli
  1 sibling, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-09 13:00 UTC (permalink / raw)
  To: Ananyev, Konstantin, Honnappa Nagarahalli, dev
  Cc: david.marchand, jielong.zjl, nd, nd



> > > +static int
> > > +test_worker_prcs(void *arg)
> > > +{
> > > +	int32_t rc;
> > > +	uint32_t lc, n, num;
> > minor, lcore instead of lc would be better
> >
> > > +	uint64_t cl, tm0, tm1;
> > > +	struct lcore_arg *la;
> > > +	struct ring_elem def_elm, loc_elm;
> > > +	struct ring_elem *obj[2 * BULK_NUM];
> > > +
> > > +	la = arg;
> > > +	lc = rte_lcore_id();
> > > +
> > > +	fill_ring_elm(&def_elm, UINT32_MAX);
> > > +	fill_ring_elm(&loc_elm, lc);
> > > +
> > > +	while (wrk_cmd != WRK_CMD_RUN) {
> > > +		rte_smp_rmb();
> > > +		rte_pause();
> > > +	}
> > > +
> > > +	cl = rte_rdtsc_precise();
> > > +
> > > +	do {
> > > +		/* num in interval [7/8, 11/8] of BULK_NUM */
> > > +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> > > +
> > > +		/* reset all pointer values */
> > > +		memset(obj, 0, sizeof(obj));
> > > +
> > > +		/* dequeue num elems */
> > > +		tm0 = rte_rdtsc_precise();
> > > +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> > > +		tm0 = rte_rdtsc_precise() - tm0;
> > > +
> > > +		/* check return value and objects */
> > > +		rc = check_ring_op(num, n, lc, __func__,
> > > +			RTE_STR(_st_ring_dequeue_bulk));
> > > +		if (rc == 0)
> > > +			rc = check_updt_elem(obj, num, &def_elm,
> > > &loc_elm);
> > > +		if (rc != 0)
> > > +			break;
> > Since this seems like a performance test, should we skip validating the objects?

I think it is good to have test doing validation too.
It shouldn't affect measurements, but brings extra confidentiality
that our ring implementation works properly and doesn't introduce 
any races. 

> > Did these tests run on Travis CI?

AFAIK, no but people can still run it manually.

>> I believe Travis CI has trouble running stress/performance tests if they take too much time.
> > The RTS and HTS tests should be added to functional tests.

Ok, I'll try to add some extra functional tests in v4. 


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/9] ring: prepare ring to allow new sync schemes
  2020-04-08  4:59         ` Honnappa Nagarahalli
@ 2020-04-09 13:39           ` Ananyev, Konstantin
  2020-04-10 20:15             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-09 13:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd

> > Subject: [PATCH v3 2/9] ring: prepare ring to allow new sync schemes
> >
> > Change from *single* to *sync_type* to allow different synchronisation
> > schemes to be applied.
> > Mark *single* as deprecated in comments.
> > Add new functions to allow user to query ring sync types.
> > Replace direct access to *single* with appopriate function call.
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  app/test/test_pdump.c           |   6 +-
> >  lib/librte_pdump/rte_pdump.c    |   2 +-
> >  lib/librte_port/rte_port_ring.c |  12 ++--
> >  lib/librte_ring/rte_ring.c      |   6 +-
> >  lib/librte_ring/rte_ring.h      | 113 ++++++++++++++++++++++++++------
> >  lib/librte_ring/rte_ring_elem.h |   8 +--
> >  6 files changed, 108 insertions(+), 39 deletions(-)
> >
> > diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c index
> > ad183184c..6a1180bcb 100644
> > --- a/app/test/test_pdump.c
> > +++ b/app/test/test_pdump.c
> > @@ -57,8 +57,7 @@ run_pdump_client_tests(void)
> >  	if (ret < 0)
> >  		return -1;
> >  	mp->flags = 0x0000;
> > -	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
> > -				      RING_F_SP_ENQ | RING_F_SC_DEQ);
> > +	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
> Are you saying to get SP and SC behavior we now have to set the flags to 0?

No.
What the original cause does:
creates SP/SC ring:
ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
				      RING_F_SP_ENQ | RING_F_SC_DEQ);
Then manually makes it MP/MC by:
ring_client->prod.single = 0;
ring_client->cons.single = 0;

Instead it should just create MP/MC ring straightway, as the patch does:
ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);

 >Isn't that a ABI break?
I don't see any.

> 
> >  	if (ring_client == NULL) {
> >  		printf("rte_ring_create SR0 failed");
> >  		return -1;
> > @@ -71,9 +70,6 @@ run_pdump_client_tests(void)
> >  	}
> >  	rte_eth_dev_probing_finish(eth_dev);
> >
> > -	ring_client->prod.single = 0;
> > -	ring_client->cons.single = 0;
> Just wondering if users outside of DPDK have done the same. I hope not, otherwise, we have an API break?

I think no. While it is completely wrong practise, it would keep working
even with these changes. 

> 
> > -
> >  	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
> >
> >  	for (itr = 0; itr < NUM_ITR; itr++) {
> > diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
> > index 8a01ac510..65364f2c5 100644
> > --- a/lib/librte_pdump/rte_pdump.c
> > +++ b/lib/librte_pdump/rte_pdump.c
> > @@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct
> > rte_mempool *mp)
> >  		rte_errno = EINVAL;
> >  		return -1;
> >  	}
> > -	if (ring->prod.single || ring->cons.single) {
> > +	if (rte_ring_prod_single(ring) || rte_ring_cons_single(ring)) {
> >  		PDUMP_LOG(ERR, "ring with either SP or SC settings"
> >  		" is not valid for pdump, should have MP and MC settings\n");
> >  		rte_errno = EINVAL;
> > diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
> > index 47fcdd06a..2f6c050fa 100644
> > --- a/lib/librte_port/rte_port_ring.c
> > +++ b/lib/librte_port/rte_port_ring.c
> > @@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int
> > socket_id,
> >  	/* Check input parameters */
> >  	if ((conf == NULL) ||
> >  		(conf->ring == NULL) ||
> > -		(conf->ring->cons.single && is_multi) ||
> > -		(!(conf->ring->cons.single) && !is_multi)) {
> > +		(rte_ring_cons_single(conf->ring) && is_multi) ||
> > +		(!rte_ring_cons_single(conf->ring) && !is_multi)) {
> >  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
> >  		return NULL;
> >  	}
> > @@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params,
> > int socket_id,
> >  	/* Check input parameters */
> >  	if ((conf == NULL) ||
> >  		(conf->ring == NULL) ||
> > -		(conf->ring->prod.single && is_multi) ||
> > -		(!(conf->ring->prod.single) && !is_multi) ||
> > +		(rte_ring_prod_single(conf->ring) && is_multi) ||
> > +		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
> >  		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
> >  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
> >  		return NULL;
> > @@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void
> > *params, int socket_id,
> >  	/* Check input parameters */
> >  	if ((conf == NULL) ||
> >  		(conf->ring == NULL) ||
> > -		(conf->ring->prod.single && is_multi) ||
> > -		(!(conf->ring->prod.single) && !is_multi) ||
> > +		(rte_ring_prod_single(conf->ring) && is_multi) ||
> > +		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
> >  		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
> >  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
> >  		return NULL;
> > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> > 77e5de099..fa5733907 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name,
> > unsigned count,
> >  	if (ret < 0 || ret >= (int)sizeof(r->name))
> >  		return -ENAMETOOLONG;
> >  	r->flags = flags;
> > -	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
> > -	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
> > +	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
> > +		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> > +	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
> > +		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> >
> >  	if (flags & RING_F_EXACT_SZ) {
> >  		r->size = rte_align32pow2(count + 1); diff --git
> > a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> > 18fc5d845..d4775a063 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -61,11 +61,27 @@ enum rte_ring_queue_behavior {  #define
> > RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
> >  			   sizeof(RTE_RING_MZ_PREFIX) + 1)
> >
> > -/* structure to hold a pair of head/tail values and other metadata */
> > +/** prod/cons sync types */
> > +enum rte_ring_sync_type {
> > +	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> > +	RTE_RING_SYNC_ST,     /**< single thread only */
> > +};
> > +
> > +/**
> > + * structure to hold a pair of head/tail values and other metadata.
> > + * Depending on sync_type format of that structure might be different,
> > + * but offset for *sync_type* and *tail* values should remain the same.
> > + */
> >  struct rte_ring_headtail {
> > -	volatile uint32_t head;  /**< Prod/consumer head. */
> > -	volatile uint32_t tail;  /**< Prod/consumer tail. */
> > -	uint32_t single;         /**< True if single prod/cons */
> > +	volatile uint32_t head;      /**< prod/consumer head. */
> > +	volatile uint32_t tail;      /**< prod/consumer tail. */
> > +	RTE_STD_C11
> > +	union {
> > +		/** sync type of prod/cons */
> > +		enum rte_ring_sync_type sync_type;
> > +		/** deprecated -  True if single prod/cons */
> > +		uint32_t single;
> > +	};
> >  };
> >
> >  /**
> > @@ -116,11 +132,10 @@ struct rte_ring {
> >  #define RING_F_EXACT_SZ 0x0004
> >  #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
> >
> > -/* @internal defines for passing to the enqueue dequeue worker functions
> > */ -#define __IS_SP 1 -#define __IS_MP 0 -#define __IS_SC 1 -#define
> > __IS_MC 0
> > +#define __IS_SP RTE_RING_SYNC_ST
> > +#define __IS_MP RTE_RING_SYNC_MT
> > +#define __IS_SC RTE_RING_SYNC_ST
> > +#define __IS_MC RTE_RING_SYNC_MT
> I think we can remove these #defines and use the new SYNC types

Wouldn't that introduce an API breakage?
Or we are ok here, as they are marked as internal?
I think I can for sure mark them as deprecated.
 
> >
> >  /**
> >   * Calculate the memory size needed for a ring @@ -420,7 +435,7 @@
> > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> >  	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			__IS_MP, free_space);
> > +			RTE_RING_SYNC_MT, free_space);
> >  }
> >
> >  /**
> > @@ -443,7 +458,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void *
> > const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> >  	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			__IS_SP, free_space);
> > +			RTE_RING_SYNC_ST, free_space);
> >  }
> >
> >  /**
> > @@ -470,7 +485,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> > const *obj_table,
> >  		      unsigned int n, unsigned int *free_space)  {
> >  	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			r->prod.single, free_space);
> > +			r->prod.sync_type, free_space);
> >  }
> >
> >  /**
> > @@ -554,7 +569,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void
> > **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> >  	return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			__IS_MC, available);
> > +			RTE_RING_SYNC_MT, available);
> >  }
> >
> >  /**
> > @@ -578,7 +593,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void
> > **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> >  	return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			__IS_SC, available);
> > +			RTE_RING_SYNC_ST, available);
> >  }
> >
> >  /**
> > @@ -605,7 +620,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > **obj_table, unsigned int n,
> >  		unsigned int *available)
> >  {
> >  	return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -				r->cons.single, available);
> > +				r->cons.sync_type, available);
> >  }
> >
> >  /**
> > @@ -777,6 +792,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
> >  	return r->capacity;
> >  }
> >
> > +/**
> > + * Return sync type used by producer in the ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   Producer sync type value.
> > + */
> > +static inline enum rte_ring_sync_type
> > +rte_ring_get_prod_sync_type(const struct rte_ring *r) {
> > +	return r->prod.sync_type;
> > +}
> > +
> > +/**
> > + * Check is the ring for single producer.
>                      ^^ if
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   true if ring is SP, zero otherwise.
> > + */
> > +static inline int
> > +rte_ring_prod_single(const struct rte_ring *r) {
> would rte_ring_is_prod_single better?

Ok, can rename.

> 
> > +	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST); }
> > +
> > +/**
> > + * Return sync type used by consumer in the ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   Consumer sync type value.
> > + */
> > +static inline enum rte_ring_sync_type
> > +rte_ring_get_cons_sync_type(const struct rte_ring *r) {
> > +	return r->cons.sync_type;
> > +}
> > +
> > +/**
> > + * Check is the ring for single consumer.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   true if ring is SC, zero otherwise.
> > + */
> > +static inline int
> > +rte_ring_cons_single(const struct rte_ring *r) {
> > +	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST); }
> > +
> All these new functions are  not required to be called in the data path. They can be made non-inline.

Well, all these functions are introduced to encourage people not to
access ring fields sync_type/single directly but use functions instead.
I don't know do people access ring.single directly at data-path or not,
but assuming that they do - making these functions not-inline would 
force them to ignore these functions and keep accessing it directly.
That was my thoughts besides making them inline.
I think we have the same for get_size/get_capacity().


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-08  5:00         ` Honnappa Nagarahalli
@ 2020-04-09 14:52           ` Ananyev, Konstantin
  2020-04-10 23:10             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-09 14:52 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd

> > Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
> > Aim to reduce stall times in case when ring is used on overcommited cpus
> > (multiple active threads on the same cpu).
> > The main difference from original MP/MC algorithm is that tail value is
> > increased not by every thread that finished enqueue/dequeue, but only by the
> > last one.
> > That allows threads to avoid spinning on ring tail value, leaving actual tail
> > value change to the last thread in the update queue.
> >
> > check-abi.sh reports what I believe is a false-positive about ring cons/prod
> > changes. As a workaround, devtools/libabigail.abignore is updated to suppress
> > *struct ring* related errors.
> This can be removed from the commit message.
> 
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  devtools/libabigail.abignore           |   7 +
> >  lib/librte_ring/Makefile               |   5 +-
> >  lib/librte_ring/meson.build            |   5 +-
> >  lib/librte_ring/rte_ring.c             | 100 +++++++-
> >  lib/librte_ring/rte_ring.h             | 110 ++++++++-
> >  lib/librte_ring/rte_ring_elem.h        |  86 ++++++-
> >  lib/librte_ring/rte_ring_rts.h         | 316 +++++++++++++++++++++++++
> >  lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++++++
> >  lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++
> >  9 files changed, 1015 insertions(+), 29 deletions(-)  create mode 100644
> > lib/librte_ring/rte_ring_rts.h  create mode 100644
> > lib/librte_ring/rte_ring_rts_elem.h
> >  create mode 100644 lib/librte_ring/rte_ring_rts_generic.h
> >
> > diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index
> > a59df8f13..cd86d89ca 100644
> > --- a/devtools/libabigail.abignore
> > +++ b/devtools/libabigail.abignore
> > @@ -11,3 +11,10 @@
> >          type_kind = enum
> >          name = rte_crypto_asym_xform_type
> >          changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> > +; Ignore updates of ring prod/cons
> > +[suppress_type]
> > +        type_kind = struct
> > +        name = rte_ring
> > +[suppress_type]
> > +        type_kind = struct
> > +        name = rte_event_ring
> Does this block the reporting of these structures forever?

Till we'll have a fix in libabigail, then we can remove these lines.
I don't know any better alternative.

> 
> > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> > 917c560ad..8f5c284cc 100644
> > --- a/lib/librte_ring/Makefile
> > +++ b/lib/librte_ring/Makefile
> > @@ -18,6 +18,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> > SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
> >  					rte_ring_elem.h \
> >  					rte_ring_generic.h \
> > -					rte_ring_c11_mem.h
> > +					rte_ring_c11_mem.h \
> > +					rte_ring_rts.h \
> > +					rte_ring_rts_elem.h \
> > +					rte_ring_rts_generic.h
> >
> >  include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> > f2f3ccc88..612936afb 100644
> > --- a/lib/librte_ring/meson.build
> > +++ b/lib/librte_ring/meson.build
> > @@ -5,7 +5,10 @@ sources = files('rte_ring.c')  headers = files('rte_ring.h',
> >  		'rte_ring_elem.h',
> >  		'rte_ring_c11_mem.h',
> > -		'rte_ring_generic.h')
> > +		'rte_ring_generic.h',
> > +		'rte_ring_rts.h',
> > +		'rte_ring_rts_elem.h',
> > +		'rte_ring_rts_generic.h')
> >
> >  # rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> > allow_experimental_apis = true diff --git a/lib/librte_ring/rte_ring.c
> > b/lib/librte_ring/rte_ring.c index fa5733907..222eec0fb 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
> >  /* true if x is a power of 2 */
> >  #define POWEROF2(x) ((((x)-1) & (x)) == 0)
> >
> > +/* by default set head/tail distance as 1/8 of ring capacity */
> > +#define HTD_MAX_DEF	8
> > +
> >  /* return the size of memory occupied by a ring */  ssize_t
> > rte_ring_get_memsize_elem(unsigned int esize, unsigned int count) @@ -
> > 79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
> >  	return rte_ring_get_memsize_elem(sizeof(void *), count);  }
> >
> > +/*
> > + * internal helper function to reset prod/cons head-tail values.
> > + */
> > +static void
> > +reset_headtail(void *p)
> > +{
> > +	struct rte_ring_headtail *ht;
> > +	struct rte_ring_rts_headtail *ht_rts;
> > +
> > +	ht = p;
> > +	ht_rts = p;
> > +
> > +	switch (ht->sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +	case RTE_RING_SYNC_ST:
> > +		ht->head = 0;
> > +		ht->tail = 0;
> > +		break;
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		ht_rts->head.raw = 0;
> > +		ht_rts->tail.raw = 0;
> > +		break;
> > +	default:
> > +		/* unknown sync mode */
> > +		RTE_ASSERT(0);
> > +	}
> > +}
> > +
> >  void
> >  rte_ring_reset(struct rte_ring *r)
> >  {
> > -	r->prod.head = r->cons.head = 0;
> > -	r->prod.tail = r->cons.tail = 0;
> > +	reset_headtail(&r->prod);
> > +	reset_headtail(&r->cons);
> > +}
> > +
> > +/*
> > + * helper function, calculates sync_type values for prod and cons
> > + * based on input flags. Returns zero at success or negative
> > + * errno value otherwise.
> > + */
> > +static int
> > +get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
> > +	enum rte_ring_sync_type *cons_st)
> > +{
> > +	static const uint32_t prod_st_flags =
> > +		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
> > +	static const uint32_t cons_st_flags =
> > +		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
> > +
> > +	switch (flags & prod_st_flags) {
> > +	case 0:
> > +		*prod_st = RTE_RING_SYNC_MT;
> > +		break;
> > +	case RING_F_SP_ENQ:
> > +		*prod_st = RTE_RING_SYNC_ST;
> > +		break;
> > +	case RING_F_MP_RTS_ENQ:
> > +		*prod_st = RTE_RING_SYNC_MT_RTS;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	switch (flags & cons_st_flags) {
> > +	case 0:
> > +		*cons_st = RTE_RING_SYNC_MT;
> > +		break;
> > +	case RING_F_SC_DEQ:
> > +		*cons_st = RTE_RING_SYNC_ST;
> > +		break;
> > +	case RING_F_MC_RTS_DEQ:
> > +		*cons_st = RTE_RING_SYNC_MT_RTS;
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> >  }
> >
> >  int
> > @@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name,
> > unsigned count,
> >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> >  			  RTE_CACHE_LINE_MASK) != 0);
> >
> > +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
> > +		offsetof(struct rte_ring_rts_headtail, sync_type));
> > +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
> > +		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
> > +
> >  	/* init the ring structure */
> >  	memset(r, 0, sizeof(*r));
> >  	ret = strlcpy(r->name, name, sizeof(r->name));
> >  	if (ret < 0 || ret >= (int)sizeof(r->name))
> >  		return -ENAMETOOLONG;
> >  	r->flags = flags;
> > -	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
> > -		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> > -	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
> > -		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> > +	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
> > +	if (ret != 0)
> > +		return ret;
> >
> >  	if (flags & RING_F_EXACT_SZ) {
> >  		r->size = rte_align32pow2(count + 1); @@ -126,8 +206,12
> > @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> >  		r->mask = count - 1;
> >  		r->capacity = r->mask;
> >  	}
> > -	r->prod.head = r->cons.head = 0;
> > -	r->prod.tail = r->cons.tail = 0;
> > +
> > +	/* set default values for head-tail distance */
> > +	if (flags & RING_F_MP_RTS_ENQ)
> > +		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
> > +	if (flags & RING_F_MC_RTS_DEQ)
> > +		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
> >
> >  	return 0;
> >  }
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> > d4775a063..f6f084d79 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -48,6 +48,7 @@ extern "C" {
> >  #include <rte_branch_prediction.h>
> >  #include <rte_memzone.h>
> >  #include <rte_pause.h>
> > +#include <rte_debug.h>
> >
> >  #define RTE_TAILQ_RING_NAME "RTE_RING"
> >
> > @@ -65,10 +66,13 @@ enum rte_ring_queue_behavior {  enum
> > rte_ring_sync_type {
> >  	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> >  	RTE_RING_SYNC_ST,     /**< single thread only */
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
> > #endif
> >  };
> >
> >  /**
> > - * structure to hold a pair of head/tail values and other metadata.
> > + * structures to hold a pair of head/tail values and other metadata.
> >   * Depending on sync_type format of that structure might be different,
> >   * but offset for *sync_type* and *tail* values should remain the same.
> >   */
> > @@ -84,6 +88,21 @@ struct rte_ring_headtail {
> >  	};
> >  };
> >
> > +union rte_ring_ht_poscnt {
> nit, this is specific to RTS, may be change this to rte_ring_rts_ht_poscnt?

Ok.

> 
> > +	uint64_t raw;
> > +	struct {
> > +		uint32_t cnt; /**< head/tail reference counter */
> > +		uint32_t pos; /**< head/tail position */
> > +	} val;
> > +};
> > +
> > +struct rte_ring_rts_headtail {
> > +	volatile union rte_ring_ht_poscnt tail;
> > +	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
> > +	uint32_t htd_max;   /**< max allowed distance between head/tail */
> > +	volatile union rte_ring_ht_poscnt head; };
> > +
> >  /**
> >   * An RTE ring structure.
> >   *
> > @@ -111,11 +130,21 @@ struct rte_ring {
> >  	char pad0 __rte_cache_aligned; /**< empty cache line */
> >
> >  	/** Ring producer status. */
> > -	struct rte_ring_headtail prod __rte_cache_aligned;
> > +	RTE_STD_C11
> > +	union {
> > +		struct rte_ring_headtail prod;
> > +		struct rte_ring_rts_headtail rts_prod;
> > +	}  __rte_cache_aligned;
> > +
> >  	char pad1 __rte_cache_aligned; /**< empty cache line */
> >
> >  	/** Ring consumer status. */
> > -	struct rte_ring_headtail cons __rte_cache_aligned;
> > +	RTE_STD_C11
> > +	union {
> > +		struct rte_ring_headtail cons;
> > +		struct rte_ring_rts_headtail rts_cons;
> > +	}  __rte_cache_aligned;
> > +
> >  	char pad2 __rte_cache_aligned; /**< empty cache line */  };
> >
> > @@ -132,6 +161,9 @@ struct rte_ring {
> >  #define RING_F_EXACT_SZ 0x0004
> >  #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
> >
> > +#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS".
> > +*/ #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC
> > +RTS". */
> > +
> >  #define __IS_SP RTE_RING_SYNC_ST
> >  #define __IS_MP RTE_RING_SYNC_MT
> >  #define __IS_SC RTE_RING_SYNC_ST
> > @@ -461,6 +493,10 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void *
> > const *obj_table,
> >  			RTE_RING_SYNC_ST, free_space);
> >  }
> >
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +#include <rte_ring_rts.h>
> > +#endif
> > +
> >  /**
> >   * Enqueue several objects on a ring.
> >   *
> > @@ -484,8 +520,21 @@ static __rte_always_inline unsigned int
> > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >  		      unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			r->prod.sync_type, free_space);
> > +	switch (r->prod.sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
> > +	case RTE_RING_SYNC_ST:
> > +		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
> Have you validated if these affect the performance for the existing APIs?

I run ring_pmd_perf_autotest
(AFAIK, that's the only one of our perf tests that calls rte_ring_enqueue/dequeue),
and didn't see any real difference in perf numbers. 

> I am also wondering why should we support these new modes in the legacy APIs?

Majority of DPDK users still do use legacy API,
and I am not sure all of them will be happy to switch to _elem_ one manually.
Plus I can't see how we can justify that after let say:
rte_ring_init(ring, ..., RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
returns with success
valid call to rte_ring_enqueue(ring,...) should fail.

> I think users should move to use rte_ring_xxx_elem APIs. If users want to use RTS/HTS it will be a good time for them to move to new APIs.

If they use rte_ring_enqueue/dequeue all they have to do - just change flags in ring_create/ring_init call.
With what you suggest - they have to change every rte_ring_enqueue/dequeue
to rte_ring_elem_enqueue/dequeue.
That's much bigger code churn.

> They anyway have to test their code for RTS/HTS, might as well make the change to new APIs and test both.
> It will be less code to maintain for the community as well.

That's true, right now there is a lot of duplication between
_elem_ and legacy code. 
 Actually the only real diff between them - actual copying of the objects.
 But I thought we are going to deal with that, just by
changing one day all legacy API to wrappers around _elem_ calls,
i.e something like:

static __rte_always_inline unsigned int
rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
                      unsigned int n, unsigned int *free_space)
{
	return  rte_ring_enqueue_elem_bulk(r, obj_table, sizeof(uintptr_t), n, free_space);
}

That way users will switch to new API automatically,
without any extra effort for them, and we will be able to remove legacy code. 
Do you have some other thoughts here how to deal with this legacy/elem conversion?

> 
> > #ifdef
> > +ALLOW_EXPERIMENTAL_API
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > +			free_space);
> > +#endif
> > +	}
> > +
> > +	/* valid ring should never reach this point */
> > +	RTE_ASSERT(0);
> > +	return 0;
> >  }
> >
> >  /**
> > @@ -619,8 +668,20 @@ static __rte_always_inline unsigned int
> > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
> >  		unsigned int *available)
> >  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -				r->cons.sync_type, available);
> > +	switch (r->cons.sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > +	case RTE_RING_SYNC_ST:
> > +		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > #ifdef
> > +ALLOW_EXPERIMENTAL_API
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> > available);
> > +#endif
> > +	}
> > +
> > +	/* valid ring should never reach this point */
> > +	RTE_ASSERT(0);
> > +	return 0;
> >  }
> >
> >  /**
> > @@ -940,8 +1001,21 @@ static __rte_always_inline unsigned
> > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> >  		      unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_VARIABLE,
> > -			r->prod.sync_type, free_space);
> > +	switch (r->prod.sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> > free_space);
> > +	case RTE_RING_SYNC_ST:
> > +		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
> > #ifdef
> > +ALLOW_EXPERIMENTAL_API
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > +			free_space);
> > +#endif
> > +	}
> > +
> > +	/* valid ring should never reach this point */
> > +	RTE_ASSERT(0);
> > +	return 0;
> >  }
> >
> >  /**
> > @@ -1020,9 +1094,21 @@ static __rte_always_inline unsigned
> > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > -				RTE_RING_QUEUE_VARIABLE,
> > -				r->cons.sync_type, available);
> > +	switch (r->cons.sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
> > +	case RTE_RING_SYNC_ST:
> > +		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > #ifdef
> > +ALLOW_EXPERIMENTAL_API
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > +			available);
> > +#endif
> > +	}
> > +
> > +	/* valid ring should never reach this point */
> > +	RTE_ASSERT(0);
> > +	return 0;
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> > index 28f9836e6..5de0850dc 100644
> > --- a/lib/librte_ring/rte_ring_elem.h
> > +++ b/lib/librte_ring/rte_ring_elem.h
> > @@ -542,6 +542,8 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r,
> > const void *obj_table,
> >  			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);  }
> >
> > +#include <rte_ring_rts_elem.h>
> > +
> >  /**
> >   * Enqueue several objects on a ring.
> >   *
> > @@ -571,6 +573,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r,
> > const void *obj_table,  {
> >  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> >  			RTE_RING_QUEUE_FIXED, r->prod.sync_type,
> > free_space);
> > +
> > +	switch (r->prod.sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
> > +			free_space);
> > +	case RTE_RING_SYNC_ST:
> > +		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
> > +			free_space);
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table,
> > esize, n,
> > +			free_space);
> > +#endif
> > +	}
> > +
> > +	/* valid ring should never reach this point */
> > +	RTE_ASSERT(0);
> > +	if (free_space != NULL)
> > +		*free_space = 0;
> > +	return 0;
> >  }
> >
> >  /**
> > @@ -733,8 +755,25 @@ static __rte_always_inline unsigned int
> > rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
> >  		unsigned int esize, unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> > -			RTE_RING_QUEUE_FIXED, r->cons.sync_type,
> > available);
> > +	switch (r->cons.sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
> > +			available);
> > +	case RTE_RING_SYNC_ST:
> > +		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
> > +			available);
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table,
> > esize,
> > +			n, available);
> > +#endif
> > +	}
> > +
> > +	/* valid ring should never reach this point */
> > +	RTE_ASSERT(0);
> > +	if (available != NULL)
> > +		*available = 0;
> > +	return 0;
> >  }
> >
> >  /**
> > @@ -901,8 +940,25 @@ static __rte_always_inline unsigned
> > rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
> >  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> > -			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type,
> > free_space);
> > +	switch (r->prod.sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
> > +			free_space);
> > +	case RTE_RING_SYNC_ST:
> > +		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
> > +			free_space);
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table,
> > esize,
> > +			n, free_space);
> > +#endif
> > +	}
> > +
> > +	/* valid ring should never reach this point */
> > +	RTE_ASSERT(0);
> > +	if (free_space != NULL)
> > +		*free_space = 0;
> > +	return 0;
> >  }
> >
> >  /**
> > @@ -993,9 +1049,25 @@ static __rte_always_inline unsigned int
> > rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
> >  		unsigned int esize, unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> > -				RTE_RING_QUEUE_VARIABLE,
> > -				r->cons.sync_type, available);
> > +	switch (r->cons.sync_type) {
> > +	case RTE_RING_SYNC_MT:
> > +		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
> > +			available);
> > +	case RTE_RING_SYNC_ST:
> > +		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
> > +			available);
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +	case RTE_RING_SYNC_MT_RTS:
> > +		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table,
> > esize,
> > +			n, available);
> > +#endif
> > +	}
> > +
> > +	/* valid ring should never reach this point */
> > +	RTE_ASSERT(0);
> > +	if (available != NULL)
> > +		*available = 0;
> > +	return 0;
> >  }
> >
> >  #ifdef __cplusplus
> > diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h new
> > file mode 100644 index 000000000..18404fe48
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_rts.h
> IMO, we should not provide these APIs.

You mean only _elem_ ones, as discussed above?

> 
> > @@ -0,0 +1,316 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2010-2017 Intel Corporation
> nit, the year should change to 2020? Look at others too.

ack, will do. 

> 
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_RTS_H_
> > +#define _RTE_RING_RTS_H_
> > +
> > +/**
> > + * @file rte_ring_rts.h
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + * It is not recommended to include this file directly.
> > + * Please include <rte_ring.h> instead.
> > + *
> > + * Contains functions for Relaxed Tail Sync (RTS) ring mode.
> > + * The main idea remains the same as for our original MP/MC
>                                                                                  ^^^ the
> > +synchronization
> > + * mechanism.
> > + * The main difference is that tail value is increased not
> > + * by every thread that finished enqueue/dequeue,
> > + * but only by the last one doing enqueue/dequeue.
> should we say 'current last' or 'last thread at a given instance'?
> 
> > + * That allows threads to skip spinning on tail value,
> > + * leaving actual tail value change to last thread in the update queue.
> nit, I understand what you mean by 'update queue' here. IMO, we should remove it as it might confuse some.
> 
> > + * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
> > + * one for head update, second for tail update.
> > + * As a gain it allows thread to avoid spinning/waiting on tail value.
> > + * In comparision original MP/MC algorithm requires one 32-bit CAS
> > + * for head update and waiting/spinning on tail value.
> > + *
> > + * Brief outline:
> > + *  - introduce refcnt for both head and tail.
> Suggesting using the same names as used in the structures.
> 
> > + *  - increment head.refcnt for each head.value update
> > + *  - write head:value and head:refcnt atomically (64-bit CAS)
> > + *  - move tail.value ahead only when tail.refcnt + 1 == head.refcnt
> May be add '(indicating that this is the last thread updating the tail)'
> 
> > + *  - increment tail.refcnt when each enqueue/dequeue op finishes
> May be add 'otherwise' at the beginning.
> 
> > + *    (no matter is tail:value going to change or not)
> nit                            ^^ if
> > + *  - write tail.value and tail.recnt atomically (64-bit CAS)
> > + *
> > + * To avoid producer/consumer starvation:
> > + *  - limit max allowed distance between head and tail value (HTD_MAX).
> > + *    I.E. thread is allowed to proceed with changing head.value,
> > + *    only when:  head.value - tail.value <= HTD_MAX
> > + * HTD_MAX is an optional parameter.
> > + * With HTD_MAX == 0 we'll have fully serialized ring -
> > + * i.e. only one thread at a time will be able to enqueue/dequeue
> > + * to/from the ring.
> > + * With HTD_MAX >= ring.capacity - no limitation.
> > + * By default HTD_MAX == ring.capacity / 8.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_ring_rts_generic.h>
> > +
> > +/**
> > + * @internal Enqueue several objects on the RTS ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > + * @param free_space
> > + *   returns the amount of space after the enqueue operation has finished
> > + * @return
> > + *   Actual number of objects enqueued.
> > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_do_rts_enqueue(struct rte_ring *r, void * const *obj_table,
> > +		uint32_t n, enum rte_ring_queue_behavior behavior,
> > +		uint32_t *free_space)
> > +{
> > +	uint32_t free, head;
> > +
> > +	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
> > +
> > +	if (n != 0) {
> > +		ENQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
> > +		__rte_ring_rts_update_tail(&r->rts_prod);
> > +	}
> > +
> > +	if (free_space != NULL)
> > +		*free_space = free - n;
> > +	return n;
> > +}
> > +
> > +/**
> > + * @internal Dequeue several objects from the RTS ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to pull from the ring.
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> > ring
> > + * @param available
> > + *   returns the number of remaining ring entries after the dequeue has
> > finished
> > + * @return
> > + *   - Actual number of objects dequeued.
> > + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_do_rts_dequeue(struct rte_ring *r, void **obj_table,
> > +		uint32_t n, enum rte_ring_queue_behavior behavior,
> > +		uint32_t *available)
> > +{
> > +	uint32_t entries, head;
> > +
> > +	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
> > +
> > +	if (n != 0) {
> > +		DEQUEUE_PTRS(r, &r[1], head, obj_table, n, void *);
> > +		__rte_ring_rts_update_tail(&r->rts_cons);
> > +	}
> > +
> > +	if (available != NULL)
> > +		*available = entries - n;
> > +	return n;
> > +}
> > +
> > +/**
> > + * Enqueue several objects on the RTS ring (multi-producers safe).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects enqueued, either 0 or n
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned int
> > +rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > +			 unsigned int n, unsigned int *free_space) {
> > +	return __rte_ring_do_rts_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > +			free_space);
> > +}
> > +
> > +/**
> > + * Dequeue several objects from an RTS ring (multi-consumers safe).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   The number of objects dequeued, either 0 or n
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned int
> > +rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	return __rte_ring_do_rts_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > +			available);
> > +}
> > +
> > +/**
> > + * Return producer max Head-Tail-Distance (HTD).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   Producer HTD value, if producer is set in appropriate sync mode,
> > + *   or UINT32_MAX otherwise.
> > + */
> > +__rte_experimental
> > +static inline uint32_t
> > +rte_ring_get_prod_htd_max(const struct rte_ring *r) {
> > +	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
> > +		return r->rts_prod.htd_max;
> > +	return UINT32_MAX;
> > +}
> > +
> > +/**
> > + * Set producer max Head-Tail-Distance (HTD).
> > + * Note that producer has to use appropriate sync mode (RTS).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param v
> > + *   new HTD value to setup.
> > + * @return
> > + *   Zero on success, or negative error code otherwise.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v) {
> > +	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
> > +		return -ENOTSUP;
> > +
> > +	r->rts_prod.htd_max = v;
> > +	return 0;
> > +}
> > +
> > +/**
> > + * Return consumer max Head-Tail-Distance (HTD).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @return
> > + *   Consumer HTD value, if consumer is set in appropriate sync mode,
> > + *   or UINT32_MAX otherwise.
> > + */
> > +__rte_experimental
> > +static inline uint32_t
> > +rte_ring_get_cons_htd_max(const struct rte_ring *r) {
> > +	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
> > +		return r->rts_cons.htd_max;
> > +	return UINT32_MAX;
> > +}
> > +
> > +/**
> > + * Set consumer max Head-Tail-Distance (HTD).
> > + * Note that consumer has to use appropriate sync mode (RTS).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param v
> > + *   new HTD value to setup.
> > + * @return
> > + *   Zero on success, or negative error code otherwise.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v) {
> > +	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
> > +		return -ENOTSUP;
> > +
> > +	r->rts_cons.htd_max = v;
> > +	return 0;
> > +}
> > +
> > +/**
> > + * Enqueue several objects on the RTS ring (multi-producers safe).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   - n: Actual number of objects enqueued.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned
> > +rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > +			 unsigned int n, unsigned int *free_space) {
> > +	return __rte_ring_do_rts_enqueue(r, obj_table, n,
> > +			RTE_RING_QUEUE_VARIABLE, free_space); }
> > +
> > +/**
> > + * Dequeue several objects from an RTS  ring (multi-consumers safe).
> > + * When the requested objects are more than the available objects,
> > + * only dequeue the actual number of objects.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   - n: Actual number of objects dequeued, 0 if ring is empty
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned
> > +rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
> > +		unsigned int n, unsigned int *available) {
> > +	return __rte_ring_do_rts_dequeue(r, obj_table, n,
> > +			RTE_RING_QUEUE_VARIABLE, available); }
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RING_RTS_H_ */
> > diff --git a/lib/librte_ring/rte_ring_rts_elem.h
> > b/lib/librte_ring/rte_ring_rts_elem.h
> > new file mode 100644
> > index 000000000..71a331b23
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_rts_elem.h
> > @@ -0,0 +1,205 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2010-2017 Intel Corporation
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_RTS_ELEM_H_
> > +#define _RTE_RING_RTS_ELEM_H_
> > +
> > +/**
> > + * @file rte_ring_rts_elem.h
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * It is not recommended to include this file directly.
> > + * Please include <rte_ring_elem.h> instead.
> > + * Contains *ring_elem* functions for Relaxed Tail Sync (RTS) ring mode.
> > + * for more details please refer to <rte_ring_rts.h>.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_ring_rts_generic.h>
> > +
> > +/**
> > + * @internal Enqueue several objects on the RTS ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > + * @param free_space
> > + *   returns the amount of space after the enqueue operation has finished
> > + * @return
> > + *   Actual number of objects enqueued.
> > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, void * const *obj_table,
> obj_table should be of type 'const void * obj_table' (looks like copy paste error). Please check the other APIs below too.
> 
> > +	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
> 'esize' is not documented in the comments above the function. You can copy the header from rte_ring_elem.h file. Please check other APIs
> as well.

Ack to both, will fix.

> 
> > +	uint32_t *free_space)
> > +{
> > +	uint32_t free, head;
> > +
> > +	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
> > +
> > +	if (n != 0) {
> > +		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
> > +		__rte_ring_rts_update_tail(&r->rts_prod);
> > +	}
> > +
> > +	if (free_space != NULL)
> > +		*free_space = free - n;
> > +	return n;
> > +}
> > +
> > +/**
> > + * @internal Dequeue several objects from the RTS ring.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to pull from the ring.
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> > ring
> > + * @param available
> > + *   returns the number of remaining ring entries after the dequeue has
> > finished
> > + * @return
> > + *   - Actual number of objects dequeued.
> > + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void **obj_table,
> > +	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
> > +	uint32_t *available)
> > +{
> > +	uint32_t entries, head;
> > +
> > +	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
> > +
> > +	if (n != 0) {
> > +		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
> > +		__rte_ring_rts_update_tail(&r->rts_cons);
> > +	}
> > +
> > +	if (available != NULL)
> > +		*available = entries - n;
> > +	return n;
> > +}
> > +
> > +/**
> > + * Enqueue several objects on the RTS ring (multi-producers safe).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects enqueued, either 0 or n
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned int
> > +rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, void * const
> > *obj_table,
> > +	unsigned int esize, unsigned int n, unsigned int *free_space) {
> > +	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
> > +			RTE_RING_QUEUE_FIXED, free_space);
> > +}
> > +
> > +/**
> > + * Dequeue several objects from an RTS ring (multi-consumers safe).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   The number of objects dequeued, either 0 or n
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned int
> > +rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void **obj_table,
> > +	unsigned int esize, unsigned int n, unsigned int *available) {
> > +	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
> > +		RTE_RING_QUEUE_FIXED, available);
> > +}
> > +
> > +/**
> > + * Enqueue several objects on the RTS ring (multi-producers safe).
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   - n: Actual number of objects enqueued.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned
> > +rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, void * const
> > *obj_table,
> > +	unsigned int esize, unsigned int n, unsigned int *free_space) {
> > +	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
> > +			RTE_RING_QUEUE_VARIABLE, free_space); }
> > +
> > +/**
> > + * Dequeue several objects from an RTS  ring (multi-consumers safe).
> > + * When the requested objects are more than the available objects,
> > + * only dequeue the actual number of objects.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects) that will be filled.
> > + * @param n
> > + *   The number of objects to dequeue from the ring to the obj_table.
> > + * @param available
> > + *   If non-NULL, returns the number of remaining ring entries after the
> > + *   dequeue has finished.
> > + * @return
> > + *   - n: Actual number of objects dequeued, 0 if ring is empty
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned
> > +rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void **obj_table,
> > +	unsigned int esize, unsigned int n, unsigned int *available) {
> > +	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
> > +			RTE_RING_QUEUE_VARIABLE, available); }
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_RING_RTS_ELEM_H_ */
> > diff --git a/lib/librte_ring/rte_ring_rts_generic.h
> > b/lib/librte_ring/rte_ring_rts_generic.h
> > new file mode 100644
> > index 000000000..f88460d47
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_rts_generic.h
> I do not know the benefit to providing the generic version. Do you know why this was done in the legacy APIs?

I think at first we had generic API only, then later C11 was added.
As I remember, C11 one on IA was measured as a bit slower then generic,
so it was decided to keep both.   

> If there is no performance difference between generic and C11 versions, should we just skip the generic version?
> The oldest compiler in CI are GCC 4.8.5 and Clang 3.4.2 and C11 built-ins are supported earlier than these compiler versions.
> I feel the code is growing exponentially in rte_ring library and we should try to cut non-value-ass code/APIs aggressively.

I'll check is there perf difference for RTS and HTS between generic and C11 versions on IA.
Meanwhile please have a proper look at C11 implementation, I am not that familiar with C11 atomics yet.
If there would be no problems with it and no noticeable diff in performance -
I am ok to have for RTS/HTS modes C11 version only.

> 
> > @@ -0,0 +1,210 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2010-2017 Intel Corporation
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_RTS_GENERIC_H_
> > +#define _RTE_RING_RTS_GENERIC_H_
> > +
> > +/**
> > + * @file rte_ring_rts_generic.h
> > + * It is not recommended to include this file directly,
> > + * include <rte_ring.h> instead.
> > + * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
> > + * For more information please refer to <rte_ring_rts.h>.
> > + */
> > +
> > +/**
> > + * @internal This function updates tail values.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht) {
> > +	union rte_ring_ht_poscnt h, ot, nt;
> > +
> > +	/*
> > +	 * If there are other enqueues/dequeues in progress that
> > +	 * might preceded us, then don't update tail with new value.
> > +	 */
> > +
> > +	do {
> > +		ot.raw = ht->tail.raw;
> > +		rte_smp_rmb();
> > +
> > +		/* on 32-bit systems we have to do atomic read here */
> > +		h.raw = rte_atomic64_read((rte_atomic64_t *)
> > +			(uintptr_t)&ht->head.raw);
> > +
> > +		nt.raw = ot.raw;
> > +		if (++nt.val.cnt == h.val.cnt)
> > +			nt.val.pos = h.val.pos;
> > +
> > +	} while (rte_atomic64_cmpset(&ht->tail.raw, ot.raw, nt.raw) == 0); }
> > +
> > +/**
> > + * @internal This function waits till head/tail distance wouldn't
> > + * exceed pre-defined max value.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
> > +	union rte_ring_ht_poscnt *h)
> > +{
> > +	uint32_t max;
> > +
> > +	max = ht->htd_max;
> > +	h->raw = ht->head.raw;
> > +	rte_smp_rmb();
> > +
> > +	while (h->val.pos - ht->tail.val.pos > max) {
> > +		rte_pause();
> > +		h->raw = ht->head.raw;
> > +		rte_smp_rmb();
> > +	}
> > +}
> > +
> > +/**
> > + * @internal This function updates the producer head for enqueue.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure
> > + * @param is_sp
> > + *   Indicates whether multi-producer path is needed or not
> > + * @param n
> > + *   The number of elements we will want to enqueue, i.e. how far should the
> > + *   head be moved
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > + * @param old_head
> > + *   Returns head value as it was before the move, i.e. where enqueue starts
> > + * @param new_head
> > + *   Returns the current/new head value i.e. where enqueue finishes
> > + * @param free_entries
> > + *   Returns the amount of free space in the ring BEFORE head was moved
> > + * @return
> > + *   Actual number of objects enqueued.
> > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline uint32_t
> > +__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
> > +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> > +	uint32_t *free_entries)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_poscnt nh, oh;
> > +
> > +	const uint32_t capacity = r->capacity;
> > +
> > +	do {
> > +		/* Reset n to the initial burst count */
> > +		n = num;
> > +
> > +		/* read prod head (may spin on prod tail) */
> > +		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
> > +
> > +		/* add rmb barrier to avoid load/load reorder in weak
> > +		 * memory model. It is noop on x86
> > +		 */
> > +		rte_smp_rmb();
> > +
> > +		/*
> > +		 *  The subtraction is done between two unsigned 32bits value
> > +		 * (the result is always modulo 32 bits even if we have
> > +		 * *old_head > cons_tail). So 'free_entries' is always between
> > 0
> > +		 * and capacity (which is < size).
> > +		 */
> > +		*free_entries = capacity + r->cons.tail - oh.val.pos;
> > +
> > +		/* check that we have enough room in ring */
> > +		if (unlikely(n > *free_entries))
> > +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> > +					0 : *free_entries;
> > +
> > +		if (n == 0)
> > +			break;
> > +
> > +		nh.val.pos = oh.val.pos + n;
> > +		nh.val.cnt = oh.val.cnt + 1;
> > +
> > +	} while (rte_atomic64_cmpset(&r->rts_prod.head.raw,
> > +			oh.raw, nh.raw) == 0);
> > +
> > +	*old_head = oh.val.pos;
> > +	return n;
> > +}
> > +
> > +/**
> > + * @internal This function updates the consumer head for dequeue
> > + *
> > + * @param r
> > + *   A pointer to the ring structure
> > + * @param is_sc
> > + *   Indicates whether multi-consumer path is needed or not
> > + * @param n
> > + *   The number of elements we will want to enqueue, i.e. how far should the
> > + *   head be moved
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> > ring
> > + * @param old_head
> > + *   Returns head value as it was before the move, i.e. where dequeue starts
> > + * @param new_head
> > + *   Returns the current/new head value i.e. where dequeue finishes
> > + * @param entries
> > + *   Returns the number of entries in the ring BEFORE head was moved
> > + * @return
> > + *   - Actual number of objects dequeued.
> > + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
> > +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> > +	uint32_t *entries)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_poscnt nh, oh;
> > +
> > +	/* move cons.head atomically */
> > +	do {
> > +		/* Restore n as it may change every loop */
> > +		n = num;
> > +
> > +		/* read cons head (may spin on cons tail) */
> > +		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
> > +
> > +
> > +		/* add rmb barrier to avoid load/load reorder in weak
> > +		 * memory model. It is noop on x86
> > +		 */
> > +		rte_smp_rmb();
> > +
> > +		/* The subtraction is done between two unsigned 32bits value
> > +		 * (the result is always modulo 32 bits even if we have
> > +		 * cons_head > prod_tail). So 'entries' is always between 0
> > +		 * and size(ring)-1.
> > +		 */
> > +		*entries = r->prod.tail - oh.val.pos;
> > +
> > +		/* Set the actual entries for dequeue */
> > +		if (n > *entries)
> > +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> > *entries;
> > +
> > +		if (unlikely(n == 0))
> > +			break;
> > +
> > +		nh.val.pos = oh.val.pos + n;
> > +		nh.val.cnt = oh.val.cnt + 1;
> > +
> > +	} while (rte_atomic64_cmpset(&r->rts_cons.head.raw,
> > +			oh.raw, nh.raw) == 0);
> > +
> > +	*old_head = oh.val.pos;
> > +	return n;
> > +}
> > +
> > +#endif /* _RTE_RING_RTS_GENERIC_H_ */
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test
  2020-04-09 12:36           ` Ananyev, Konstantin
  2020-04-09 13:00             ` Ananyev, Konstantin
@ 2020-04-10 16:59             ` Honnappa Nagarahalli
  1 sibling, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-10 16:59 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> >
> > > Subject: [PATCH v3 1/9] test/ring: add contention stress test
> > Minor, would 'add stress test for overcommitted use case' sound better?
> 
> I  liked to point out that this test-case can be used as contention stress-test
> (many threads do enqueue/dequeue to/from the same ring) for both over-
> committed and not scenarios...
> Will probably try to add few extra explanations in v4.
The test cases for non-over committed case are already available in the function 'run_on_all_cores'. There we are running enqueue/dequeue in all the available cores.

> 
> > >
> > > Introduce new test-case to measure ring perfomance under contention
> > Minor, 'over committed' seems to the word commonly used from the
> references you provided. Does it make sense to use that?
> >
> > > (miltiple producers/consumers).
> >     ^^^^^^^ multiple
> 
> ack.
> 
> >
> > > Starts dequeue/enqueue loop on all available slave lcores.
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > ---
> > >  app/test/Makefile                |   2 +
> > >  app/test/meson.build             |   2 +
> > >  app/test/test_ring_mpmc_stress.c |  31 +++
> > >  app/test/test_ring_stress.c      |  48 ++++
> > >  app/test/test_ring_stress.h      |  35 +++
> > >  app/test/test_ring_stress_impl.h | 444
> > > +++++++++++++++++++++++++++++++
> > Would be good to change the file names to indicate that these tests are for
> over-committed usecase/configuration.
> > These are performance tests, better to have 'perf' or 'performance' in their
> names.
> >
> > >  6 files changed, 562 insertions(+)
> > >  create mode 100644 app/test/test_ring_mpmc_stress.c  create mode
> > > 100644 app/test/test_ring_stress.c  create mode 100644
> > > app/test/test_ring_stress.h create mode 100644
> > > app/test/test_ring_stress_impl.h
> > >
> > > diff --git a/app/test/Makefile b/app/test/Makefile index
> > > 1f080d162..4eefaa887 100644
> > > --- a/app/test/Makefile
> > > +++ b/app/test/Makefile
> > > @@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c  SRCS-y +=
> > > test_rand_perf.c
> > >
> > >  SRCS-y += test_ring.c
> > > +SRCS-y += test_ring_mpmc_stress.c
> > >  SRCS-y += test_ring_perf.c
> > > +SRCS-y += test_ring_stress.c
> > >  SRCS-y += test_pmd_perf.c
> > >
> > >  ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y) diff --git
> > > a/app/test/meson.build b/app/test/meson.build index
> > > 351d29cb6..827b04886 100644
> > > --- a/app/test/meson.build
> > > +++ b/app/test/meson.build
> > > @@ -100,7 +100,9 @@ test_sources = files('commands.c',
> > >  	'test_rib.c',
> > >  	'test_rib6.c',
> > >  	'test_ring.c',
> > > +	'test_ring_mpmc_stress.c',
> > >  	'test_ring_perf.c',
> > > +	'test_ring_stress.c',
> > >  	'test_rwlock.c',
> > >  	'test_sched.c',
> > >  	'test_service_cores.c',
> > > diff --git a/app/test/test_ring_mpmc_stress.c
> > > b/app/test/test_ring_mpmc_stress.c
> > > new file mode 100644
> > > index 000000000..1524b0248
> > > --- /dev/null
> > > +++ b/app/test/test_ring_mpmc_stress.c
> > > @@ -0,0 +1,31 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2020 Intel Corporation  */
> > > +
> > > +#include "test_ring_stress_impl.h"
> > > +
> > > +static inline uint32_t
> > > +_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
> > > +	uint32_t *avail)
> > > +{
> > > +	return rte_ring_mc_dequeue_bulk(r, obj, n, avail); }
> > > +
> > > +static inline uint32_t
> > > +_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
> > > +	uint32_t *free)
> > > +{
> > > +	return rte_ring_mp_enqueue_bulk(r, obj, n, free); }
> > > +
> > > +static int
> > > +_st_ring_init(struct rte_ring *r, const char *name, uint32_t num) {
> > > +	return rte_ring_init(r, name, num, 0); }
> > > +
> > > +const struct test test_ring_mpmc_stress = {
> > > +	.name = "MP/MC",
> > > +	.nb_case = RTE_DIM(tests),
> > > +	.cases = tests,
> > > +};
> > > diff --git a/app/test/test_ring_stress.c
> > > b/app/test/test_ring_stress.c new file mode 100644 index
> > > 000000000..60706f799
> > > --- /dev/null
> > > +++ b/app/test/test_ring_stress.c
> > > @@ -0,0 +1,48 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2020 Intel Corporation  */
> > > +
> > > +#include "test_ring_stress.h"
> > > +
> > > +static int
> > > +run_test(const struct test *test)
> > > +{
> > > +	int32_t rc;
> > > +	uint32_t i, k;
> > > +
> > > +	for (i = 0, k = 0; i != test->nb_case; i++) {
> > > +
> > > +		printf("TEST-CASE %s %s START\n",
> > > +			test->name, test->cases[i].name);
> > > +
> > > +		rc = test->cases[i].func(test->cases[i].wfunc);
> > > +		k += (rc == 0);
> > > +
> > > +		if (rc != 0)
> > > +			printf("TEST-CASE %s %s FAILED\n",
> > > +				test->name, test->cases[i].name);
> > > +		else
> > > +			printf("TEST-CASE %s %s OK\n",
> > > +				test->name, test->cases[i].name);
> > > +	}
> > > +
> > > +	return k;
> > > +}
> > > +
> > > +static int
> > > +test_ring_stress(void)
> > > +{
> > > +	uint32_t n, k;
> > > +
> > > +	n = 0;
> > > +	k = 0;
> > > +
> > > +	n += test_ring_mpmc_stress.nb_case;
> > > +	k += run_test(&test_ring_mpmc_stress);
> > > +
> > > +	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
> > > +		n, k, n - k);
> > > +	return (k != n);
> > > +}
> > > +
> > > +REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
> > > diff --git a/app/test/test_ring_stress.h
> > > b/app/test/test_ring_stress.h new file mode 100644 index
> > > 000000000..60eac6216
> > > --- /dev/null
> > > +++ b/app/test/test_ring_stress.h
> > > @@ -0,0 +1,35 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2020 Intel Corporation  */
> > > +
> > > +
> > > +#include <inttypes.h>
> > > +#include <stddef.h>
> > > +#include <stdalign.h>
> > > +#include <string.h>
> > > +#include <stdio.h>
> > > +#include <unistd.h>
> > > +
> > > +#include <rte_ring.h>
> > > +#include <rte_cycles.h>
> > > +#include <rte_launch.h>
> > > +#include <rte_pause.h>
> > > +#include <rte_random.h>
> > > +#include <rte_malloc.h>
> > > +#include <rte_spinlock.h>
> > > +
> > > +#include "test.h"
> > > +
> > > +struct test_case {
> > > +	const char *name;
> > > +	int (*func)(int (*)(void *));
> > > +	int (*wfunc)(void *arg);
> > > +};
> > > +
> > > +struct test {
> > > +	const char *name;
> > > +	uint32_t nb_case;
> > > +	const struct test_case *cases;
> > > +};
> > > +
> > > +extern const struct test test_ring_mpmc_stress;
> > > diff --git a/app/test/test_ring_stress_impl.h
> > > b/app/test/test_ring_stress_impl.h
> > > new file mode 100644
> > > index 000000000..11476d28c
> > > --- /dev/null
> > > +++ b/app/test/test_ring_stress_impl.h
> > > @@ -0,0 +1,444 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2020 Intel Corporation  */
> > > +
> > > +#include "test_ring_stress.h"
> > > +
> > > +/*
> > > + * Measures performance of ring enqueue/dequeue under high
> > > +contention */
> > > +
> > > +#define RING_NAME	"RING_STRESS"
> > > +#define BULK_NUM	32
> > > +#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
> > > +
> > > +enum {
> > > +	WRK_CMD_STOP,
> > > +	WRK_CMD_RUN,
> > > +};
> > > +
> > > +static volatile uint32_t wrk_cmd __rte_cache_aligned;
> > > +
> > > +/* test run-time in seconds */
> > > +static const uint32_t run_time = 60; static const uint32_t verbose;
> > > +
> > > +struct lcore_stat {
> > > +	uint64_t nb_cycle;
> > > +	struct {
> > > +		uint64_t nb_call;
> > > +		uint64_t nb_obj;
> > > +		uint64_t nb_cycle;
> > > +		uint64_t max_cycle;
> > > +		uint64_t min_cycle;
> > > +	} op;
> > > +};
> > > +
> > > +struct lcore_arg {
> > > +	struct rte_ring *rng;
> > > +	struct lcore_stat stats;
> > > +} __rte_cache_aligned;
> > > +
> > > +struct ring_elem {
> > > +	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)]; }
> > > +__rte_cache_aligned;
> > > +
> > > +/*
> > > + * redefinable functions
> > > + */
> > > +static uint32_t
> > > +_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
> > > +	uint32_t *avail);
> > > +
> > > +static uint32_t
> > > +_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
> > > +	uint32_t *free);
> > > +
> > > +static int
> > > +_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
> > > +
> > > +
> > > +static void
> > > +lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
> > > +	uint64_t tm, int32_t prcs)
> > > +{
> > > +	ls->op.nb_call += call;
> > > +	ls->op.nb_obj += obj;
> > > +	ls->op.nb_cycle += tm;
> > > +	if (prcs) {
> > > +		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
> > > +		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
> > > +	}
> > > +}
> > > +
> > > +static void
> > > +lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat
> > > +*ls) {
> > > +
> > > +	ms->op.nb_call += ls->op.nb_call;
> > > +	ms->op.nb_obj += ls->op.nb_obj;
> > > +	ms->op.nb_cycle += ls->op.nb_cycle;
> > > +	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
> > > +	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle); }
> > > +
> > > +static void
> > > +lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls) {
> > > +	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
> > > +	lcore_op_stat_aggr(ms, ls);
> > > +}
> > > +
> > > +static void
> > > +lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls) {
> > > +	long double st;
> > > +
> > > +	st = (long double)rte_get_timer_hz() / US_PER_S;
> > > +
> > > +	if (lc == UINT32_MAX)
> > > +		fprintf(f, "%s(AGGREGATE)={\n", __func__);
> > > +	else
> > > +		fprintf(f, "%s(lc=%u)={\n", __func__, lc);
> > > +
> > > +	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
> > > +		ls->nb_cycle, (long double)ls->nb_cycle / st);
> > > +
> > > +	fprintf(f, "\tDEQ+ENQ={\n");
> > > +
> > > +	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
> > > +	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
> > > +	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
> > > +	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
> > > +		(long double)ls->op.nb_obj / ls->op.nb_call);
> > > +	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
> > > +		(long double)ls->op.nb_cycle / ls->op.nb_obj);
> > > +	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
> > > +		(long double)ls->op.nb_cycle / ls->op.nb_call);
> > > +
> > > +	/* if min/max cycles per call stats was collected */
> > > +	if (ls->op.min_cycle != UINT64_MAX) {
> > > +		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
> > > +			ls->op.max_cycle,
> > > +			(long double)ls->op.max_cycle / st);
> > > +		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
> > > +			ls->op.min_cycle,
> > > +			(long double)ls->op.min_cycle / st);
> > > +	}
> > > +
> > > +	fprintf(f, "\t},\n");
> > > +	fprintf(f, "};\n");
> > > +}
> > > +
> > > +static void
> > > +fill_ring_elm(struct ring_elem *elm, uint32_t fill) {
> > > +	uint32_t i;
> > > +
> > > +	for (i = 0; i != RTE_DIM(elm->cnt); i++)
> > > +		elm->cnt[i] = fill;
> > > +}
> > > +
> > > +static int32_t
> > > +check_updt_elem(struct ring_elem *elm[], uint32_t num,
> > > +	const struct ring_elem *check, const struct ring_elem *fill) {
> > > +	uint32_t i;
> > > +
> > > +	static rte_spinlock_t dump_lock;
> > > +
> > > +	for (i = 0; i != num; i++) {
> > > +		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
> > > +			rte_spinlock_lock(&dump_lock);
> > > +			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
> > > +				"offending object: %p\n",
> > > +				__func__, rte_lcore_id(), num, i, elm[i]);
> > > +			rte_memdump(stdout, "expected", check,
> > > sizeof(*check));
> > > +			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
> > > +			rte_spinlock_unlock(&dump_lock);
> > > +			return -EINVAL;
> > > +		}
> > > +		memcpy(elm[i], fill, sizeof(*elm[i]));
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int
> > > +check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
> > minor, lcore instead of lc would be better
> >
> > > +	const char *fname, const char *opname) {
> > > +	if (exp != res) {
> > > +		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
> > Suggest using lcore in the printf
> >
> > > +			fname, lc, opname, exp, res);
> > > +		return -ENOSPC;
> > > +	}
> > > +	return 0;
> > > +}
> > > +
> > > +static int
> > > +test_worker_prcs(void *arg)
> > > +{
> > > +	int32_t rc;
> > > +	uint32_t lc, n, num;
> > minor, lcore instead of lc would be better
> >
> > > +	uint64_t cl, tm0, tm1;
> > > +	struct lcore_arg *la;
> > > +	struct ring_elem def_elm, loc_elm;
> > > +	struct ring_elem *obj[2 * BULK_NUM];
> > > +
> > > +	la = arg;
> > > +	lc = rte_lcore_id();
> > > +
> > > +	fill_ring_elm(&def_elm, UINT32_MAX);
> > > +	fill_ring_elm(&loc_elm, lc);
> > > +
> > > +	while (wrk_cmd != WRK_CMD_RUN) {
> > > +		rte_smp_rmb();
> > > +		rte_pause();
> > > +	}
> > > +
> > > +	cl = rte_rdtsc_precise();
> > > +
> > > +	do {
> > > +		/* num in interval [7/8, 11/8] of BULK_NUM */
> > > +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> > > +
> > > +		/* reset all pointer values */
> > > +		memset(obj, 0, sizeof(obj));
> > > +
> > > +		/* dequeue num elems */
> > > +		tm0 = rte_rdtsc_precise();
> > > +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> > > +		tm0 = rte_rdtsc_precise() - tm0;
> > > +
> > > +		/* check return value and objects */
> > > +		rc = check_ring_op(num, n, lc, __func__,
> > > +			RTE_STR(_st_ring_dequeue_bulk));
> > > +		if (rc == 0)
> > > +			rc = check_updt_elem(obj, num, &def_elm,
> > > &loc_elm);
> > > +		if (rc != 0)
> > > +			break;
> > Since this seems like a performance test, should we skip validating the
> objects?
> > Did these tests run on Travis CI? I believe Travis CI has trouble running
> stress/performance tests if they take too much time.
> > The RTS and HTS tests should be added to functional tests.
> >
> > > +
> > > +		/* enqueue num elems */
> > > +		rte_compiler_barrier();
> > > +		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
> > > +		if (rc != 0)
> > > +			break;
> > > +
> > > +		tm1 = rte_rdtsc_precise();
> > > +		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
> > > +		tm1 = rte_rdtsc_precise() - tm1;
> > > +
> > > +		/* check return value */
> > > +		rc = check_ring_op(num, n, lc, __func__,
> > > +			RTE_STR(_st_ring_enqueue_bulk));
> > > +		if (rc != 0)
> > > +			break;
> > > +
> > > +		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, 1);
> > > +
> > > +	} while (wrk_cmd == WRK_CMD_RUN);
> > > +
> > > +	la->stats.nb_cycle = rte_rdtsc_precise() - cl;
> > > +	return rc;
> > > +}
> > > +
> > > +static int
> > > +test_worker_avg(void *arg)
> > > +{
> > > +	int32_t rc;
> > > +	uint32_t lc, n, num;
> > > +	uint64_t cl;
> > > +	struct lcore_arg *la;
> > > +	struct ring_elem def_elm, loc_elm;
> > > +	struct ring_elem *obj[2 * BULK_NUM];
> > > +
> > > +	la = arg;
> > > +	lc = rte_lcore_id();
> > > +
> > > +	fill_ring_elm(&def_elm, UINT32_MAX);
> > > +	fill_ring_elm(&loc_elm, lc);
> > > +
> > > +	while (wrk_cmd != WRK_CMD_RUN) {
> > > +		rte_smp_rmb();
> > > +		rte_pause();
> > > +	}
> > > +
> > > +	cl = rte_rdtsc_precise();
> > > +
> > > +	do {
> > > +		/* num in interval [7/8, 11/8] of BULK_NUM */
> > > +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> > > +
> > > +		/* reset all pointer values */
> > > +		memset(obj, 0, sizeof(obj));
> > > +
> > > +		/* dequeue num elems */
> > > +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> > > +
> > > +		/* check return value and objects */
> > > +		rc = check_ring_op(num, n, lc, __func__,
> > > +			RTE_STR(_st_ring_dequeue_bulk));
> > > +		if (rc == 0)
> > > +			rc = check_updt_elem(obj, num, &def_elm,
> > > &loc_elm);
> > > +		if (rc != 0)
> > > +			break;
> > > +
> > > +		/* enqueue num elems */
> > > +		rte_compiler_barrier();
> > > +		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
> > > +		if (rc != 0)
> > > +			break;
> > > +
> > > +		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
> > > +
> > > +		/* check return value */
> > > +		rc = check_ring_op(num, n, lc, __func__,
> > > +			RTE_STR(_st_ring_enqueue_bulk));
> > > +		if (rc != 0)
> > > +			break;
> > > +
> > > +		lcore_stat_update(&la->stats, 1, num, 0, 0);
> > > +
> > > +	} while (wrk_cmd == WRK_CMD_RUN);
> > > +
> > > +	/* final stats update */
> > > +	cl = rte_rdtsc_precise() - cl;
> > > +	lcore_stat_update(&la->stats, 0, 0, cl, 0);
> > > +	la->stats.nb_cycle = cl;
> > > +
> > > +	return rc;
> > > +}
> > Just wondering about the need of 2 tests which run the same functionality.
> The difference is the way in which numbers are collected.
> > Does 'test_worker_avg' adding any value? IMO, we can remove
> 'test_worker_avg'.
> 
> Yeh, they are quite similar.
> I added _average_ version for two reasons:
> 1. In precise I call rte_rdtsc_precise() straight before/after
>     enqueue/dequeue op.
>     At least at IA rte_rdtsc_precise()  means mb().
>     This extra sync point might hide some sync problems in the ring
>     enqueue/dequeue itself.
>     So having a separate test without such extra sync points
>     gives extra confidence that these tests would catch ring sync problems if
> any.
The functional tests should be captured separately as we need to run them on Travis CI (please see the comment above).
Do we need the precise version? I think the average version captures the data well already.
Consider merging the two functions into a single one with a parameter to capture precise or average cycles. Since such a parameter would be a constant, compiler will remove the unwanted code.

> 2. People usually don't do enqueue/dequeue on its own.
>     One of common patterns: dequeue/read-write data from the dequed
> objects/enqueue.
>     So this test measures cycles for dequeue/enqueue plus some reads/writes
>     to the objects from the ring.
Users could be doing lot of things after the ring operations, we do not know what all can happen in the application.
My concern here is, the numbers include cycles spent on things other than ring operations. For ex: we cannot compare the average numbers with precise numbers.
If we move the validation of the results to functional tests, we should be good.

> 
> > > +
> > > +static void
> > > +mt1_fini(struct rte_ring *rng, void *data) {
> > > +	rte_free(rng);
> > > +	rte_free(data);
> > > +}
> > > +
> > > +static int
> > > +mt1_init(struct rte_ring **rng, void **data, uint32_t num) {
> > > +	int32_t rc;
> > > +	size_t sz;
> > > +	uint32_t i, nr;
> > > +	struct rte_ring *r;
> > > +	struct ring_elem *elm;
> > > +	void *p;
> > > +
> > > +	*rng = NULL;
> > > +	*data = NULL;
> > > +
> > > +	sz = num * sizeof(*elm);
> > > +	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
> > > +	if (elm == NULL) {
> > > +		printf("%s: alloc(%zu) for %u elems data failed",
> > > +			__func__, sz, num);
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	*data = elm;
> > > +
> > > +	/* alloc ring */
> > > +	nr = 2 * num;
> > > +	sz = rte_ring_get_memsize(nr);
> > > +	r = rte_zmalloc(NULL, sz, __alignof__(*r));
> > > +	if (r == NULL) {
> > > +		printf("%s: alloc(%zu) for FIFO with %u elems failed",
> > > +			__func__, sz, nr);
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	*rng = r;
> > > +
> > > +	rc = _st_ring_init(r, RING_NAME, nr);
> > > +	if (rc != 0) {
> > > +		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
> > > +			__func__, r, nr, rc, strerror(-rc));
> > > +		return rc;
> > > +	}
> > > +
> > > +	for (i = 0; i != num; i++) {
> > > +		fill_ring_elm(elm + i, UINT32_MAX);
> > > +		p = elm + i;
> > > +		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
> > > +			break;
> > > +	}
> > > +
> > > +	if (i != num) {
> > > +		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
> > > +			__func__, r, num, i);
> > > +		return -ENOSPC;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int
> > > +test_mt1(int (*test)(void *))
> > > +{
> > > +	int32_t rc;
> > > +	uint32_t lc, mc;
> > > +	struct rte_ring *r;
> > > +	void *data;
> > > +	struct lcore_arg arg[RTE_MAX_LCORE];
> > > +
> > > +	static const struct lcore_stat init_stat = {
> > > +		.op.min_cycle = UINT64_MAX,
> > > +	};
> > > +
> > > +	rc = mt1_init(&r, &data, RING_SIZE);
> > > +	if (rc != 0) {
> > > +		mt1_fini(r, data);
> > > +		return rc;
> > > +	}
> > > +
> > > +	memset(arg, 0, sizeof(arg));
> > > +
> > > +	/* launch on all slaves */
> > > +	RTE_LCORE_FOREACH_SLAVE(lc) {
> > > +		arg[lc].rng = r;
> > > +		arg[lc].stats = init_stat;
> > > +		rte_eal_remote_launch(test, &arg[lc], lc);
> > > +	}
> > > +
> > > +	/* signal worker to start test */
> > > +	wrk_cmd = WRK_CMD_RUN;
> > > +	rte_smp_wmb();
> > > +
> > > +	usleep(run_time * US_PER_S);
> > > +
> > > +	/* signal worker to start test */
> > > +	wrk_cmd = WRK_CMD_STOP;
> > > +	rte_smp_wmb();
> > > +
> > > +	/* wait for slaves and collect stats. */
> > > +	mc = rte_lcore_id();
> > > +	arg[mc].stats = init_stat;
> > > +
> > > +	rc = 0;
> > > +	RTE_LCORE_FOREACH_SLAVE(lc) {
> > > +		rc |= rte_eal_wait_lcore(lc);
> > > +		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
> > > +		if (verbose != 0)
> > > +			lcore_stat_dump(stdout, lc, &arg[lc].stats);
> > > +	}
> > > +
> > > +	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
> > > +	mt1_fini(r, data);
> > > +	return rc;
> > > +}
> > > +
> > > +static const struct test_case tests[] = {
> > > +	{
> > > +		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
> > > +		.func = test_mt1,
> > > +		.wfunc = test_worker_prcs,
> > > +	},
> > > +	{
> > > +		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
> > > +		.func = test_mt1,
> > > +		.wfunc = test_worker_avg,
> > > +	},
> > > +};
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test
  2020-04-09 13:00             ` Ananyev, Konstantin
@ 2020-04-10 18:01               ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-10 18:01 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> 
> > > > +static int
> > > > +test_worker_prcs(void *arg)
> > > > +{
> > > > +	int32_t rc;
> > > > +	uint32_t lc, n, num;
> > > minor, lcore instead of lc would be better
> > >
> > > > +	uint64_t cl, tm0, tm1;
> > > > +	struct lcore_arg *la;
> > > > +	struct ring_elem def_elm, loc_elm;
> > > > +	struct ring_elem *obj[2 * BULK_NUM];
> > > > +
> > > > +	la = arg;
> > > > +	lc = rte_lcore_id();
> > > > +
> > > > +	fill_ring_elm(&def_elm, UINT32_MAX);
> > > > +	fill_ring_elm(&loc_elm, lc);
> > > > +
> > > > +	while (wrk_cmd != WRK_CMD_RUN) {
> > > > +		rte_smp_rmb();
> > > > +		rte_pause();
> > > > +	}
> > > > +
> > > > +	cl = rte_rdtsc_precise();
> > > > +
> > > > +	do {
> > > > +		/* num in interval [7/8, 11/8] of BULK_NUM */
> > > > +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> > > > +
> > > > +		/* reset all pointer values */
> > > > +		memset(obj, 0, sizeof(obj));
> > > > +
> > > > +		/* dequeue num elems */
> > > > +		tm0 = rte_rdtsc_precise();
> > > > +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> > > > +		tm0 = rte_rdtsc_precise() - tm0;
> > > > +
> > > > +		/* check return value and objects */
> > > > +		rc = check_ring_op(num, n, lc, __func__,
> > > > +			RTE_STR(_st_ring_dequeue_bulk));
> > > > +		if (rc == 0)
> > > > +			rc = check_updt_elem(obj, num, &def_elm,
> > > > &loc_elm);
> > > > +		if (rc != 0)
> > > > +			break;
> > > Since this seems like a performance test, should we skip validating the
> objects?
> 
> I think it is good to have test doing validation too.
> It shouldn't affect measurements, but brings extra confidentiality that our
> ring implementation works properly and doesn't introduce any races.
Ok, I am fine here as the cycles for validation are not counted in the cycles for ring APIs.
IMO, this test is enough and do not need the average cycles test.

> 
> > > Did these tests run on Travis CI?
> 
> AFAIK, no but people can still run it manually.
> 
> >> I believe Travis CI has trouble running stress/performance tests if they take
> too much time.
> > > The RTS and HTS tests should be added to functional tests.
> 
> Ok, I'll try to add some extra functional tests in v4.


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/9] ring: prepare ring to allow new sync schemes
  2020-04-09 13:39           ` Ananyev, Konstantin
@ 2020-04-10 20:15             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-10 20:15 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> 
> > > Subject: [PATCH v3 2/9] ring: prepare ring to allow new sync schemes
> > >
> > > Change from *single* to *sync_type* to allow different
> > > synchronisation schemes to be applied.
> > > Mark *single* as deprecated in comments.
> > > Add new functions to allow user to query ring sync types.
> > > Replace direct access to *single* with appopriate function call.
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > ---
> > >  app/test/test_pdump.c           |   6 +-
> > >  lib/librte_pdump/rte_pdump.c    |   2 +-
> > >  lib/librte_port/rte_port_ring.c |  12 ++--
> > >  lib/librte_ring/rte_ring.c      |   6 +-
> > >  lib/librte_ring/rte_ring.h      | 113 ++++++++++++++++++++++++++------
> > >  lib/librte_ring/rte_ring_elem.h |   8 +--
> > >  6 files changed, 108 insertions(+), 39 deletions(-)
> > >
> > > diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c index
> > > ad183184c..6a1180bcb 100644
> > > --- a/app/test/test_pdump.c
> > > +++ b/app/test/test_pdump.c
> > > @@ -57,8 +57,7 @@ run_pdump_client_tests(void)
> > >  	if (ret < 0)
> > >  		return -1;
> > >  	mp->flags = 0x0000;
> > > -	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
> > > -				      RING_F_SP_ENQ | RING_F_SC_DEQ);
> > > +	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
> > > +0);
> > Are you saying to get SP and SC behavior we now have to set the flags to 0?
> 
> No.
> What the original cause does:
> creates SP/SC ring:
> ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
> 				      RING_F_SP_ENQ | RING_F_SC_DEQ); Then
> manually makes it MP/MC by:
> ring_client->prod.single = 0;
> ring_client->cons.single = 0;
> 
> Instead it should just create MP/MC ring straightway, as the patch does:
> ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
> 
>  >Isn't that a ABI break?
> I don't see any.
Ack

> 
> >
> > >  	if (ring_client == NULL) {
> > >  		printf("rte_ring_create SR0 failed");
> > >  		return -1;
> > > @@ -71,9 +70,6 @@ run_pdump_client_tests(void)
> > >  	}
> > >  	rte_eth_dev_probing_finish(eth_dev);
> > >
> > > -	ring_client->prod.single = 0;
> > > -	ring_client->cons.single = 0;
> > Just wondering if users outside of DPDK have done the same. I hope not,
> otherwise, we have an API break?
> 
> I think no. While it is completely wrong practise, it would keep working even
> with these changes.
Ack

> 
> >
> > > -
> > >  	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
> > >
> > >  	for (itr = 0; itr < NUM_ITR; itr++) { diff --git
> > > a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c index
> > > 8a01ac510..65364f2c5 100644
> > > --- a/lib/librte_pdump/rte_pdump.c
> > > +++ b/lib/librte_pdump/rte_pdump.c
> > > @@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring,
> > > struct rte_mempool *mp)
> > >  		rte_errno = EINVAL;
> > >  		return -1;
> > >  	}
> > > -	if (ring->prod.single || ring->cons.single) {
> > > +	if (rte_ring_prod_single(ring) || rte_ring_cons_single(ring)) {
> > >  		PDUMP_LOG(ERR, "ring with either SP or SC settings"
> > >  		" is not valid for pdump, should have MP and MC settings\n");
> > >  		rte_errno = EINVAL;
> > > diff --git a/lib/librte_port/rte_port_ring.c
> > > b/lib/librte_port/rte_port_ring.c index 47fcdd06a..2f6c050fa 100644
> > > --- a/lib/librte_port/rte_port_ring.c
> > > +++ b/lib/librte_port/rte_port_ring.c
> > > @@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params,
> > > int socket_id,
> > >  	/* Check input parameters */
> > >  	if ((conf == NULL) ||
> > >  		(conf->ring == NULL) ||
> > > -		(conf->ring->cons.single && is_multi) ||
> > > -		(!(conf->ring->cons.single) && !is_multi)) {
> > > +		(rte_ring_cons_single(conf->ring) && is_multi) ||
> > > +		(!rte_ring_cons_single(conf->ring) && !is_multi)) {
> > >  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
> > >  		return NULL;
> > >  	}
> > > @@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void
> > > *params, int socket_id,
> > >  	/* Check input parameters */
> > >  	if ((conf == NULL) ||
> > >  		(conf->ring == NULL) ||
> > > -		(conf->ring->prod.single && is_multi) ||
> > > -		(!(conf->ring->prod.single) && !is_multi) ||
> > > +		(rte_ring_prod_single(conf->ring) && is_multi) ||
> > > +		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
> > >  		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
> > >  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
> > >  		return NULL;
> > > @@ -440,8 +440,8 @@
> rte_port_ring_writer_nodrop_create_internal(void
> > > *params, int socket_id,
> > >  	/* Check input parameters */
> > >  	if ((conf == NULL) ||
> > >  		(conf->ring == NULL) ||
> > > -		(conf->ring->prod.single && is_multi) ||
> > > -		(!(conf->ring->prod.single) && !is_multi) ||
> > > +		(rte_ring_prod_single(conf->ring) && is_multi) ||
> > > +		(!rte_ring_prod_single(conf->ring) && !is_multi) ||
> > >  		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
> > >  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
> > >  		return NULL;
> > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > index
> > > 77e5de099..fa5733907 100644
> > > --- a/lib/librte_ring/rte_ring.c
> > > +++ b/lib/librte_ring/rte_ring.c
> > > @@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char
> > > *name, unsigned count,
> > >  	if (ret < 0 || ret >= (int)sizeof(r->name))
> > >  		return -ENAMETOOLONG;
> > >  	r->flags = flags;
> > > -	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
> > > -	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
> > > +	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
> > > +		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> > > +	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
> > > +		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> > >
> > >  	if (flags & RING_F_EXACT_SZ) {
> > >  		r->size = rte_align32pow2(count + 1); diff --git
> > > a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> > > 18fc5d845..d4775a063 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -61,11 +61,27 @@ enum rte_ring_queue_behavior {  #define
> > > RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
> > >  			   sizeof(RTE_RING_MZ_PREFIX) + 1)
> > >
> > > -/* structure to hold a pair of head/tail values and other metadata
> > > */
> > > +/** prod/cons sync types */
> > > +enum rte_ring_sync_type {
> > > +	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> > > +	RTE_RING_SYNC_ST,     /**< single thread only */
> > > +};
> > > +
> > > +/**
> > > + * structure to hold a pair of head/tail values and other metadata.
> > > + * Depending on sync_type format of that structure might be
> > > +different,
> > > + * but offset for *sync_type* and *tail* values should remain the same.
> > > + */
> > >  struct rte_ring_headtail {
> > > -	volatile uint32_t head;  /**< Prod/consumer head. */
> > > -	volatile uint32_t tail;  /**< Prod/consumer tail. */
> > > -	uint32_t single;         /**< True if single prod/cons */
> > > +	volatile uint32_t head;      /**< prod/consumer head. */
> > > +	volatile uint32_t tail;      /**< prod/consumer tail. */
> > > +	RTE_STD_C11
> > > +	union {
> > > +		/** sync type of prod/cons */
> > > +		enum rte_ring_sync_type sync_type;
> > > +		/** deprecated -  True if single prod/cons */
> > > +		uint32_t single;
> > > +	};
> > >  };
> > >
> > >  /**
> > > @@ -116,11 +132,10 @@ struct rte_ring {  #define RING_F_EXACT_SZ
> > > 0x0004  #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask
> > > */
> > >
> > > -/* @internal defines for passing to the enqueue dequeue worker
> > > functions */ -#define __IS_SP 1 -#define __IS_MP 0 -#define __IS_SC
> > > 1 -#define __IS_MC 0
> > > +#define __IS_SP RTE_RING_SYNC_ST
> > > +#define __IS_MP RTE_RING_SYNC_MT
> > > +#define __IS_SC RTE_RING_SYNC_ST
> > > +#define __IS_MC RTE_RING_SYNC_MT
> > I think we can remove these #defines and use the new SYNC types
> 
> Wouldn't that introduce an API breakage?
> Or we are ok here, as they are marked as internal?
> I think I can for sure mark them as deprecated.
I think they are internal. 
Although it does not apply to your patch, rte_ring_queue_behavior also should be internal.

> 
> > >
> > >  /**
> > >   * Calculate the memory size needed for a ring @@ -420,7 +435,7 @@
> > > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > >  	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			__IS_MP, free_space);
> > > +			RTE_RING_SYNC_MT, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -443,7 +458,7 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r,
> > > void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > >  	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			__IS_SP, free_space);
> > > +			RTE_RING_SYNC_ST, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -470,7 +485,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> > > const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > >  	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			r->prod.single, free_space);
> > > +			r->prod.sync_type, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -554,7 +569,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r,
> > > void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > >  	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			__IS_MC, available);
> > > +			RTE_RING_SYNC_MT, available);
> > >  }
> > >
> > >  /**
> > > @@ -578,7 +593,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r,
> > > void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > >  	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			__IS_SC, available);
> > > +			RTE_RING_SYNC_ST, available);
> > >  }
> > >
> > >  /**
> > > @@ -605,7 +620,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > > **obj_table, unsigned int n,
> > >  		unsigned int *available)
> > >  {
> > >  	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -				r->cons.single, available);
> > > +				r->cons.sync_type, available);
> > >  }
> > >
> > >  /**
> > > @@ -777,6 +792,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
> > >  	return r->capacity;
> > >  }
> > >
> > > +/**
> > > + * Return sync type used by producer in the ring.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   Producer sync type value.
> > > + */
> > > +static inline enum rte_ring_sync_type
> > > +rte_ring_get_prod_sync_type(const struct rte_ring *r) {
> > > +	return r->prod.sync_type;
> > > +}
> > > +
> > > +/**
> > > + * Check is the ring for single producer.
> >                      ^^ if
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   true if ring is SP, zero otherwise.
> > > + */
> > > +static inline int
> > > +rte_ring_prod_single(const struct rte_ring *r) {
> > would rte_ring_is_prod_single better?
> 
> Ok, can rename.
> 
> >
> > > +	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST); }
> > > +
> > > +/**
> > > + * Return sync type used by consumer in the ring.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   Consumer sync type value.
> > > + */
> > > +static inline enum rte_ring_sync_type
> > > +rte_ring_get_cons_sync_type(const struct rte_ring *r) {
> > > +	return r->cons.sync_type;
> > > +}
> > > +
> > > +/**
> > > + * Check is the ring for single consumer.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @return
> > > + *   true if ring is SC, zero otherwise.
> > > + */
> > > +static inline int
> > > +rte_ring_cons_single(const struct rte_ring *r) {
> > > +	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST); }
> > > +
> > All these new functions are  not required to be called in the data path. They
> can be made non-inline.
> 
> Well, all these functions are introduced to encourage people not to access
> ring fields sync_type/single directly but use functions instead.
> I don't know do people access ring.single directly at data-path or not, but
> assuming that they do - making these functions not-inline would force them
> to ignore these functions and keep accessing it directly.
> That was my thoughts besides making them inline.
> I think we have the same for get_size/get_capacity().
Ack 

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-09 14:52           ` Ananyev, Konstantin
@ 2020-04-10 23:10             ` Honnappa Nagarahalli
  2020-04-13 14:29               ` David Marchand
  2020-04-14 13:18               ` Ananyev, Konstantin
  0 siblings, 2 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-10 23:10 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: RE: [PATCH v3 3/9] ring: introduce RTS ring mode
> 
> > > Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
> > > Aim to reduce stall times in case when ring is used on overcommited
> > > cpus (multiple active threads on the same cpu).
> > > The main difference from original MP/MC algorithm is that tail value
> > > is increased not by every thread that finished enqueue/dequeue, but
> > > only by the last one.
> > > That allows threads to avoid spinning on ring tail value, leaving
> > > actual tail value change to the last thread in the update queue.
> > >
> > > check-abi.sh reports what I believe is a false-positive about ring
> > > cons/prod changes. As a workaround, devtools/libabigail.abignore is
> > > updated to suppress *struct ring* related errors.
> > This can be removed from the commit message.
> >
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > ---
> > >  devtools/libabigail.abignore           |   7 +
> > >  lib/librte_ring/Makefile               |   5 +-
> > >  lib/librte_ring/meson.build            |   5 +-
> > >  lib/librte_ring/rte_ring.c             | 100 +++++++-
> > >  lib/librte_ring/rte_ring.h             | 110 ++++++++-
> > >  lib/librte_ring/rte_ring_elem.h        |  86 ++++++-
> > >  lib/librte_ring/rte_ring_rts.h         | 316 +++++++++++++++++++++++++
> > >  lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++++++
> > >  lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++
> > >  9 files changed, 1015 insertions(+), 29 deletions(-)  create mode
> > > 100644 lib/librte_ring/rte_ring_rts.h  create mode 100644
> > > lib/librte_ring/rte_ring_rts_elem.h
> > >  create mode 100644 lib/librte_ring/rte_ring_rts_generic.h
> > >
> > > diff --git a/devtools/libabigail.abignore
> > > b/devtools/libabigail.abignore index a59df8f13..cd86d89ca 100644
> > > --- a/devtools/libabigail.abignore
> > > +++ b/devtools/libabigail.abignore
> > > @@ -11,3 +11,10 @@
> > >          type_kind = enum
> > >          name = rte_crypto_asym_xform_type
> > >          changed_enumerators =
> RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> > > +; Ignore updates of ring prod/cons
> > > +[suppress_type]
> > > +        type_kind = struct
> > > +        name = rte_ring
> > > +[suppress_type]
> > > +        type_kind = struct
> > > +        name = rte_event_ring
> > Does this block the reporting of these structures forever?
> 
> Till we'll have a fix in libabigail, then we can remove these lines.
> I don't know any better alternative.
David, does this block all issues in the future for rte_ring library?

> 
> >
> > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > > index 917c560ad..8f5c284cc 100644
> > > --- a/lib/librte_ring/Makefile
> > > +++ b/lib/librte_ring/Makefile
> > > @@ -18,6 +18,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> > > SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
> > >  					rte_ring_elem.h \
> > >  					rte_ring_generic.h \
> > > -					rte_ring_c11_mem.h
> > > +					rte_ring_c11_mem.h \
> > > +					rte_ring_rts.h \
> > > +					rte_ring_rts_elem.h \
> > > +					rte_ring_rts_generic.h
> > >
> > >  include $(RTE_SDK)/mk/rte.lib.mk
> > > diff --git a/lib/librte_ring/meson.build
> > > b/lib/librte_ring/meson.build index f2f3ccc88..612936afb 100644
> > > --- a/lib/librte_ring/meson.build
> > > +++ b/lib/librte_ring/meson.build
> > > @@ -5,7 +5,10 @@ sources = files('rte_ring.c')  headers = files('rte_ring.h',
> > >  		'rte_ring_elem.h',
> > >  		'rte_ring_c11_mem.h',
> > > -		'rte_ring_generic.h')
> > > +		'rte_ring_generic.h',
> > > +		'rte_ring_rts.h',
> > > +		'rte_ring_rts_elem.h',
> > > +		'rte_ring_rts_generic.h')
> > >
> > >  # rte_ring_create_elem and rte_ring_get_memsize_elem are
> > > experimental allow_experimental_apis = true diff --git
> > > a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> > > fa5733907..222eec0fb 100644
> > > --- a/lib/librte_ring/rte_ring.c
> > > +++ b/lib/librte_ring/rte_ring.c
> > > @@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
> > >  /* true if x is a power of 2 */
> > >  #define POWEROF2(x) ((((x)-1) & (x)) == 0)
> > >
> > > +/* by default set head/tail distance as 1/8 of ring capacity */
> > > +#define HTD_MAX_DEF	8
> > > +
> > >  /* return the size of memory occupied by a ring */  ssize_t
> > > rte_ring_get_memsize_elem(unsigned int esize, unsigned int count) @@
> > > -
> > > 79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
> > >  	return rte_ring_get_memsize_elem(sizeof(void *), count);  }
> > >
> > > +/*
> > > + * internal helper function to reset prod/cons head-tail values.
> > > + */
> > > +static void
> > > +reset_headtail(void *p)
> > > +{
> > > +	struct rte_ring_headtail *ht;
> > > +	struct rte_ring_rts_headtail *ht_rts;
> > > +
> > > +	ht = p;
> > > +	ht_rts = p;
> > > +
> > > +	switch (ht->sync_type) {
> > > +	case RTE_RING_SYNC_MT:
> > > +	case RTE_RING_SYNC_ST:
> > > +		ht->head = 0;
> > > +		ht->tail = 0;
> > > +		break;
> > > +	case RTE_RING_SYNC_MT_RTS:
> > > +		ht_rts->head.raw = 0;
> > > +		ht_rts->tail.raw = 0;
> > > +		break;
> > > +	default:
> > > +		/* unknown sync mode */
> > > +		RTE_ASSERT(0);
> > > +	}
> > > +}
> > > +
> > >  void
> > >  rte_ring_reset(struct rte_ring *r)
> > >  {
> > > -	r->prod.head = r->cons.head = 0;
> > > -	r->prod.tail = r->cons.tail = 0;
> > > +	reset_headtail(&r->prod);
> > > +	reset_headtail(&r->cons);
> > > +}
> > > +
> > > +/*
> > > + * helper function, calculates sync_type values for prod and cons
> > > + * based on input flags. Returns zero at success or negative
> > > + * errno value otherwise.
> > > + */
> > > +static int
> > > +get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
> > > +	enum rte_ring_sync_type *cons_st)
> > > +{
> > > +	static const uint32_t prod_st_flags =
> > > +		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
> > > +	static const uint32_t cons_st_flags =
> > > +		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
> > > +
> > > +	switch (flags & prod_st_flags) {
> > > +	case 0:
> > > +		*prod_st = RTE_RING_SYNC_MT;
> > > +		break;
> > > +	case RING_F_SP_ENQ:
> > > +		*prod_st = RTE_RING_SYNC_ST;
> > > +		break;
> > > +	case RING_F_MP_RTS_ENQ:
> > > +		*prod_st = RTE_RING_SYNC_MT_RTS;
> > > +		break;
> > > +	default:
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	switch (flags & cons_st_flags) {
> > > +	case 0:
> > > +		*cons_st = RTE_RING_SYNC_MT;
> > > +		break;
> > > +	case RING_F_SC_DEQ:
> > > +		*cons_st = RTE_RING_SYNC_ST;
> > > +		break;
> > > +	case RING_F_MC_RTS_DEQ:
> > > +		*cons_st = RTE_RING_SYNC_MT_RTS;
> > > +		break;
> > > +	default:
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	return 0;
> > >  }
> > >
> > >  int
> > > @@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char
> > > *name, unsigned count,
> > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > >  			  RTE_CACHE_LINE_MASK) != 0);
> > >
> > > +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
> > > +		offsetof(struct rte_ring_rts_headtail, sync_type));
> > > +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
> > > +		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
> > > +
> > >  	/* init the ring structure */
> > >  	memset(r, 0, sizeof(*r));
> > >  	ret = strlcpy(r->name, name, sizeof(r->name));
> > >  	if (ret < 0 || ret >= (int)sizeof(r->name))
> > >  		return -ENAMETOOLONG;
> > >  	r->flags = flags;
> > > -	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
> > > -		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> > > -	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
> > > -		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> > > +	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
> > > +	if (ret != 0)
> > > +		return ret;
> > >
> > >  	if (flags & RING_F_EXACT_SZ) {
> > >  		r->size = rte_align32pow2(count + 1); @@ -126,8 +206,12
> @@
> > > rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > >  		r->mask = count - 1;
> > >  		r->capacity = r->mask;
> > >  	}
> > > -	r->prod.head = r->cons.head = 0;
> > > -	r->prod.tail = r->cons.tail = 0;
> > > +
> > > +	/* set default values for head-tail distance */
> > > +	if (flags & RING_F_MP_RTS_ENQ)
> > > +		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
> > > +	if (flags & RING_F_MC_RTS_DEQ)
> > > +		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
> > >
> > >  	return 0;
> > >  }
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index
> > > d4775a063..f6f084d79 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -48,6 +48,7 @@ extern "C" {
> > >  #include <rte_branch_prediction.h>
> > >  #include <rte_memzone.h>
> > >  #include <rte_pause.h>
> > > +#include <rte_debug.h>
> > >
> > >  #define RTE_TAILQ_RING_NAME "RTE_RING"
> > >
> > > @@ -65,10 +66,13 @@ enum rte_ring_queue_behavior {  enum
> > > rte_ring_sync_type {
> > >  	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> > >  	RTE_RING_SYNC_ST,     /**< single thread only */
> > > +#ifdef ALLOW_EXPERIMENTAL_API
> > > +	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
> > > #endif
> > >  };
> > >
> > >  /**
> > > - * structure to hold a pair of head/tail values and other metadata.
> > > + * structures to hold a pair of head/tail values and other metadata.
> > >   * Depending on sync_type format of that structure might be different,
> > >   * but offset for *sync_type* and *tail* values should remain the same.
> > >   */
> > > @@ -84,6 +88,21 @@ struct rte_ring_headtail {
> > >  	};
> > >  };
> > >
> > > +union rte_ring_ht_poscnt {
> > nit, this is specific to RTS, may be change this to rte_ring_rts_ht_poscnt?
> 
> Ok.
> 
> >
> > > +	uint64_t raw;
> > > +	struct {
> > > +		uint32_t cnt; /**< head/tail reference counter */
> > > +		uint32_t pos; /**< head/tail position */
> > > +	} val;
> > > +};
> > > +
> > > +struct rte_ring_rts_headtail {
> > > +	volatile union rte_ring_ht_poscnt tail;
> > > +	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
> > > +	uint32_t htd_max;   /**< max allowed distance between head/tail */
> > > +	volatile union rte_ring_ht_poscnt head; };
> > > +
> > >  /**
> > >   * An RTE ring structure.
> > >   *
> > > @@ -111,11 +130,21 @@ struct rte_ring {
> > >  	char pad0 __rte_cache_aligned; /**< empty cache line */
> > >
> > >  	/** Ring producer status. */
> > > -	struct rte_ring_headtail prod __rte_cache_aligned;
> > > +	RTE_STD_C11
> > > +	union {
> > > +		struct rte_ring_headtail prod;
> > > +		struct rte_ring_rts_headtail rts_prod;
> > > +	}  __rte_cache_aligned;
> > > +
> > >  	char pad1 __rte_cache_aligned; /**< empty cache line */
> > >
> > >  	/** Ring consumer status. */
> > > -	struct rte_ring_headtail cons __rte_cache_aligned;
> > > +	RTE_STD_C11
> > > +	union {
> > > +		struct rte_ring_headtail cons;
> > > +		struct rte_ring_rts_headtail rts_cons;
> > > +	}  __rte_cache_aligned;
> > > +
> > >  	char pad2 __rte_cache_aligned; /**< empty cache line */  };
> > >
> > > @@ -132,6 +161,9 @@ struct rte_ring {  #define RING_F_EXACT_SZ
> > > 0x0004  #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask
> > > */
> > >
> > > +#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP
> RTS".
> > > +*/ #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is
> "MC
> > > +RTS". */
> > > +
> > >  #define __IS_SP RTE_RING_SYNC_ST
> > >  #define __IS_MP RTE_RING_SYNC_MT
> > >  #define __IS_SC RTE_RING_SYNC_ST
> > > @@ -461,6 +493,10 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r,
> > > void * const *obj_table,
> > >  			RTE_RING_SYNC_ST, free_space);
> > >  }
> > >
> > > +#ifdef ALLOW_EXPERIMENTAL_API
> > > +#include <rte_ring_rts.h>
> > > +#endif
> > > +
> > >  /**
> > >   * Enqueue several objects on a ring.
> > >   *
> > > @@ -484,8 +520,21 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			r->prod.sync_type, free_space);
> > > +	switch (r->prod.sync_type) {
> > > +	case RTE_RING_SYNC_MT:
> > > +		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
> > > +	case RTE_RING_SYNC_ST:
> > > +		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
> > Have you validated if these affect the performance for the existing APIs?
> 
> I run ring_pmd_perf_autotest
> (AFAIK, that's the only one of our perf tests that calls
> rte_ring_enqueue/dequeue), and didn't see any real difference in perf
> numbers.
> 
> > I am also wondering why should we support these new modes in the legacy
> APIs?
> 
> Majority of DPDK users still do use legacy API, and I am not sure all of them
> will be happy to switch to _elem_ one manually.
> Plus I can't see how we can justify that after let say:
> rte_ring_init(ring, ..., RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ); returns
> with success valid call to rte_ring_enqueue(ring,...) should fail.
Agree, I think the only way right now is through documentation.

> 
> > I think users should move to use rte_ring_xxx_elem APIs. If users want to
> use RTS/HTS it will be a good time for them to move to new APIs.
> 
> If they use rte_ring_enqueue/dequeue all they have to do - just change flags
> in ring_create/ring_init call.
> With what you suggest - they have to change every
> rte_ring_enqueue/dequeue to rte_ring_elem_enqueue/dequeue.
> That's much bigger code churn.
But these are just 1 to 1 mapped.  I would think, there are not a whole lot of them in the application code, may be ~10 lines?
I think the bigger factor for the user here is the algorithm changes in rte_ring library. Bigger effort for the users would be testing rather than code changes in the applications. 

> 
> > They anyway have to test their code for RTS/HTS, might as well make the
> change to new APIs and test both.
> > It will be less code to maintain for the community as well.
> 
> That's true, right now there is a lot of duplication between _elem_ and legacy
> code.
>  Actually the only real diff between them - actual copying of the objects.
>  But I thought we are going to deal with that, just by changing one day all
> legacy API to wrappers around _elem_ calls, i.e something like:
> 
> static __rte_always_inline unsigned int
> rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>                       unsigned int n, unsigned int *free_space) {
> 	return  rte_ring_enqueue_elem_bulk(r, obj_table, sizeof(uintptr_t), n,
> free_space); }
> 
> That way users will switch to new API automatically, without any extra effort
> for them, and we will be able to remove legacy code.
> Do you have some other thoughts here how to deal with this legacy/elem
> conversion?
Yes, that is what I was thinking, but had not considered any addition of new APIs.
But, I am wondering if we should look at deprecation? If we decide to deprecate, it would be good to avoid making the users of RTS/HTS do the work twice (once to make the switch to RTS/HTS and then another to _elem_ APIs).

One thing we can do is to implement the wrappers you mentioned above for RTS/HTS now. I also it is worth considering to switch to these wrappers 20.05 so that come 20.11, we have a code base that has gone through couple of releases' testing.
 
> 
> >
> > > #ifdef
> > > +ALLOW_EXPERIMENTAL_API
> > > +	case RTE_RING_SYNC_MT_RTS:
> > > +		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > > +			free_space);
> > > +#endif
> > > +	}
> > > +
> > > +	/* valid ring should never reach this point */
> > > +	RTE_ASSERT(0);
> > > +	return 0;
> > >  }
> > >

<snip>

> > >
> > >  #ifdef __cplusplus
> > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > b/lib/librte_ring/rte_ring_elem.h index 28f9836e6..5de0850dc 100644
> > > --- a/lib/librte_ring/rte_ring_elem.h
> > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > @@ -542,6 +542,8 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring
> > > *r, const void *obj_table,
> > >  			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);  }
> > >
> > > +#include <rte_ring_rts_elem.h>

<snip>

> > >
> > >  #ifdef __cplusplus
> > > diff --git a/lib/librte_ring/rte_ring_rts.h
> > > b/lib/librte_ring/rte_ring_rts.h new file mode 100644 index
> > > 000000000..18404fe48
> > > --- /dev/null
> > > +++ b/lib/librte_ring/rte_ring_rts.h
> > IMO, we should not provide these APIs.
> 
> You mean only _elem_ ones, as discussed above?
Yes

> 
> >
> > > @@ -0,0 +1,316 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + *
> > > + * Copyright (c) 2010-2017 Intel Corporation
> > nit, the year should change to 2020? Look at others too.
> 
> ack, will do.
> 
> >
> > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > + * All rights reserved.
> > > + * Derived from FreeBSD's bufring.h
> > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > + */
> > > +
> > > +#ifndef _RTE_RING_RTS_H_
> > > +#define _RTE_RING_RTS_H_
> > > +
> > > +/**
> > > + * @file rte_ring_rts.h
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + * It is not recommended to include this file directly.
> > > + * Please include <rte_ring.h> instead.
> > > + *
> > > + * Contains functions for Relaxed Tail Sync (RTS) ring mode.
> > > + * The main idea remains the same as for our original MP/MC
> >
> > ^^^ the
> > > +synchronization
> > > + * mechanism.
> > > + * The main difference is that tail value is increased not
> > > + * by every thread that finished enqueue/dequeue,
> > > + * but only by the last one doing enqueue/dequeue.
> > should we say 'current last' or 'last thread at a given instance'?
> >
> > > + * That allows threads to skip spinning on tail value,
> > > + * leaving actual tail value change to last thread in the update queue.
> > nit, I understand what you mean by 'update queue' here. IMO, we should
> remove it as it might confuse some.
> >
> > > + * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
> > > + * one for head update, second for tail update.
> > > + * As a gain it allows thread to avoid spinning/waiting on tail value.
> > > + * In comparision original MP/MC algorithm requires one 32-bit CAS
> > > + * for head update and waiting/spinning on tail value.
> > > + *
> > > + * Brief outline:
> > > + *  - introduce refcnt for both head and tail.
> > Suggesting using the same names as used in the structures.
> >
> > > + *  - increment head.refcnt for each head.value update
> > > + *  - write head:value and head:refcnt atomically (64-bit CAS)
> > > + *  - move tail.value ahead only when tail.refcnt + 1 ==
> > > + head.refcnt
> > May be add '(indicating that this is the last thread updating the tail)'
> >
> > > + *  - increment tail.refcnt when each enqueue/dequeue op finishes
> > May be add 'otherwise' at the beginning.
> >
> > > + *    (no matter is tail:value going to change or not)
> > nit                            ^^ if
> > > + *  - write tail.value and tail.recnt atomically (64-bit CAS)
> > > + *
> > > + * To avoid producer/consumer starvation:
> > > + *  - limit max allowed distance between head and tail value (HTD_MAX).
> > > + *    I.E. thread is allowed to proceed with changing head.value,
> > > + *    only when:  head.value - tail.value <= HTD_MAX
> > > + * HTD_MAX is an optional parameter.
> > > + * With HTD_MAX == 0 we'll have fully serialized ring -
> > > + * i.e. only one thread at a time will be able to enqueue/dequeue
> > > + * to/from the ring.
> > > + * With HTD_MAX >= ring.capacity - no limitation.
> > > + * By default HTD_MAX == ring.capacity / 8.
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include <rte_ring_rts_generic.h>
> > > +

<snip>

> > > +
> > > +#endif /* _RTE_RING_RTS_H_ */
> > > diff --git a/lib/librte_ring/rte_ring_rts_elem.h
> > > b/lib/librte_ring/rte_ring_rts_elem.h
> > > new file mode 100644
> > > index 000000000..71a331b23
> > > --- /dev/null
> > > +++ b/lib/librte_ring/rte_ring_rts_elem.h
> > > @@ -0,0 +1,205 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + *
> > > + * Copyright (c) 2010-2017 Intel Corporation
> > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > + * All rights reserved.
> > > + * Derived from FreeBSD's bufring.h
> > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > + */
> > > +
> > > +#ifndef _RTE_RING_RTS_ELEM_H_
> > > +#define _RTE_RING_RTS_ELEM_H_
> > > +
> > > +/**
> > > + * @file rte_ring_rts_elem.h
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * It is not recommended to include this file directly.
> > > + * Please include <rte_ring_elem.h> instead.
> > > + * Contains *ring_elem* functions for Relaxed Tail Sync (RTS) ring mode.
> > > + * for more details please refer to <rte_ring_rts.h>.
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include <rte_ring_rts_generic.h>
> > > +
> > > +/**
> > > + * @internal Enqueue several objects on the RTS ring.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param obj_table
> > > + *   A pointer to a table of void * pointers (objects).
> > > + * @param n
> > > + *   The number of objects to add in the ring from the obj_table.
> > > + * @param behavior
> > > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> ring
> > > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> from
> > > ring
> > > + * @param free_space
> > > + *   returns the amount of space after the enqueue operation has finished
> > > + * @return
> > > + *   Actual number of objects enqueued.
> > > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > + */
> > > +static __rte_always_inline unsigned int
> > > +__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, void * const
> > > +*obj_table,
> > obj_table should be of type 'const void * obj_table' (looks like copy paste
> error). Please check the other APIs below too.
> >
> > > +	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
> > 'esize' is not documented in the comments above the function. You can
> > copy the header from rte_ring_elem.h file. Please check other APIs as well.
> 
> Ack to both, will fix.
> 
> >
> > > +	uint32_t *free_space)
> > > +{
> > > +	uint32_t free, head;
> > > +
> > > +	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
> > > +
> > > +	if (n != 0) {
> > > +		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
> > > +		__rte_ring_rts_update_tail(&r->rts_prod);
> > > +	}
> > > +
> > > +	if (free_space != NULL)
> > > +		*free_space = free - n;
> > > +	return n;
> > > +}
> > > +

<snip>

> > > +
> > > +#endif /* _RTE_RING_RTS_ELEM_H_ */
> > > diff --git a/lib/librte_ring/rte_ring_rts_generic.h
> > > b/lib/librte_ring/rte_ring_rts_generic.h
> > > new file mode 100644
> > > index 000000000..f88460d47
> > > --- /dev/null
> > > +++ b/lib/librte_ring/rte_ring_rts_generic.h
> > I do not know the benefit to providing the generic version. Do you know
> why this was done in the legacy APIs?
> 
> I think at first we had generic API only, then later C11 was added.
> As I remember, C11 one on IA was measured as a bit slower then generic,
> so it was decided to keep both.
> 
> > If there is no performance difference between generic and C11 versions,
> should we just skip the generic version?
> > The oldest compiler in CI are GCC 4.8.5 and Clang 3.4.2 and C11 built-ins
> are supported earlier than these compiler versions.
> > I feel the code is growing exponentially in rte_ring library and we should try
> to cut non-value-ass code/APIs aggressively.
> 
> I'll check is there perf difference for RTS and HTS between generic and C11
> versions on IA.
> Meanwhile please have a proper look at C11 implementation, I am not that
> familiar with C11 atomics yet.
ok

> If there would be no problems with it and no noticeable diff in performance -
> I am ok to have for RTS/HTS modes C11 version only.
> 
> >
> > > @@ -0,0 +1,210 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + *
> > > + * Copyright (c) 2010-2017 Intel Corporation
> > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > + * All rights reserved.
> > > + * Derived from FreeBSD's bufring.h
> > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > + */
> > > +
> > > +#ifndef _RTE_RING_RTS_GENERIC_H_
> > > +#define _RTE_RING_RTS_GENERIC_H_
> > > +
> > > +/**
> > > + * @file rte_ring_rts_generic.h
> > > + * It is not recommended to include this file directly,
> > > + * include <rte_ring.h> instead.
> > > + * Contains internal helper functions for Relaxed Tail Sync (RTS) ring
> mode.
> > > + * For more information please refer to <rte_ring_rts.h>.
> > > + */

<snip>


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-10 23:10             ` Honnappa Nagarahalli
@ 2020-04-13 14:29               ` David Marchand
  2020-04-13 16:42                 ` Honnappa Nagarahalli
  2020-04-14 13:18               ` Ananyev, Konstantin
  1 sibling, 1 reply; 146+ messages in thread
From: David Marchand @ 2020-04-13 14:29 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Ananyev, Konstantin, dev, jielong.zjl, nd, Kinsella, Ray,
	Thomas Monjalon, Jerin Jacob Kollanukkaran

On Sat, Apr 11, 2020 at 1:10 AM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> <snip>
>
> > Subject: RE: [PATCH v3 3/9] ring: introduce RTS ring mode
> >
> > > > Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
> > > > Aim to reduce stall times in case when ring is used on overcommited
> > > > cpus (multiple active threads on the same cpu).
> > > > The main difference from original MP/MC algorithm is that tail value
> > > > is increased not by every thread that finished enqueue/dequeue, but
> > > > only by the last one.
> > > > That allows threads to avoid spinning on ring tail value, leaving
> > > > actual tail value change to the last thread in the update queue.
> > > >
> > > > check-abi.sh reports what I believe is a false-positive about ring
> > > > cons/prod changes. As a workaround, devtools/libabigail.abignore is
> > > > updated to suppress *struct ring* related errors.
> > > This can be removed from the commit message.
> > >
> > > >
> > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > ---
> > > >  devtools/libabigail.abignore           |   7 +
> > > >  lib/librte_ring/Makefile               |   5 +-
> > > >  lib/librte_ring/meson.build            |   5 +-
> > > >  lib/librte_ring/rte_ring.c             | 100 +++++++-
> > > >  lib/librte_ring/rte_ring.h             | 110 ++++++++-
> > > >  lib/librte_ring/rte_ring_elem.h        |  86 ++++++-
> > > >  lib/librte_ring/rte_ring_rts.h         | 316 +++++++++++++++++++++++++
> > > >  lib/librte_ring/rte_ring_rts_elem.h    | 205 ++++++++++++++++
> > > >  lib/librte_ring/rte_ring_rts_generic.h | 210 ++++++++++++++++
> > > >  9 files changed, 1015 insertions(+), 29 deletions(-)  create mode
> > > > 100644 lib/librte_ring/rte_ring_rts.h  create mode 100644
> > > > lib/librte_ring/rte_ring_rts_elem.h
> > > >  create mode 100644 lib/librte_ring/rte_ring_rts_generic.h
> > > >
> > > > diff --git a/devtools/libabigail.abignore
> > > > b/devtools/libabigail.abignore index a59df8f13..cd86d89ca 100644
> > > > --- a/devtools/libabigail.abignore
> > > > +++ b/devtools/libabigail.abignore
> > > > @@ -11,3 +11,10 @@
> > > >          type_kind = enum
> > > >          name = rte_crypto_asym_xform_type
> > > >          changed_enumerators =
> > RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> > > > +; Ignore updates of ring prod/cons
> > > > +[suppress_type]
> > > > +        type_kind = struct
> > > > +        name = rte_ring
> > > > +[suppress_type]
> > > > +        type_kind = struct
> > > > +        name = rte_event_ring
> > > Does this block the reporting of these structures forever?
> >
> > Till we'll have a fix in libabigail, then we can remove these lines.
> > I don't know any better alternative.
> David, does this block all issues in the future for rte_ring library?

These two "suppression rules" make libabigail consider as harmless any
change on the structures rte_ring and rte_event_ring.
With those suppression rules, you won't get any complaint from
libabigail (if this is what you call issues :-)).

Reviews on those structures must be extra careful, as we are blind
with those rules in place.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-13 14:29               ` David Marchand
@ 2020-04-13 16:42                 ` Honnappa Nagarahalli
  2020-04-14 13:47                   ` David Marchand
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-13 16:42 UTC (permalink / raw)
  To: David Marchand
  Cc: Ananyev, Konstantin, dev, jielong.zjl, nd, Kinsella, Ray, thomas,
	jerinj, Honnappa Nagarahalli, nd

<snip>

> > > > >
> > > > > diff --git a/devtools/libabigail.abignore
> > > > > b/devtools/libabigail.abignore index a59df8f13..cd86d89ca 100644
> > > > > --- a/devtools/libabigail.abignore
> > > > > +++ b/devtools/libabigail.abignore
> > > > > @@ -11,3 +11,10 @@
> > > > >          type_kind = enum
> > > > >          name = rte_crypto_asym_xform_type
> > > > >          changed_enumerators =
> > > RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> > > > > +; Ignore updates of ring prod/cons [suppress_type]
> > > > > +        type_kind = struct
> > > > > +        name = rte_ring
> > > > > +[suppress_type]
> > > > > +        type_kind = struct
> > > > > +        name = rte_event_ring
> > > > Does this block the reporting of these structures forever?
> > >
> > > Till we'll have a fix in libabigail, then we can remove these lines.
> > > I don't know any better alternative.
> > David, does this block all issues in the future for rte_ring library?
> 
> These two "suppression rules" make libabigail consider as harmless any
> change on the structures rte_ring and rte_event_ring.
> With those suppression rules, you won't get any complaint from libabigail (if
> this is what you call issues :-)).
> 
> Reviews on those structures must be extra careful, as we are blind with those
> rules in place.
Yes, this is my concern. Why not remove these fixes and ignore the errors manually (i.e. merge the patches knowing that they are false errors) from libabigail? Do you know if libabigail will fix these in the future?

> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/9] ring: introduce HTS ring mode
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 5/9] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-04-13 23:27         ` Honnappa Nagarahalli
  2020-04-14 16:12           ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-13 23:27 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

Hi Konstantin,
	Few nits/comments inline.

<snip>

> diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h new
> file mode 100644 index 000000000..062d7be6c
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_hts.h
> @@ -0,0 +1,210 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_HTS_H_
> +#define _RTE_RING_HTS_H_
> +
> +/**
> + * @file rte_ring_hts.h
> + * @b EXPERIMENTAL: this API may change without prior notice
> + * It is not recommended to include this file directly.
> + * Please include <rte_ring.h> instead.
> + *
> + * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
> + * In that mode enqueue/dequeue operation is fully serialized:
> + * at any given moement only one enqueue/dequeue operation can proceed.
                                 ^^^^^^^^ moment
> + * This is achieved by thread is allowed to proceed with changing
                                            ^^^^^^^^^^^^^^ allowing a thread
> +head.value
> + * only when head.value == tail.value.
> + * Both head and tail values are updated atomically (as one 64-bit value).
> + * To achieve that 64-bit CAS is used by head update routine.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_ring_hts_generic.h>
> +

<snip>

> diff --git a/lib/librte_ring/rte_ring_hts_generic.h
> b/lib/librte_ring/rte_ring_hts_generic.h
> new file mode 100644
> index 000000000..da08f1d94
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_hts_generic.h
> @@ -0,0 +1,198 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2020 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_HTS_GENERIC_H_
> +#define _RTE_RING_HTS_GENERIC_H_
> +
> +/**
> + * @file rte_ring_hts_generic.h
> + * It is not recommended to include this file directly,
> + * include <rte_ring.h> instead.
> + * Contains internal helper functions for head/tail sync (HTS) ring mode.
> + * For more information please refer to <rte_ring_hts.h>.
> + */
> +
> +static __rte_always_inline void
> +__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> +	uint32_t enqueue)
> +{
> +	union rte_ring_ht_pos p;
> +
> +	if (enqueue)
> +		rte_smp_wmb();
> +	else
> +		rte_smp_rmb();
> +
> +	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht-
> >ht.raw);
This read can be avoided if the new head can be returned from '__rte_ring_hts_head_wait'.

> +
> +	p.pos.head = p.pos.tail + num;
> +	p.pos.tail = p.pos.head;
> +
> +	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw); }
Why not use 32b atomic operation here and update just the tail?

> +
> +/**
> + * @internal waits till tail will become equal to head.
> + * Means no writer/reader is active for that ring.
> + * Suppose to work as serialization point.
> + */
> +static __rte_always_inline void
> +__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
> +		union rte_ring_ht_pos *p)
> +{
> +	p->raw = rte_atomic64_read((rte_atomic64_t *)
> +			(uintptr_t)&ht->ht.raw);
> +
> +	while (p->pos.head != p->pos.tail) {
> +		rte_pause();
> +		p->raw = rte_atomic64_read((rte_atomic64_t *)
> +				(uintptr_t)&ht->ht.raw);
> +	}
> +}
> +
> +/**
> + * @internal This function updates the producer head for enqueue
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sp
> + *   Indicates whether multi-producer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where enqueue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where enqueue finishes
Would be good to return the new_head from this function and use it in '__rte_ring_hts_update_tail'.

> + * @param free_entries
> + *   Returns the amount of free space in the ring BEFORE head was moved
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
Minor, suggest removing the elaborate comments, it is not required and difficult to maintain.
I think we should do the same thing for other files too.

> +static __rte_always_inline unsigned int
> +__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *free_entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos np, op;
> +
> +	const uint32_t capacity = r->capacity;
> +
> +	do {
> +		/* Reset n to the initial burst count */
> +		n = num;
> +
> +		/* wait for tail to be equal to head */
> +		__rte_ring_hts_head_wait(&r->hts_prod, &op);
> +
> +		/* add rmb barrier to avoid load/load reorder in weak
> +		 * memory model. It is noop on x86
> +		 */
> +		rte_smp_rmb();
> +
> +		/*
> +		 *  The subtraction is done between two unsigned 32bits
> value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * *old_head > cons_tail). So 'free_entries' is always between
> 0
> +		 * and capacity (which is < size).
> +		 */
> +		*free_entries = capacity + r->cons.tail - op.pos.head;
> +
> +		/* check that we have enough room in ring */
> +		if (unlikely(n > *free_entries))
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> +					0 : *free_entries;
> +
> +		if (n == 0)
> +			break;
> +
> +		np.pos.tail = op.pos.tail;
> +		np.pos.head = op.pos.head + n;
> +
> +	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
> +			op.raw, np.raw) == 0);
I think we can use 32b atomic operation here and just update the head.

> +
> +	*old_head = op.pos.head;
> +	return n;
> +}
> +
> +/**
> + * @internal This function updates the consumer head for dequeue
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sc
> + *   Indicates whether multi-consumer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where dequeue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where dequeue finishes
> + * @param entries
> + *   Returns the number of entries in the ring BEFORE head was moved
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos np, op;
> +
> +	/* move cons.head atomically */
> +	do {
> +		/* Restore n as it may change every loop */
> +		n = num;
> +
> +		/* wait for tail to be equal to head */
> +		__rte_ring_hts_head_wait(&r->hts_cons, &op);
> +
> +		/* add rmb barrier to avoid load/load reorder in weak
> +		 * memory model. It is noop on x86
> +		 */
> +		rte_smp_rmb();
> +
> +		/* The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * cons_head > prod_tail). So 'entries' is always between 0
> +		 * and size(ring)-1.
> +		 */
> +		*entries = r->prod.tail - op.pos.head;
> +
> +		/* Set the actual entries for dequeue */
> +		if (n > *entries)
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> *entries;
> +
> +		if (unlikely(n == 0))
> +			break;
> +
> +		np.pos.tail = op.pos.tail;
> +		np.pos.head = op.pos.head + n;
> +
> +	} while (rte_atomic64_cmpset(&r->hts_cons.ht.raw,
> +			op.raw, np.raw) == 0);
> +
> +	*old_head = op.pos.head;
> +	return n;
> +}
> +
> +#endif /* _RTE_RING_HTS_GENERIC_H_ */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API Konstantin Ananyev
@ 2020-04-14  3:45         ` Honnappa Nagarahalli
  2020-04-14 16:47           ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-14  3:45 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> 
> For rings with producer/consumer in RTE_RING_SYNC_ST,
> RTE_RING_SYNC_MT_HTS mode, provide an ability to split enqueue/dequeue
> operation into two phases:
>       - enqueue/dequeue start
>       - enqueue/dequeue finish
> That allows user to inspect objects in the ring without removing them from it
> (aka MT safe peek).
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_ring/Makefile               |   1 +
>  lib/librte_ring/meson.build            |   1 +
>  lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
>  lib/librte_ring/rte_ring_elem.h        |   4 +
>  lib/librte_ring/rte_ring_generic.h     |  48 ++++
>  lib/librte_ring/rte_ring_hts_generic.h |  47 ++-
>  lib/librte_ring/rte_ring_peek.h        | 379 +++++++++++++++++++++++++
>  7 files changed, 519 insertions(+), 5 deletions(-)  create mode 100644
> lib/librte_ring/rte_ring_peek.h
> 
> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> 6fe500f0d..5f8662737 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -22,6 +22,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> rte_ring.h \
>  					rte_ring_hts.h \
>  					rte_ring_hts_elem.h \
>  					rte_ring_hts_generic.h \
> +					rte_ring_peek.h \
>  					rte_ring_rts.h \
>  					rte_ring_rts_elem.h \
>  					rte_ring_rts_generic.h
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> 8e86e037a..f5f84dc6e 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -9,6 +9,7 @@ headers = files('rte_ring.h',
>  		'rte_ring_hts.h',
>  		'rte_ring_hts_elem.h',
>  		'rte_ring_hts_generic.h',
> +		'rte_ring_peek.h',
>  		'rte_ring_rts.h',
>  		'rte_ring_rts_elem.h',
>  		'rte_ring_rts_generic.h')
> diff --git a/lib/librte_ring/rte_ring_c11_mem.h
> b/lib/librte_ring/rte_ring_c11_mem.h
> index 0fb73a337..bb3096721 100644
> --- a/lib/librte_ring/rte_ring_c11_mem.h
> +++ b/lib/librte_ring/rte_ring_c11_mem.h
> @@ -10,6 +10,50 @@
>  #ifndef _RTE_RING_C11_MEM_H_
>  #define _RTE_RING_C11_MEM_H_
> 
> +/**
> + * @internal get current tail value.
> + * This function should be used only for single thread producer/consumer.
> + * Check that user didn't request to move tail above the head.
Do we need this check? This could be a data path function, we could document a warning and leave it to the users to provide the correct value.

> + * In that situation:
> + * - return zero, that will cause abort any pending changes and
> + *   return head to its previous position.
> + * - throw an assert in debug mode.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
> +	uint32_t num)
> +{
> +	uint32_t h, n, t;
> +
> +	h = ht->head;
> +	t = ht->tail;
> +	n = h - t;
> +
> +	RTE_ASSERT(n >= num);
> +	num = (n >= num) ? num : 0;
> +
> +	*tail = h;
> +	return num;
> +}
> +
> +/**
> + * @internal set new values for head and tail.
> + * This function should be used only for single thread producer/consumer.
> + * Should be used only in conjunction with __rte_ring_st_get_tail.
> + */
> +static __rte_always_inline void
> +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> +	uint32_t num, uint32_t enqueue)
> +{
> +	uint32_t pos;
> +
> +	RTE_SET_USED(enqueue);
> +
> +	pos = tail + num;
> +	ht->head = pos;
> +	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE); }
> +
>  static __rte_always_inline void
>  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
>  		uint32_t single, uint32_t enqueue)
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 010a564c1..5bf7c1c1b 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -1083,6 +1083,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r,
> void *obj_table,
>  	return 0;
>  }
> 
> +#ifdef ALLOW_EXPERIMENTAL_API
> +#include <rte_ring_peek.h>
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ring/rte_ring_generic.h
> b/lib/librte_ring/rte_ring_generic.h
> index 953cdbbd5..9f5fdf13b 100644
> --- a/lib/librte_ring/rte_ring_generic.h
> +++ b/lib/librte_ring/rte_ring_generic.h
> @@ -10,6 +10,54 @@
>  #ifndef _RTE_RING_GENERIC_H_
>  #define _RTE_RING_GENERIC_H_
> 
> +/**
> + * @internal get current tail value.
> + * This function should be used only for single thread producer/consumer.
> + * Check that user didn't request to move tail above the head.
> + * In that situation:
> + * - return zero, that will cause abort any pending changes and
> + *   return head to its previous position.
> + * - throw an assert in debug mode.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
> +	uint32_t num)
> +{
> +	uint32_t h, n, t;
> +
> +	h = ht->head;
> +	t = ht->tail;
> +	n = h - t;
> +
> +	RTE_ASSERT(n >= num);
> +	num = (n >= num) ? num : 0;
> +
> +	*tail = h;
> +	return num;
> +}
> +
> +/**
> + * @internal set new values for head and tail.
> + * This function should be used only for single thread producer/consumer.
> + * Should be used only in conjunction with __rte_ring_st_get_tail.
> + */
> +static __rte_always_inline void
> +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> +	uint32_t num, uint32_t enqueue)
> +{
> +	uint32_t pos;
> +
> +	pos = tail + num;
> +
> +	if (enqueue)
> +		rte_smp_wmb();
> +	else
> +		rte_smp_rmb();
> +
> +	ht->head = pos;
> +	ht->tail = pos;
> +}
> +
>  static __rte_always_inline void
>  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
>  		uint32_t single, uint32_t enqueue)
> diff --git a/lib/librte_ring/rte_ring_hts_generic.h
> b/lib/librte_ring/rte_ring_hts_generic.h
> index da08f1d94..8e699c006 100644
> --- a/lib/librte_ring/rte_ring_hts_generic.h
> +++ b/lib/librte_ring/rte_ring_hts_generic.h
> @@ -18,9 +18,38 @@
>   * For more information please refer to <rte_ring_hts.h>.
>   */
> 
> +/**
> + * @internal get current tail value.
> + * Check that user didn't request to move tail above the head.
> + * In that situation:
> + * - return zero, that will cause abort any pending changes and
> + *   return head to its previous position.
> + * - throw an assert in debug mode.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
> +	uint32_t num)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos p;
> +
> +	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht-
> >ht.raw);
> +	n = p.pos.head - p.pos.tail;
> +
> +	RTE_ASSERT(n >= num);
> +	num = (n >= num) ? num : 0;
> +
> +	*tail = p.pos.tail;
> +	return num;
> +}
> +
> +/**
> + * @internal set new values for head and tail as one atomic 64 bit operation.
> + * Should be used only in conjunction with __rte_ring_hts_get_tail.
> + */
>  static __rte_always_inline void
> -__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> -	uint32_t enqueue)
> +__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
> +	uint32_t num, uint32_t enqueue)
>  {
>  	union rte_ring_ht_pos p;
> 
> @@ -29,14 +58,22 @@ __rte_ring_hts_update_tail(struct
> rte_ring_hts_headtail *ht, uint32_t num,
>  	else
>  		rte_smp_rmb();
> 
> -	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht-
> >ht.raw);
> -
> -	p.pos.head = p.pos.tail + num;
> +	p.pos.head = tail + num;
>  	p.pos.tail = p.pos.head;
> 
>  	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);  }
> 
> +static __rte_always_inline void
> +__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> +	uint32_t enqueue)
> +{
> +	uint32_t tail;
> +
> +	num = __rte_ring_hts_get_tail(ht, &tail, num);
> +	__rte_ring_hts_set_head_tail(ht, tail, num, enqueue); }
> +
>  /**
>   * @internal waits till tail will become equal to head.
>   * Means no writer/reader is active for that ring.
> diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
> new file mode 100644 index 000000000..baefd2f7b
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_peek.h
> @@ -0,0 +1,379 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_PEEK_H_
> +#define _RTE_RING_PEEK_H_
> +
> +/**
> + * @file
> + * @b EXPERIMENTAL: this API may change without prior notice
> + * It is not recommended to include this file directly.
> + * Please include <rte_ring_elem.h> instead.
> + *
> + * Ring Peek AP
                            ^^^ API
> + * Introduction of rte_ring with serialized producer/consumer (HTS sync
> +mode)
> + * makes possible to split public enqueue/dequeue API into two phases:
> + * - enqueue/dequeue start
> + * - enqueue/dequeue finish
> + * That allows user to inspect objects in the ring without removing
> +them
> + * from it (aka MT safe peek).
> + * Note that right now this new API is avaialble only for two sync modes:
> + * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
> + * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
> + * It is a user responsibility to create/init ring with appropriate
> +sync
> + * modes selected.
> + * As an example:
> + * // read 1 elem from the ring:
> + * n = rte_ring_hts_dequeue_bulk_start(ring, &obj, 1, NULL);
> + * if (n != 0) {
> + *    //examine object
> + *    if (object_examine(obj) == KEEP)
> + *       //decided to keep it in the ring.
> + *       rte_ring_hts_dequeue_finish(ring, 0);
> + *    else
> + *       //decided to remove it from the ring.
> + *       rte_ring_hts_dequeue_finish(ring, n);
> + * }
> + * Note that between _start_ and _finish_ the ring is sort of locked -
                                                                                  ^^^^^^^^^^^^^^^^^^^^ - 'locked' can mean different to different people, may be remove this, the next sentence anyway has the description
> + * none other thread can proceed with enqueue(/dequeue) operation till
          ^^^^ no
> + * _finish_ will complete.
                         ^^^^^^^^^^^ completes
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +

<snip>

> +
> +/**
> + * Start to enqueue several objects on the ring.
> + * Note that no actual objects are put in the queue by this function,
> + * it just reserves for user such ability.
> + * User has to call appropriate enqueue_finish() to copy objects into
> +the
> + * queue and complete given enqueue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects that can be enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
> +		unsigned int *free_space)
If one wants to use _elem_ APIs for ring peek, a combination of legacy API (format) and a _elem_ API is required.
For ex:
rte_ring_enqueue_bulk_start
rte_ring_enqueue_elem_finish

I understand why you have done this. I think this is getting somewhat too inconsistent.

> +{
> +	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
> +			free_space);
> +}
> +

<snip>

> +
> +/**
> + * Start to dequeue several objects from the ring.
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation and actually remove objects the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   Actual number of objects dequeued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
Should this in a separate file? (similar to rte_ring.h and rte_ring_elem.h)

> +	return rte_ring_dequeue_bulk_elem_start(r, obj_table,
> sizeof(uintptr_t),
> +		n, available);
> +}
> +
> +/**
> + * Start to dequeue several objects from the ring.
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation and actually remove objects the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
Minor, update this to indicate generic objects. Can be copied from rte_ring_elem.h.

> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The actual number of objects dequeued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void **obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, available); }
> +

<snip>

> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RING_PEEK_H_ */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes Konstantin Ananyev
  2020-04-04 14:16         ` [dpdk-dev] 回复:[PATCH " 周介龙
@ 2020-04-14  4:28         ` Honnappa Nagarahalli
  2020-04-14 18:29           ` Ananyev, Konstantin
  2020-04-15 20:28           ` Ananyev, Konstantin
  1 sibling, 2 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-14  4:28 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>
Hi Konstantin,
	It would be good to blend this commit with the other commits. Few comments inline.

> Subject: [PATCH v3 9/9] ring: add C11 memory model for new sync modes
> 
> Add C11 atomics based implementation for RTS and HTS head/tail update
> primitivies.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_ring/Makefile               |   4 +-
>  lib/librte_ring/meson.build            |   2 +
>  lib/librte_ring/rte_ring_hts.h         |   4 +
>  lib/librte_ring/rte_ring_hts_c11_mem.h | 222 +++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_hts_elem.h    |   4 +
>  lib/librte_ring/rte_ring_rts.h         |   4 +
>  lib/librte_ring/rte_ring_rts_c11_mem.h | 198 ++++++++++++++++++++++
>  lib/librte_ring/rte_ring_rts_elem.h    |   4 +
>  8 files changed, 441 insertions(+), 1 deletion(-)  create mode 100644
> lib/librte_ring/rte_ring_hts_c11_mem.h
>  create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h
> 
> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> 5f8662737..927d105bf 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -22,9 +22,11 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> rte_ring.h \
>  					rte_ring_hts.h \
>  					rte_ring_hts_elem.h \
>  					rte_ring_hts_generic.h \
> +					rte_ring_hts_c11_mem.h \
>  					rte_ring_peek.h \
>  					rte_ring_rts.h \
>  					rte_ring_rts_elem.h \
> -					rte_ring_rts_generic.h
> +					rte_ring_rts_generic.h \
> +					rte_ring_rts_c11_mem.h
> 
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> f5f84dc6e..f2e37a8e4 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -7,10 +7,12 @@ headers = files('rte_ring.h',
>  		'rte_ring_c11_mem.h',
>  		'rte_ring_generic.h',
>  		'rte_ring_hts.h',
> +		'rte_ring_hts_c11_mem.h',
>  		'rte_ring_hts_elem.h',
>  		'rte_ring_hts_generic.h',
>  		'rte_ring_peek.h',
>  		'rte_ring_rts.h',
> +		'rte_ring_rts_c11_mem.h',
>  		'rte_ring_rts_elem.h',
>  		'rte_ring_rts_generic.h')
> 
> diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h index
> 062d7be6c..ddaa47ff1 100644
> --- a/lib/librte_ring/rte_ring_hts.h
> +++ b/lib/librte_ring/rte_ring_hts.h
> @@ -29,7 +29,11 @@
>  extern "C" {
>  #endif
> 
> +#ifdef RTE_USE_C11_MEM_MODEL
> +#include <rte_ring_hts_c11_mem.h>
> +#else
>  #include <rte_ring_hts_generic.h>
> +#endif
> 
>  /**
>   * @internal Enqueue several objects on the HTS ring.
> diff --git a/lib/librte_ring/rte_ring_hts_c11_mem.h
> b/lib/librte_ring/rte_ring_hts_c11_mem.h
> new file mode 100644
> index 000000000..0218d0e7d
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_hts_c11_mem.h
> @@ -0,0 +1,222 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2020 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_HTS_C11_MEM_H_
> +#define _RTE_RING_HTS_C11_MEM_H_
> +
> +/**
> + * @file rte_ring_hts_c11_mem.h
> + * It is not recommended to include this file directly,
> + * include <rte_ring.h> instead.
> + * Contains internal helper functions for head/tail sync (HTS) ring mode.
> + * For more information please refer to <rte_ring_hts.h>.
> + */
> +
> +/**
> + * @internal get current tail value.
> + * Check that user didn't request to move tail above the head.
> + * In that situation:
> + * - return zero, that will cause abort any pending changes and
> + *   return head to its previous position.
> + * - throw an assert in debug mode.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
> +	uint32_t num)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos p;
> +
> +	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
> +	n = p.pos.head - p.pos.tail;
> +
> +	RTE_ASSERT(n >= num);
> +	num = (n >= num) ? num : 0;
> +
> +	*tail = p.pos.tail;
> +	return num;
> +}
> +
> +/**
> + * @internal set new values for head and tail as one atomic 64 bit operation.
> + * Should be used only in conjunction with __rte_ring_hts_get_tail.
> + */
> +static __rte_always_inline void
> +__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
> +	uint32_t num, uint32_t enqueue)
> +{
> +	union rte_ring_ht_pos p;
> +
> +	RTE_SET_USED(enqueue);
> +
> +	p.pos.head = tail + num;
> +	p.pos.tail = p.pos.head;
> +
> +	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE); }
> +
> +static __rte_always_inline void
> +__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> +	uint32_t enqueue)
> +{
> +	uint32_t tail;
> +
> +	num = __rte_ring_hts_get_tail(ht, &tail, num);
> +	__rte_ring_hts_set_head_tail(ht, tail, num, enqueue); }
> +
> +/**
> + * @internal waits till tail will become equal to head.
> + * Means no writer/reader is active for that ring.
> + * Suppose to work as serialization point.
> + */
> +static __rte_always_inline void
> +__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
> +		union rte_ring_ht_pos *p)
> +{
> +	p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
> +
> +	while (p->pos.head != p->pos.tail) {
> +		rte_pause();
> +		p->raw = __atomic_load_n(&ht->ht.raw,
> __ATOMIC_ACQUIRE);
> +	}
> +}
> +
> +/**
> + * @internal This function updates the producer head for enqueue
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sp
> + *   Indicates whether multi-producer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where enqueue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where enqueue finishes
> + * @param free_entries
> + *   Returns the amount of free space in the ring BEFORE head was moved
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *free_entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos np, op;
> +
> +	const uint32_t capacity = r->capacity;
> +
> +	do {
> +		/* Reset n to the initial burst count */
> +		n = num;
> +
> +		/* wait for tail to be equal to head, , acquire point */
> +		__rte_ring_hts_head_wait(&r->hts_prod, &op);
> +
> +		/*
> +		 *  The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * *old_head > cons_tail). So 'free_entries' is always between
> 0
> +		 * and capacity (which is < size).
> +		 */
> +		*free_entries = capacity + r->cons.tail - op.pos.head;
> +
> +		/* check that we have enough room in ring */
> +		if (unlikely(n > *free_entries))
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> +					0 : *free_entries;
> +
> +		if (n == 0)
> +			break;
> +
> +		np.pos.tail = op.pos.tail;
> +		np.pos.head = op.pos.head + n;
> +
> +	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
> +			&op.raw, np.raw,
> +			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
__ATOMIC_RELEASE can be __ATOMIC_RELAXED. The RELEASE while updating after the elements are written is enough.

> +
> +	*old_head = op.pos.head;
> +	return n;
> +}
> +
> +/**
> + * @internal This function updates the consumer head for dequeue
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sc
> + *   Indicates whether multi-consumer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where dequeue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where dequeue finishes
> + * @param entries
> + *   Returns the number of entries in the ring BEFORE head was moved
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_pos np, op;
> +
> +	/* move cons.head atomically */
> +	do {
> +		/* Restore n as it may change every loop */
> +		n = num;
> +
> +		/* wait for tail to be equal to head */
> +		__rte_ring_hts_head_wait(&r->hts_cons, &op);
> +
> +		/* The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * cons_head > prod_tail). So 'entries' is always between 0
> +		 * and size(ring)-1.
> +		 */
> +		*entries = r->prod.tail - op.pos.head;
> +
> +		/* Set the actual entries for dequeue */
> +		if (n > *entries)
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> *entries;
> +
> +		if (unlikely(n == 0))
> +			break;
> +
> +		np.pos.tail = op.pos.tail;
> +		np.pos.head = op.pos.head + n;
> +
> +	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
> +			&op.raw, np.raw,
> +			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
Same here, RELEASE can be RELAXED.

> +
> +	*old_head = op.pos.head;
> +	return n;
> +}
> +
> +#endif /* _RTE_RING_HTS_C11_MEM_H_ */

<snip>

>  /**
>   * @internal Enqueue several objects on the RTS ring.
> diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h
> b/lib/librte_ring/rte_ring_rts_c11_mem.h
> new file mode 100644
> index 000000000..b72901497
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
> @@ -0,0 +1,198 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_RTS_C11_MEM_H_
> +#define _RTE_RING_RTS_C11_MEM_H_
> +
> +/**
> + * @file rte_ring_rts_c11_mem.h
> + * It is not recommended to include this file directly,
> + * include <rte_ring.h> instead.
> + * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
> + * For more information please refer to <rte_ring_rts.h>.
> + */
> +
> +/**
> + * @internal This function updates tail values.
> + */
> +static __rte_always_inline void
> +__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht) {
> +	union rte_ring_ht_poscnt h, ot, nt;
> +
> +	/*
> +	 * If there are other enqueues/dequeues in progress that
> +	 * might preceded us, then don't update tail with new value.
> +	 */
> +
> +	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
This can be RELAXED. This thread is reading the value that it updated earlier, so it should be able to see the value it updated.

> +
> +	do {
> +		/* on 32-bit systems we have to do atomic read here */
> +		h.raw = __atomic_load_n(&ht->head.raw,
> __ATOMIC_RELAXED);
> +
> +		nt.raw = ot.raw;
> +		if (++nt.val.cnt == h.val.cnt)
> +			nt.val.pos = h.val.pos;
> +
> +	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw,
> nt.raw,
> +			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0); }
> +
> +/**
> + * @internal This function waits till head/tail distance wouldn't
> + * exceed pre-defined max value.
> + */
> +static __rte_always_inline void
> +__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
> +	union rte_ring_ht_poscnt *h)
> +{
> +	uint32_t max;
> +
> +	max = ht->htd_max;
> +	h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
> +
> +	while (h->val.pos - ht->tail.val.pos > max) {
> +		rte_pause();
> +		h->raw = __atomic_load_n(&ht->head.raw,
> __ATOMIC_ACQUIRE);
> +	}
> +}
> +
> +/**
> + * @internal This function updates the producer head for enqueue.
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sp
> + *   Indicates whether multi-producer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where enqueue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where enqueue finishes
> + * @param free_entries
> + *   Returns the amount of free space in the ring BEFORE head was moved
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *free_entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_poscnt nh, oh;
> +
> +	const uint32_t capacity = r->capacity;
> +
> +	do {
> +		/* Reset n to the initial burst count */
> +		n = num;
> +
> +		/* read prod head (may spin on prod tail, acquire point) */
> +		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
> +
> +		/*
> +		 *  The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * *old_head > cons_tail). So 'free_entries' is always between
> 0
> +		 * and capacity (which is < size).
> +		 */
> +		*free_entries = capacity + r->cons.tail - oh.val.pos;
> +
> +		/* check that we have enough room in ring */
> +		if (unlikely(n > *free_entries))
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> +					0 : *free_entries;
> +
> +		if (n == 0)
> +			break;
> +
> +		nh.val.pos = oh.val.pos + n;
> +		nh.val.cnt = oh.val.cnt + 1;
> +
> +	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
> +			&oh.raw, nh.raw,
> +			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
> +
> +	*old_head = oh.val.pos;
> +	return n;
> +}
> +
> +/**
> + * @internal This function updates the consumer head for dequeue
> + *
> + * @param r
> + *   A pointer to the ring structure
> + * @param is_sc
> + *   Indicates whether multi-consumer path is needed or not
> + * @param n
> + *   The number of elements we will want to enqueue, i.e. how far should the
> + *   head be moved
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param old_head
> + *   Returns head value as it was before the move, i.e. where dequeue starts
> + * @param new_head
> + *   Returns the current/new head value i.e. where dequeue finishes
> + * @param entries
> + *   Returns the number of entries in the ring BEFORE head was moved
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *entries)
> +{
> +	uint32_t n;
> +	union rte_ring_ht_poscnt nh, oh;
> +
> +	/* move cons.head atomically */
> +	do {
> +		/* Restore n as it may change every loop */
> +		n = num;
> +
> +		/* read cons head (may spin on cons tail, acquire point) */
> +		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
> +
> +		/* The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * cons_head > prod_tail). So 'entries' is always between 0
> +		 * and size(ring)-1.
> +		 */
> +		*entries = r->prod.tail - oh.val.pos;
> +
> +		/* Set the actual entries for dequeue */
> +		if (n > *entries)
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> *entries;
> +
> +		if (unlikely(n == 0))
> +			break;
> +
> +		nh.val.pos = oh.val.pos + n;
> +		nh.val.cnt = oh.val.cnt + 1;
> +
> +	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
> +			&oh.raw, nh.raw,
> +			1, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
> +
> +	*old_head = oh.val.pos;
> +	return n;
> +}
> +
> +#endif /* _RTE_RING_RTS_C11_MEM_H_ */
> diff --git a/lib/librte_ring/rte_ring_rts_elem.h
> b/lib/librte_ring/rte_ring_rts_elem.h
> index 71a331b23..23d8aeec7 100644
> --- a/lib/librte_ring/rte_ring_rts_elem.h
> +++ b/lib/librte_ring/rte_ring_rts_elem.h
> @@ -24,7 +24,11 @@
>  extern "C" {
>  #endif
> 
> +#ifdef RTE_USE_C11_MEM_MODEL
> +#include <rte_ring_rts_c11_mem.h>
> +#else
>  #include <rte_ring_rts_generic.h>
> +#endif
> 
>  /**
>   * @internal Enqueue several objects on the RTS ring.
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-10 23:10             ` Honnappa Nagarahalli
  2020-04-13 14:29               ` David Marchand
@ 2020-04-14 13:18               ` Ananyev, Konstantin
  2020-04-14 15:58                 ` Honnappa Nagarahalli
  1 sibling, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-14 13:18 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd

> > > >
> > > > +#ifdef ALLOW_EXPERIMENTAL_API
> > > > +#include <rte_ring_rts.h>
> > > > +#endif
> > > > +
> > > >  /**
> > > >   * Enqueue several objects on a ring.
> > > >   *
> > > > @@ -484,8 +520,21 @@ static __rte_always_inline unsigned int
> > > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > >  		      unsigned int n, unsigned int *free_space)  {
> > > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > > RTE_RING_QUEUE_FIXED,
> > > > -			r->prod.sync_type, free_space);
> > > > +	switch (r->prod.sync_type) {
> > > > +	case RTE_RING_SYNC_MT:
> > > > +		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
> > > > +	case RTE_RING_SYNC_ST:
> > > > +		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
> > > Have you validated if these affect the performance for the existing APIs?
> >
> > I run ring_pmd_perf_autotest
> > (AFAIK, that's the only one of our perf tests that calls
> > rte_ring_enqueue/dequeue), and didn't see any real difference in perf
> > numbers.
> >
> > > I am also wondering why should we support these new modes in the legacy
> > APIs?
> >
> > Majority of DPDK users still do use legacy API, and I am not sure all of them
> > will be happy to switch to _elem_ one manually.
> > Plus I can't see how we can justify that after let say:
> > rte_ring_init(ring, ..., RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ); returns
> > with success valid call to rte_ring_enqueue(ring,...) should fail.
> Agree, I think the only way right now is through documentation.
> 
> >
> > > I think users should move to use rte_ring_xxx_elem APIs. If users want to
> > use RTS/HTS it will be a good time for them to move to new APIs.
> >
> > If they use rte_ring_enqueue/dequeue all they have to do - just change flags
> > in ring_create/ring_init call.
> > With what you suggest - they have to change every
> > rte_ring_enqueue/dequeue to rte_ring_elem_enqueue/dequeue.
> > That's much bigger code churn.
> But these are just 1 to 1 mapped.  I would think, there are not a whole lot of them in the application code, may be ~10 lines?

I suppose it depends a lot on particular user app.
My preference not to force users to do extra changes in their code.
If we can add new functionality while keeping existing API, why not to do it?
Less disturbance for users seems a good thing to me.

> I think the bigger factor for the user here is the algorithm changes in rte_ring library. Bigger effort for the users would be testing rather than
> code changes in the applications.
> >
> > > They anyway have to test their code for RTS/HTS, might as well make the
> > change to new APIs and test both.
> > > It will be less code to maintain for the community as well.
> >
> > That's true, right now there is a lot of duplication between _elem_ and legacy
> > code.
> >  Actually the only real diff between them - actual copying of the objects.
> >  But I thought we are going to deal with that, just by changing one day all
> > legacy API to wrappers around _elem_ calls, i.e something like:
> >
> > static __rte_always_inline unsigned int
> > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >                       unsigned int n, unsigned int *free_space) {
> > 	return  rte_ring_enqueue_elem_bulk(r, obj_table, sizeof(uintptr_t), n,
> > free_space); }
> >
> > That way users will switch to new API automatically, without any extra effort
> > for them, and we will be able to remove legacy code.
> > Do you have some other thoughts here how to deal with this legacy/elem
> > conversion?
> Yes, that is what I was thinking, but had not considered any addition of new APIs.
> But, I am wondering if we should look at deprecation?

You mean to deprecate existing  legacy API?
rte_ring_enqueue/dequeue_bulk, etc?
I don't think we need to deprecate it at all.
As long as we'll have _elem_  functions called underneath there would be one implementation anyway,
and we can leave them forever, so users wouldn't need to change their existing code at all. 

> If we decide to deprecate, it would be good to avoid making the users of RTS/HTS do
> the work twice (once to make the switch to RTS/HTS and then another to _elem_ APIs).
> 
> One thing we can do is to implement the wrappers you mentioned above for RTS/HTS now.

That's a very good point.
 It will require some re-org to allow rte_ring.h to include rte_ring_elem.h,
but I think it is doable, will try to make these changes in v4. 

> I also it is worth considering to switch to these
> wrappers 20.05 so that come 20.11, we have a code base that has gone through couple of releases' testing.

You mean wrappers for existing legacy API (MP/MC, SP/SC modes)?
It is probably too late to make such changes in 20.05, probably early 20.08 is a good time for that.  

> 
> <snip>
> 
> > > > +
> > > > +#endif /* _RTE_RING_RTS_ELEM_H_ */
> > > > diff --git a/lib/librte_ring/rte_ring_rts_generic.h
> > > > b/lib/librte_ring/rte_ring_rts_generic.h
> > > > new file mode 100644
> > > > index 000000000..f88460d47
> > > > --- /dev/null
> > > > +++ b/lib/librte_ring/rte_ring_rts_generic.h
> > > I do not know the benefit to providing the generic version. Do you know
> > why this was done in the legacy APIs?
> >
> > I think at first we had generic API only, then later C11 was added.
> > As I remember, C11 one on IA was measured as a bit slower then generic,
> > so it was decided to keep both.
> >
> > > If there is no performance difference between generic and C11 versions,
> > should we just skip the generic version?
> > > The oldest compiler in CI are GCC 4.8.5 and Clang 3.4.2 and C11 built-ins
> > are supported earlier than these compiler versions.
> > > I feel the code is growing exponentially in rte_ring library and we should try
> > to cut non-value-ass code/APIs aggressively.
> >
> > I'll check is there perf difference for RTS and HTS between generic and C11
> > versions on IA.
> > Meanwhile please have a proper look at C11 implementation, I am not that
> > familiar with C11 atomics yet.
> ok
> 
> > If there would be no problems with it and no noticeable diff in performance -
> > I am ok to have for RTS/HTS modes C11 version only.

From what I see on my box, there is no much difference
in terms of performance between *generic* and *c11_mem* for RTS/HTS.
ring_stress_autotest for majority of cases shows ~1% diff,
in some cases c11 numbers are even a bit better.
So will keep c11 version only in v4.


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-13 16:42                 ` Honnappa Nagarahalli
@ 2020-04-14 13:47                   ` David Marchand
  2020-04-14 15:57                     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 146+ messages in thread
From: David Marchand @ 2020-04-14 13:47 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Ananyev, Konstantin, dev, jielong.zjl, nd, Kinsella, Ray, thomas,
	jerinj, Dodji Seketeli

Hello Honnappa,

On Mon, Apr 13, 2020 at 6:42 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > Reviews on those structures must be extra careful, as we are blind with those
> > rules in place.
> Yes, this is my concern. Why not remove these fixes and ignore the errors manually (i.e. merge the patches knowing that they are false errors) from libabigail? Do you know if libabigail will fix these in the future?

A lot of people ignore the errors reported by the CI.
I don't want to give a valid reason to ignore the reports.

Dodji (libabigail maintainer) has been working on the issue.
He showed me his progress last week.
I don't know when the fix is ready but we can expect it before the
20.05 release.

Do you expect other changes on the ring structure in this release ?


-- 
David Marchand


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-14 13:47                   ` David Marchand
@ 2020-04-14 15:57                     ` Honnappa Nagarahalli
  2020-04-14 16:21                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-14 15:57 UTC (permalink / raw)
  To: David Marchand
  Cc: Ananyev, Konstantin, dev, jielong.zjl, nd, Kinsella, Ray, thomas,
	jerinj, Dodji Seketeli, Honnappa Nagarahalli, nd

<snip>

> 
> Hello Honnappa,
> 
> On Mon, Apr 13, 2020 at 6:42 PM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> > > Reviews on those structures must be extra careful, as we are blind
> > > with those rules in place.
> > Yes, this is my concern. Why not remove these fixes and ignore the errors
> manually (i.e. merge the patches knowing that they are false errors) from
> libabigail? Do you know if libabigail will fix these in the future?
> 
> A lot of people ignore the errors reported by the CI.
> I don't want to give a valid reason to ignore the reports.
> 
> Dodji (libabigail maintainer) has been working on the issue.
> He showed me his progress last week.
> I don't know when the fix is ready but we can expect it before the
> 20.05 release.
> 
> Do you expect other changes on the ring structure in this release ?
Konstantin can comment better. But, from my review, I do not see further changes to the ring structure in this patch set.

> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-14 13:18               ` Ananyev, Konstantin
@ 2020-04-14 15:58                 ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-14 15:58 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>
> Subject: RE: [PATCH v3 3/9] ring: introduce RTS ring mode
> 
> > > > >
> > > > > +#ifdef ALLOW_EXPERIMENTAL_API
> > > > > +#include <rte_ring_rts.h>
> > > > > +#endif
> > > > > +
> > > > >  /**
> > > > >   * Enqueue several objects on a ring.
> > > > >   *
> > > > > @@ -484,8 +520,21 @@ static __rte_always_inline unsigned int
> > > > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > > >  		      unsigned int n, unsigned int *free_space)  {
> > > > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > > > RTE_RING_QUEUE_FIXED,
> > > > > -			r->prod.sync_type, free_space);
> > > > > +	switch (r->prod.sync_type) {
> > > > > +	case RTE_RING_SYNC_MT:
> > > > > +		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> free_space);
> > > > > +	case RTE_RING_SYNC_ST:
> > > > > +		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> free_space);
> > > > Have you validated if these affect the performance for the existing APIs?
> > >
> > > I run ring_pmd_perf_autotest
> > > (AFAIK, that's the only one of our perf tests that calls
> > > rte_ring_enqueue/dequeue), and didn't see any real difference in
> > > perf numbers.
> > >
> > > > I am also wondering why should we support these new modes in the
> > > > legacy
> > > APIs?
> > >
> > > Majority of DPDK users still do use legacy API, and I am not sure
> > > all of them will be happy to switch to _elem_ one manually.
> > > Plus I can't see how we can justify that after let say:
> > > rte_ring_init(ring, ..., RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
> > > returns with success valid call to rte_ring_enqueue(ring,...) should fail.
> > Agree, I think the only way right now is through documentation.
> >
> > >
> > > > I think users should move to use rte_ring_xxx_elem APIs. If users
> > > > want to
> > > use RTS/HTS it will be a good time for them to move to new APIs.
> > >
> > > If they use rte_ring_enqueue/dequeue all they have to do - just
> > > change flags in ring_create/ring_init call.
> > > With what you suggest - they have to change every
> > > rte_ring_enqueue/dequeue to rte_ring_elem_enqueue/dequeue.
> > > That's much bigger code churn.
> > But these are just 1 to 1 mapped.  I would think, there are not a whole lot of
> them in the application code, may be ~10 lines?
> 
> I suppose it depends a lot on particular user app.
> My preference not to force users to do extra changes in their code.
> If we can add new functionality while keeping existing API, why not to do it?
> Less disturbance for users seems a good thing to me.
> 
> > I think the bigger factor for the user here is the algorithm changes
> > in rte_ring library. Bigger effort for the users would be testing rather than
> code changes in the applications.
> > >
> > > > They anyway have to test their code for RTS/HTS, might as well
> > > > make the
> > > change to new APIs and test both.
> > > > It will be less code to maintain for the community as well.
> > >
> > > That's true, right now there is a lot of duplication between _elem_
> > > and legacy code.
> > >  Actually the only real diff between them - actual copying of the objects.
> > >  But I thought we are going to deal with that, just by changing one
> > > day all legacy API to wrappers around _elem_ calls, i.e something like:
> > >
> > > static __rte_always_inline unsigned int rte_ring_enqueue_bulk(struct
> > > rte_ring *r, void * const *obj_table,
> > >                       unsigned int n, unsigned int *free_space) {
> > > 	return  rte_ring_enqueue_elem_bulk(r, obj_table, sizeof(uintptr_t),
> > > n, free_space); }
> > >
> > > That way users will switch to new API automatically, without any
> > > extra effort for them, and we will be able to remove legacy code.
> > > Do you have some other thoughts here how to deal with this
> > > legacy/elem conversion?
> > Yes, that is what I was thinking, but had not considered any addition of new
> APIs.
> > But, I am wondering if we should look at deprecation?
> 
> You mean to deprecate existing  legacy API?
> rte_ring_enqueue/dequeue_bulk, etc?
> I don't think we need to deprecate it at all.
> As long as we'll have _elem_  functions called underneath there would be one
> implementation anyway, and we can leave them forever, so users wouldn't
> need to change their existing code at all.
Ack (assuming that the legacy APIs will be wrappers)

> 
> > If we decide to deprecate, it would be good to avoid making the users
> > of RTS/HTS do the work twice (once to make the switch to RTS/HTS and
> then another to _elem_ APIs).
> >
> > One thing we can do is to implement the wrappers you mentioned above
> for RTS/HTS now.
> 
> That's a very good point.
>  It will require some re-org to allow rte_ring.h to include rte_ring_elem.h, but
> I think it is doable, will try to make these changes in v4.
> 
> > I also it is worth considering to switch to these wrappers 20.05 so
> > that come 20.11, we have a code base that has gone through couple of
> releases' testing.
> 
> You mean wrappers for existing legacy API (MP/MC, SP/SC modes)?
> It is probably too late to make such changes in 20.05, probably early 20.08 is
> a good time for that.
Yes, will target for 20.08

> 
> >
> > <snip>
> >
> > > > > +
> > > > > +#endif /* _RTE_RING_RTS_ELEM_H_ */
> > > > > diff --git a/lib/librte_ring/rte_ring_rts_generic.h
> > > > > b/lib/librte_ring/rte_ring_rts_generic.h
> > > > > new file mode 100644
> > > > > index 000000000..f88460d47
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_ring/rte_ring_rts_generic.h
> > > > I do not know the benefit to providing the generic version. Do you
> > > > know
> > > why this was done in the legacy APIs?
> > >
> > > I think at first we had generic API only, then later C11 was added.
> > > As I remember, C11 one on IA was measured as a bit slower then
> > > generic, so it was decided to keep both.
> > >
> > > > If there is no performance difference between generic and C11
> > > > versions,
> > > should we just skip the generic version?
> > > > The oldest compiler in CI are GCC 4.8.5 and Clang 3.4.2 and C11
> > > > built-ins
> > > are supported earlier than these compiler versions.
> > > > I feel the code is growing exponentially in rte_ring library and
> > > > we should try
> > > to cut non-value-ass code/APIs aggressively.
> > >
> > > I'll check is there perf difference for RTS and HTS between generic
> > > and C11 versions on IA.
> > > Meanwhile please have a proper look at C11 implementation, I am not
> > > that familiar with C11 atomics yet.
> > ok
> >
> > > If there would be no problems with it and no noticeable diff in
> > > performance - I am ok to have for RTS/HTS modes C11 version only.
> 
> From what I see on my box, there is no much difference in terms of
> performance between *generic* and *c11_mem* for RTS/HTS.
> ring_stress_autotest for majority of cases shows ~1% diff, in some cases c11
> numbers are even a bit better.
> So will keep c11 version only in v4.
Thanks. That will remove good amount of code.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/9] ring: introduce HTS ring mode
  2020-04-13 23:27         ` Honnappa Nagarahalli
@ 2020-04-14 16:12           ` Ananyev, Konstantin
  2020-04-14 17:06             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-14 16:12 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd

Hi Honnappa,

> 
> Hi Konstantin,
> 	Few nits/comments inline.
> 
> <snip>
> 
> > diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h new
> > file mode 100644 index 000000000..062d7be6c
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_hts.h
> > @@ -0,0 +1,210 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2010-2017 Intel Corporation
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_HTS_H_
> > +#define _RTE_RING_HTS_H_
> > +
> > +/**
> > + * @file rte_ring_hts.h
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + * It is not recommended to include this file directly.
> > + * Please include <rte_ring.h> instead.
> > + *
> > + * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
> > + * In that mode enqueue/dequeue operation is fully serialized:
> > + * at any given moement only one enqueue/dequeue operation can proceed.
>                                  ^^^^^^^^ moment
> > + * This is achieved by thread is allowed to proceed with changing
>                                             ^^^^^^^^^^^^^^ allowing a thread
> > +head.value
> > + * only when head.value == tail.value.
> > + * Both head and tail values are updated atomically (as one 64-bit value).
> > + * To achieve that 64-bit CAS is used by head update routine.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_ring_hts_generic.h>
> > +
> 
> <snip>
> 
> > diff --git a/lib/librte_ring/rte_ring_hts_generic.h
> > b/lib/librte_ring/rte_ring_hts_generic.h
> > new file mode 100644
> > index 000000000..da08f1d94
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_hts_generic.h
> > @@ -0,0 +1,198 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2010-2020 Intel Corporation
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_HTS_GENERIC_H_
> > +#define _RTE_RING_HTS_GENERIC_H_
> > +
> > +/**
> > + * @file rte_ring_hts_generic.h
> > + * It is not recommended to include this file directly,
> > + * include <rte_ring.h> instead.
> > + * Contains internal helper functions for head/tail sync (HTS) ring mode.
> > + * For more information please refer to <rte_ring_hts.h>.
> > + */
> > +
> > +static __rte_always_inline void
> > +__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> > +	uint32_t enqueue)
> > +{
> > +	union rte_ring_ht_pos p;
> > +
> > +	if (enqueue)
> > +		rte_smp_wmb();
> > +	else
> > +		rte_smp_rmb();
> > +
> > +	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht-
> > >ht.raw);
> This read can be avoided if the new head can be returned from '__rte_ring_hts_head_wait'.

Yes, or even cur_head and num should be enough to avoid read here.

> 
> > +
> > +	p.pos.head = p.pos.tail + num;
> > +	p.pos.tail = p.pos.head;
> > +
> > +	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw); }
> Why not use 32b atomic operation here and update just the tail?

Agree, this code-path can do just 32-bit store (as head is not going to change).  

> 
> > +
> > +/**
> > + * @internal waits till tail will become equal to head.
> > + * Means no writer/reader is active for that ring.
> > + * Suppose to work as serialization point.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
> > +		union rte_ring_ht_pos *p)
> > +{
> > +	p->raw = rte_atomic64_read((rte_atomic64_t *)
> > +			(uintptr_t)&ht->ht.raw);
> > +
> > +	while (p->pos.head != p->pos.tail) {
> > +		rte_pause();
> > +		p->raw = rte_atomic64_read((rte_atomic64_t *)
> > +				(uintptr_t)&ht->ht.raw);
> > +	}
> > +}
> > +
> > +/**
> > + * @internal This function updates the producer head for enqueue
> > + *
> > + * @param r
> > + *   A pointer to the ring structure
> > + * @param is_sp
> > + *   Indicates whether multi-producer path is needed or not
> > + * @param n
> > + *   The number of elements we will want to enqueue, i.e. how far should the
> > + *   head be moved
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > + * @param old_head
> > + *   Returns head value as it was before the move, i.e. where enqueue starts
> > + * @param new_head
> > + *   Returns the current/new head value i.e. where enqueue finishes

Ups, copy/paste thing - will remove.

> Would be good to return the new_head from this function and use it in '__rte_ring_hts_update_tail'.

I think old_head + num should be enough here (see above).

> 
> > + * @param free_entries
> > + *   Returns the amount of free space in the ring BEFORE head was moved
> > + * @return
> > + *   Actual number of objects enqueued.
> > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> Minor, suggest removing the elaborate comments, it is not required and difficult to maintain.
> I think we should do the same thing for other files too.

Sorry, didn't get you here: what exactly do you suggest to remove?

> 
> > +static __rte_always_inline unsigned int
> > +__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
> > +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> > +	uint32_t *free_entries)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_pos np, op;
> > +
> > +	const uint32_t capacity = r->capacity;
> > +
> > +	do {
> > +		/* Reset n to the initial burst count */
> > +		n = num;
> > +
> > +		/* wait for tail to be equal to head */
> > +		__rte_ring_hts_head_wait(&r->hts_prod, &op);
> > +
> > +		/* add rmb barrier to avoid load/load reorder in weak
> > +		 * memory model. It is noop on x86
> > +		 */
> > +		rte_smp_rmb();
> > +
> > +		/*
> > +		 *  The subtraction is done between two unsigned 32bits
> > value
> > +		 * (the result is always modulo 32 bits even if we have
> > +		 * *old_head > cons_tail). So 'free_entries' is always between
> > 0
> > +		 * and capacity (which is < size).
> > +		 */
> > +		*free_entries = capacity + r->cons.tail - op.pos.head;
> > +
> > +		/* check that we have enough room in ring */
> > +		if (unlikely(n > *free_entries))
> > +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> > +					0 : *free_entries;
> > +
> > +		if (n == 0)
> > +			break;
> > +
> > +		np.pos.tail = op.pos.tail;
> > +		np.pos.head = op.pos.head + n;
> > +
> > +	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
> > +			op.raw, np.raw) == 0);
> I think we can use 32b atomic operation here and just update the head.

I think we have to do proper 64 bit CAS here, otherwise ABA race could arise:
Thread reads head/tail values, then get suspended just before CAS instruction for a while.
Thread resumes when ring head value is equal to thread's local head value,
but tail differs (some other thread enqueuing into the ring).
If we'll do CAS just for head - it would succeed, though it shouldn't.    
I understand that with 32-bit head/tail values probability of such situation
is really low, but still.

> 
> > +
> > +	*old_head = op.pos.head;
> > +	return n;
> > +}
> > +
> > +/**
> > + * @internal This function updates the consumer head for dequeue
> > + *
> > + * @param r
> > + *   A pointer to the ring structure
> > + * @param is_sc
> > + *   Indicates whether multi-consumer path is needed or not
> > + * @param n
> > + *   The number of elements we will want to enqueue, i.e. how far should the
> > + *   head be moved
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> > ring
> > + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> > ring
> > + * @param old_head
> > + *   Returns head value as it was before the move, i.e. where dequeue starts
> > + * @param new_head
> > + *   Returns the current/new head value i.e. where dequeue finishes
> > + * @param entries
> > + *   Returns the number of entries in the ring BEFORE head was moved
> > + * @return
> > + *   - Actual number of objects dequeued.
> > + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
> > +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> > +	uint32_t *entries)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_pos np, op;
> > +
> > +	/* move cons.head atomically */
> > +	do {
> > +		/* Restore n as it may change every loop */
> > +		n = num;
> > +
> > +		/* wait for tail to be equal to head */
> > +		__rte_ring_hts_head_wait(&r->hts_cons, &op);
> > +
> > +		/* add rmb barrier to avoid load/load reorder in weak
> > +		 * memory model. It is noop on x86
> > +		 */
> > +		rte_smp_rmb();
> > +
> > +		/* The subtraction is done between two unsigned 32bits value
> > +		 * (the result is always modulo 32 bits even if we have
> > +		 * cons_head > prod_tail). So 'entries' is always between 0
> > +		 * and size(ring)-1.
> > +		 */
> > +		*entries = r->prod.tail - op.pos.head;
> > +
> > +		/* Set the actual entries for dequeue */
> > +		if (n > *entries)
> > +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> > *entries;
> > +
> > +		if (unlikely(n == 0))
> > +			break;
> > +
> > +		np.pos.tail = op.pos.tail;
> > +		np.pos.head = op.pos.head + n;
> > +
> > +	} while (rte_atomic64_cmpset(&r->hts_cons.ht.raw,
> > +			op.raw, np.raw) == 0);
> > +
> > +	*old_head = op.pos.head;
> > +	return n;
> > +}
> > +
> > +#endif /* _RTE_RING_HTS_GENERIC_H_ */
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode
  2020-04-14 15:57                     ` Honnappa Nagarahalli
@ 2020-04-14 16:21                       ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-14 16:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, David Marchand
  Cc: dev, jielong.zjl, nd, Kinsella, Ray, thomas, jerinj, Dodji Seketeli, nd


Hi guys,
 
> >
> > Hello Honnappa,
> >
> > On Mon, Apr 13, 2020 at 6:42 PM Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com> wrote:
> > > > Reviews on those structures must be extra careful, as we are blind
> > > > with those rules in place.
> > > Yes, this is my concern. Why not remove these fixes and ignore the errors
> > manually (i.e. merge the patches knowing that they are false errors) from
> > libabigail? Do you know if libabigail will fix these in the future?
> >
> > A lot of people ignore the errors reported by the CI.
> > I don't want to give a valid reason to ignore the reports.
> >
> > Dodji (libabigail maintainer) has been working on the issue.
> > He showed me his progress last week.
> > I don't know when the fix is ready but we can expect it before the
> > 20.05 release.
> >
> > Do you expect other changes on the ring structure in this release ?
> Konstantin can comment better. But, from my review, I do not see further changes to the ring structure in this patch set.

I don't plan any extra changes in rte_ring struct right now.
Konstantin


 



^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API
  2020-04-14  3:45         ` Honnappa Nagarahalli
@ 2020-04-14 16:47           ` Ananyev, Konstantin
  2020-04-14 17:30             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-14 16:47 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd


> >
> > For rings with producer/consumer in RTE_RING_SYNC_ST,
> > RTE_RING_SYNC_MT_HTS mode, provide an ability to split enqueue/dequeue
> > operation into two phases:
> >       - enqueue/dequeue start
> >       - enqueue/dequeue finish
> > That allows user to inspect objects in the ring without removing them from it
> > (aka MT safe peek).
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/librte_ring/Makefile               |   1 +
> >  lib/librte_ring/meson.build            |   1 +
> >  lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
> >  lib/librte_ring/rte_ring_elem.h        |   4 +
> >  lib/librte_ring/rte_ring_generic.h     |  48 ++++
> >  lib/librte_ring/rte_ring_hts_generic.h |  47 ++-
> >  lib/librte_ring/rte_ring_peek.h        | 379 +++++++++++++++++++++++++
> >  7 files changed, 519 insertions(+), 5 deletions(-)  create mode 100644
> > lib/librte_ring/rte_ring_peek.h
> >
> > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> > 6fe500f0d..5f8662737 100644
> > --- a/lib/librte_ring/Makefile
> > +++ b/lib/librte_ring/Makefile
> > @@ -22,6 +22,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> > rte_ring.h \
> >  					rte_ring_hts.h \
> >  					rte_ring_hts_elem.h \
> >  					rte_ring_hts_generic.h \
> > +					rte_ring_peek.h \
> >  					rte_ring_rts.h \
> >  					rte_ring_rts_elem.h \
> >  					rte_ring_rts_generic.h
> > diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> > 8e86e037a..f5f84dc6e 100644
> > --- a/lib/librte_ring/meson.build
> > +++ b/lib/librte_ring/meson.build
> > @@ -9,6 +9,7 @@ headers = files('rte_ring.h',
> >  		'rte_ring_hts.h',
> >  		'rte_ring_hts_elem.h',
> >  		'rte_ring_hts_generic.h',
> > +		'rte_ring_peek.h',
> >  		'rte_ring_rts.h',
> >  		'rte_ring_rts_elem.h',
> >  		'rte_ring_rts_generic.h')
> > diff --git a/lib/librte_ring/rte_ring_c11_mem.h
> > b/lib/librte_ring/rte_ring_c11_mem.h
> > index 0fb73a337..bb3096721 100644
> > --- a/lib/librte_ring/rte_ring_c11_mem.h
> > +++ b/lib/librte_ring/rte_ring_c11_mem.h
> > @@ -10,6 +10,50 @@
> >  #ifndef _RTE_RING_C11_MEM_H_
> >  #define _RTE_RING_C11_MEM_H_
> >
> > +/**
> > + * @internal get current tail value.
> > + * This function should be used only for single thread producer/consumer.
> > + * Check that user didn't request to move tail above the head.
> Do we need this check? This could be a data path function, we could document a warning and leave it to the users to provide the correct
> value.

I don't think this extra check will cause any extra slowdown.
From other side, it  seems useful to have it - might help people
to debug/root-cause an issue.
 
> > + * In that situation:
> > + * - return zero, that will cause abort any pending changes and
> > + *   return head to its previous position.
> > + * - throw an assert in debug mode.
> > + */
> > +static __rte_always_inline uint32_t
> > +__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
> > +	uint32_t num)
> > +{
> > +	uint32_t h, n, t;
> > +
> > +	h = ht->head;
> > +	t = ht->tail;
> > +	n = h - t;
> > +
> > +	RTE_ASSERT(n >= num);
> > +	num = (n >= num) ? num : 0;
> > +
> > +	*tail = h;
> > +	return num;
> > +}
> > +
> > +/**
> > + * @internal set new values for head and tail.
> > + * This function should be used only for single thread producer/consumer.
> > + * Should be used only in conjunction with __rte_ring_st_get_tail.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> > +	uint32_t num, uint32_t enqueue)
> > +{
> > +	uint32_t pos;
> > +
> > +	RTE_SET_USED(enqueue);
> > +
> > +	pos = tail + num;
> > +	ht->head = pos;
> > +	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE); }
> > +
> >  static __rte_always_inline void
> >  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
> >  		uint32_t single, uint32_t enqueue)
> > diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> > index 010a564c1..5bf7c1c1b 100644
> > --- a/lib/librte_ring/rte_ring_elem.h
> > +++ b/lib/librte_ring/rte_ring_elem.h
> > @@ -1083,6 +1083,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r,
> > void *obj_table,
> >  	return 0;
> >  }
> >
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +#include <rte_ring_peek.h>
> > +#endif
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_ring/rte_ring_generic.h
> > b/lib/librte_ring/rte_ring_generic.h
> > index 953cdbbd5..9f5fdf13b 100644
> > --- a/lib/librte_ring/rte_ring_generic.h
> > +++ b/lib/librte_ring/rte_ring_generic.h
> > @@ -10,6 +10,54 @@
> >  #ifndef _RTE_RING_GENERIC_H_
> >  #define _RTE_RING_GENERIC_H_
> >
> > +/**
> > + * @internal get current tail value.
> > + * This function should be used only for single thread producer/consumer.
> > + * Check that user didn't request to move tail above the head.
> > + * In that situation:
> > + * - return zero, that will cause abort any pending changes and
> > + *   return head to its previous position.
> > + * - throw an assert in debug mode.
> > + */
> > +static __rte_always_inline uint32_t
> > +__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
> > +	uint32_t num)
> > +{
> > +	uint32_t h, n, t;
> > +
> > +	h = ht->head;
> > +	t = ht->tail;
> > +	n = h - t;
> > +
> > +	RTE_ASSERT(n >= num);
> > +	num = (n >= num) ? num : 0;
> > +
> > +	*tail = h;
> > +	return num;
> > +}
> > +
> > +/**
> > + * @internal set new values for head and tail.
> > + * This function should be used only for single thread producer/consumer.
> > + * Should be used only in conjunction with __rte_ring_st_get_tail.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> > +	uint32_t num, uint32_t enqueue)
> > +{
> > +	uint32_t pos;
> > +
> > +	pos = tail + num;
> > +
> > +	if (enqueue)
> > +		rte_smp_wmb();
> > +	else
> > +		rte_smp_rmb();
> > +
> > +	ht->head = pos;
> > +	ht->tail = pos;
> > +}
> > +
> >  static __rte_always_inline void
> >  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
> >  		uint32_t single, uint32_t enqueue)
> > diff --git a/lib/librte_ring/rte_ring_hts_generic.h
> > b/lib/librte_ring/rte_ring_hts_generic.h
> > index da08f1d94..8e699c006 100644
> > --- a/lib/librte_ring/rte_ring_hts_generic.h
> > +++ b/lib/librte_ring/rte_ring_hts_generic.h
> > @@ -18,9 +18,38 @@
> >   * For more information please refer to <rte_ring_hts.h>.
> >   */
> >
> > +/**
> > + * @internal get current tail value.
> > + * Check that user didn't request to move tail above the head.
> > + * In that situation:
> > + * - return zero, that will cause abort any pending changes and
> > + *   return head to its previous position.
> > + * - throw an assert in debug mode.
> > + */
> > +static __rte_always_inline uint32_t
> > +__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
> > +	uint32_t num)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_pos p;
> > +
> > +	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht-
> > >ht.raw);
> > +	n = p.pos.head - p.pos.tail;
> > +
> > +	RTE_ASSERT(n >= num);
> > +	num = (n >= num) ? num : 0;
> > +
> > +	*tail = p.pos.tail;
> > +	return num;
> > +}
> > +
> > +/**
> > + * @internal set new values for head and tail as one atomic 64 bit operation.
> > + * Should be used only in conjunction with __rte_ring_hts_get_tail.
> > + */
> >  static __rte_always_inline void
> > -__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> > -	uint32_t enqueue)
> > +__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
> > +	uint32_t num, uint32_t enqueue)
> >  {
> >  	union rte_ring_ht_pos p;
> >
> > @@ -29,14 +58,22 @@ __rte_ring_hts_update_tail(struct
> > rte_ring_hts_headtail *ht, uint32_t num,
> >  	else
> >  		rte_smp_rmb();
> >
> > -	p.raw = rte_atomic64_read((rte_atomic64_t *)(uintptr_t)&ht-
> > >ht.raw);
> > -
> > -	p.pos.head = p.pos.tail + num;
> > +	p.pos.head = tail + num;
> >  	p.pos.tail = p.pos.head;
> >
> >  	rte_atomic64_set((rte_atomic64_t *)(uintptr_t)&ht->ht.raw, p.raw);  }
> >
> > +static __rte_always_inline void
> > +__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t num,
> > +	uint32_t enqueue)
> > +{
> > +	uint32_t tail;
> > +
> > +	num = __rte_ring_hts_get_tail(ht, &tail, num);
> > +	__rte_ring_hts_set_head_tail(ht, tail, num, enqueue); }
> > +
> >  /**
> >   * @internal waits till tail will become equal to head.
> >   * Means no writer/reader is active for that ring.
> > diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
> > new file mode 100644 index 000000000..baefd2f7b
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_peek.h
> > @@ -0,0 +1,379 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2010-2017 Intel Corporation
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_PEEK_H_
> > +#define _RTE_RING_PEEK_H_
> > +
> > +/**
> > + * @file
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + * It is not recommended to include this file directly.
> > + * Please include <rte_ring_elem.h> instead.
> > + *
> > + * Ring Peek AP
>                             ^^^ API
> > + * Introduction of rte_ring with serialized producer/consumer (HTS sync
> > +mode)
> > + * makes possible to split public enqueue/dequeue API into two phases:
> > + * - enqueue/dequeue start
> > + * - enqueue/dequeue finish
> > + * That allows user to inspect objects in the ring without removing
> > +them
> > + * from it (aka MT safe peek).
> > + * Note that right now this new API is avaialble only for two sync modes:
> > + * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
> > + * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
> > + * It is a user responsibility to create/init ring with appropriate
> > +sync
> > + * modes selected.
> > + * As an example:
> > + * // read 1 elem from the ring:
> > + * n = rte_ring_hts_dequeue_bulk_start(ring, &obj, 1, NULL);
> > + * if (n != 0) {
> > + *    //examine object
> > + *    if (object_examine(obj) == KEEP)
> > + *       //decided to keep it in the ring.
> > + *       rte_ring_hts_dequeue_finish(ring, 0);
> > + *    else
> > + *       //decided to remove it from the ring.
> > + *       rte_ring_hts_dequeue_finish(ring, n);
> > + * }
> > + * Note that between _start_ and _finish_ the ring is sort of locked -
>                                                                                   ^^^^^^^^^^^^^^^^^^^^ - 'locked' can mean different to different people, may be remove this,
> the next sentence anyway has the description
> > + * none other thread can proceed with enqueue(/dequeue) operation till
>           ^^^^ no
> > + * _finish_ will complete.
>                          ^^^^^^^^^^^ completes
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> 
> <snip>
> 
> > +
> > +/**
> > + * Start to enqueue several objects on the ring.
> > + * Note that no actual objects are put in the queue by this function,
> > + * it just reserves for user such ability.
> > + * User has to call appropriate enqueue_finish() to copy objects into
> > +the
> > + * queue and complete given enqueue operation.
> > + *
> > + * @param r
> > + *   A pointer to the ring structure.
> > + * @param n
> > + *   The number of objects to add in the ring from the obj_table.
> > + * @param free_space
> > + *   if non-NULL, returns the amount of space in the ring after the
> > + *   enqueue operation has finished.
> > + * @return
> > + *   The number of objects that can be enqueued, either 0 or n
> > + */
> > +__rte_experimental
> > +static __rte_always_inline unsigned int
> > +rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
> > +		unsigned int *free_space)
> If one wants to use _elem_ APIs for ring peek, a combination of legacy API (format) and a _elem_ API is required.
> For ex:
> rte_ring_enqueue_bulk_start
> rte_ring_enqueue_elem_finish
> 
> I understand why you have done this. I think this is getting somewhat too inconsistent.
> 

Agree, there could be some confusion.
Don't know what would be a better approach here....
2 similar functions with exactly same parameter list (one wrapper for another):
rte_ring_enqueue_elem_bulk_start() and  rte_ring_enqueue_elem_bulk_start().
?

> > +{
> > +	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
> > +			free_space);
> > +}
> > +
> 


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/9] ring: introduce HTS ring mode
  2020-04-14 16:12           ` Ananyev, Konstantin
@ 2020-04-14 17:06             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-14 17:06 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> > > diff --git a/lib/librte_ring/rte_ring_hts_generic.h
> > > b/lib/librte_ring/rte_ring_hts_generic.h
> > > new file mode 100644
> > > index 000000000..da08f1d94
> > > --- /dev/null
> > > +++ b/lib/librte_ring/rte_ring_hts_generic.h
> > > @@ -0,0 +1,198 @@

<snip>

> > > +
> > > +/**
> > > + * @internal This function updates the producer head for enqueue
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure
> > > + * @param is_sp
> > > + *   Indicates whether multi-producer path is needed or not
> > > + * @param n
> > > + *   The number of elements we will want to enqueue, i.e. how far should
> the
> > > + *   head be moved
> > > + * @param behavior
> > > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> ring
> > > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> from
> > > ring
> > > + * @param old_head
> > > + *   Returns head value as it was before the move, i.e. where enqueue
> starts
> > > + * @param new_head
> > > + *   Returns the current/new head value i.e. where enqueue finishes
> 
> Ups, copy/paste thing - will remove.
> 
> > Would be good to return the new_head from this function and use it in
> '__rte_ring_hts_update_tail'.
> 
> I think old_head + num should be enough here (see above).
> 
> >
> > > + * @param free_entries
> > > + *   Returns the amount of free space in the ring BEFORE head was
> moved
> > > + * @return
> > > + *   Actual number of objects enqueued.
> > > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > + */
> > Minor, suggest removing the elaborate comments, it is not required and
> difficult to maintain.
> > I think we should do the same thing for other files too.
> 
> Sorry, didn't get you here: what exactly do you suggest to remove?
Following function is an internal function, we can skip the elaborate comments. I see that you have done this in other places.

> 
> >
> > > +static __rte_always_inline unsigned int
> > > +__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
> > > +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> > > +	uint32_t *free_entries)
> > > +{
> > > +	uint32_t n;
> > > +	union rte_ring_ht_pos np, op;
> > > +
> > > +	const uint32_t capacity = r->capacity;
> > > +
> > > +	do {
> > > +		/* Reset n to the initial burst count */
> > > +		n = num;
> > > +
> > > +		/* wait for tail to be equal to head */
> > > +		__rte_ring_hts_head_wait(&r->hts_prod, &op);
> > > +
> > > +		/* add rmb barrier to avoid load/load reorder in weak
> > > +		 * memory model. It is noop on x86
> > > +		 */
> > > +		rte_smp_rmb();
> > > +
> > > +		/*
> > > +		 *  The subtraction is done between two unsigned 32bits
> > > value
> > > +		 * (the result is always modulo 32 bits even if we have
> > > +		 * *old_head > cons_tail). So 'free_entries' is always between
> > > 0
> > > +		 * and capacity (which is < size).
> > > +		 */
> > > +		*free_entries = capacity + r->cons.tail - op.pos.head;
> > > +
> > > +		/* check that we have enough room in ring */
> > > +		if (unlikely(n > *free_entries))
> > > +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> > > +					0 : *free_entries;
> > > +
> > > +		if (n == 0)
> > > +			break;
> > > +
> > > +		np.pos.tail = op.pos.tail;
> > > +		np.pos.head = op.pos.head + n;
> > > +
> > > +	} while (rte_atomic64_cmpset(&r->hts_prod.ht.raw,
> > > +			op.raw, np.raw) == 0);
> > I think we can use 32b atomic operation here and just update the head.
> 
> I think we have to do proper 64 bit CAS here, otherwise ABA race could arise:
> Thread reads head/tail values, then get suspended just before CAS instruction
> for a while.
> Thread resumes when ring head value is equal to thread's local head value,
> but tail differs (some other thread enqueuing into the ring).
Good point, ACK

> If we'll do CAS just for head - it would succeed, though it shouldn't.
> I understand that with 32-bit head/tail values probability of such situation is
> really low, but still.
Using 64b values would be good. Both Arm and x86 support 128b CAS, not sure about POWER.

> 
> >
> > > +
> > > +	*old_head = op.pos.head;
> > > +	return n;
> > > +}
> > > +

<snip>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API
  2020-04-14 16:47           ` Ananyev, Konstantin
@ 2020-04-14 17:30             ` Honnappa Nagarahalli
  2020-04-14 22:24               ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-14 17:30 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> >
> > > +
> > > +/**
> > > + * Start to enqueue several objects on the ring.
> > > + * Note that no actual objects are put in the queue by this
> > > +function,
> > > + * it just reserves for user such ability.
> > > + * User has to call appropriate enqueue_finish() to copy objects
> > > +into the
> > > + * queue and complete given enqueue operation.
> > > + *
> > > + * @param r
> > > + *   A pointer to the ring structure.
> > > + * @param n
> > > + *   The number of objects to add in the ring from the obj_table.
> > > + * @param free_space
> > > + *   if non-NULL, returns the amount of space in the ring after the
> > > + *   enqueue operation has finished.
> > > + * @return
> > > + *   The number of objects that can be enqueued, either 0 or n
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline unsigned int
> > > +rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
> > > +		unsigned int *free_space)
> > If one wants to use _elem_ APIs for ring peek, a combination of legacy API
> (format) and a _elem_ API is required.
> > For ex:
> > rte_ring_enqueue_bulk_start
> > rte_ring_enqueue_elem_finish
> >
> > I understand why you have done this. I think this is getting somewhat too
> inconsistent.
> >
> 
> Agree, there could be some confusion.
> Don't know what would be a better approach here....
> 2 similar functions with exactly same parameter list (one wrapper for
> another):
> rte_ring_enqueue_elem_bulk_start() and
> rte_ring_enqueue_elem_bulk_start().
> ?
We should go with 2 functions 'rte_ring_enqueue_bulk_start' and 'rte_ring_enqueue_bulk_elem_start'. There is a slight variation in the way obj_table is provided.
'rte_ring_enqueue_bulk_start' can be the wrapper.

> 
> > > +{
> > > +	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
> > > +			free_space);
> > > +}
> > > +
> >


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes
  2020-04-14  4:28         ` [dpdk-dev] [PATCH " Honnappa Nagarahalli
@ 2020-04-14 18:29           ` Ananyev, Konstantin
  2020-04-15 20:28           ` Ananyev, Konstantin
  1 sibling, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-14 18:29 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd


> <snip>
> 
> >  /**
> >   * @internal Enqueue several objects on the RTS ring.
> > diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h
> > b/lib/librte_ring/rte_ring_rts_c11_mem.h
> > new file mode 100644
> > index 000000000..b72901497
> > --- /dev/null
> > +++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
> > @@ -0,0 +1,198 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + *
> > + * Copyright (c) 2010-2017 Intel Corporation
> > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > + * All rights reserved.
> > + * Derived from FreeBSD's bufring.h
> > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > + */
> > +
> > +#ifndef _RTE_RING_RTS_C11_MEM_H_
> > +#define _RTE_RING_RTS_C11_MEM_H_
> > +
> > +/**
> > + * @file rte_ring_rts_c11_mem.h
> > + * It is not recommended to include this file directly,
> > + * include <rte_ring.h> instead.
> > + * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
> > + * For more information please refer to <rte_ring_rts.h>.
> > + */
> > +
> > +/**
> > + * @internal This function updates tail values.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht) {
> > +	union rte_ring_ht_poscnt h, ot, nt;
> > +
> > +	/*
> > +	 * If there are other enqueues/dequeues in progress that
> > +	 * might preceded us, then don't update tail with new value.
> > +	 */
> > +
> > +	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
> This can be RELAXED. This thread is reading the value that it updated earlier, so it should be able to see the value it updated.

It serves as a hoist barrier, to make sure that we read tail before head (see below).
 
> > +
> > +	do {
> > +		/* on 32-bit systems we have to do atomic read here */
> > +		h.raw = __atomic_load_n(&ht->head.raw,
> > __ATOMIC_RELAXED);
> > +
> > +		nt.raw = ot.raw;
> > +		if (++nt.val.cnt == h.val.cnt)
> > +			nt.val.pos = h.val.pos;
> > +
> > +	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw,
> > nt.raw,
> > +			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0); }
> > +
> > +/**
> > + * @internal This function waits till head/tail distance wouldn't
> > + * exceed pre-defined max value.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
> > +	union rte_ring_ht_poscnt *h)
> > +{
> > +	uint32_t max;
> > +
> > +	max = ht->htd_max;
> > +	h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
> > +
> > +	while (h->val.pos - ht->tail.val.pos > max) {
> > +		rte_pause();
> > +		h->raw = __atomic_load_n(&ht->head.raw,
> > __ATOMIC_ACQUIRE);
> > +	}
> > +}
> > +


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API
  2020-04-14 17:30             ` Honnappa Nagarahalli
@ 2020-04-14 22:24               ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-14 22:24 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd

> > >
> > > > +
> > > > +/**
> > > > + * Start to enqueue several objects on the ring.
> > > > + * Note that no actual objects are put in the queue by this
> > > > +function,
> > > > + * it just reserves for user such ability.
> > > > + * User has to call appropriate enqueue_finish() to copy objects
> > > > +into the
> > > > + * queue and complete given enqueue operation.
> > > > + *
> > > > + * @param r
> > > > + *   A pointer to the ring structure.
> > > > + * @param n
> > > > + *   The number of objects to add in the ring from the obj_table.
> > > > + * @param free_space
> > > > + *   if non-NULL, returns the amount of space in the ring after the
> > > > + *   enqueue operation has finished.
> > > > + * @return
> > > > + *   The number of objects that can be enqueued, either 0 or n
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline unsigned int
> > > > +rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
> > > > +		unsigned int *free_space)
> > > If one wants to use _elem_ APIs for ring peek, a combination of legacy API
> > (format) and a _elem_ API is required.
> > > For ex:
> > > rte_ring_enqueue_bulk_start
> > > rte_ring_enqueue_elem_finish
> > >
> > > I understand why you have done this. I think this is getting somewhat too
> > inconsistent.
> > >
> >
> > Agree, there could be some confusion.
> > Don't know what would be a better approach here....
> > 2 similar functions with exactly same parameter list (one wrapper for
> > another):
> > rte_ring_enqueue_elem_bulk_start() and
> > rte_ring_enqueue_elem_bulk_start().
> > ?
> We should go with 2 functions 'rte_ring_enqueue_bulk_start' and 'rte_ring_enqueue_bulk_elem_start'.
> There is a slight variation in the way
> obj_table is provided.
> 'rte_ring_enqueue_bulk_start' can be the wrapper.

For enqueue_start() there is no need to have obj_table parameter.
That's why enqueue_start and enqueue_elem_start parameter list will be identical.
But sure, if that helps to avoid confusion - let's have 2 functions here.

> 
> >
> > > > +{
> > > > +	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
> > > > +			free_space);
> > > > +}
> > > > +
> > >


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes
  2020-04-14  4:28         ` [dpdk-dev] [PATCH " Honnappa Nagarahalli
  2020-04-14 18:29           ` Ananyev, Konstantin
@ 2020-04-15 20:28           ` Ananyev, Konstantin
  1 sibling, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-15 20:28 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd

Hi Honnappa,

> > +
> > +/**
> > + * @internal This function updates the producer head for enqueue
> > + *
> > + * @param r
> > + *   A pointer to the ring structure
> > + * @param is_sp
> > + *   Indicates whether multi-producer path is needed or not
> > + * @param n
> > + *   The number of elements we will want to enqueue, i.e. how far should the
> > + *   head be moved
> > + * @param behavior
> > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > + * @param old_head
> > + *   Returns head value as it was before the move, i.e. where enqueue starts
> > + * @param new_head
> > + *   Returns the current/new head value i.e. where enqueue finishes
> > + * @param free_entries
> > + *   Returns the amount of free space in the ring BEFORE head was moved
> > + * @return
> > + *   Actual number of objects enqueued.
> > + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > + */
> > +static __rte_always_inline unsigned int
> > +__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
> > +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> > +	uint32_t *free_entries)
> > +{
> > +	uint32_t n;
> > +	union rte_ring_ht_pos np, op;
> > +
> > +	const uint32_t capacity = r->capacity;
> > +
> > +	do {
> > +		/* Reset n to the initial burst count */
> > +		n = num;
> > +
> > +		/* wait for tail to be equal to head, , acquire point */
> > +		__rte_ring_hts_head_wait(&r->hts_prod, &op);
> > +
> > +		/*
> > +		 *  The subtraction is done between two unsigned 32bits value
> > +		 * (the result is always modulo 32 bits even if we have
> > +		 * *old_head > cons_tail). So 'free_entries' is always between
> > 0
> > +		 * and capacity (which is < size).
> > +		 */
> > +		*free_entries = capacity + r->cons.tail - op.pos.head;
> > +
> > +		/* check that we have enough room in ring */
> > +		if (unlikely(n > *free_entries))
> > +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> > +					0 : *free_entries;
> > +
> > +		if (n == 0)
> > +			break;
> > +
> > +		np.pos.tail = op.pos.tail;
> > +		np.pos.head = op.pos.head + n;
> > +
> > +	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
> > +			&op.raw, np.raw,
> > +			0, __ATOMIC_RELEASE, __ATOMIC_RELAXED) == 0);
> __ATOMIC_RELEASE can be __ATOMIC_RELAXED. The RELEASE while updating after the elements are written is enough.

I looked at it once again and I think RELAXED probably not enough here
(same as RELEASE).
Seems we have to use ACQUIRE here (and in other similar places)
to forbid CPU speculatively do actual objects copy before CAS.
Another alternative would probably use :
cons_tail = __atomic_load_n(&r->cons.tail, __ATOMIC_ACQUIRE);
*free_entries = capacity + cons_tail - op.pos.head;
instead of just
*free_entries = capacity + r->cons.tail - op.pos.head;
above.
But that would mean two acquire points inside the loop:
load(prod, ACQUIRE);
load(cons.tail, ACQUIRE);
Plus for me CAS(..., ACQUIRE, ACQUIRE) seems more logical here,
so leaning that way.  

> 
> > +
> > +	*old_head = op.pos.head;
> > +	return n;
> > +}

^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 0/9] New sync modes for ring
  2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
                         ` (8 preceding siblings ...)
  2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes Konstantin Ananyev
@ 2020-04-17 13:36       ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 1/9] test/ring: add contention stress test Konstantin Ananyev
                           ` (9 more replies)
  9 siblings, 10 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

V3 - V4 changes:
Address comments from Honnappa:
1. for new sync modes make legacy API wrappers around _elem_ calls
2. remove rte_ring_(hts|rts)_generic.h
3. few changes in C11 version
4. peek API - add missing functions for _elem_
5. remove _IS_SP/_IS_MP, etc. internal macros
6. fix param types (obj_table) for _elem_functions
7. fix formal API comments
8. deduplicate code for test_ring_stress
9. added functional tests for new sync modes

V2 - V3 changes:
1. Few more compilation fixes (for gcc 4.8.X)
2. Extra update devtools/libabigail.abignore (workaround) 

V1 - V2 changes:
1. Fix compilation issues
2. Add C11 atomics support
3. Updates devtools/libabigail.abignore (workaround)

RFC - V1 changes:
1. remove ABI brekage (at least I hope I did)
2. Add support for ring_elem
3. Rework peek related API a bit
4. Rework test to make it less verbose and unite all test-cases
   in one command
5. Add new test-case for MT peek API

TODO list:
1. Update docs

These days more and more customers use(/try to use) DPDK based apps within
overcommitted systems (multiple acttive threads over same pysical cores):
VM, container deployments, etc.
One quite common problem they hit:
Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
LHP is quite a common problem for spin-based sync primitives
(spin-locks, etc.) on overcommitted systems.
The situation gets much worse when some sort of
fair-locking technique is used (ticket-lock, etc.).
As now not only lock-owner but also lock-waiters scheduling
order matters a lot (LWP).
These two problems are well-known for kernel within VMs:
http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
The problem with rte_ring is that while head accusion is sort of
un-fair locking, waiting on tail is very similar to ticket lock schema -
tail has to be updated in particular order.
That makes current rte_ring implementation to perform
really pure on some overcommited scenarios.
It is probably not possible to completely resolve LHP problem in
userspace only (without some kernel communication/intervention).
But removing fairness at tail update helps to avoid LWP and
can mitigate the situation significantly.
This patch proposes two new optional ring synchronization modes:
1) Head/Tail Sync (HTS) mode
In that mode enqueue/dequeue operation is fully serialized:
    only one thread at a time is allowed to perform given op.
    As another enhancement provide ability to split enqueue/dequeue
    operation into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
    That allows user to inspect objects in the ring without removing
    them from it (aka MT safe peek).
2) Relaxed Tail Sync (RTS)
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Note that these new sync modes are optional.
For current rte_ring users nothing should change
(both in terms of API/ABI and performance).
Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
way (via flags and _init_), and MP/MC remains as default one.
The only thing that changed:
Format of prod/cons now could differ depending on mode selected at _init_.
So user has to stick with one sync model through whole ring lifetime.
In other words, user can't create a ring for let say SP mode and then
in the middle of data-path change his mind and start using MP_RTS mode.
For existing modes (SP/MP, SC/MC) format remains the same and
user can still use them interchangeably, though of course it is an
error prone practice.

Test results on IA (see below) show significant improvements
for average enqueue/dequeue op times on overcommitted systems.
For 'classic' DPDK deployments (one thread per core) original MP/MC
algorithm still shows best numbers, though for 64-bit target
RTS numbers are not that far away.
Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'

X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj
                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51

2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68

8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12

i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj
                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91

2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65

8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

Konstantin Ananyev (9):
  test/ring: add contention stress test
  ring: prepare ring to allow new sync schemes
  ring: introduce RTS ring mode
  test/ring: add contention stress test for RTS ring
  ring: introduce HTS ring mode
  test/ring: add contention stress test for HTS ring
  ring: introduce peek style API
  test/ring: add stress test for MT peek API
  test/ring: add functional tests for new sync modes

 app/test/Makefile                      |   5 +
 app/test/meson.build                   |   5 +
 app/test/test_pdump.c                  |   6 +-
 app/test/test_ring.c                   |  93 ++++--
 app/test/test_ring_hts_stress.c        |  32 ++
 app/test/test_ring_mpmc_stress.c       |  31 ++
 app/test/test_ring_peek_stress.c       |  43 +++
 app/test/test_ring_rts_stress.c        |  32 ++
 app/test/test_ring_stress.c            |  57 ++++
 app/test/test_ring_stress.h            |  38 +++
 app/test/test_ring_stress_impl.h       | 396 ++++++++++++++++++++++
 devtools/libabigail.abignore           |   7 +
 lib/librte_pdump/rte_pdump.c           |   2 +-
 lib/librte_port/rte_port_ring.c        |  12 +-
 lib/librte_ring/Makefile               |   8 +-
 lib/librte_ring/meson.build            |  11 +-
 lib/librte_ring/rte_ring.c             | 114 ++++++-
 lib/librte_ring/rte_ring.h             | 243 ++++++++------
 lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
 lib/librte_ring/rte_ring_core.h        | 181 ++++++++++
 lib/librte_ring/rte_ring_elem.h        | 141 ++++++--
 lib/librte_ring/rte_ring_generic.h     |  48 +++
 lib/librte_ring/rte_ring_hts.h         | 332 ++++++++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h | 207 ++++++++++++
 lib/librte_ring/rte_ring_peek.h        | 446 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts.h         | 439 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h | 179 ++++++++++
 27 files changed, 2978 insertions(+), 174 deletions(-)
 create mode 100644 app/test/test_ring_hts_stress.c
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_peek_stress.c
 create mode 100644 app/test/test_ring_rts_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h
 create mode 100644 lib/librte_ring/rte_ring_core.h
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 1/9] test/ring: add contention stress test
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
                           ` (8 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce stress test for ring enqueue/dequeue operations.
Performs the following pattern on each slave worker:
dequeue/read-write data from the dequeued objects/enqueue.
Serves as both functional and performance test of ring
enqueue/dequeue operations under high contention
(for both over committed and non-over committed scenarios).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |   2 +
 app/test/meson.build             |   2 +
 app/test/test_ring_mpmc_stress.c |  31 +++
 app/test/test_ring_stress.c      |  48 ++++
 app/test/test_ring_stress.h      |  35 +++
 app/test/test_ring_stress_impl.h | 396 +++++++++++++++++++++++++++++++
 6 files changed, 514 insertions(+)
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h

diff --git a/app/test/Makefile b/app/test/Makefile
index be53d33c3..a23a011df 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 04b59cffa..8824f366c 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,7 +100,9 @@ test_sources = files('commands.c',
 	'test_rib.c',
 	'test_rib6.c',
 	'test_ring.c',
+	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_mpmc_stress.c b/app/test/test_ring_mpmc_stress.c
new file mode 100644
index 000000000..1524b0248
--- /dev/null
+++ b/app/test/test_ring_mpmc_stress.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num, 0);
+}
+
+const struct test test_ring_mpmc_stress = {
+	.name = "MP/MC",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
new file mode 100644
index 000000000..60706f799
--- /dev/null
+++ b/app/test/test_ring_stress.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static int
+run_test(const struct test *test)
+{
+	int32_t rc;
+	uint32_t i, k;
+
+	for (i = 0, k = 0; i != test->nb_case; i++) {
+
+		printf("TEST-CASE %s %s START\n",
+			test->name, test->cases[i].name);
+
+		rc = test->cases[i].func(test->cases[i].wfunc);
+		k += (rc == 0);
+
+		if (rc != 0)
+			printf("TEST-CASE %s %s FAILED\n",
+				test->name, test->cases[i].name);
+		else
+			printf("TEST-CASE %s %s OK\n",
+				test->name, test->cases[i].name);
+	}
+
+	return k;
+}
+
+static int
+test_ring_stress(void)
+{
+	uint32_t n, k;
+
+	n = 0;
+	k = 0;
+
+	n += test_ring_mpmc_stress.nb_case;
+	k += run_test(&test_ring_mpmc_stress);
+
+	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+		n, k, n - k);
+	return (k != n);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
new file mode 100644
index 000000000..60eac6216
--- /dev/null
+++ b/app/test/test_ring_stress.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <rte_ring.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+#include <rte_random.h>
+#include <rte_malloc.h>
+#include <rte_spinlock.h>
+
+#include "test.h"
+
+struct test_case {
+	const char *name;
+	int (*func)(int (*)(void *));
+	int (*wfunc)(void *arg);
+};
+
+struct test {
+	const char *name;
+	uint32_t nb_case;
+	const struct test_case *cases;
+};
+
+extern const struct test test_ring_mpmc_stress;
diff --git a/app/test/test_ring_stress_impl.h b/app/test/test_ring_stress_impl.h
new file mode 100644
index 000000000..222d62bc4
--- /dev/null
+++ b/app/test/test_ring_stress_impl.h
@@ -0,0 +1,396 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+/**
+ * Stress test for ring enqueue/dequeue operations.
+ * Performs the following pattern on each slave worker:
+ * dequeue/read-write data from the dequeued objects/enqueue.
+ * Serves as both functional and performance test of ring
+ * enqueue/dequeue operations under high contention
+ * (for both over committed and non-over committed scenarios).
+ */
+
+#define RING_NAME	"RING_STRESS"
+#define BULK_NUM	32
+#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
+
+enum {
+	WRK_CMD_STOP,
+	WRK_CMD_RUN,
+};
+
+static volatile uint32_t wrk_cmd __rte_cache_aligned;
+
+/* test run-time in seconds */
+static const uint32_t run_time = 60;
+static const uint32_t verbose;
+
+struct lcore_stat {
+	uint64_t nb_cycle;
+	struct {
+		uint64_t nb_call;
+		uint64_t nb_obj;
+		uint64_t nb_cycle;
+		uint64_t max_cycle;
+		uint64_t min_cycle;
+	} op;
+};
+
+struct lcore_arg {
+	struct rte_ring *rng;
+	struct lcore_stat stats;
+} __rte_cache_aligned;
+
+struct ring_elem {
+	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)];
+} __rte_cache_aligned;
+
+/*
+ * redefinable functions
+ */
+static uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail);
+
+static uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free);
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
+
+
+static void
+lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
+	uint64_t tm, int32_t prcs)
+{
+	ls->op.nb_call += call;
+	ls->op.nb_obj += obj;
+	ls->op.nb_cycle += tm;
+	if (prcs) {
+		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
+		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
+	}
+}
+
+static void
+lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+
+	ms->op.nb_call += ls->op.nb_call;
+	ms->op.nb_obj += ls->op.nb_obj;
+	ms->op.nb_cycle += ls->op.nb_cycle;
+	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
+	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle);
+}
+
+static void
+lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
+	lcore_op_stat_aggr(ms, ls);
+}
+
+static void
+lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls)
+{
+	long double st;
+
+	st = (long double)rte_get_timer_hz() / US_PER_S;
+
+	if (lc == UINT32_MAX)
+		fprintf(f, "%s(AGGREGATE)={\n", __func__);
+	else
+		fprintf(f, "%s(lcore=%u)={\n", __func__, lc);
+
+	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
+		ls->nb_cycle, (long double)ls->nb_cycle / st);
+
+	fprintf(f, "\tDEQ+ENQ={\n");
+
+	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
+	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
+	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
+	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_obj / ls->op.nb_call);
+	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_obj);
+	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_call);
+
+	/* if min/max cycles per call stats was collected */
+	if (ls->op.min_cycle != UINT64_MAX) {
+		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.max_cycle,
+			(long double)ls->op.max_cycle / st);
+		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.min_cycle,
+			(long double)ls->op.min_cycle / st);
+	}
+
+	fprintf(f, "\t},\n");
+	fprintf(f, "};\n");
+}
+
+static void
+fill_ring_elm(struct ring_elem *elm, uint32_t fill)
+{
+	uint32_t i;
+
+	for (i = 0; i != RTE_DIM(elm->cnt); i++)
+		elm->cnt[i] = fill;
+}
+
+static int32_t
+check_updt_elem(struct ring_elem *elm[], uint32_t num,
+	const struct ring_elem *check, const struct ring_elem *fill)
+{
+	uint32_t i;
+
+	static rte_spinlock_t dump_lock;
+
+	for (i = 0; i != num; i++) {
+		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
+			rte_spinlock_lock(&dump_lock);
+			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
+				"offending object: %p\n",
+				__func__, rte_lcore_id(), num, i, elm[i]);
+			rte_memdump(stdout, "expected", check, sizeof(*check));
+			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
+			rte_spinlock_unlock(&dump_lock);
+			return -EINVAL;
+		}
+		memcpy(elm[i], fill, sizeof(*elm[i]));
+	}
+
+	return 0;
+}
+
+static int
+check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
+	const char *fname, const char *opname)
+{
+	if (exp != res) {
+		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
+			fname, lc, opname, exp, res);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static int
+test_worker(void *arg, const char *fname, int32_t prcs)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl, tm0, tm1;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		tm0 = (prcs != 0) ? rte_rdtsc_precise() : 0;
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+		tm0 = (prcs != 0) ? rte_rdtsc_precise() - tm0 : 0;
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, fname,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		tm1 = (prcs != 0) ? rte_rdtsc_precise() : 0;
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+		tm1 = (prcs != 0) ? rte_rdtsc_precise() - tm1 : 0;
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, fname,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, prcs);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	cl = rte_rdtsc_precise() - cl;
+	if (prcs == 0)
+		lcore_stat_update(&la->stats, 0, 0, cl, 0);
+	la->stats.nb_cycle = cl;
+	return rc;
+}
+static int
+test_worker_prcs(void *arg)
+{
+	return test_worker(arg, __func__, 1);
+}
+
+static int
+test_worker_avg(void *arg)
+{
+	return test_worker(arg, __func__, 0);
+}
+
+static void
+mt1_fini(struct rte_ring *rng, void *data)
+{
+	rte_free(rng);
+	rte_free(data);
+}
+
+static int
+mt1_init(struct rte_ring **rng, void **data, uint32_t num)
+{
+	int32_t rc;
+	size_t sz;
+	uint32_t i, nr;
+	struct rte_ring *r;
+	struct ring_elem *elm;
+	void *p;
+
+	*rng = NULL;
+	*data = NULL;
+
+	sz = num * sizeof(*elm);
+	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
+	if (elm == NULL) {
+		printf("%s: alloc(%zu) for %u elems data failed",
+			__func__, sz, num);
+		return -ENOMEM;
+	}
+
+	*data = elm;
+
+	/* alloc ring */
+	nr = 2 * num;
+	sz = rte_ring_get_memsize(nr);
+	r = rte_zmalloc(NULL, sz, __alignof__(*r));
+	if (r == NULL) {
+		printf("%s: alloc(%zu) for FIFO with %u elems failed",
+			__func__, sz, nr);
+		return -ENOMEM;
+	}
+
+	*rng = r;
+
+	rc = _st_ring_init(r, RING_NAME, nr);
+	if (rc != 0) {
+		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
+			__func__, r, nr, rc, strerror(-rc));
+		return rc;
+	}
+
+	for (i = 0; i != num; i++) {
+		fill_ring_elm(elm + i, UINT32_MAX);
+		p = elm + i;
+		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
+			break;
+	}
+
+	if (i != num) {
+		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
+			__func__, r, num, i);
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
+static int
+test_mt1(int (*test)(void *))
+{
+	int32_t rc;
+	uint32_t lc, mc;
+	struct rte_ring *r;
+	void *data;
+	struct lcore_arg arg[RTE_MAX_LCORE];
+
+	static const struct lcore_stat init_stat = {
+		.op.min_cycle = UINT64_MAX,
+	};
+
+	rc = mt1_init(&r, &data, RING_SIZE);
+	if (rc != 0) {
+		mt1_fini(r, data);
+		return rc;
+	}
+
+	memset(arg, 0, sizeof(arg));
+
+	/* launch on all slaves */
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		arg[lc].rng = r;
+		arg[lc].stats = init_stat;
+		rte_eal_remote_launch(test, &arg[lc], lc);
+	}
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_RUN;
+	rte_smp_wmb();
+
+	usleep(run_time * US_PER_S);
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_STOP;
+	rte_smp_wmb();
+
+	/* wait for slaves and collect stats. */
+	mc = rte_lcore_id();
+	arg[mc].stats = init_stat;
+
+	rc = 0;
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		rc |= rte_eal_wait_lcore(lc);
+		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
+		if (verbose != 0)
+			lcore_stat_dump(stdout, lc, &arg[lc].stats);
+	}
+
+	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+	mt1_fini(r, data);
+	return rc;
+}
+
+static const struct test_case tests[] = {
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
+		.func = test_mt1,
+		.wfunc = test_worker_prcs,
+	},
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
+		.func = test_mt1,
+		.wfunc = test_worker_avg,
+	},
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 2/9] ring: prepare ring to allow new sync schemes
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 1/9] test/ring: add contention stress test Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 3/9] ring: introduce RTS ring mode Konstantin Ananyev
                           ` (7 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

To make these preparations two main things are done:
- Change from *single* to *sync_type* to allow different
  synchronisation schemes to be applied.
  Mark *single* as deprecated in comments.
  Add new functions to allow user to query ring sync types.
  Replace direct access to *single* with appropriate function call.
- Move actual rte_ring and related structures definitions into a
  separate file: <rte_ring_core.h>. It allows to refer contents
  of <rte_ring_elem.h> from <rte_ring.h> without introducing a
  circular dependency.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_pdump.c           |   6 +-
 lib/librte_pdump/rte_pdump.c    |   2 +-
 lib/librte_port/rte_port_ring.c |  12 +--
 lib/librte_ring/Makefile        |   1 +
 lib/librte_ring/meson.build     |   1 +
 lib/librte_ring/rte_ring.c      |   6 +-
 lib/librte_ring/rte_ring.h      | 170 ++++++++++++++------------------
 lib/librte_ring/rte_ring_core.h | 131 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_elem.h |  42 +++-----
 9 files changed, 233 insertions(+), 138 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_core.h

diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index ad183184c..6a1180bcb 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -57,8 +57,7 @@ run_pdump_client_tests(void)
 	if (ret < 0)
 		return -1;
 	mp->flags = 0x0000;
-	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
-				      RING_F_SP_ENQ | RING_F_SC_DEQ);
+	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
 	if (ring_client == NULL) {
 		printf("rte_ring_create SR0 failed");
 		return -1;
@@ -71,9 +70,6 @@ run_pdump_client_tests(void)
 	}
 	rte_eth_dev_probing_finish(eth_dev);
 
-	ring_client->prod.single = 0;
-	ring_client->cons.single = 0;
-
 	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
 
 	for (itr = 0; itr < NUM_ITR; itr++) {
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 8a01ac510..f96709f95 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 		rte_errno = EINVAL;
 		return -1;
 	}
-	if (ring->prod.single || ring->cons.single) {
+	if (rte_ring_is_prod_single(ring) || rte_ring_is_cons_single(ring)) {
 		PDUMP_LOG(ERR, "ring with either SP or SC settings"
 		" is not valid for pdump, should have MP and MC settings\n");
 		rte_errno = EINVAL;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 47fcdd06a..52b2d8e55 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->cons.single && is_multi) ||
-		(!(conf->ring->cons.single) && !is_multi)) {
+		(rte_ring_is_cons_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_cons_single(conf->ring) && !is_multi)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
 	}
@@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
@@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 28368e6d1..6572768c9 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_core.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 05402e4f0..c656781da 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -3,6 +3,7 @@
 
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_core.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 77e5de099..fa5733907 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
-	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
+	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 18fc5d845..35ee4491c 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -36,91 +36,7 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_config.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-#include <rte_pause.h>
-
-#define RTE_TAILQ_RING_NAME "RTE_RING"
-
-enum rte_ring_queue_behavior {
-	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items from a ring */
-	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
-};
-
-#define RTE_RING_MZ_PREFIX "RG_"
-/** The maximum length of a ring name. */
-#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
-			   sizeof(RTE_RING_MZ_PREFIX) + 1)
-
-/* structure to hold a pair of head/tail values and other metadata */
-struct rte_ring_headtail {
-	volatile uint32_t head;  /**< Prod/consumer head. */
-	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t single;         /**< True if single prod/cons */
-};
-
-/**
- * An RTE ring structure.
- *
- * The producer and the consumer have a head and a tail index. The particularity
- * of these index is that they are not between 0 and size(ring). These indexes
- * are between 0 and 2^32, and we mask their value when we access the ring[]
- * field. Thanks to this assumption, we can do subtractions between 2 index
- * values in a modulo-32bit base: that's why the overflow of the indexes is not
- * a problem.
- */
-struct rte_ring {
-	/*
-	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
-	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
-	 * next time the ABI changes
-	 */
-	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned; /**< Name of the ring. */
-	int flags;               /**< Flags supplied at creation. */
-	const struct rte_memzone *memzone;
-			/**< Memzone, if any, containing the rte_ring */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t capacity;       /**< Usable size of ring */
-
-	char pad0 __rte_cache_aligned; /**< empty cache line */
-
-	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
-	char pad1 __rte_cache_aligned; /**< empty cache line */
-
-	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
-	char pad2 __rte_cache_aligned; /**< empty cache line */
-};
-
-#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
-#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-/**
- * Ring is to hold exactly requested number of entries.
- * Without this flag set, the ring size requested must be a power of 2, and the
- * usable space will be that size - 1. With the flag, the requested size will
- * be rounded up to the next power of two, but the usable space will be exactly
- * that requested. Worst case, if a power-of-2 size is requested, half the
- * ring space will be wasted.
- */
-#define RING_F_EXACT_SZ 0x0004
-#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
-
-/* @internal defines for passing to the enqueue dequeue worker functions */
-#define __IS_SP 1
-#define __IS_MP 0
-#define __IS_SC 1
-#define __IS_MC 0
+#include <rte_ring_core.h>
 
 /**
  * Calculate the memory size needed for a ring
@@ -420,7 +336,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MP, free_space);
+			RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -443,9 +359,13 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SP, free_space);
+			RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_elem.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -470,7 +390,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -554,7 +474,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MC, available);
+			RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -578,7 +498,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SC, available);
+			RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -605,7 +525,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 /**
@@ -777,6 +697,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
 	return r->capacity;
 }
 
+/**
+ * Return sync type used by producer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_prod_sync_type(const struct rte_ring *r)
+{
+	return r->prod.sync_type;
+}
+
+/**
+ * Check is the ring for single producer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SP, zero otherwise.
+ */
+static inline int
+rte_ring_is_prod_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
+/**
+ * Return sync type used by consumer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_cons_sync_type(const struct rte_ring *r)
+{
+	return r->cons.sync_type;
+}
+
+/**
+ * Check is the ring for single consumer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SC, zero otherwise.
+ */
+static inline int
+rte_ring_is_cons_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -820,7 +796,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -843,7 +819,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -870,7 +846,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -898,7 +874,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -923,7 +899,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -951,7 +927,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
new file mode 100644
index 000000000..459a0ffa1
--- /dev/null
+++ b/lib/librte_ring/rte_ring_core.h
@@ -0,0 +1,131 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_CORE_H_
+#define _RTE_RING_CORE_H_
+
+/**
+ * @file
+ * This file contains definion of RTE ring structure itself,
+ * init flags and some related macros.
+ * For majority of DPDK entities, it is not recommended to include
+ * this file directly, use include <rte_ring.h> or <rte_ring_elem.h>
+ * instead.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_debug.h>
+
+#define RTE_TAILQ_RING_NAME "RTE_RING"
+
+enum rte_ring_queue_behavior {
+	/** Enq/Deq a fixed number of items from a ring */
+	RTE_RING_QUEUE_FIXED = 0,
+	/** Enq/Deq as many items as possible from ring */
+	RTE_RING_QUEUE_VARIABLE
+};
+
+#define RTE_RING_MZ_PREFIX "RG_"
+/** The maximum length of a ring name. */
+#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_RING_MZ_PREFIX) + 1)
+
+/** prod/cons sync types */
+enum rte_ring_sync_type {
+	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
+	RTE_RING_SYNC_ST,     /**< single thread only */
+};
+
+/**
+ * structures to hold a pair of head/tail values and other metadata.
+ * Depending on sync_type format of that structure might be different,
+ * but offset for *sync_type* and *tail* values should remain the same.
+ */
+struct rte_ring_headtail {
+	volatile uint32_t head;      /**< prod/consumer head. */
+	volatile uint32_t tail;      /**< prod/consumer tail. */
+	RTE_STD_C11
+	union {
+		/** sync type of prod/cons */
+		enum rte_ring_sync_type sync_type;
+		/** deprecated -  True if single prod/cons */
+		uint32_t single;
+	};
+};
+
+/**
+ * An RTE ring structure.
+ *
+ * The producer and the consumer have a head and a tail index. The particularity
+ * of these index is that they are not between 0 and size(ring). These indexes
+ * are between 0 and 2^32, and we mask their value when we access the ring[]
+ * field. Thanks to this assumption, we can do subtractions between 2 index
+ * values in a modulo-32bit base: that's why the overflow of the indexes is not
+ * a problem.
+ */
+struct rte_ring {
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
+	 * next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned;
+	/**< Name of the ring. */
+	int flags;               /**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t capacity;       /**< Usable size of ring */
+
+	char pad0 __rte_cache_aligned; /**< empty cache line */
+
+	/** Ring producer status. */
+	struct rte_ring_headtail prod __rte_cache_aligned;
+	char pad1 __rte_cache_aligned; /**< empty cache line */
+
+	/** Ring consumer status. */
+	struct rte_ring_headtail cons __rte_cache_aligned;
+	char pad2 __rte_cache_aligned; /**< empty cache line */
+};
+
+#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
+#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
+/**
+ * Ring is to hold exactly requested number of entries.
+ * Without this flag set, the ring size requested must be a power of 2, and the
+ * usable space will be that size - 1. With the flag, the requested size will
+ * be rounded up to the next power of two, but the usable space will be exactly
+ * that requested. Worst case, if a power-of-2 size is requested, half the
+ * ring space will be wasted.
+ */
+#define RING_F_EXACT_SZ 0x0004
+#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_CORE_H_ */
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 663addc73..7406c0b0f 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -20,21 +20,7 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <string.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_config.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-#include <rte_pause.h>
-
-#include "rte_ring.h"
+#include <rte_ring_core.h>
 
 /**
  * @warning
@@ -510,7 +496,7 @@ rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -539,7 +525,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -570,7 +556,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
 }
 
 /**
@@ -675,7 +661,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+				RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -703,7 +689,7 @@ rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -734,7 +720,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
 }
 
 /**
@@ -842,7 +828,7 @@ rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -871,7 +857,7 @@ rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -902,7 +888,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
 }
 
 /**
@@ -934,7 +920,7 @@ rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -963,7 +949,7 @@ rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -995,9 +981,11 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
+#include <rte_ring.h>
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 3/9] ring: introduce RTS ring mode
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 1/9] test/ring: add contention stress test Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
                           ` (6 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
Aim to reduce stall times in case when ring is used on
overcommited cpus (multiple active threads on the same cpu).
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

check-abi.sh reports what I believe is a false-positive about
ring cons/prod changes. As a workaround, devtools/libabigail.abignore is
updated to suppress *struct ring* related errors.

 devtools/libabigail.abignore           |   7 +
 lib/librte_ring/Makefile               |   4 +-
 lib/librte_ring/meson.build            |   7 +-
 lib/librte_ring/rte_ring.c             | 100 +++++-
 lib/librte_ring/rte_ring.h             |  70 +++-
 lib/librte_ring/rte_ring_core.h        |  35 +-
 lib/librte_ring/rte_ring_elem.h        |  90 ++++-
 lib/librte_ring/rte_ring_rts.h         | 439 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h | 179 ++++++++++
 9 files changed, 901 insertions(+), 30 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index a59df8f13..cd86d89ca 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -11,3 +11,10 @@
         type_kind = enum
         name = rte_crypto_asym_xform_type
         changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+; Ignore updates of ring prod/cons
+[suppress_type]
+        type_kind = struct
+        name = rte_ring
+[suppress_type]
+        type_kind = struct
+        name = rte_event_ring
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 6572768c9..04e446e37 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_core.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_rts.h \
+					rte_ring_rts_c11_mem.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index c656781da..a95598032 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,4 +6,9 @@ headers = files('rte_ring.h',
 		'rte_ring_core.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_rts.h',
+		'rte_ring_rts_c11_mem.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index fa5733907..222eec0fb 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
+/* by default set head/tail distance as 1/8 of ring capacity */
+#define HTD_MAX_DEF	8
+
 /* return the size of memory occupied by a ring */
 ssize_t
 rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
@@ -79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
 	return rte_ring_get_memsize_elem(sizeof(void *), count);
 }
 
+/*
+ * internal helper function to reset prod/cons head-tail values.
+ */
+static void
+reset_headtail(void *p)
+{
+	struct rte_ring_headtail *ht;
+	struct rte_ring_rts_headtail *ht_rts;
+
+	ht = p;
+	ht_rts = p;
+
+	switch (ht->sync_type) {
+	case RTE_RING_SYNC_MT:
+	case RTE_RING_SYNC_ST:
+		ht->head = 0;
+		ht->tail = 0;
+		break;
+	case RTE_RING_SYNC_MT_RTS:
+		ht_rts->head.raw = 0;
+		ht_rts->tail.raw = 0;
+		break;
+	default:
+		/* unknown sync mode */
+		RTE_ASSERT(0);
+	}
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+	reset_headtail(&r->prod);
+	reset_headtail(&r->cons);
+}
+
+/*
+ * helper function, calculates sync_type values for prod and cons
+ * based on input flags. Returns zero at success or negative
+ * errno value otherwise.
+ */
+static int
+get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
+	enum rte_ring_sync_type *cons_st)
+{
+	static const uint32_t prod_st_flags =
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+	static const uint32_t cons_st_flags =
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+
+	switch (flags & prod_st_flags) {
+	case 0:
+		*prod_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SP_ENQ:
+		*prod_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MP_RTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (flags & cons_st_flags) {
+	case 0:
+		*cons_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SC_DEQ:
+		*cons_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MC_RTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 int
@@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_rts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
+
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
 	ret = strlcpy(r->name, name, sizeof(r->name));
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
-	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
+	if (ret != 0)
+		return ret;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
@@ -126,8 +206,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 		r->mask = count - 1;
 		r->capacity = r->mask;
 	}
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+
+	/* set default values for head-tail distance */
+	if (flags & RING_F_MP_RTS_ENQ)
+		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
+	if (flags & RING_F_MC_RTS_DEQ)
+		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
 
 	return 0;
 }
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 35ee4491c..77f206ca7 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  *
- * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2010-2020 Intel Corporation
  * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
  * All rights reserved.
  * Derived from FreeBSD's bufring.h
@@ -389,8 +389,21 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -524,8 +537,20 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -845,8 +870,21 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -925,9 +963,21 @@ static __rte_always_inline unsigned
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
+			available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
index 459a0ffa1..173b5f68d 100644
--- a/lib/librte_ring/rte_ring_core.h
+++ b/lib/librte_ring/rte_ring_core.h
@@ -56,6 +56,9 @@ enum rte_ring_queue_behavior {
 enum rte_ring_sync_type {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
+#ifdef ALLOW_EXPERIMENTAL_API
+	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+#endif
 };
 
 /**
@@ -75,6 +78,21 @@ struct rte_ring_headtail {
 	};
 };
 
+union rte_ring_rts_poscnt {
+	uint64_t raw;
+	struct {
+		uint32_t cnt; /**< head/tail reference counter */
+		uint32_t pos; /**< head/tail position */
+	} val;
+};
+
+struct rte_ring_rts_headtail {
+	volatile union rte_ring_rts_poscnt tail;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+	uint32_t htd_max;   /**< max allowed distance between head/tail */
+	volatile union rte_ring_rts_poscnt head;
+};
+
 /**
  * An RTE ring structure.
  *
@@ -103,11 +121,21 @@ struct rte_ring {
 	char pad0 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail prod;
+		struct rte_ring_rts_headtail rts_prod;
+	}  __rte_cache_aligned;
+
 	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail cons;
+		struct rte_ring_rts_headtail rts_cons;
+	}  __rte_cache_aligned;
+
 	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
@@ -124,6 +152,9 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
+#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
+#define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 7406c0b0f..6da0a917b 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -528,6 +528,10 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_rts.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -557,6 +561,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
 			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -661,7 +685,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -719,8 +743,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 /**
@@ -887,8 +928,25 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -979,9 +1037,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 #include <rte_ring.h>
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
new file mode 100644
index 000000000..8ced07096
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -0,0 +1,439 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_H_
+#define _RTE_RING_RTS_H_
+
+/**
+ * @file rte_ring_rts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for Relaxed Tail Sync (RTS) ring mode.
+ * The main idea remains the same as for our original MP/MC synchronization
+ * mechanism.
+ * The main difference is that tail value is increased not
+ * by every thread that finished enqueue/dequeue,
+ * but only by the current last one doing enqueue/dequeue.
+ * That allows threads to skip spinning on tail value,
+ * leaving actual tail value change to last thread at a given instance.
+ * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+ * one for head update, second for tail update.
+ * As a gain it allows thread to avoid spinning/waiting on tail value.
+ * In comparision original MP/MC algorithm requires one 32-bit CAS
+ * for head update and waiting/spinning on tail value.
+ *
+ * Brief outline:
+ *  - introduce update counter (cnt) for both head and tail.
+ *  - increment head.cnt for each head.value update
+ *  - write head.value and head.cnt atomically (64-bit CAS)
+ *  - move tail.value ahead only when tail.cnt + 1 == head.cnt
+ *    (indicating that this is the last thread updating the tail)
+ *  - increment tail.cnt when each enqueue/dequeue op finishes
+ *    (no matter if tail.value going to change or not)
+ *  - write tail.value and tail.cnt atomically (64-bit CAS)
+ *
+ * To avoid producer/consumer starvation:
+ *  - limit max allowed distance between head and tail value (HTD_MAX).
+ *    I.E. thread is allowed to proceed with changing head.value,
+ *    only when:  head.value - tail.value <= HTD_MAX
+ * HTD_MAX is an optional parameter.
+ * With HTD_MAX == 0 we'll have fully serialized ring -
+ * i.e. only one thread at a time will be able to enqueue/dequeue
+ * to/from the ring.
+ * With HTD_MAX >= ring.capacity - no limitation.
+ * By default HTD_MAX == ring.capacity / 8.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_c11_mem.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, const void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Return producer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer HTD value, if producer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_prod_htd_max(const struct rte_ring *r)
+{
+	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set producer max Head-Tail-Distance (HTD).
+ * Note that producer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_prod.htd_max = v;
+	return 0;
+}
+
+/**
+ * Return consumer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer HTD value, if consumer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_cons_htd_max(const struct rte_ring *r)
+{
+	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_cons.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set consumer max Head-Tail-Distance (HTD).
+ * Note that consumer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_cons.htd_max = v;
+	return 0;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h b/lib/librte_ring/rte_ring_rts_c11_mem.h
new file mode 100644
index 000000000..9f26817c0
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_C11_MEM_H_
+#define _RTE_RING_RTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_rts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union rte_ring_rts_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union rte_ring_rts_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_rts_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/*
+		 * wait for prod head/tail distance,
+		 * make sure that we read prod head *before*
+		 * reading cons tail.
+		 */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of cons tail value
+	 *  - OOO copy of elems to the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_rts_poscnt nh, oh;
+
+	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/*
+		 * wait for cons head/tail distance,
+		 * make sure that we read cons head *before*
+		 * reading prod tail.
+		 */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of prod tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 4/9] test/ring: add contention stress test for RTS ring
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
                           ` (2 preceding siblings ...)
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 3/9] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 5/9] ring: introduce HTS ring mode Konstantin Ananyev
                           ` (5 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test RTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_rts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_rts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index a23a011df..00b74b5c9 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -79,6 +79,7 @@ SRCS-y += test_rand_perf.c
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 8824f366c..97ad822c1 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_rts_stress.c b/app/test/test_ring_rts_stress.c
new file mode 100644
index 000000000..f5255f24c
--- /dev/null
+++ b/app/test/test_ring_rts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_rts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_rts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ);
+}
+
+const struct test test_ring_rts_stress = {
+	.name = "MT_RTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 60706f799..eab395e30 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -40,6 +40,9 @@ test_ring_stress(void)
 	n += test_ring_mpmc_stress.nb_case;
 	k += run_test(&test_ring_mpmc_stress);
 
+	n += test_ring_rts_stress.nb_case;
+	k += run_test(&test_ring_rts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 60eac6216..32aae2072 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -33,3 +33,4 @@ struct test {
 };
 
 extern const struct test test_ring_mpmc_stress;
+extern const struct test test_ring_rts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 5/9] ring: introduce HTS ring mode
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
                           ` (3 preceding siblings ...)
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
                           ` (4 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce head/tail sync mode for MT ring synchronization.
In that mode enqueue/dequeue operation is fully serialized:
only one thread at a time is allowed to perform given op.
Suppose to reduce stall times in case when ring is used on
overcommitted cpus (multiple active threads on the same cpu).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   2 +
 lib/librte_ring/meson.build            |   2 +
 lib/librte_ring/rte_ring.c             |  20 +-
 lib/librte_ring/rte_ring.h             |  11 +
 lib/librte_ring/rte_ring_core.h        |  19 ++
 lib/librte_ring/rte_ring_elem.h        |  13 +
 lib/librte_ring/rte_ring_hts.h         | 332 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h | 207 +++++++++++++++
 8 files changed, 604 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 04e446e37..f75d8e530 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -20,6 +20,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
+					rte_ring_hts.h \
+					rte_ring_hts_c11_mem.h \
 					rte_ring_rts.h \
 					rte_ring_rts_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index a95598032..ca37cb8cc 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -7,6 +7,8 @@ headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
+		'rte_ring_hts.h',
+		'rte_ring_hts_c11_mem.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_c11_mem.h')
 
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 222eec0fb..ebe5ccf0d 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -89,9 +89,11 @@ static void
 reset_headtail(void *p)
 {
 	struct rte_ring_headtail *ht;
+	struct rte_ring_hts_headtail *ht_hts;
 	struct rte_ring_rts_headtail *ht_rts;
 
 	ht = p;
+	ht_hts = p;
 	ht_rts = p;
 
 	switch (ht->sync_type) {
@@ -104,6 +106,9 @@ reset_headtail(void *p)
 		ht_rts->head.raw = 0;
 		ht_rts->tail.raw = 0;
 		break;
+	case RTE_RING_SYNC_MT_HTS:
+		ht_hts->ht.raw = 0;
+		break;
 	default:
 		/* unknown sync mode */
 		RTE_ASSERT(0);
@@ -127,9 +132,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	enum rte_ring_sync_type *cons_st)
 {
 	static const uint32_t prod_st_flags =
-		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ | RING_F_MP_HTS_ENQ);
 	static const uint32_t cons_st_flags =
-		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ | RING_F_MC_HTS_DEQ);
 
 	switch (flags & prod_st_flags) {
 	case 0:
@@ -141,6 +146,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MP_RTS_ENQ:
 		*prod_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MP_HTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -155,6 +163,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MC_RTS_DEQ:
 		*cons_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MC_HTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -176,6 +187,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_hts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
+
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
 		offsetof(struct rte_ring_rts_headtail, sync_type));
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 77f206ca7..7a7695914 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -398,6 +398,9 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -545,6 +548,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 #ifdef ALLOW_EXPERIMENTAL_API
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
 #endif
 	}
 
@@ -879,6 +884,9 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -972,6 +980,9 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
 			available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
+			available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
index 173b5f68d..450f2bc87 100644
--- a/lib/librte_ring/rte_ring_core.h
+++ b/lib/librte_ring/rte_ring_core.h
@@ -58,6 +58,7 @@ enum rte_ring_sync_type {
 	RTE_RING_SYNC_ST,     /**< single thread only */
 #ifdef ALLOW_EXPERIMENTAL_API
 	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
 #endif
 };
 
@@ -93,6 +94,19 @@ struct rte_ring_rts_headtail {
 	volatile union rte_ring_rts_poscnt head;
 };
 
+union rte_ring_hts_pos {
+	uint64_t raw;
+	struct {
+		uint32_t head; /**< head position */
+		uint32_t tail; /**< tail position */
+	} pos;
+};
+
+struct rte_ring_hts_headtail {
+	volatile union rte_ring_hts_pos ht;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+};
+
 /**
  * An RTE ring structure.
  *
@@ -124,6 +138,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail prod;
+		struct rte_ring_hts_headtail hts_prod;
 		struct rte_ring_rts_headtail rts_prod;
 	}  __rte_cache_aligned;
 
@@ -133,6 +148,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail cons;
+		struct rte_ring_hts_headtail hts_cons;
 		struct rte_ring_rts_headtail rts_cons;
 	}  __rte_cache_aligned;
 
@@ -155,6 +171,9 @@ struct rte_ring {
 #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
 #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
 
+#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP HTS". */
+#define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC HTS". */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 6da0a917b..df485fc6b 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -529,6 +529,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 }
 
 #ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_hts.h>
 #include <rte_ring_rts.h>
 #endif
 
@@ -573,6 +574,9 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
 #endif
 	}
 
@@ -754,6 +758,9 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
@@ -939,6 +946,9 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
 			n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
 #endif
 	}
 
@@ -1048,6 +1058,9 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
new file mode 100644
index 000000000..c7701defc
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -0,0 +1,332 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_H_
+#define _RTE_RING_HTS_H_
+
+/**
+ * @file rte_ring_hts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
+ * In that mode enqueue/dequeue operation is fully serialized:
+ * at any given moment only one enqueue/dequeue operation can proceed.
+ * This is achieved by allowing a thread to proceed with changing head.value
+ * only when head.value == tail.value.
+ * Both head and tail values are updated atomically (as one 64-bit value).
+ * To achieve that 64-bit CAS is used by head update routine.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_c11_mem.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue_elem(struct rte_ring *r, const void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_prod, head, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue_elem(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_cons, head, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_c11_mem.h b/lib/librte_ring/rte_ring_hts_c11_mem.h
new file mode 100644
index 000000000..87e84fdc9
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_c11_mem.h
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_C11_MEM_H_
+#define _RTE_RING_HTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_hts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+/**
+ * @internal get current tail value.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union rte_ring_hts_pos p;
+
+	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	union rte_ring_hts_pos p;
+
+	RTE_SET_USED(enqueue);
+
+	p.pos.head = tail + num;
+	p.pos.tail = p.pos.head;
+
+	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+}
+
+/**
+ * @internal update tail with new value.
+ */
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t old_tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t tail;
+
+	RTE_SET_USED(enqueue);
+
+	tail = old_tail + num;
+	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union rte_ring_hts_pos *p)
+{
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_hts_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/*
+		 * wait for tail to be equal to head,
+		 * make sure that we read prod head/tail *before*
+		 * reading cons tail.
+		 */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of cons tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_hts_pos np, op;
+
+	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/*
+		 * wait for tail to be equal to head,
+		 * make sure that we read cons head/tail *before*
+		 * reading prod tail.
+		 */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of prod tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 6/9] test/ring: add contention stress test for HTS ring
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
                           ` (4 preceding siblings ...)
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 5/9] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 7/9] ring: introduce peek style API Konstantin Ananyev
                           ` (3 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test HTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_hts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_hts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 00b74b5c9..28f0b9ac2 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
+SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 97ad822c1..20c4978c2 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
+	'test_ring_hts_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_hts_stress.c b/app/test/test_ring_hts_stress.c
new file mode 100644
index 000000000..edc9175cb
--- /dev/null
+++ b/app/test/test_ring_hts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_hts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_hts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_hts_stress = {
+	.name = "MT_HTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index eab395e30..29a1368d7 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -43,6 +43,9 @@ test_ring_stress(void)
 	n += test_ring_rts_stress.nb_case;
 	k += run_test(&test_ring_rts_stress);
 
+	n += test_ring_hts_stress.nb_case;
+	k += run_test(&test_ring_hts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 32aae2072..9a87c7f7b 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -34,3 +34,4 @@ struct test {
 
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
+extern const struct test test_ring_hts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 7/9] ring: introduce peek style API
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
                           ` (5 preceding siblings ...)
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
                           ` (2 subsequent siblings)
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
mode, provide an ability to split enqueue/dequeue operation
into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
That allows user to inspect objects in the ring without removing
them from it (aka MT safe peek).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile           |   1 +
 lib/librte_ring/meson.build        |   1 +
 lib/librte_ring/rte_ring_c11_mem.h |  44 +++
 lib/librte_ring/rte_ring_elem.h    |   4 +
 lib/librte_ring/rte_ring_generic.h |  48 ++++
 lib/librte_ring/rte_ring_peek.h    | 446 +++++++++++++++++++++++++++++
 6 files changed, 544 insertions(+)
 create mode 100644 lib/librte_ring/rte_ring_peek.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index f75d8e530..52bb2a42d 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,6 +22,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_c11_mem.h \
 					rte_ring_hts.h \
 					rte_ring_hts_c11_mem.h \
+					rte_ring_peek.h \
 					rte_ring_rts.h \
 					rte_ring_rts_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ca37cb8cc..0c1f2d996 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -9,6 +9,7 @@ headers = files('rte_ring.h',
 		'rte_ring_generic.h',
 		'rte_ring_hts.h',
 		'rte_ring_hts_c11_mem.h',
+		'rte_ring_peek.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_c11_mem.h')
 
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 0fb73a337..bb3096721 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -10,6 +10,50 @@
 #ifndef _RTE_RING_C11_MEM_H_
 #define _RTE_RING_C11_MEM_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	RTE_SET_USED(enqueue);
+
+	pos = tail + num;
+	ht->head = pos;
+	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index df485fc6b..eeb850ab5 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -1071,6 +1071,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	return 0;
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_peek.h>
+#endif
+
 #include <rte_ring.h>
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
index 953cdbbd5..9f5fdf13b 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -10,6 +10,54 @@
 #ifndef _RTE_RING_GENERIC_H_
 #define _RTE_RING_GENERIC_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	pos = tail + num;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	ht->head = pos;
+	ht->tail = pos;
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
new file mode 100644
index 000000000..c3e04c198
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek.h
@@ -0,0 +1,446 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_H_
+#define _RTE_RING_PEEK_H_
+
+/**
+ * @file
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Ring Peek API
+ * Introduction of rte_ring with serialized producer/consumer (HTS sync mode)
+ * makes possible to split public enqueue/dequeue API into two phases:
+ * - enqueue/dequeue start
+ * - enqueue/dequeue finish
+ * That allows user to inspect objects in the ring without removing them
+ * from it (aka MT safe peek).
+ * Note that right now this new API is avaialble only for two sync modes:
+ * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
+ * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
+ * It is a user responsibility to create/init ring with appropriate sync
+ * modes selected.
+ * As an example:
+ * // read 1 elem from the ring:
+ * n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
+ * if (n != 0) {
+ *    //examine object
+ *    if (object_examine(obj) == KEEP)
+ *       //decided to keep it in the ring.
+ *       rte_ring_dequeue_finish(ring, 0);
+ *    else
+ *       //decided to remove it from the ring.
+ *       rte_ring_dequeue_finish(ring, n);
+ * }
+ * Note that between _start_ and _finish_ none other thread can proceed
+ * with enqueue(/dequeue) operation till _finish_ completes.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal This function moves prod head value.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_start(struct rte_ring *r, uint32_t n,
+		enum rte_ring_queue_behavior behavior, uint32_t *free_space)
+{
+	uint32_t free, head, next;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &free);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_prod_head(r, n, behavior,
+			&head, &free);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_elem_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return rte_ring_enqueue_bulk_elem_start(r, n, free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_elem_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_elem_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_VARIABLE,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return rte_ring_enqueue_burst_elem_start(r, n, free_space);
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_elem_finish(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_st_set_head_tail(&r->prod, tail, n, 1);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_hts_set_head_tail(&r->hts_prod, tail, n, 1);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int n)
+{
+	rte_ring_enqueue_elem_finish(r, obj_table, sizeof(uintptr_t), n);
+}
+
+/**
+ * @internal This function moves cons head value and copies up to *n*
+ * objects from the ring to the user provided obj_table.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_start(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t avail, head, next;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_cons_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &avail);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_cons_head(r, n, behavior,
+			&head, &avail);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (n != 0)
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+
+	if (available != NULL)
+		*available = avail - n;
+	return n;
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem_start(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   Actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_bulk_elem_start(r, obj_table, sizeof(uintptr_t),
+		n, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_burst_elem_start(r, obj_table,
+		sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to remove from the ring.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_elem_finish(struct rte_ring *r, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->cons, &tail, n);
+		__rte_ring_st_set_head_tail(&r->cons, tail, n, 0);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_cons, &tail, n);
+		__rte_ring_hts_set_head_tail(&r->hts_cons, tail, n, 0);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to remove from the ring.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
+{
+	rte_ring_dequeue_elem_finish(r, n);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_PEEK_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 8/9] test/ring: add stress test for MT peek API
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
                           ` (6 preceding siblings ...)
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 7/9] ring: introduce peek style API Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 9/9] test/ring: add functional tests for new sync modes Konstantin Ananyev
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test MT peek API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |  1 +
 app/test/meson.build             |  1 +
 app/test/test_ring_peek_stress.c | 43 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c      |  3 +++
 app/test/test_ring_stress.h      |  1 +
 5 files changed, 49 insertions(+)
 create mode 100644 app/test/test_ring_peek_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 28f0b9ac2..631a21028 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -80,6 +80,7 @@ SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_peek_stress.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 20c4978c2..d15278cf9 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_hts_stress.c',
+	'test_ring_peek_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_peek_stress.c b/app/test/test_ring_peek_stress.c
new file mode 100644
index 000000000..cfc82d728
--- /dev/null
+++ b/app/test/test_ring_peek_stress.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+#include <rte_ring_elem.h>
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_start(r, obj, n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_finish(r, obj, n);
+	return n;
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_peek_stress = {
+	.name = "MT_PEEK",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 29a1368d7..853fcc190 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -46,6 +46,9 @@ test_ring_stress(void)
 	n += test_ring_hts_stress.nb_case;
 	k += run_test(&test_ring_hts_stress);
 
+	n += test_ring_peek_stress.nb_case;
+	k += run_test(&test_ring_peek_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 9a87c7f7b..60953ce47 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -35,3 +35,4 @@ struct test {
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
 extern const struct test test_ring_hts_stress;
+extern const struct test test_ring_peek_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v4 9/9] test/ring: add functional tests for new sync modes
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
                           ` (7 preceding siblings ...)
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
@ 2020-04-17 13:36         ` Konstantin Ananyev
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
  9 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-17 13:36 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Extend test_ring_autotest with new test-cases for RTS/HTS sync modes.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_ring.c | 93 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 20 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fbcd109b1..e21557cd9 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -203,7 +203,8 @@ test_ring_negative_tests(void)
  * Random number of elements are enqueued and dequeued.
  */
 static int
-test_ring_burst_bulk_tests1(unsigned int api_type)
+test_ring_burst_bulk_tests1(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -213,12 +214,11 @@ test_ring_burst_bulk_tests1(unsigned int api_type)
 	const unsigned int rsz = RING_SIZE - 1;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -294,7 +294,8 @@ test_ring_burst_bulk_tests1(unsigned int api_type)
  * dequeued data.
  */
 static int
-test_ring_burst_bulk_tests2(unsigned int api_type)
+test_ring_burst_bulk_tests2(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -302,12 +303,11 @@ test_ring_burst_bulk_tests2(unsigned int api_type)
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -390,7 +390,8 @@ test_ring_burst_bulk_tests2(unsigned int api_type)
  * Enqueue and dequeue to cover the entire ring length.
  */
 static int
-test_ring_burst_bulk_tests3(unsigned int api_type)
+test_ring_burst_bulk_tests3(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -398,12 +399,11 @@ test_ring_burst_bulk_tests3(unsigned int api_type)
 	unsigned int i, j;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -465,7 +465,8 @@ test_ring_burst_bulk_tests3(unsigned int api_type)
  * Enqueue till the ring is full and dequeue till the ring becomes empty.
  */
 static int
-test_ring_burst_bulk_tests4(unsigned int api_type)
+test_ring_burst_bulk_tests4(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -474,12 +475,11 @@ test_ring_burst_bulk_tests4(unsigned int api_type)
 	unsigned int num_elems;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -815,7 +815,23 @@ test_ring_with_exact_size(void)
 static int
 test_ring(void)
 {
+	int32_t rc;
 	unsigned int i, j;
+	const char *tname;
+
+	static const struct {
+		uint32_t create_flags;
+		const char *name;
+	} test_sync_modes[] = {
+		{
+			RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ,
+			"Test MT_RTS ring",
+		},
+		{
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ,
+			"Test MT_HTS ring",
+		},
+	};
 
 	/* Negative test cases */
 	if (test_ring_negative_tests() < 0)
@@ -832,30 +848,67 @@ test_ring(void)
 	 * The test cases are split into smaller test cases to
 	 * help clang compile faster.
 	 */
+	tname = "Test standard ring";
+
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests1(i | j) < 0)
+			if (test_ring_burst_bulk_tests1(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests2(i | j) < 0)
+			if (test_ring_burst_bulk_tests2(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests3(i | j) < 0)
+			if (test_ring_burst_bulk_tests3(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests4(i | j) < 0)
+			if (test_ring_burst_bulk_tests4(i | j, 0, tname) < 0)
+				goto test_fail;
+
+	/* Burst and bulk operations with MT_RTS and MT_HTS sync modes */
+	for (i = 0; i != RTE_DIM(test_sync_modes); i++) {
+		for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST;
+				j <<= 1) {
+
+			rc = test_ring_burst_bulk_tests1(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+
+			rc = test_ring_burst_bulk_tests2(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
 				goto test_fail;
 
+			rc = test_ring_burst_bulk_tests3(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+
+			rc = test_ring_burst_bulk_tests3(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+		}
+	}
+
 	/* dump the ring status */
 	rte_ring_list_dump(stdout);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 0/9] New sync modes for ring
  2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
                           ` (8 preceding siblings ...)
  2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 9/9] test/ring: add functional tests for new sync modes Konstantin Ananyev
@ 2020-04-18 16:32         ` Konstantin Ananyev
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test Konstantin Ananyev
                             ` (10 more replies)
  9 siblings, 11 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

V4 - V5:
1. fix i686 clang build problem
2. fix formal API comments

V3 - V4 changes:
Address comments from Honnappa:
1. for new sync modes make legacy API wrappers around _elem_ calls
2. remove rte_ring_(hts|rts)_generic.h
3. few changes in C11 version
4. peek API - add missing functions for _elem_
5. remove _IS_SP/_IS_MP, etc. internal macros
6. fix param types (obj_table) for _elem_functions
7. fix formal API comments
8. deduplicate code for test_ring_stress
9. added functional tests for new sync modes

V2 - V3 changes:
1. few more compilation fixes (for gcc 4.8.X)
2. extra update devtools/libabigail.abignore (workaround) 

V1 - V2 changes:
1. fix compilation issues
2. add C11 atomics support
3. updates devtools/libabigail.abignore (workaround)

RFC - V1 changes:
1. remove ABI brekage (at least I hope I did)
2. Add support for ring_elem
3. rework peek related API a bit
4. rework test to make it less verbose and unite all test-cases
   in one command
5. add new test-case for MT peek API

TODO list:
1. Update docs

These days more and more customers use(/try to use) DPDK based apps within
overcommitted systems (multiple acttive threads over same pysical cores):
VM, container deployments, etc.
One quite common problem they hit:
Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
LHP is quite a common problem for spin-based sync primitives
(spin-locks, etc.) on overcommitted systems.
The situation gets much worse when some sort of
fair-locking technique is used (ticket-lock, etc.).
As now not only lock-owner but also lock-waiters scheduling
order matters a lot (LWP).
These two problems are well-known for kernel within VMs:
http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
The problem with rte_ring is that while head accusion is sort of
un-fair locking, waiting on tail is very similar to ticket lock schema -
tail has to be updated in particular order.
That makes current rte_ring implementation to perform
really pure on some overcommited scenarios.
It is probably not possible to completely resolve LHP problem in
userspace only (without some kernel communication/intervention).
But removing fairness at tail update helps to avoid LWP and
can mitigate the situation significantly.
This patch proposes two new optional ring synchronization modes:
1) Head/Tail Sync (HTS) mode
In that mode enqueue/dequeue operation is fully serialized:
    only one thread at a time is allowed to perform given op.
    As another enhancement provide ability to split enqueue/dequeue
    operation into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
    That allows user to inspect objects in the ring without removing
    them from it (aka MT safe peek).
2) Relaxed Tail Sync (RTS)
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Note that these new sync modes are optional.
For current rte_ring users nothing should change
(both in terms of API/ABI and performance).
Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
way (via flags and _init_), and MP/MC remains as default one.
The only thing that changed:
Format of prod/cons now could differ depending on mode selected at _init_.
So user has to stick with one sync model through whole ring lifetime.
In other words, user can't create a ring for let say SP mode and then
in the middle of data-path change his mind and start using MP_RTS mode.
For existing modes (SP/MP, SC/MC) format remains the same and
user can still use them interchangeably, though of course it is an
error prone practice.

Test results on IA (see below) show significant improvements
for average enqueue/dequeue op times on overcommitted systems.
For 'classic' DPDK deployments (one thread per core) original MP/MC
algorithm still shows best numbers, though for 64-bit target
RTS numbers are not that far away.
Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'

X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj
                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51

2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68

8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12

i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj
                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91

2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65

8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

Konstantin Ananyev (9):
  test/ring: add contention stress test
  ring: prepare ring to allow new sync schemes
  ring: introduce RTS ring mode
  test/ring: add contention stress test for RTS ring
  ring: introduce HTS ring mode
  test/ring: add contention stress test for HTS ring
  ring: introduce peek style API
  test/ring: add stress test for MT peek API
  test/ring: add functional tests for new sync modes

 app/test/Makefile                      |   5 +
 app/test/meson.build                   |   5 +
 app/test/test_pdump.c                  |   6 +-
 app/test/test_ring.c                   |  93 ++++--
 app/test/test_ring_hts_stress.c        |  32 ++
 app/test/test_ring_mpmc_stress.c       |  31 ++
 app/test/test_ring_peek_stress.c       |  43 +++
 app/test/test_ring_rts_stress.c        |  32 ++
 app/test/test_ring_stress.c            |  57 ++++
 app/test/test_ring_stress.h            |  38 +++
 app/test/test_ring_stress_impl.h       | 396 ++++++++++++++++++++++
 devtools/libabigail.abignore           |   7 +
 lib/librte_pdump/rte_pdump.c           |   2 +-
 lib/librte_port/rte_port_ring.c        |  12 +-
 lib/librte_ring/Makefile               |   8 +-
 lib/librte_ring/meson.build            |  11 +-
 lib/librte_ring/rte_ring.c             | 114 ++++++-
 lib/librte_ring/rte_ring.h             | 243 ++++++++------
 lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
 lib/librte_ring/rte_ring_core.h        | 184 ++++++++++
 lib/librte_ring/rte_ring_elem.h        | 141 ++++++--
 lib/librte_ring/rte_ring_generic.h     |  48 +++
 lib/librte_ring/rte_ring_hts.h         | 332 +++++++++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h | 207 ++++++++++++
 lib/librte_ring/rte_ring_peek.h        | 442 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts.h         | 439 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h | 179 ++++++++++
 27 files changed, 2977 insertions(+), 174 deletions(-)
 create mode 100644 app/test/test_ring_hts_stress.c
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_peek_stress.c
 create mode 100644 app/test/test_ring_rts_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h
 create mode 100644 lib/librte_ring/rte_ring_core.h
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:30             ` Honnappa Nagarahalli
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
                             ` (9 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce stress test for ring enqueue/dequeue operations.
Performs the following pattern on each slave worker:
dequeue/read-write data from the dequeued objects/enqueue.
Serves as both functional and performance test of ring
enqueue/dequeue operations under high contention
(for both over committed and non-over committed scenarios).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |   2 +
 app/test/meson.build             |   2 +
 app/test/test_ring_mpmc_stress.c |  31 +++
 app/test/test_ring_stress.c      |  48 ++++
 app/test/test_ring_stress.h      |  35 +++
 app/test/test_ring_stress_impl.h | 396 +++++++++++++++++++++++++++++++
 6 files changed, 514 insertions(+)
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h

diff --git a/app/test/Makefile b/app/test/Makefile
index be53d33c3..a23a011df 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 04b59cffa..8824f366c 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,7 +100,9 @@ test_sources = files('commands.c',
 	'test_rib.c',
 	'test_rib6.c',
 	'test_ring.c',
+	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_mpmc_stress.c b/app/test/test_ring_mpmc_stress.c
new file mode 100644
index 000000000..1524b0248
--- /dev/null
+++ b/app/test/test_ring_mpmc_stress.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num, 0);
+}
+
+const struct test test_ring_mpmc_stress = {
+	.name = "MP/MC",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
new file mode 100644
index 000000000..60706f799
--- /dev/null
+++ b/app/test/test_ring_stress.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static int
+run_test(const struct test *test)
+{
+	int32_t rc;
+	uint32_t i, k;
+
+	for (i = 0, k = 0; i != test->nb_case; i++) {
+
+		printf("TEST-CASE %s %s START\n",
+			test->name, test->cases[i].name);
+
+		rc = test->cases[i].func(test->cases[i].wfunc);
+		k += (rc == 0);
+
+		if (rc != 0)
+			printf("TEST-CASE %s %s FAILED\n",
+				test->name, test->cases[i].name);
+		else
+			printf("TEST-CASE %s %s OK\n",
+				test->name, test->cases[i].name);
+	}
+
+	return k;
+}
+
+static int
+test_ring_stress(void)
+{
+	uint32_t n, k;
+
+	n = 0;
+	k = 0;
+
+	n += test_ring_mpmc_stress.nb_case;
+	k += run_test(&test_ring_mpmc_stress);
+
+	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+		n, k, n - k);
+	return (k != n);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
new file mode 100644
index 000000000..60eac6216
--- /dev/null
+++ b/app/test/test_ring_stress.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <rte_ring.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+#include <rte_random.h>
+#include <rte_malloc.h>
+#include <rte_spinlock.h>
+
+#include "test.h"
+
+struct test_case {
+	const char *name;
+	int (*func)(int (*)(void *));
+	int (*wfunc)(void *arg);
+};
+
+struct test {
+	const char *name;
+	uint32_t nb_case;
+	const struct test_case *cases;
+};
+
+extern const struct test test_ring_mpmc_stress;
diff --git a/app/test/test_ring_stress_impl.h b/app/test/test_ring_stress_impl.h
new file mode 100644
index 000000000..222d62bc4
--- /dev/null
+++ b/app/test/test_ring_stress_impl.h
@@ -0,0 +1,396 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+/**
+ * Stress test for ring enqueue/dequeue operations.
+ * Performs the following pattern on each slave worker:
+ * dequeue/read-write data from the dequeued objects/enqueue.
+ * Serves as both functional and performance test of ring
+ * enqueue/dequeue operations under high contention
+ * (for both over committed and non-over committed scenarios).
+ */
+
+#define RING_NAME	"RING_STRESS"
+#define BULK_NUM	32
+#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
+
+enum {
+	WRK_CMD_STOP,
+	WRK_CMD_RUN,
+};
+
+static volatile uint32_t wrk_cmd __rte_cache_aligned;
+
+/* test run-time in seconds */
+static const uint32_t run_time = 60;
+static const uint32_t verbose;
+
+struct lcore_stat {
+	uint64_t nb_cycle;
+	struct {
+		uint64_t nb_call;
+		uint64_t nb_obj;
+		uint64_t nb_cycle;
+		uint64_t max_cycle;
+		uint64_t min_cycle;
+	} op;
+};
+
+struct lcore_arg {
+	struct rte_ring *rng;
+	struct lcore_stat stats;
+} __rte_cache_aligned;
+
+struct ring_elem {
+	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)];
+} __rte_cache_aligned;
+
+/*
+ * redefinable functions
+ */
+static uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail);
+
+static uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free);
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
+
+
+static void
+lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
+	uint64_t tm, int32_t prcs)
+{
+	ls->op.nb_call += call;
+	ls->op.nb_obj += obj;
+	ls->op.nb_cycle += tm;
+	if (prcs) {
+		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
+		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
+	}
+}
+
+static void
+lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+
+	ms->op.nb_call += ls->op.nb_call;
+	ms->op.nb_obj += ls->op.nb_obj;
+	ms->op.nb_cycle += ls->op.nb_cycle;
+	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
+	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle);
+}
+
+static void
+lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
+	lcore_op_stat_aggr(ms, ls);
+}
+
+static void
+lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls)
+{
+	long double st;
+
+	st = (long double)rte_get_timer_hz() / US_PER_S;
+
+	if (lc == UINT32_MAX)
+		fprintf(f, "%s(AGGREGATE)={\n", __func__);
+	else
+		fprintf(f, "%s(lcore=%u)={\n", __func__, lc);
+
+	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
+		ls->nb_cycle, (long double)ls->nb_cycle / st);
+
+	fprintf(f, "\tDEQ+ENQ={\n");
+
+	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
+	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
+	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
+	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_obj / ls->op.nb_call);
+	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_obj);
+	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_call);
+
+	/* if min/max cycles per call stats was collected */
+	if (ls->op.min_cycle != UINT64_MAX) {
+		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.max_cycle,
+			(long double)ls->op.max_cycle / st);
+		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.min_cycle,
+			(long double)ls->op.min_cycle / st);
+	}
+
+	fprintf(f, "\t},\n");
+	fprintf(f, "};\n");
+}
+
+static void
+fill_ring_elm(struct ring_elem *elm, uint32_t fill)
+{
+	uint32_t i;
+
+	for (i = 0; i != RTE_DIM(elm->cnt); i++)
+		elm->cnt[i] = fill;
+}
+
+static int32_t
+check_updt_elem(struct ring_elem *elm[], uint32_t num,
+	const struct ring_elem *check, const struct ring_elem *fill)
+{
+	uint32_t i;
+
+	static rte_spinlock_t dump_lock;
+
+	for (i = 0; i != num; i++) {
+		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
+			rte_spinlock_lock(&dump_lock);
+			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
+				"offending object: %p\n",
+				__func__, rte_lcore_id(), num, i, elm[i]);
+			rte_memdump(stdout, "expected", check, sizeof(*check));
+			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
+			rte_spinlock_unlock(&dump_lock);
+			return -EINVAL;
+		}
+		memcpy(elm[i], fill, sizeof(*elm[i]));
+	}
+
+	return 0;
+}
+
+static int
+check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
+	const char *fname, const char *opname)
+{
+	if (exp != res) {
+		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
+			fname, lc, opname, exp, res);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static int
+test_worker(void *arg, const char *fname, int32_t prcs)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl, tm0, tm1;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		tm0 = (prcs != 0) ? rte_rdtsc_precise() : 0;
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+		tm0 = (prcs != 0) ? rte_rdtsc_precise() - tm0 : 0;
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, fname,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		tm1 = (prcs != 0) ? rte_rdtsc_precise() : 0;
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+		tm1 = (prcs != 0) ? rte_rdtsc_precise() - tm1 : 0;
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, fname,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, prcs);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	cl = rte_rdtsc_precise() - cl;
+	if (prcs == 0)
+		lcore_stat_update(&la->stats, 0, 0, cl, 0);
+	la->stats.nb_cycle = cl;
+	return rc;
+}
+static int
+test_worker_prcs(void *arg)
+{
+	return test_worker(arg, __func__, 1);
+}
+
+static int
+test_worker_avg(void *arg)
+{
+	return test_worker(arg, __func__, 0);
+}
+
+static void
+mt1_fini(struct rte_ring *rng, void *data)
+{
+	rte_free(rng);
+	rte_free(data);
+}
+
+static int
+mt1_init(struct rte_ring **rng, void **data, uint32_t num)
+{
+	int32_t rc;
+	size_t sz;
+	uint32_t i, nr;
+	struct rte_ring *r;
+	struct ring_elem *elm;
+	void *p;
+
+	*rng = NULL;
+	*data = NULL;
+
+	sz = num * sizeof(*elm);
+	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
+	if (elm == NULL) {
+		printf("%s: alloc(%zu) for %u elems data failed",
+			__func__, sz, num);
+		return -ENOMEM;
+	}
+
+	*data = elm;
+
+	/* alloc ring */
+	nr = 2 * num;
+	sz = rte_ring_get_memsize(nr);
+	r = rte_zmalloc(NULL, sz, __alignof__(*r));
+	if (r == NULL) {
+		printf("%s: alloc(%zu) for FIFO with %u elems failed",
+			__func__, sz, nr);
+		return -ENOMEM;
+	}
+
+	*rng = r;
+
+	rc = _st_ring_init(r, RING_NAME, nr);
+	if (rc != 0) {
+		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
+			__func__, r, nr, rc, strerror(-rc));
+		return rc;
+	}
+
+	for (i = 0; i != num; i++) {
+		fill_ring_elm(elm + i, UINT32_MAX);
+		p = elm + i;
+		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
+			break;
+	}
+
+	if (i != num) {
+		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
+			__func__, r, num, i);
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
+static int
+test_mt1(int (*test)(void *))
+{
+	int32_t rc;
+	uint32_t lc, mc;
+	struct rte_ring *r;
+	void *data;
+	struct lcore_arg arg[RTE_MAX_LCORE];
+
+	static const struct lcore_stat init_stat = {
+		.op.min_cycle = UINT64_MAX,
+	};
+
+	rc = mt1_init(&r, &data, RING_SIZE);
+	if (rc != 0) {
+		mt1_fini(r, data);
+		return rc;
+	}
+
+	memset(arg, 0, sizeof(arg));
+
+	/* launch on all slaves */
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		arg[lc].rng = r;
+		arg[lc].stats = init_stat;
+		rte_eal_remote_launch(test, &arg[lc], lc);
+	}
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_RUN;
+	rte_smp_wmb();
+
+	usleep(run_time * US_PER_S);
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_STOP;
+	rte_smp_wmb();
+
+	/* wait for slaves and collect stats. */
+	mc = rte_lcore_id();
+	arg[mc].stats = init_stat;
+
+	rc = 0;
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		rc |= rte_eal_wait_lcore(lc);
+		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
+		if (verbose != 0)
+			lcore_stat_dump(stdout, lc, &arg[lc].stats);
+	}
+
+	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+	mt1_fini(r, data);
+	return rc;
+}
+
+static const struct test_case tests[] = {
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
+		.func = test_mt1,
+		.wfunc = test_worker_prcs,
+	},
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
+		.func = test_mt1,
+		.wfunc = test_worker_avg,
+	},
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 2/9] ring: prepare ring to allow new sync schemes
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:31             ` Honnappa Nagarahalli
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 3/9] ring: introduce RTS ring mode Konstantin Ananyev
                             ` (8 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

To make these preparations two main things are done:
- Change from *single* to *sync_type* to allow different
  synchronisation schemes to be applied.
  Mark *single* as deprecated in comments.
  Add new functions to allow user to query ring sync types.
  Replace direct access to *single* with appropriate function call.
- Move actual rte_ring and related structures definitions into a
  separate file: <rte_ring_core.h>. It allows to refer contents
  of <rte_ring_elem.h> from <rte_ring.h> without introducing a
  circular dependency.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_pdump.c           |   6 +-
 lib/librte_pdump/rte_pdump.c    |   2 +-
 lib/librte_port/rte_port_ring.c |  12 +--
 lib/librte_ring/Makefile        |   1 +
 lib/librte_ring/meson.build     |   1 +
 lib/librte_ring/rte_ring.c      |   6 +-
 lib/librte_ring/rte_ring.h      | 170 ++++++++++++++------------------
 lib/librte_ring/rte_ring_core.h | 132 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_elem.h |  42 +++-----
 9 files changed, 234 insertions(+), 138 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_core.h

diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index ad183184c..6a1180bcb 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -57,8 +57,7 @@ run_pdump_client_tests(void)
 	if (ret < 0)
 		return -1;
 	mp->flags = 0x0000;
-	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
-				      RING_F_SP_ENQ | RING_F_SC_DEQ);
+	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
 	if (ring_client == NULL) {
 		printf("rte_ring_create SR0 failed");
 		return -1;
@@ -71,9 +70,6 @@ run_pdump_client_tests(void)
 	}
 	rte_eth_dev_probing_finish(eth_dev);
 
-	ring_client->prod.single = 0;
-	ring_client->cons.single = 0;
-
 	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
 
 	for (itr = 0; itr < NUM_ITR; itr++) {
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 8a01ac510..f96709f95 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 		rte_errno = EINVAL;
 		return -1;
 	}
-	if (ring->prod.single || ring->cons.single) {
+	if (rte_ring_is_prod_single(ring) || rte_ring_is_cons_single(ring)) {
 		PDUMP_LOG(ERR, "ring with either SP or SC settings"
 		" is not valid for pdump, should have MP and MC settings\n");
 		rte_errno = EINVAL;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 47fcdd06a..52b2d8e55 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->cons.single && is_multi) ||
-		(!(conf->ring->cons.single) && !is_multi)) {
+		(rte_ring_is_cons_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_cons_single(conf->ring) && !is_multi)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
 	}
@@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
@@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 28368e6d1..6572768c9 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_core.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 05402e4f0..c656781da 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -3,6 +3,7 @@
 
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_core.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 77e5de099..fa5733907 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
-	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
+	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 18fc5d845..35ee4491c 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -36,91 +36,7 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_config.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-#include <rte_pause.h>
-
-#define RTE_TAILQ_RING_NAME "RTE_RING"
-
-enum rte_ring_queue_behavior {
-	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items from a ring */
-	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
-};
-
-#define RTE_RING_MZ_PREFIX "RG_"
-/** The maximum length of a ring name. */
-#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
-			   sizeof(RTE_RING_MZ_PREFIX) + 1)
-
-/* structure to hold a pair of head/tail values and other metadata */
-struct rte_ring_headtail {
-	volatile uint32_t head;  /**< Prod/consumer head. */
-	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t single;         /**< True if single prod/cons */
-};
-
-/**
- * An RTE ring structure.
- *
- * The producer and the consumer have a head and a tail index. The particularity
- * of these index is that they are not between 0 and size(ring). These indexes
- * are between 0 and 2^32, and we mask their value when we access the ring[]
- * field. Thanks to this assumption, we can do subtractions between 2 index
- * values in a modulo-32bit base: that's why the overflow of the indexes is not
- * a problem.
- */
-struct rte_ring {
-	/*
-	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
-	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
-	 * next time the ABI changes
-	 */
-	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned; /**< Name of the ring. */
-	int flags;               /**< Flags supplied at creation. */
-	const struct rte_memzone *memzone;
-			/**< Memzone, if any, containing the rte_ring */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t capacity;       /**< Usable size of ring */
-
-	char pad0 __rte_cache_aligned; /**< empty cache line */
-
-	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
-	char pad1 __rte_cache_aligned; /**< empty cache line */
-
-	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
-	char pad2 __rte_cache_aligned; /**< empty cache line */
-};
-
-#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
-#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-/**
- * Ring is to hold exactly requested number of entries.
- * Without this flag set, the ring size requested must be a power of 2, and the
- * usable space will be that size - 1. With the flag, the requested size will
- * be rounded up to the next power of two, but the usable space will be exactly
- * that requested. Worst case, if a power-of-2 size is requested, half the
- * ring space will be wasted.
- */
-#define RING_F_EXACT_SZ 0x0004
-#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
-
-/* @internal defines for passing to the enqueue dequeue worker functions */
-#define __IS_SP 1
-#define __IS_MP 0
-#define __IS_SC 1
-#define __IS_MC 0
+#include <rte_ring_core.h>
 
 /**
  * Calculate the memory size needed for a ring
@@ -420,7 +336,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MP, free_space);
+			RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -443,9 +359,13 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SP, free_space);
+			RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_elem.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -470,7 +390,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -554,7 +474,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MC, available);
+			RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -578,7 +498,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SC, available);
+			RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -605,7 +525,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 /**
@@ -777,6 +697,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
 	return r->capacity;
 }
 
+/**
+ * Return sync type used by producer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_prod_sync_type(const struct rte_ring *r)
+{
+	return r->prod.sync_type;
+}
+
+/**
+ * Check is the ring for single producer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SP, zero otherwise.
+ */
+static inline int
+rte_ring_is_prod_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
+/**
+ * Return sync type used by consumer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_cons_sync_type(const struct rte_ring *r)
+{
+	return r->cons.sync_type;
+}
+
+/**
+ * Check is the ring for single consumer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SC, zero otherwise.
+ */
+static inline int
+rte_ring_is_cons_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -820,7 +796,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -843,7 +819,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -870,7 +846,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -898,7 +874,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -923,7 +899,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -951,7 +927,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
new file mode 100644
index 000000000..d9cef763f
--- /dev/null
+++ b/lib/librte_ring/rte_ring_core.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_CORE_H_
+#define _RTE_RING_CORE_H_
+
+/**
+ * @file
+ * This file contains definion of RTE ring structure itself,
+ * init flags and some related macros.
+ * For majority of DPDK entities, it is not recommended to include
+ * this file directly, use include <rte_ring.h> or <rte_ring_elem.h>
+ * instead.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_debug.h>
+
+#define RTE_TAILQ_RING_NAME "RTE_RING"
+
+/** enqueue/dequeue behavior types */
+enum rte_ring_queue_behavior {
+	/** Enq/Deq a fixed number of items from a ring */
+	RTE_RING_QUEUE_FIXED = 0,
+	/** Enq/Deq as many items as possible from ring */
+	RTE_RING_QUEUE_VARIABLE
+};
+
+#define RTE_RING_MZ_PREFIX "RG_"
+/** The maximum length of a ring name. */
+#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_RING_MZ_PREFIX) + 1)
+
+/** prod/cons sync types */
+enum rte_ring_sync_type {
+	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
+	RTE_RING_SYNC_ST,     /**< single thread only */
+};
+
+/**
+ * structures to hold a pair of head/tail values and other metadata.
+ * Depending on sync_type format of that structure might be different,
+ * but offset for *sync_type* and *tail* values should remain the same.
+ */
+struct rte_ring_headtail {
+	volatile uint32_t head;      /**< prod/consumer head. */
+	volatile uint32_t tail;      /**< prod/consumer tail. */
+	RTE_STD_C11
+	union {
+		/** sync type of prod/cons */
+		enum rte_ring_sync_type sync_type;
+		/** deprecated -  True if single prod/cons */
+		uint32_t single;
+	};
+};
+
+/**
+ * An RTE ring structure.
+ *
+ * The producer and the consumer have a head and a tail index. The particularity
+ * of these index is that they are not between 0 and size(ring). These indexes
+ * are between 0 and 2^32, and we mask their value when we access the ring[]
+ * field. Thanks to this assumption, we can do subtractions between 2 index
+ * values in a modulo-32bit base: that's why the overflow of the indexes is not
+ * a problem.
+ */
+struct rte_ring {
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
+	 * next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned;
+	/**< Name of the ring. */
+	int flags;               /**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t capacity;       /**< Usable size of ring */
+
+	char pad0 __rte_cache_aligned; /**< empty cache line */
+
+	/** Ring producer status. */
+	struct rte_ring_headtail prod __rte_cache_aligned;
+	char pad1 __rte_cache_aligned; /**< empty cache line */
+
+	/** Ring consumer status. */
+	struct rte_ring_headtail cons __rte_cache_aligned;
+	char pad2 __rte_cache_aligned; /**< empty cache line */
+};
+
+#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
+#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
+/**
+ * Ring is to hold exactly requested number of entries.
+ * Without this flag set, the ring size requested must be a power of 2, and the
+ * usable space will be that size - 1. With the flag, the requested size will
+ * be rounded up to the next power of two, but the usable space will be exactly
+ * that requested. Worst case, if a power-of-2 size is requested, half the
+ * ring space will be wasted.
+ */
+#define RING_F_EXACT_SZ 0x0004
+#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_CORE_H_ */
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 663addc73..7406c0b0f 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -20,21 +20,7 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <string.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_config.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-#include <rte_pause.h>
-
-#include "rte_ring.h"
+#include <rte_ring_core.h>
 
 /**
  * @warning
@@ -510,7 +496,7 @@ rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -539,7 +525,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -570,7 +556,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
 }
 
 /**
@@ -675,7 +661,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+				RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -703,7 +689,7 @@ rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -734,7 +720,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
 }
 
 /**
@@ -842,7 +828,7 @@ rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -871,7 +857,7 @@ rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -902,7 +888,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
 }
 
 /**
@@ -934,7 +920,7 @@ rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -963,7 +949,7 @@ rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -995,9 +981,11 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
+#include <rte_ring.h>
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 3/9] ring: introduce RTS ring mode
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test Konstantin Ananyev
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:31             ` Honnappa Nagarahalli
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
                             ` (7 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
Aim to reduce stall times in case when ring is used on
overcommited cpus (multiple active threads on the same cpu).
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

check-abi.sh reports what I believe is a false-positive about
ring cons/prod changes. As a workaround, devtools/libabigail.abignore is
updated to suppress *struct ring* related errors.

 devtools/libabigail.abignore           |   7 +
 lib/librte_ring/Makefile               |   4 +-
 lib/librte_ring/meson.build            |   7 +-
 lib/librte_ring/rte_ring.c             | 100 +++++-
 lib/librte_ring/rte_ring.h             |  70 +++-
 lib/librte_ring/rte_ring_core.h        |  36 +-
 lib/librte_ring/rte_ring_elem.h        |  90 ++++-
 lib/librte_ring/rte_ring_rts.h         | 439 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h | 179 ++++++++++
 9 files changed, 902 insertions(+), 30 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index a59df8f13..cd86d89ca 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -11,3 +11,10 @@
         type_kind = enum
         name = rte_crypto_asym_xform_type
         changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+; Ignore updates of ring prod/cons
+[suppress_type]
+        type_kind = struct
+        name = rte_ring
+[suppress_type]
+        type_kind = struct
+        name = rte_event_ring
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 6572768c9..04e446e37 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_core.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_rts.h \
+					rte_ring_rts_c11_mem.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index c656781da..a95598032 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,4 +6,9 @@ headers = files('rte_ring.h',
 		'rte_ring_core.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_rts.h',
+		'rte_ring_rts_c11_mem.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index fa5733907..222eec0fb 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
+/* by default set head/tail distance as 1/8 of ring capacity */
+#define HTD_MAX_DEF	8
+
 /* return the size of memory occupied by a ring */
 ssize_t
 rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
@@ -79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
 	return rte_ring_get_memsize_elem(sizeof(void *), count);
 }
 
+/*
+ * internal helper function to reset prod/cons head-tail values.
+ */
+static void
+reset_headtail(void *p)
+{
+	struct rte_ring_headtail *ht;
+	struct rte_ring_rts_headtail *ht_rts;
+
+	ht = p;
+	ht_rts = p;
+
+	switch (ht->sync_type) {
+	case RTE_RING_SYNC_MT:
+	case RTE_RING_SYNC_ST:
+		ht->head = 0;
+		ht->tail = 0;
+		break;
+	case RTE_RING_SYNC_MT_RTS:
+		ht_rts->head.raw = 0;
+		ht_rts->tail.raw = 0;
+		break;
+	default:
+		/* unknown sync mode */
+		RTE_ASSERT(0);
+	}
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+	reset_headtail(&r->prod);
+	reset_headtail(&r->cons);
+}
+
+/*
+ * helper function, calculates sync_type values for prod and cons
+ * based on input flags. Returns zero at success or negative
+ * errno value otherwise.
+ */
+static int
+get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
+	enum rte_ring_sync_type *cons_st)
+{
+	static const uint32_t prod_st_flags =
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+	static const uint32_t cons_st_flags =
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+
+	switch (flags & prod_st_flags) {
+	case 0:
+		*prod_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SP_ENQ:
+		*prod_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MP_RTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (flags & cons_st_flags) {
+	case 0:
+		*cons_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SC_DEQ:
+		*cons_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MC_RTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 int
@@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_rts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
+
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
 	ret = strlcpy(r->name, name, sizeof(r->name));
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
-	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
+	if (ret != 0)
+		return ret;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
@@ -126,8 +206,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 		r->mask = count - 1;
 		r->capacity = r->mask;
 	}
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+
+	/* set default values for head-tail distance */
+	if (flags & RING_F_MP_RTS_ENQ)
+		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
+	if (flags & RING_F_MC_RTS_DEQ)
+		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
 
 	return 0;
 }
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 35ee4491c..77f206ca7 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  *
- * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2010-2020 Intel Corporation
  * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
  * All rights reserved.
  * Derived from FreeBSD's bufring.h
@@ -389,8 +389,21 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -524,8 +537,20 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -845,8 +870,21 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -925,9 +963,21 @@ static __rte_always_inline unsigned
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
+			available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
index d9cef763f..ded0fa0b7 100644
--- a/lib/librte_ring/rte_ring_core.h
+++ b/lib/librte_ring/rte_ring_core.h
@@ -57,6 +57,9 @@ enum rte_ring_queue_behavior {
 enum rte_ring_sync_type {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
+#ifdef ALLOW_EXPERIMENTAL_API
+	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+#endif
 };
 
 /**
@@ -76,6 +79,22 @@ struct rte_ring_headtail {
 	};
 };
 
+union rte_ring_rts_poscnt {
+	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
+	uint64_t raw __rte_aligned(8);
+	struct {
+		uint32_t cnt; /**< head/tail reference counter */
+		uint32_t pos; /**< head/tail position */
+	} val;
+};
+
+struct rte_ring_rts_headtail {
+	volatile union rte_ring_rts_poscnt tail;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+	uint32_t htd_max;   /**< max allowed distance between head/tail */
+	volatile union rte_ring_rts_poscnt head;
+};
+
 /**
  * An RTE ring structure.
  *
@@ -104,11 +123,21 @@ struct rte_ring {
 	char pad0 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail prod;
+		struct rte_ring_rts_headtail rts_prod;
+	}  __rte_cache_aligned;
+
 	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail cons;
+		struct rte_ring_rts_headtail rts_cons;
+	}  __rte_cache_aligned;
+
 	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
@@ -125,6 +154,9 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
+#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
+#define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 7406c0b0f..6da0a917b 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -528,6 +528,10 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_rts.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -557,6 +561,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
 			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -661,7 +685,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -719,8 +743,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 /**
@@ -887,8 +928,25 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -979,9 +1037,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 #include <rte_ring.h>
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
new file mode 100644
index 000000000..8ced07096
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -0,0 +1,439 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_H_
+#define _RTE_RING_RTS_H_
+
+/**
+ * @file rte_ring_rts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for Relaxed Tail Sync (RTS) ring mode.
+ * The main idea remains the same as for our original MP/MC synchronization
+ * mechanism.
+ * The main difference is that tail value is increased not
+ * by every thread that finished enqueue/dequeue,
+ * but only by the current last one doing enqueue/dequeue.
+ * That allows threads to skip spinning on tail value,
+ * leaving actual tail value change to last thread at a given instance.
+ * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+ * one for head update, second for tail update.
+ * As a gain it allows thread to avoid spinning/waiting on tail value.
+ * In comparision original MP/MC algorithm requires one 32-bit CAS
+ * for head update and waiting/spinning on tail value.
+ *
+ * Brief outline:
+ *  - introduce update counter (cnt) for both head and tail.
+ *  - increment head.cnt for each head.value update
+ *  - write head.value and head.cnt atomically (64-bit CAS)
+ *  - move tail.value ahead only when tail.cnt + 1 == head.cnt
+ *    (indicating that this is the last thread updating the tail)
+ *  - increment tail.cnt when each enqueue/dequeue op finishes
+ *    (no matter if tail.value going to change or not)
+ *  - write tail.value and tail.cnt atomically (64-bit CAS)
+ *
+ * To avoid producer/consumer starvation:
+ *  - limit max allowed distance between head and tail value (HTD_MAX).
+ *    I.E. thread is allowed to proceed with changing head.value,
+ *    only when:  head.value - tail.value <= HTD_MAX
+ * HTD_MAX is an optional parameter.
+ * With HTD_MAX == 0 we'll have fully serialized ring -
+ * i.e. only one thread at a time will be able to enqueue/dequeue
+ * to/from the ring.
+ * With HTD_MAX >= ring.capacity - no limitation.
+ * By default HTD_MAX == ring.capacity / 8.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_c11_mem.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, const void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Return producer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer HTD value, if producer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_prod_htd_max(const struct rte_ring *r)
+{
+	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set producer max Head-Tail-Distance (HTD).
+ * Note that producer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_prod.htd_max = v;
+	return 0;
+}
+
+/**
+ * Return consumer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer HTD value, if consumer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_cons_htd_max(const struct rte_ring *r)
+{
+	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_cons.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set consumer max Head-Tail-Distance (HTD).
+ * Note that consumer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_cons.htd_max = v;
+	return 0;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h b/lib/librte_ring/rte_ring_rts_c11_mem.h
new file mode 100644
index 000000000..9f26817c0
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_C11_MEM_H_
+#define _RTE_RING_RTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_rts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union rte_ring_rts_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union rte_ring_rts_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_rts_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/*
+		 * wait for prod head/tail distance,
+		 * make sure that we read prod head *before*
+		 * reading cons tail.
+		 */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of cons tail value
+	 *  - OOO copy of elems to the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_rts_poscnt nh, oh;
+
+	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/*
+		 * wait for cons head/tail distance,
+		 * make sure that we read cons head *before*
+		 * reading prod tail.
+		 */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of prod tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 4/9] test/ring: add contention stress test for RTS ring
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
                             ` (2 preceding siblings ...)
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 3/9] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:31             ` Honnappa Nagarahalli
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 5/9] ring: introduce HTS ring mode Konstantin Ananyev
                             ` (6 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test RTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_rts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_rts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index a23a011df..00b74b5c9 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -79,6 +79,7 @@ SRCS-y += test_rand_perf.c
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 8824f366c..97ad822c1 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_rts_stress.c b/app/test/test_ring_rts_stress.c
new file mode 100644
index 000000000..f5255f24c
--- /dev/null
+++ b/app/test/test_ring_rts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_rts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_rts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ);
+}
+
+const struct test test_ring_rts_stress = {
+	.name = "MT_RTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 60706f799..eab395e30 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -40,6 +40,9 @@ test_ring_stress(void)
 	n += test_ring_mpmc_stress.nb_case;
 	k += run_test(&test_ring_mpmc_stress);
 
+	n += test_ring_rts_stress.nb_case;
+	k += run_test(&test_ring_rts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 60eac6216..32aae2072 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -33,3 +33,4 @@ struct test {
 };
 
 extern const struct test test_ring_mpmc_stress;
+extern const struct test test_ring_rts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 5/9] ring: introduce HTS ring mode
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
                             ` (3 preceding siblings ...)
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:31             ` Honnappa Nagarahalli
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
                             ` (5 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce head/tail sync mode for MT ring synchronization.
In that mode enqueue/dequeue operation is fully serialized:
only one thread at a time is allowed to perform given op.
Suppose to reduce stall times in case when ring is used on
overcommitted cpus (multiple active threads on the same cpu).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile               |   2 +
 lib/librte_ring/meson.build            |   2 +
 lib/librte_ring/rte_ring.c             |  20 +-
 lib/librte_ring/rte_ring.h             |  11 +
 lib/librte_ring/rte_ring_core.h        |  20 ++
 lib/librte_ring/rte_ring_elem.h        |  13 +
 lib/librte_ring/rte_ring_hts.h         | 332 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h | 207 +++++++++++++++
 8 files changed, 605 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 04e446e37..f75d8e530 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -20,6 +20,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
+					rte_ring_hts.h \
+					rte_ring_hts_c11_mem.h \
 					rte_ring_rts.h \
 					rte_ring_rts_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index a95598032..ca37cb8cc 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -7,6 +7,8 @@ headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
+		'rte_ring_hts.h',
+		'rte_ring_hts_c11_mem.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_c11_mem.h')
 
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 222eec0fb..ebe5ccf0d 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -89,9 +89,11 @@ static void
 reset_headtail(void *p)
 {
 	struct rte_ring_headtail *ht;
+	struct rte_ring_hts_headtail *ht_hts;
 	struct rte_ring_rts_headtail *ht_rts;
 
 	ht = p;
+	ht_hts = p;
 	ht_rts = p;
 
 	switch (ht->sync_type) {
@@ -104,6 +106,9 @@ reset_headtail(void *p)
 		ht_rts->head.raw = 0;
 		ht_rts->tail.raw = 0;
 		break;
+	case RTE_RING_SYNC_MT_HTS:
+		ht_hts->ht.raw = 0;
+		break;
 	default:
 		/* unknown sync mode */
 		RTE_ASSERT(0);
@@ -127,9 +132,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	enum rte_ring_sync_type *cons_st)
 {
 	static const uint32_t prod_st_flags =
-		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ | RING_F_MP_HTS_ENQ);
 	static const uint32_t cons_st_flags =
-		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ | RING_F_MC_HTS_DEQ);
 
 	switch (flags & prod_st_flags) {
 	case 0:
@@ -141,6 +146,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MP_RTS_ENQ:
 		*prod_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MP_HTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -155,6 +163,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MC_RTS_DEQ:
 		*cons_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MC_HTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -176,6 +187,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_hts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
+
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
 		offsetof(struct rte_ring_rts_headtail, sync_type));
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 77f206ca7..7a7695914 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -398,6 +398,9 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -545,6 +548,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 #ifdef ALLOW_EXPERIMENTAL_API
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
 #endif
 	}
 
@@ -879,6 +884,9 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -972,6 +980,9 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
 			available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
+			available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
index ded0fa0b7..7dc0fb0d9 100644
--- a/lib/librte_ring/rte_ring_core.h
+++ b/lib/librte_ring/rte_ring_core.h
@@ -59,6 +59,7 @@ enum rte_ring_sync_type {
 	RTE_RING_SYNC_ST,     /**< single thread only */
 #ifdef ALLOW_EXPERIMENTAL_API
 	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
 #endif
 };
 
@@ -95,6 +96,20 @@ struct rte_ring_rts_headtail {
 	volatile union rte_ring_rts_poscnt head;
 };
 
+union rte_ring_hts_pos {
+	/** raw 8B value to read/write *head* and *tail* as one atomic op */
+	uint64_t raw __rte_aligned(8);
+	struct {
+		uint32_t head; /**< head position */
+		uint32_t tail; /**< tail position */
+	} pos;
+};
+
+struct rte_ring_hts_headtail {
+	volatile union rte_ring_hts_pos ht;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+};
+
 /**
  * An RTE ring structure.
  *
@@ -126,6 +141,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail prod;
+		struct rte_ring_hts_headtail hts_prod;
 		struct rte_ring_rts_headtail rts_prod;
 	}  __rte_cache_aligned;
 
@@ -135,6 +151,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail cons;
+		struct rte_ring_hts_headtail hts_cons;
 		struct rte_ring_rts_headtail rts_cons;
 	}  __rte_cache_aligned;
 
@@ -157,6 +174,9 @@ struct rte_ring {
 #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
 #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
 
+#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP HTS". */
+#define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC HTS". */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 6da0a917b..df485fc6b 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -529,6 +529,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 }
 
 #ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_hts.h>
 #include <rte_ring_rts.h>
 #endif
 
@@ -573,6 +574,9 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
 #endif
 	}
 
@@ -754,6 +758,9 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
@@ -939,6 +946,9 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
 			n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
 #endif
 	}
 
@@ -1048,6 +1058,9 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
new file mode 100644
index 000000000..c7701defc
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -0,0 +1,332 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_H_
+#define _RTE_RING_HTS_H_
+
+/**
+ * @file rte_ring_hts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
+ * In that mode enqueue/dequeue operation is fully serialized:
+ * at any given moment only one enqueue/dequeue operation can proceed.
+ * This is achieved by allowing a thread to proceed with changing head.value
+ * only when head.value == tail.value.
+ * Both head and tail values are updated atomically (as one 64-bit value).
+ * To achieve that 64-bit CAS is used by head update routine.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_c11_mem.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue_elem(struct rte_ring *r, const void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_prod, head, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue_elem(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_cons, head, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_c11_mem.h b/lib/librte_ring/rte_ring_hts_c11_mem.h
new file mode 100644
index 000000000..87e84fdc9
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_c11_mem.h
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_C11_MEM_H_
+#define _RTE_RING_HTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_hts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+/**
+ * @internal get current tail value.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union rte_ring_hts_pos p;
+
+	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	union rte_ring_hts_pos p;
+
+	RTE_SET_USED(enqueue);
+
+	p.pos.head = tail + num;
+	p.pos.tail = p.pos.head;
+
+	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+}
+
+/**
+ * @internal update tail with new value.
+ */
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t old_tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t tail;
+
+	RTE_SET_USED(enqueue);
+
+	tail = old_tail + num;
+	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union rte_ring_hts_pos *p)
+{
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union rte_ring_hts_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/*
+		 * wait for tail to be equal to head,
+		 * make sure that we read prod head/tail *before*
+		 * reading cons tail.
+		 */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of cons tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union rte_ring_hts_pos np, op;
+
+	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/*
+		 * wait for tail to be equal to head,
+		 * make sure that we read cons head/tail *before*
+		 * reading prod tail.
+		 */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of prod tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 6/9] test/ring: add contention stress test for HTS ring
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
                             ` (4 preceding siblings ...)
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 5/9] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:31             ` Honnappa Nagarahalli
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API Konstantin Ananyev
                             ` (4 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test HTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_hts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_hts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 00b74b5c9..28f0b9ac2 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
+SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 97ad822c1..20c4978c2 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
+	'test_ring_hts_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_hts_stress.c b/app/test/test_ring_hts_stress.c
new file mode 100644
index 000000000..edc9175cb
--- /dev/null
+++ b/app/test/test_ring_hts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_hts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_hts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_hts_stress = {
+	.name = "MT_HTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index eab395e30..29a1368d7 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -43,6 +43,9 @@ test_ring_stress(void)
 	n += test_ring_rts_stress.nb_case;
 	k += run_test(&test_ring_rts_stress);
 
+	n += test_ring_hts_stress.nb_case;
+	k += run_test(&test_ring_hts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 32aae2072..9a87c7f7b 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -34,3 +34,4 @@ struct test {
 
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
+extern const struct test test_ring_hts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
                             ` (5 preceding siblings ...)
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:31             ` Honnappa Nagarahalli
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
                             ` (3 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
mode, provide an ability to split enqueue/dequeue operation
into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
That allows user to inspect objects in the ring without removing
them from it (aka MT safe peek).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_ring/Makefile           |   1 +
 lib/librte_ring/meson.build        |   1 +
 lib/librte_ring/rte_ring_c11_mem.h |  44 +++
 lib/librte_ring/rte_ring_elem.h    |   4 +
 lib/librte_ring/rte_ring_generic.h |  48 ++++
 lib/librte_ring/rte_ring_peek.h    | 442 +++++++++++++++++++++++++++++
 6 files changed, 540 insertions(+)
 create mode 100644 lib/librte_ring/rte_ring_peek.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index f75d8e530..52bb2a42d 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,6 +22,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_c11_mem.h \
 					rte_ring_hts.h \
 					rte_ring_hts_c11_mem.h \
+					rte_ring_peek.h \
 					rte_ring_rts.h \
 					rte_ring_rts_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ca37cb8cc..0c1f2d996 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -9,6 +9,7 @@ headers = files('rte_ring.h',
 		'rte_ring_generic.h',
 		'rte_ring_hts.h',
 		'rte_ring_hts_c11_mem.h',
+		'rte_ring_peek.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_c11_mem.h')
 
diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 0fb73a337..bb3096721 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -10,6 +10,50 @@
 #ifndef _RTE_RING_C11_MEM_H_
 #define _RTE_RING_C11_MEM_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	RTE_SET_USED(enqueue);
+
+	pos = tail + num;
+	ht->head = pos;
+	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index df485fc6b..eeb850ab5 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -1071,6 +1071,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	return 0;
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_peek.h>
+#endif
+
 #include <rte_ring.h>
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
index 953cdbbd5..9f5fdf13b 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -10,6 +10,54 @@
 #ifndef _RTE_RING_GENERIC_H_
 #define _RTE_RING_GENERIC_H_
 
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	pos = tail + num;
+
+	if (enqueue)
+		rte_smp_wmb();
+	else
+		rte_smp_rmb();
+
+	ht->head = pos;
+	ht->tail = pos;
+}
+
 static __rte_always_inline void
 update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
 		uint32_t single, uint32_t enqueue)
diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
new file mode 100644
index 000000000..2d06888b6
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek.h
@@ -0,0 +1,442 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_H_
+#define _RTE_RING_PEEK_H_
+
+/**
+ * @file
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Ring Peek API
+ * Introduction of rte_ring with serialized producer/consumer (HTS sync mode)
+ * makes possible to split public enqueue/dequeue API into two phases:
+ * - enqueue/dequeue start
+ * - enqueue/dequeue finish
+ * That allows user to inspect objects in the ring without removing them
+ * from it (aka MT safe peek).
+ * Note that right now this new API is avaialble only for two sync modes:
+ * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
+ * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
+ * It is a user responsibility to create/init ring with appropriate sync
+ * modes selected.
+ * As an example:
+ * // read 1 elem from the ring:
+ * n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
+ * if (n != 0) {
+ *    //examine object
+ *    if (object_examine(obj) == KEEP)
+ *       //decided to keep it in the ring.
+ *       rte_ring_dequeue_finish(ring, 0);
+ *    else
+ *       //decided to remove it from the ring.
+ *       rte_ring_dequeue_finish(ring, n);
+ * }
+ * Note that between _start_ and _finish_ none other thread can proceed
+ * with enqueue(/dequeue) operation till _finish_ completes.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal This function moves prod head value.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_start(struct rte_ring *r, uint32_t n,
+		enum rte_ring_queue_behavior behavior, uint32_t *free_space)
+{
+	uint32_t free, head, next;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &free);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_prod_head(r, n, behavior,
+			&head, &free);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_elem_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return rte_ring_enqueue_bulk_elem_start(r, n, free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_elem_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_elem_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_VARIABLE,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return rte_ring_enqueue_burst_elem_start(r, n, free_space);
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_elem_finish(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_st_set_head_tail(&r->prod, tail, n, 1);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_hts_set_head_tail(&r->hts_prod, tail, n, 1);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int n)
+{
+	rte_ring_enqueue_elem_finish(r, obj_table, sizeof(uintptr_t), n);
+}
+
+/**
+ * @internal This function moves cons head value and copies up to *n*
+ * objects from the ring to the user provided obj_table.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_start(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t avail, head, next;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_cons_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &avail);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_cons_head(r, n, behavior,
+			&head, &avail);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (n != 0)
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+
+	if (available != NULL)
+		*available = avail - n;
+	return n;
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem_start(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   Actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_bulk_elem_start(r, obj_table, sizeof(uintptr_t),
+		n, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_burst_elem_start(r, obj_table,
+		sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to remove from the ring.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_elem_finish(struct rte_ring *r, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->cons, &tail, n);
+		__rte_ring_st_set_head_tail(&r->cons, tail, n, 0);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_cons, &tail, n);
+		__rte_ring_hts_set_head_tail(&r->hts_cons, tail, n, 0);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to remove from the ring.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
+{
+	rte_ring_dequeue_elem_finish(r, n);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_PEEK_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 8/9] test/ring: add stress test for MT peek API
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
                             ` (6 preceding siblings ...)
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:32             ` Honnappa Nagarahalli
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 9/9] test/ring: add functional tests for new sync modes Konstantin Ananyev
                             ` (2 subsequent siblings)
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test MT peek API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/Makefile                |  1 +
 app/test/meson.build             |  1 +
 app/test/test_ring_peek_stress.c | 43 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c      |  3 +++
 app/test/test_ring_stress.h      |  1 +
 5 files changed, 49 insertions(+)
 create mode 100644 app/test/test_ring_peek_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 28f0b9ac2..631a21028 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -80,6 +80,7 @@ SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_peek_stress.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 20c4978c2..d15278cf9 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_hts_stress.c',
+	'test_ring_peek_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_peek_stress.c b/app/test/test_ring_peek_stress.c
new file mode 100644
index 000000000..cfc82d728
--- /dev/null
+++ b/app/test/test_ring_peek_stress.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+#include <rte_ring_elem.h>
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_start(r, obj, n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_finish(r, obj, n);
+	return n;
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_peek_stress = {
+	.name = "MT_PEEK",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 29a1368d7..853fcc190 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -46,6 +46,9 @@ test_ring_stress(void)
 	n += test_ring_hts_stress.nb_case;
 	k += run_test(&test_ring_hts_stress);
 
+	n += test_ring_peek_stress.nb_case;
+	k += run_test(&test_ring_peek_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 9a87c7f7b..60953ce47 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -35,3 +35,4 @@ struct test {
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
 extern const struct test test_ring_hts_stress;
+extern const struct test test_ring_peek_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v5 9/9] test/ring: add functional tests for new sync modes
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
                             ` (7 preceding siblings ...)
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
@ 2020-04-18 16:32           ` Konstantin Ananyev
  2020-04-19  2:32             ` Honnappa Nagarahalli
  2020-04-19  2:32           ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Honnappa Nagarahalli
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-18 16:32 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Extend test_ring_autotest with new test-cases for RTS/HTS sync modes.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_ring.c | 93 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 20 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fbcd109b1..e21557cd9 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -203,7 +203,8 @@ test_ring_negative_tests(void)
  * Random number of elements are enqueued and dequeued.
  */
 static int
-test_ring_burst_bulk_tests1(unsigned int api_type)
+test_ring_burst_bulk_tests1(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -213,12 +214,11 @@ test_ring_burst_bulk_tests1(unsigned int api_type)
 	const unsigned int rsz = RING_SIZE - 1;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -294,7 +294,8 @@ test_ring_burst_bulk_tests1(unsigned int api_type)
  * dequeued data.
  */
 static int
-test_ring_burst_bulk_tests2(unsigned int api_type)
+test_ring_burst_bulk_tests2(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -302,12 +303,11 @@ test_ring_burst_bulk_tests2(unsigned int api_type)
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -390,7 +390,8 @@ test_ring_burst_bulk_tests2(unsigned int api_type)
  * Enqueue and dequeue to cover the entire ring length.
  */
 static int
-test_ring_burst_bulk_tests3(unsigned int api_type)
+test_ring_burst_bulk_tests3(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -398,12 +399,11 @@ test_ring_burst_bulk_tests3(unsigned int api_type)
 	unsigned int i, j;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -465,7 +465,8 @@ test_ring_burst_bulk_tests3(unsigned int api_type)
  * Enqueue till the ring is full and dequeue till the ring becomes empty.
  */
 static int
-test_ring_burst_bulk_tests4(unsigned int api_type)
+test_ring_burst_bulk_tests4(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -474,12 +475,11 @@ test_ring_burst_bulk_tests4(unsigned int api_type)
 	unsigned int num_elems;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -815,7 +815,23 @@ test_ring_with_exact_size(void)
 static int
 test_ring(void)
 {
+	int32_t rc;
 	unsigned int i, j;
+	const char *tname;
+
+	static const struct {
+		uint32_t create_flags;
+		const char *name;
+	} test_sync_modes[] = {
+		{
+			RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ,
+			"Test MT_RTS ring",
+		},
+		{
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ,
+			"Test MT_HTS ring",
+		},
+	};
 
 	/* Negative test cases */
 	if (test_ring_negative_tests() < 0)
@@ -832,30 +848,67 @@ test_ring(void)
 	 * The test cases are split into smaller test cases to
 	 * help clang compile faster.
 	 */
+	tname = "Test standard ring";
+
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests1(i | j) < 0)
+			if (test_ring_burst_bulk_tests1(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests2(i | j) < 0)
+			if (test_ring_burst_bulk_tests2(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests3(i | j) < 0)
+			if (test_ring_burst_bulk_tests3(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests4(i | j) < 0)
+			if (test_ring_burst_bulk_tests4(i | j, 0, tname) < 0)
+				goto test_fail;
+
+	/* Burst and bulk operations with MT_RTS and MT_HTS sync modes */
+	for (i = 0; i != RTE_DIM(test_sync_modes); i++) {
+		for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST;
+				j <<= 1) {
+
+			rc = test_ring_burst_bulk_tests1(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+
+			rc = test_ring_burst_bulk_tests2(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
 				goto test_fail;
 
+			rc = test_ring_burst_bulk_tests3(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+
+			rc = test_ring_burst_bulk_tests3(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+		}
+	}
+
 	/* dump the ring status */
 	rte_ring_list_dump(stdout);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test Konstantin Ananyev
@ 2020-04-19  2:30             ` Honnappa Nagarahalli
  2020-04-19  8:03               ` David Marchand
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:30 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> 
> Introduce stress test for ring enqueue/dequeue operations.
> Performs the following pattern on each slave worker:
> dequeue/read-write data from the dequeued objects/enqueue.
> Serves as both functional and performance test of ring enqueue/dequeue
> operations under high contention (for both over committed and non-over
> committed scenarios).
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
ci/intel-compilation fails for meson due to clang+32b. I believe it is solved by [1] (as you indicated). Can you make this patch dependent on [1]?
Otherwise,
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

[1] http://patches.dpdk.org/patch/68280/
> ---
>  app/test/Makefile                |   2 +
>  app/test/meson.build             |   2 +
>  app/test/test_ring_mpmc_stress.c |  31 +++
>  app/test/test_ring_stress.c      |  48 ++++
>  app/test/test_ring_stress.h      |  35 +++
>  app/test/test_ring_stress_impl.h | 396 +++++++++++++++++++++++++++++++
>  6 files changed, 514 insertions(+)
>  create mode 100644 app/test/test_ring_mpmc_stress.c  create mode 100644
> app/test/test_ring_stress.c  create mode 100644 app/test/test_ring_stress.h
> create mode 100644 app/test/test_ring_stress_impl.h
> 
> diff --git a/app/test/Makefile b/app/test/Makefile index
> be53d33c3..a23a011df 100644
> --- a/app/test/Makefile
> +++ b/app/test/Makefile
> @@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c  SRCS-y +=
> test_rand_perf.c
> 
>  SRCS-y += test_ring.c
> +SRCS-y += test_ring_mpmc_stress.c
>  SRCS-y += test_ring_perf.c
> +SRCS-y += test_ring_stress.c
>  SRCS-y += test_pmd_perf.c
> 
>  ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
> diff --git a/app/test/meson.build b/app/test/meson.build index
> 04b59cffa..8824f366c 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -100,7 +100,9 @@ test_sources = files('commands.c',
>  	'test_rib.c',
>  	'test_rib6.c',
>  	'test_ring.c',
> +	'test_ring_mpmc_stress.c',
>  	'test_ring_perf.c',
> +	'test_ring_stress.c',
>  	'test_rwlock.c',
>  	'test_sched.c',
>  	'test_service_cores.c',
> diff --git a/app/test/test_ring_mpmc_stress.c
> b/app/test/test_ring_mpmc_stress.c
> new file mode 100644
> index 000000000..1524b0248
> --- /dev/null
> +++ b/app/test/test_ring_mpmc_stress.c
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include "test_ring_stress_impl.h"
> +
> +static inline uint32_t
> +_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
> +	uint32_t *avail)
> +{
> +	return rte_ring_mc_dequeue_bulk(r, obj, n, avail); }
> +
> +static inline uint32_t
> +_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
> +	uint32_t *free)
> +{
> +	return rte_ring_mp_enqueue_bulk(r, obj, n, free); }
> +
> +static int
> +_st_ring_init(struct rte_ring *r, const char *name, uint32_t num) {
> +	return rte_ring_init(r, name, num, 0); }
> +
> +const struct test test_ring_mpmc_stress = {
> +	.name = "MP/MC",
> +	.nb_case = RTE_DIM(tests),
> +	.cases = tests,
> +};
> diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c new file
> mode 100644 index 000000000..60706f799
> --- /dev/null
> +++ b/app/test/test_ring_stress.c
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include "test_ring_stress.h"
> +
> +static int
> +run_test(const struct test *test)
> +{
> +	int32_t rc;
> +	uint32_t i, k;
> +
> +	for (i = 0, k = 0; i != test->nb_case; i++) {
> +
> +		printf("TEST-CASE %s %s START\n",
> +			test->name, test->cases[i].name);
> +
> +		rc = test->cases[i].func(test->cases[i].wfunc);
> +		k += (rc == 0);
> +
> +		if (rc != 0)
> +			printf("TEST-CASE %s %s FAILED\n",
> +				test->name, test->cases[i].name);
> +		else
> +			printf("TEST-CASE %s %s OK\n",
> +				test->name, test->cases[i].name);
> +	}
> +
> +	return k;
> +}
> +
> +static int
> +test_ring_stress(void)
> +{
> +	uint32_t n, k;
> +
> +	n = 0;
> +	k = 0;
> +
> +	n += test_ring_mpmc_stress.nb_case;
> +	k += run_test(&test_ring_mpmc_stress);
> +
> +	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
> +		n, k, n - k);
> +	return (k != n);
> +}
> +
> +REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
> diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h new file
> mode 100644 index 000000000..60eac6216
> --- /dev/null
> +++ b/app/test/test_ring_stress.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +
> +#include <inttypes.h>
> +#include <stddef.h>
> +#include <stdalign.h>
> +#include <string.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +
> +#include <rte_ring.h>
> +#include <rte_cycles.h>
> +#include <rte_launch.h>
> +#include <rte_pause.h>
> +#include <rte_random.h>
> +#include <rte_malloc.h>
> +#include <rte_spinlock.h>
> +
> +#include "test.h"
> +
> +struct test_case {
> +	const char *name;
> +	int (*func)(int (*)(void *));
> +	int (*wfunc)(void *arg);
> +};
> +
> +struct test {
> +	const char *name;
> +	uint32_t nb_case;
> +	const struct test_case *cases;
> +};
> +
> +extern const struct test test_ring_mpmc_stress;
> diff --git a/app/test/test_ring_stress_impl.h
> b/app/test/test_ring_stress_impl.h
> new file mode 100644
> index 000000000..222d62bc4
> --- /dev/null
> +++ b/app/test/test_ring_stress_impl.h
> @@ -0,0 +1,396 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include "test_ring_stress.h"
> +
> +/**
> + * Stress test for ring enqueue/dequeue operations.
> + * Performs the following pattern on each slave worker:
> + * dequeue/read-write data from the dequeued objects/enqueue.
> + * Serves as both functional and performance test of ring
> + * enqueue/dequeue operations under high contention
> + * (for both over committed and non-over committed scenarios).
> + */
> +
> +#define RING_NAME	"RING_STRESS"
> +#define BULK_NUM	32
> +#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
> +
> +enum {
> +	WRK_CMD_STOP,
> +	WRK_CMD_RUN,
> +};
> +
> +static volatile uint32_t wrk_cmd __rte_cache_aligned;
> +
> +/* test run-time in seconds */
> +static const uint32_t run_time = 60;
> +static const uint32_t verbose;
> +
> +struct lcore_stat {
> +	uint64_t nb_cycle;
> +	struct {
> +		uint64_t nb_call;
> +		uint64_t nb_obj;
> +		uint64_t nb_cycle;
> +		uint64_t max_cycle;
> +		uint64_t min_cycle;
> +	} op;
> +};
> +
> +struct lcore_arg {
> +	struct rte_ring *rng;
> +	struct lcore_stat stats;
> +} __rte_cache_aligned;
> +
> +struct ring_elem {
> +	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)]; }
> +__rte_cache_aligned;
> +
> +/*
> + * redefinable functions
> + */
> +static uint32_t
> +_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
> +	uint32_t *avail);
> +
> +static uint32_t
> +_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
> +	uint32_t *free);
> +
> +static int
> +_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
> +
> +
> +static void
> +lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
> +	uint64_t tm, int32_t prcs)
> +{
> +	ls->op.nb_call += call;
> +	ls->op.nb_obj += obj;
> +	ls->op.nb_cycle += tm;
> +	if (prcs) {
> +		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
> +		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
> +	}
> +}
> +
> +static void
> +lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
> +{
> +
> +	ms->op.nb_call += ls->op.nb_call;
> +	ms->op.nb_obj += ls->op.nb_obj;
> +	ms->op.nb_cycle += ls->op.nb_cycle;
> +	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
> +	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle); }
> +
> +static void
> +lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls) {
> +	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
> +	lcore_op_stat_aggr(ms, ls);
> +}
> +
> +static void
> +lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls) {
> +	long double st;
> +
> +	st = (long double)rte_get_timer_hz() / US_PER_S;
> +
> +	if (lc == UINT32_MAX)
> +		fprintf(f, "%s(AGGREGATE)={\n", __func__);
> +	else
> +		fprintf(f, "%s(lcore=%u)={\n", __func__, lc);
> +
> +	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
> +		ls->nb_cycle, (long double)ls->nb_cycle / st);
> +
> +	fprintf(f, "\tDEQ+ENQ={\n");
> +
> +	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
> +	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
> +	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
> +	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
> +		(long double)ls->op.nb_obj / ls->op.nb_call);
> +	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
> +		(long double)ls->op.nb_cycle / ls->op.nb_obj);
> +	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
> +		(long double)ls->op.nb_cycle / ls->op.nb_call);
> +
> +	/* if min/max cycles per call stats was collected */
> +	if (ls->op.min_cycle != UINT64_MAX) {
> +		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
> +			ls->op.max_cycle,
> +			(long double)ls->op.max_cycle / st);
> +		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
> +			ls->op.min_cycle,
> +			(long double)ls->op.min_cycle / st);
> +	}
> +
> +	fprintf(f, "\t},\n");
> +	fprintf(f, "};\n");
> +}
> +
> +static void
> +fill_ring_elm(struct ring_elem *elm, uint32_t fill) {
> +	uint32_t i;
> +
> +	for (i = 0; i != RTE_DIM(elm->cnt); i++)
> +		elm->cnt[i] = fill;
> +}
> +
> +static int32_t
> +check_updt_elem(struct ring_elem *elm[], uint32_t num,
> +	const struct ring_elem *check, const struct ring_elem *fill) {
> +	uint32_t i;
> +
> +	static rte_spinlock_t dump_lock;
> +
> +	for (i = 0; i != num; i++) {
> +		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
> +			rte_spinlock_lock(&dump_lock);
> +			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
> +				"offending object: %p\n",
> +				__func__, rte_lcore_id(), num, i, elm[i]);
> +			rte_memdump(stdout, "expected", check,
> sizeof(*check));
> +			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
> +			rte_spinlock_unlock(&dump_lock);
> +			return -EINVAL;
> +		}
> +		memcpy(elm[i], fill, sizeof(*elm[i]));
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
> +	const char *fname, const char *opname) {
> +	if (exp != res) {
> +		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
> +			fname, lc, opname, exp, res);
> +		return -ENOSPC;
> +	}
> +	return 0;
> +}
> +
> +static int
> +test_worker(void *arg, const char *fname, int32_t prcs) {
> +	int32_t rc;
> +	uint32_t lc, n, num;
> +	uint64_t cl, tm0, tm1;
> +	struct lcore_arg *la;
> +	struct ring_elem def_elm, loc_elm;
> +	struct ring_elem *obj[2 * BULK_NUM];
> +
> +	la = arg;
> +	lc = rte_lcore_id();
> +
> +	fill_ring_elm(&def_elm, UINT32_MAX);
> +	fill_ring_elm(&loc_elm, lc);
> +
> +	while (wrk_cmd != WRK_CMD_RUN) {
> +		rte_smp_rmb();
> +		rte_pause();
> +	}
> +
> +	cl = rte_rdtsc_precise();
> +
> +	do {
> +		/* num in interval [7/8, 11/8] of BULK_NUM */
> +		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
> +
> +		/* reset all pointer values */
> +		memset(obj, 0, sizeof(obj));
> +
> +		/* dequeue num elems */
> +		tm0 = (prcs != 0) ? rte_rdtsc_precise() : 0;
> +		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
> +		tm0 = (prcs != 0) ? rte_rdtsc_precise() - tm0 : 0;
> +
> +		/* check return value and objects */
> +		rc = check_ring_op(num, n, lc, fname,
> +			RTE_STR(_st_ring_dequeue_bulk));
> +		if (rc == 0)
> +			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
> +		if (rc != 0)
> +			break;
> +
> +		/* enqueue num elems */
> +		rte_compiler_barrier();
> +		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
> +		if (rc != 0)
> +			break;
> +
> +		tm1 = (prcs != 0) ? rte_rdtsc_precise() : 0;
> +		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
> +		tm1 = (prcs != 0) ? rte_rdtsc_precise() - tm1 : 0;
> +
> +		/* check return value */
> +		rc = check_ring_op(num, n, lc, fname,
> +			RTE_STR(_st_ring_enqueue_bulk));
> +		if (rc != 0)
> +			break;
> +
> +		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, prcs);
> +
> +	} while (wrk_cmd == WRK_CMD_RUN);
> +
> +	cl = rte_rdtsc_precise() - cl;
> +	if (prcs == 0)
> +		lcore_stat_update(&la->stats, 0, 0, cl, 0);
> +	la->stats.nb_cycle = cl;
> +	return rc;
> +}
> +static int
> +test_worker_prcs(void *arg)
> +{
> +	return test_worker(arg, __func__, 1);
> +}
> +
> +static int
> +test_worker_avg(void *arg)
> +{
> +	return test_worker(arg, __func__, 0);
> +}
> +
> +static void
> +mt1_fini(struct rte_ring *rng, void *data) {
> +	rte_free(rng);
> +	rte_free(data);
> +}
> +
> +static int
> +mt1_init(struct rte_ring **rng, void **data, uint32_t num) {
> +	int32_t rc;
> +	size_t sz;
> +	uint32_t i, nr;
> +	struct rte_ring *r;
> +	struct ring_elem *elm;
> +	void *p;
> +
> +	*rng = NULL;
> +	*data = NULL;
> +
> +	sz = num * sizeof(*elm);
> +	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
> +	if (elm == NULL) {
> +		printf("%s: alloc(%zu) for %u elems data failed",
> +			__func__, sz, num);
> +		return -ENOMEM;
> +	}
> +
> +	*data = elm;
> +
> +	/* alloc ring */
> +	nr = 2 * num;
> +	sz = rte_ring_get_memsize(nr);
> +	r = rte_zmalloc(NULL, sz, __alignof__(*r));
> +	if (r == NULL) {
> +		printf("%s: alloc(%zu) for FIFO with %u elems failed",
> +			__func__, sz, nr);
> +		return -ENOMEM;
> +	}
> +
> +	*rng = r;
> +
> +	rc = _st_ring_init(r, RING_NAME, nr);
> +	if (rc != 0) {
> +		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
> +			__func__, r, nr, rc, strerror(-rc));
> +		return rc;
> +	}
> +
> +	for (i = 0; i != num; i++) {
> +		fill_ring_elm(elm + i, UINT32_MAX);
> +		p = elm + i;
> +		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
> +			break;
> +	}
> +
> +	if (i != num) {
> +		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
> +			__func__, r, num, i);
> +		return -ENOSPC;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +test_mt1(int (*test)(void *))
> +{
> +	int32_t rc;
> +	uint32_t lc, mc;
> +	struct rte_ring *r;
> +	void *data;
> +	struct lcore_arg arg[RTE_MAX_LCORE];
> +
> +	static const struct lcore_stat init_stat = {
> +		.op.min_cycle = UINT64_MAX,
> +	};
> +
> +	rc = mt1_init(&r, &data, RING_SIZE);
> +	if (rc != 0) {
> +		mt1_fini(r, data);
> +		return rc;
> +	}
> +
> +	memset(arg, 0, sizeof(arg));
> +
> +	/* launch on all slaves */
> +	RTE_LCORE_FOREACH_SLAVE(lc) {
> +		arg[lc].rng = r;
> +		arg[lc].stats = init_stat;
> +		rte_eal_remote_launch(test, &arg[lc], lc);
> +	}
> +
> +	/* signal worker to start test */
> +	wrk_cmd = WRK_CMD_RUN;
> +	rte_smp_wmb();
> +
> +	usleep(run_time * US_PER_S);
> +
> +	/* signal worker to start test */
> +	wrk_cmd = WRK_CMD_STOP;
> +	rte_smp_wmb();
> +
> +	/* wait for slaves and collect stats. */
> +	mc = rte_lcore_id();
> +	arg[mc].stats = init_stat;
> +
> +	rc = 0;
> +	RTE_LCORE_FOREACH_SLAVE(lc) {
> +		rc |= rte_eal_wait_lcore(lc);
> +		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
> +		if (verbose != 0)
> +			lcore_stat_dump(stdout, lc, &arg[lc].stats);
> +	}
> +
> +	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
> +	mt1_fini(r, data);
> +	return rc;
> +}
> +
> +static const struct test_case tests[] = {
> +	{
> +		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
> +		.func = test_mt1,
> +		.wfunc = test_worker_prcs,
> +	},
> +	{
> +		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
> +		.func = test_mt1,
> +		.wfunc = test_worker_avg,
> +	},
> +};
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 2/9] ring: prepare ring to allow new sync schemes
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-04-19  2:31             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:31 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> 
> To make these preparations two main things are done:
> - Change from *single* to *sync_type* to allow different
>   synchronisation schemes to be applied.
>   Mark *single* as deprecated in comments.
>   Add new functions to allow user to query ring sync types.
>   Replace direct access to *single* with appropriate function call.
> - Move actual rte_ring and related structures definitions into a
>   separate file: <rte_ring_core.h>. It allows to refer contents
>   of <rte_ring_elem.h> from <rte_ring.h> without introducing a
>   circular dependency.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
few nits inline, otherwise
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  app/test/test_pdump.c           |   6 +-
>  lib/librte_pdump/rte_pdump.c    |   2 +-
>  lib/librte_port/rte_port_ring.c |  12 +--
>  lib/librte_ring/Makefile        |   1 +
>  lib/librte_ring/meson.build     |   1 +
>  lib/librte_ring/rte_ring.c      |   6 +-
>  lib/librte_ring/rte_ring.h      | 170 ++++++++++++++------------------
>  lib/librte_ring/rte_ring_core.h | 132 +++++++++++++++++++++++++
> lib/librte_ring/rte_ring_elem.h |  42 +++-----
>  9 files changed, 234 insertions(+), 138 deletions(-)  create mode 100644
> lib/librte_ring/rte_ring_core.h
> 
> diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c index
> ad183184c..6a1180bcb 100644
> --- a/app/test/test_pdump.c
> +++ b/app/test/test_pdump.c
> @@ -57,8 +57,7 @@ run_pdump_client_tests(void)
>  	if (ret < 0)
>  		return -1;
>  	mp->flags = 0x0000;
> -	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
> -				      RING_F_SP_ENQ | RING_F_SC_DEQ);
> +	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
>  	if (ring_client == NULL) {
>  		printf("rte_ring_create SR0 failed");
>  		return -1;
> @@ -71,9 +70,6 @@ run_pdump_client_tests(void)
>  	}
>  	rte_eth_dev_probing_finish(eth_dev);
> 
> -	ring_client->prod.single = 0;
> -	ring_client->cons.single = 0;
> -
>  	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
> 
>  	for (itr = 0; itr < NUM_ITR; itr++) {
> diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
> index 8a01ac510..f96709f95 100644
> --- a/lib/librte_pdump/rte_pdump.c
> +++ b/lib/librte_pdump/rte_pdump.c
> @@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct
> rte_mempool *mp)
>  		rte_errno = EINVAL;
>  		return -1;
>  	}
> -	if (ring->prod.single || ring->cons.single) {
> +	if (rte_ring_is_prod_single(ring) || rte_ring_is_cons_single(ring)) {
>  		PDUMP_LOG(ERR, "ring with either SP or SC settings"
>  		" is not valid for pdump, should have MP and MC settings\n");
>  		rte_errno = EINVAL;
> diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
> index 47fcdd06a..52b2d8e55 100644
> --- a/lib/librte_port/rte_port_ring.c
> +++ b/lib/librte_port/rte_port_ring.c
> @@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int
> socket_id,
>  	/* Check input parameters */
>  	if ((conf == NULL) ||
>  		(conf->ring == NULL) ||
> -		(conf->ring->cons.single && is_multi) ||
> -		(!(conf->ring->cons.single) && !is_multi)) {
> +		(rte_ring_is_cons_single(conf->ring) && is_multi) ||
> +		(!rte_ring_is_cons_single(conf->ring) && !is_multi)) {
>  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
>  		return NULL;
>  	}
> @@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params,
> int socket_id,
>  	/* Check input parameters */
>  	if ((conf == NULL) ||
>  		(conf->ring == NULL) ||
> -		(conf->ring->prod.single && is_multi) ||
> -		(!(conf->ring->prod.single) && !is_multi) ||
> +		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
> +		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
>  		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
>  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
>  		return NULL;
> @@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void
> *params, int socket_id,
>  	/* Check input parameters */
>  	if ((conf == NULL) ||
>  		(conf->ring == NULL) ||
> -		(conf->ring->prod.single && is_multi) ||
> -		(!(conf->ring->prod.single) && !is_multi) ||
> +		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
> +		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
>  		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
>  		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
>  		return NULL;
> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> 28368e6d1..6572768c9 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> 
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
> +					rte_ring_core.h \
>  					rte_ring_elem.h \
>  					rte_ring_generic.h \
>  					rte_ring_c11_mem.h
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> 05402e4f0..c656781da 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -3,6 +3,7 @@
> 
>  sources = files('rte_ring.c')
>  headers = files('rte_ring.h',
> +		'rte_ring_core.h',
>  		'rte_ring_elem.h',
>  		'rte_ring_c11_mem.h',
>  		'rte_ring_generic.h')
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> 77e5de099..fa5733907 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name,
> unsigned count,
>  	if (ret < 0 || ret >= (int)sizeof(r->name))
>  		return -ENAMETOOLONG;
>  	r->flags = flags;
> -	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
> -	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
> +	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
> +		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> +	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
> +		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> 
>  	if (flags & RING_F_EXACT_SZ) {
>  		r->size = rte_align32pow2(count + 1); diff --git
> a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> 18fc5d845..35ee4491c 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -36,91 +36,7 @@
>  extern "C" {
>  #endif
> 
> -#include <stdio.h>
> -#include <stdint.h>
> -#include <sys/queue.h>
> -#include <errno.h>
> -#include <rte_common.h>
> -#include <rte_config.h>
> -#include <rte_memory.h>
> -#include <rte_lcore.h>
> -#include <rte_atomic.h>
> -#include <rte_branch_prediction.h>
> -#include <rte_memzone.h>
> -#include <rte_pause.h>
> -
> -#define RTE_TAILQ_RING_NAME "RTE_RING"
> -
> -enum rte_ring_queue_behavior {
> -	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items
> from a ring */
> -	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible
> from ring */
> -};
> -
> -#define RTE_RING_MZ_PREFIX "RG_"
> -/** The maximum length of a ring name. */ -#define RTE_RING_NAMESIZE
> (RTE_MEMZONE_NAMESIZE - \
> -			   sizeof(RTE_RING_MZ_PREFIX) + 1)
> -
> -/* structure to hold a pair of head/tail values and other metadata */ -struct
> rte_ring_headtail {
> -	volatile uint32_t head;  /**< Prod/consumer head. */
> -	volatile uint32_t tail;  /**< Prod/consumer tail. */
> -	uint32_t single;         /**< True if single prod/cons */
> -};
> -
> -/**
> - * An RTE ring structure.
> - *
> - * The producer and the consumer have a head and a tail index. The
> particularity
> - * of these index is that they are not between 0 and size(ring). These indexes
> - * are between 0 and 2^32, and we mask their value when we access the
> ring[]
> - * field. Thanks to this assumption, we can do subtractions between 2 index
> - * values in a modulo-32bit base: that's why the overflow of the indexes is
> not
> - * a problem.
> - */
> -struct rte_ring {
> -	/*
> -	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
> -	 * compatibility requirements, it could be changed to
> RTE_RING_NAMESIZE
> -	 * next time the ABI changes
> -	 */
> -	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned; /**<
> Name of the ring. */
> -	int flags;               /**< Flags supplied at creation. */
> -	const struct rte_memzone *memzone;
> -			/**< Memzone, if any, containing the rte_ring */
> -	uint32_t size;           /**< Size of ring. */
> -	uint32_t mask;           /**< Mask (size-1) of ring. */
> -	uint32_t capacity;       /**< Usable size of ring */
> -
> -	char pad0 __rte_cache_aligned; /**< empty cache line */
> -
> -	/** Ring producer status. */
> -	struct rte_ring_headtail prod __rte_cache_aligned;
> -	char pad1 __rte_cache_aligned; /**< empty cache line */
> -
> -	/** Ring consumer status. */
> -	struct rte_ring_headtail cons __rte_cache_aligned;
> -	char pad2 __rte_cache_aligned; /**< empty cache line */
> -};
> -
> -#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-
> producer". */ -#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is
> "single-consumer". */
These do not appear in the API document anymore

> -/**
> - * Ring is to hold exactly requested number of entries.
> - * Without this flag set, the ring size requested must be a power of 2, and the
> - * usable space will be that size - 1. With the flag, the requested size will
> - * be rounded up to the next power of two, but the usable space will be
> exactly
> - * that requested. Worst case, if a power-of-2 size is requested, half the
> - * ring space will be wasted.
> - */
> -#define RING_F_EXACT_SZ 0x0004
> -#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
> -
> -/* @internal defines for passing to the enqueue dequeue worker functions */
> -#define __IS_SP 1 -#define __IS_MP 0 -#define __IS_SC 1 -#define __IS_MC 0
> +#include <rte_ring_core.h>
> 
>  /**
>   * Calculate the memory size needed for a ring @@ -420,7 +336,7 @@
> rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			__IS_MP, free_space);
> +			RTE_RING_SYNC_MT, free_space);
>  }
> 
>  /**
> @@ -443,9 +359,13 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			__IS_SP, free_space);
> +			RTE_RING_SYNC_ST, free_space);
>  }
> 
> +#ifdef ALLOW_EXPERIMENTAL_API
> +#include <rte_ring_elem.h>
> +#endif
> +
>  /**
>   * Enqueue several objects on a ring.
>   *
> @@ -470,7 +390,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			r->prod.single, free_space);
> +			r->prod.sync_type, free_space);
>  }
> 
>  /**
> @@ -554,7 +474,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void
> **obj_table,
>  		unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			__IS_MC, available);
> +			RTE_RING_SYNC_MT, available);
>  }
> 
>  /**
> @@ -578,7 +498,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void
> **obj_table,
>  		unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			__IS_SC, available);
> +			RTE_RING_SYNC_ST, available);
>  }
> 
>  /**
> @@ -605,7 +525,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> **obj_table, unsigned int n,
>  		unsigned int *available)
>  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -				r->cons.single, available);
> +				r->cons.sync_type, available);
>  }
> 
>  /**
> @@ -777,6 +697,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
>  	return r->capacity;
>  }
> 
> +/**
> + * Return sync type used by producer in the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   Producer sync type value.
> + */
> +static inline enum rte_ring_sync_type
> +rte_ring_get_prod_sync_type(const struct rte_ring *r) {
> +	return r->prod.sync_type;
> +}
> +
> +/**
> + * Check is the ring for single producer.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   true if ring is SP, zero otherwise.
> + */
> +static inline int
> +rte_ring_is_prod_single(const struct rte_ring *r) {
> +	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST); }
> +
> +/**
> + * Return sync type used by consumer in the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   Consumer sync type value.
> + */
> +static inline enum rte_ring_sync_type
> +rte_ring_get_cons_sync_type(const struct rte_ring *r) {
> +	return r->cons.sync_type;
> +}
> +
> +/**
> + * Check is the ring for single consumer.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   true if ring is SC, zero otherwise.
> + */
> +static inline int
> +rte_ring_is_cons_single(const struct rte_ring *r) {
> +	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST); }
> +
>  /**
>   * Dump the status of all rings on the console
>   *
> @@ -820,7 +796,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void *
> const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> free_space);
>  }
> 
>  /**
> @@ -843,7 +819,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void *
> const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> free_space);
>  }
> 
>  /**
> @@ -870,7 +846,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void *
> const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_VARIABLE,
> -			r->prod.single, free_space);
> +			r->prod.sync_type, free_space);
>  }
> 
>  /**
> @@ -898,7 +874,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void
> **obj_table,
>  		unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> available);
>  }
> 
>  /**
> @@ -923,7 +899,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void
> **obj_table,
>  		unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> available);
>  }
> 
>  /**
> @@ -951,7 +927,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void
> **obj_table,  {
>  	return __rte_ring_do_dequeue(r, obj_table, n,
>  				RTE_RING_QUEUE_VARIABLE,
> -				r->cons.single, available);
> +				r->cons.sync_type, available);
>  }
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
I like this separation, thanks.

> new file mode 100644 index 000000000..d9cef763f
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_core.h
> @@ -0,0 +1,132 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2020 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_CORE_H_
> +#define _RTE_RING_CORE_H_
> +
> +/**
> + * @file
> + * This file contains definion of RTE ring structure itself,
                                         ^^^^^^^ definition
> + * init flags and some related macros.
> + * For majority of DPDK entities, it is not recommended to include
> + * this file directly, use include <rte_ring.h> or <rte_ring_elem.h>
> + * instead.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <string.h>
> +#include <sys/queue.h>
> +#include <errno.h>
> +#include <rte_common.h>
> +#include <rte_config.h>
> +#include <rte_memory.h>
> +#include <rte_lcore.h>
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_memzone.h>
> +#include <rte_pause.h>
> +#include <rte_debug.h>
> +
> +#define RTE_TAILQ_RING_NAME "RTE_RING"
> +
> +/** enqueue/dequeue behavior types */
> +enum rte_ring_queue_behavior {
> +	/** Enq/Deq a fixed number of items from a ring */
> +	RTE_RING_QUEUE_FIXED = 0,
> +	/** Enq/Deq as many items as possible from ring */
> +	RTE_RING_QUEUE_VARIABLE
> +};
> +
> +#define RTE_RING_MZ_PREFIX "RG_"
> +/** The maximum length of a ring name. */ #define RTE_RING_NAMESIZE
> +(RTE_MEMZONE_NAMESIZE - \
> +			   sizeof(RTE_RING_MZ_PREFIX) + 1)
> +
> +/** prod/cons sync types */
> +enum rte_ring_sync_type {
> +	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
> +	RTE_RING_SYNC_ST,     /**< single thread only */
> +};
> +
> +/**
> + * structures to hold a pair of head/tail values and other metadata.
> + * Depending on sync_type format of that structure might be different,
> + * but offset for *sync_type* and *tail* values should remain the same.
> + */
> +struct rte_ring_headtail {
> +	volatile uint32_t head;      /**< prod/consumer head. */
> +	volatile uint32_t tail;      /**< prod/consumer tail. */
> +	RTE_STD_C11
> +	union {
> +		/** sync type of prod/cons */
> +		enum rte_ring_sync_type sync_type;
> +		/** deprecated -  True if single prod/cons */
> +		uint32_t single;
> +	};
> +};
> +
> +/**
> + * An RTE ring structure.
> + *
> + * The producer and the consumer have a head and a tail index. The
> +particularity
> + * of these index is that they are not between 0 and size(ring). These
> +indexes
> + * are between 0 and 2^32, and we mask their value when we access the
> +ring[]
> + * field. Thanks to this assumption, we can do subtractions between 2
> +index
> + * values in a modulo-32bit base: that's why the overflow of the
> +indexes is not
> + * a problem.
> + */
> +struct rte_ring {
> +	/*
> +	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
> +	 * compatibility requirements, it could be changed to
> RTE_RING_NAMESIZE
> +	 * next time the ABI changes
> +	 */
> +	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned;
> +	/**< Name of the ring. */
> +	int flags;               /**< Flags supplied at creation. */
> +	const struct rte_memzone *memzone;
> +			/**< Memzone, if any, containing the rte_ring */
> +	uint32_t size;           /**< Size of ring. */
> +	uint32_t mask;           /**< Mask (size-1) of ring. */
> +	uint32_t capacity;       /**< Usable size of ring */
> +
> +	char pad0 __rte_cache_aligned; /**< empty cache line */
> +
> +	/** Ring producer status. */
> +	struct rte_ring_headtail prod __rte_cache_aligned;
> +	char pad1 __rte_cache_aligned; /**< empty cache line */
> +
> +	/** Ring consumer status. */
> +	struct rte_ring_headtail cons __rte_cache_aligned;
> +	char pad2 __rte_cache_aligned; /**< empty cache line */ };
> +
> +#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is
> +"single-producer". */ #define RING_F_SC_DEQ 0x0002 /**< The default
> +dequeue is "single-consumer". */
> +/**
> + * Ring is to hold exactly requested number of entries.
> + * Without this flag set, the ring size requested must be a power of 2,
> +and the
> + * usable space will be that size - 1. With the flag, the requested
> +size will
> + * be rounded up to the next power of two, but the usable space will be
> +exactly
> + * that requested. Worst case, if a power-of-2 size is requested, half
> +the
> + * ring space will be wasted.
> + */
> +#define RING_F_EXACT_SZ 0x0004
> +#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RING_CORE_H_ */
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 663addc73..7406c0b0f 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -20,21 +20,7 @@
>  extern "C" {
>  #endif
> 
> -#include <stdio.h>
> -#include <stdint.h>
> -#include <string.h>
> -#include <sys/queue.h>
> -#include <errno.h>
> -#include <rte_common.h>
> -#include <rte_config.h>
> -#include <rte_memory.h>
> -#include <rte_lcore.h>
> -#include <rte_atomic.h>
> -#include <rte_branch_prediction.h>
> -#include <rte_memzone.h>
> -#include <rte_pause.h>
> -
> -#include "rte_ring.h"
> +#include <rte_ring_core.h>
> 
>  /**
>   * @warning
> @@ -510,7 +496,7 @@ rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r,
> const void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
> +			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT,
> free_space);
>  }
> 
>  /**
> @@ -539,7 +525,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r,
> const void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
> +			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST,
> free_space);
>  }
> 
>  /**
> @@ -570,7 +556,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const
> void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
> +			RTE_RING_QUEUE_FIXED, r->prod.sync_type,
> free_space);
>  }
> 
>  /**
> @@ -675,7 +661,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r,
> void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -				RTE_RING_QUEUE_FIXED, __IS_MC, available);
> +				RTE_RING_QUEUE_FIXED,
> RTE_RING_SYNC_MT, available);
>  }
> 
>  /**
> @@ -703,7 +689,7 @@ rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r,
> void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, __IS_SC, available);
> +			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST,
> available);
>  }
> 
>  /**
> @@ -734,7 +720,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void
> *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, r->cons.single, available);
> +			RTE_RING_QUEUE_FIXED, r->cons.sync_type,
> available);
>  }
> 
>  /**
> @@ -842,7 +828,7 @@ rte_ring_mp_enqueue_burst_elem(struct rte_ring *r,
> const void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> free_space);
>  }
> 
>  /**
> @@ -871,7 +857,7 @@ rte_ring_sp_enqueue_burst_elem(struct rte_ring *r,
> const void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> free_space);
>  }
> 
>  /**
> @@ -902,7 +888,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r,
> const void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_VARIABLE, r->prod.single,
> free_space);
> +			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type,
> free_space);
>  }
> 
>  /**
> @@ -934,7 +920,7 @@ rte_ring_mc_dequeue_burst_elem(struct rte_ring *r,
> void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> available);
>  }
> 
>  /**
> @@ -963,7 +949,7 @@ rte_ring_sc_dequeue_burst_elem(struct rte_ring *r,
> void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
> +			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> available);
>  }
> 
>  /**
> @@ -995,9 +981,11 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r,
> void *obj_table,  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
>  				RTE_RING_QUEUE_VARIABLE,
> -				r->cons.single, available);
> +				r->cons.sync_type, available);
>  }
> 
> +#include <rte_ring.h>
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 3/9] ring: introduce RTS ring mode
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 3/9] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-04-19  2:31             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:31 UTC (permalink / raw)
  To: Konstantin Ananyev, dev; +Cc: david.marchand, jielong.zjl, nd

<snip>

> 
> Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
> Aim to reduce stall times in case when ring is used on overcommited cpus
> (multiple active threads on the same cpu).
> The main difference from original MP/MC algorithm is that tail value is
> increased not by every thread that finished enqueue/dequeue, but only by the
> last one.
> That allows threads to avoid spinning on ring tail value, leaving actual tail
> value change to the last thread in the update queue.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Few nits, otherwise
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
> 
> check-abi.sh reports what I believe is a false-positive about ring cons/prod
> changes. As a workaround, devtools/libabigail.abignore is updated to
> suppress *struct ring* related errors.
> 
>  devtools/libabigail.abignore           |   7 +
>  lib/librte_ring/Makefile               |   4 +-
>  lib/librte_ring/meson.build            |   7 +-
>  lib/librte_ring/rte_ring.c             | 100 +++++-
>  lib/librte_ring/rte_ring.h             |  70 +++-
>  lib/librte_ring/rte_ring_core.h        |  36 +-
>  lib/librte_ring/rte_ring_elem.h        |  90 ++++-
>  lib/librte_ring/rte_ring_rts.h         | 439 +++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_rts_c11_mem.h | 179 ++++++++++
>  9 files changed, 902 insertions(+), 30 deletions(-)  create mode 100644
> lib/librte_ring/rte_ring_rts.h  create mode 100644
> lib/librte_ring/rte_ring_rts_c11_mem.h
> 
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index
> a59df8f13..cd86d89ca 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -11,3 +11,10 @@
>          type_kind = enum
>          name = rte_crypto_asym_xform_type
>          changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> +; Ignore updates of ring prod/cons
> +[suppress_type]
> +        type_kind = struct
> +        name = rte_ring
> +[suppress_type]
> +        type_kind = struct
> +        name = rte_event_ring
> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> 6572768c9..04e446e37 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -19,6 +19,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> rte_ring.h \
>  					rte_ring_core.h \
>  					rte_ring_elem.h \
>  					rte_ring_generic.h \
> -					rte_ring_c11_mem.h
> +					rte_ring_c11_mem.h \
> +					rte_ring_rts.h \
> +					rte_ring_rts_c11_mem.h
> 
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> c656781da..a95598032 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -6,4 +6,9 @@ headers = files('rte_ring.h',
>  		'rte_ring_core.h',
>  		'rte_ring_elem.h',
>  		'rte_ring_c11_mem.h',
> -		'rte_ring_generic.h')
> +		'rte_ring_generic.h',
> +		'rte_ring_rts.h',
> +		'rte_ring_rts_c11_mem.h')
> +
> +# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
> +allow_experimental_apis = true
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index
> fa5733907..222eec0fb 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
>  /* true if x is a power of 2 */
>  #define POWEROF2(x) ((((x)-1) & (x)) == 0)
> 
> +/* by default set head/tail distance as 1/8 of ring capacity */
> +#define HTD_MAX_DEF	8
> +
>  /* return the size of memory occupied by a ring */  ssize_t
> rte_ring_get_memsize_elem(unsigned int esize, unsigned int count) @@ -
> 79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
>  	return rte_ring_get_memsize_elem(sizeof(void *), count);  }
> 
> +/*
> + * internal helper function to reset prod/cons head-tail values.
> + */
> +static void
> +reset_headtail(void *p)
The internal functions have used __rte prefix in ring library. I think we should follow the same here.

> +{
> +	struct rte_ring_headtail *ht;
> +	struct rte_ring_rts_headtail *ht_rts;
> +
> +	ht = p;
> +	ht_rts = p;
> +
> +	switch (ht->sync_type) {
> +	case RTE_RING_SYNC_MT:
> +	case RTE_RING_SYNC_ST:
> +		ht->head = 0;
> +		ht->tail = 0;
> +		break;
> +	case RTE_RING_SYNC_MT_RTS:
> +		ht_rts->head.raw = 0;
> +		ht_rts->tail.raw = 0;
> +		break;
> +	default:
> +		/* unknown sync mode */
> +		RTE_ASSERT(0);
> +	}
> +}
> +
>  void
>  rte_ring_reset(struct rte_ring *r)
>  {
> -	r->prod.head = r->cons.head = 0;
> -	r->prod.tail = r->cons.tail = 0;
> +	reset_headtail(&r->prod);
> +	reset_headtail(&r->cons);
> +}
> +
> +/*
> + * helper function, calculates sync_type values for prod and cons
> + * based on input flags. Returns zero at success or negative
> + * errno value otherwise.
> + */
> +static int
> +get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
> +	enum rte_ring_sync_type *cons_st)
The internal functions have used __rte prefix in ring library. I think we should follow the same here.
Also, it will help avoid symbol clashes.

> +{
> +	static const uint32_t prod_st_flags =
> +		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
> +	static const uint32_t cons_st_flags =
> +		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
> +
> +	switch (flags & prod_st_flags) {
> +	case 0:
> +		*prod_st = RTE_RING_SYNC_MT;
> +		break;
> +	case RING_F_SP_ENQ:
> +		*prod_st = RTE_RING_SYNC_ST;
> +		break;
> +	case RING_F_MP_RTS_ENQ:
> +		*prod_st = RTE_RING_SYNC_MT_RTS;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	switch (flags & cons_st_flags) {
> +	case 0:
> +		*cons_st = RTE_RING_SYNC_MT;
> +		break;
> +	case RING_F_SC_DEQ:
> +		*cons_st = RTE_RING_SYNC_ST;
> +		break;
> +	case RING_F_MC_RTS_DEQ:
> +		*cons_st = RTE_RING_SYNC_MT_RTS;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return 0;
>  }
> 
>  int
> @@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name,
> unsigned count,
>  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
>  			  RTE_CACHE_LINE_MASK) != 0);
> 
> +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
> +		offsetof(struct rte_ring_rts_headtail, sync_type));
> +	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
> +		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
> +
>  	/* init the ring structure */
>  	memset(r, 0, sizeof(*r));
>  	ret = strlcpy(r->name, name, sizeof(r->name));
>  	if (ret < 0 || ret >= (int)sizeof(r->name))
>  		return -ENAMETOOLONG;
>  	r->flags = flags;
> -	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
> -		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> -	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
> -		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
> +	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
> +	if (ret != 0)
> +		return ret;
> 
>  	if (flags & RING_F_EXACT_SZ) {
>  		r->size = rte_align32pow2(count + 1); @@ -126,8 +206,12
> @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
>  		r->mask = count - 1;
>  		r->capacity = r->mask;
>  	}
> -	r->prod.head = r->cons.head = 0;
> -	r->prod.tail = r->cons.tail = 0;
> +
> +	/* set default values for head-tail distance */
> +	if (flags & RING_F_MP_RTS_ENQ)
> +		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
> +	if (flags & RING_F_MC_RTS_DEQ)
> +		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
> 
>  	return 0;
>  }
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> 35ee4491c..77f206ca7 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -1,6 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   *
> - * Copyright (c) 2010-2017 Intel Corporation
> + * Copyright (c) 2010-2020 Intel Corporation
>   * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
>   * All rights reserved.
>   * Derived from FreeBSD's bufring.h
> @@ -389,8 +389,21 @@ static __rte_always_inline unsigned int
> rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			r->prod.sync_type, free_space);
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
> #ifdef
> +ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> +			free_space);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	return 0;
>  }
> 
>  /**
> @@ -524,8 +537,20 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
>  		unsigned int *available)
>  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -				r->cons.sync_type, available);
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> #ifdef
> +ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> available);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	return 0;
>  }
> 
>  /**
> @@ -845,8 +870,21 @@ static __rte_always_inline unsigned
> rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_VARIABLE,
> -			r->prod.sync_type, free_space);
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> free_space);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
> #ifdef
> +ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> +			free_space);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	return 0;
>  }
> 
>  /**
> @@ -925,9 +963,21 @@ static __rte_always_inline unsigned
> rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> -				RTE_RING_QUEUE_VARIABLE,
> -				r->cons.sync_type, available);
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> #ifdef
> +ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> +			available);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	return 0;
>  }
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
> index d9cef763f..ded0fa0b7 100644
> --- a/lib/librte_ring/rte_ring_core.h
> +++ b/lib/librte_ring/rte_ring_core.h
> @@ -57,6 +57,9 @@ enum rte_ring_queue_behavior {  enum
> rte_ring_sync_type {
>  	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
>  	RTE_RING_SYNC_ST,     /**< single thread only */
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
These need to be documented in rte_ring_init, rte_ring_create, rte_ring_create_elem API comments.
Also, please check if you want to update the file description in rte_ring.h in brief to capture the new features.

> #endif
>  };
> 
>  /**
> @@ -76,6 +79,22 @@ struct rte_ring_headtail {
>  	};
>  };
> 
> +union rte_ring_rts_poscnt {
I think this is internal structure, prefix can be __rte

> +	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
> +	uint64_t raw __rte_aligned(8);
> +	struct {
> +		uint32_t cnt; /**< head/tail reference counter */
> +		uint32_t pos; /**< head/tail position */
> +	} val;
> +};
> +
> +struct rte_ring_rts_headtail {
Same here, the prefix can be __rte

> +	volatile union rte_ring_rts_poscnt tail;
> +	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
> +	uint32_t htd_max;   /**< max allowed distance between head/tail */
> +	volatile union rte_ring_rts_poscnt head; };
> +
>  /**
>   * An RTE ring structure.
>   *
> @@ -104,11 +123,21 @@ struct rte_ring {
>  	char pad0 __rte_cache_aligned; /**< empty cache line */
> 
>  	/** Ring producer status. */
> -	struct rte_ring_headtail prod __rte_cache_aligned;
> +	RTE_STD_C11
> +	union {
> +		struct rte_ring_headtail prod;
> +		struct rte_ring_rts_headtail rts_prod;
> +	}  __rte_cache_aligned;
> +
>  	char pad1 __rte_cache_aligned; /**< empty cache line */
> 
>  	/** Ring consumer status. */
> -	struct rte_ring_headtail cons __rte_cache_aligned;
> +	RTE_STD_C11
> +	union {
> +		struct rte_ring_headtail cons;
> +		struct rte_ring_rts_headtail rts_cons;
> +	}  __rte_cache_aligned;
> +
>  	char pad2 __rte_cache_aligned; /**< empty cache line */  };
> 
> @@ -125,6 +154,9 @@ struct rte_ring {
>  #define RING_F_EXACT_SZ 0x0004
>  #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
> 
> +#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP
> RTS".
> +*/ #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC
> +RTS". */
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 7406c0b0f..6da0a917b 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -528,6 +528,10 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r,
> const void *obj_table,
>  			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST,
> free_space);  }
> 
> +#ifdef ALLOW_EXPERIMENTAL_API
> +#include <rte_ring_rts.h>
> +#endif
> +
>  /**
>   * Enqueue several objects on a ring.
>   *
> @@ -557,6 +561,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r,
> const void *obj_table,  {
>  	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
>  			RTE_RING_QUEUE_FIXED, r->prod.sync_type,
> free_space);
> +
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
> +			free_space);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
> +			free_space);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table,
> esize, n,
> +			free_space);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	if (free_space != NULL)
> +		*free_space = 0;
> +	return 0;
>  }
> 
>  /**
> @@ -661,7 +685,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r,
> void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
>  	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -				RTE_RING_QUEUE_FIXED,
> RTE_RING_SYNC_MT, available);
> +			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT,
> available);
>  }
> 
>  /**
> @@ -719,8 +743,25 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_FIXED, r->cons.sync_type,
> available);
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
> +			available);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
> +			available);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table,
> esize,
> +			n, available);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	if (available != NULL)
> +		*available = 0;
> +	return 0;
>  }
> 
>  /**
> @@ -887,8 +928,25 @@ static __rte_always_inline unsigned
> rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
> -			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type,
> free_space);
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize,
> n,
> +			free_space);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
> +			free_space);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table,
> esize,
> +			n, free_space);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	if (free_space != NULL)
> +		*free_space = 0;
> +	return 0;
>  }
> 
>  /**
> @@ -979,9 +1037,25 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
>  		unsigned int esize, unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
> -				RTE_RING_QUEUE_VARIABLE,
> -				r->cons.sync_type, available);
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_MT:
> +		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
> +			available);
> +	case RTE_RING_SYNC_ST:
> +		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
> +			available);
> +#ifdef ALLOW_EXPERIMENTAL_API
> +	case RTE_RING_SYNC_MT_RTS:
> +		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table,
> esize,
> +			n, available);
> +#endif
> +	}
> +
> +	/* valid ring should never reach this point */
> +	RTE_ASSERT(0);
> +	if (available != NULL)
> +		*available = 0;
> +	return 0;
>  }
> 
>  #include <rte_ring.h>
> diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h new
> file mode 100644 index 000000000..8ced07096
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_rts.h
> @@ -0,0 +1,439 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2020 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_RTS_H_
> +#define _RTE_RING_RTS_H_
> +
> +/**
> + * @file rte_ring_rts.h
> + * @b EXPERIMENTAL: this API may change without prior notice
> + * It is not recommended to include this file directly.
> + * Please include <rte_ring.h> instead.
> + *
> + * Contains functions for Relaxed Tail Sync (RTS) ring mode.
> + * The main idea remains the same as for our original MP/MC
> +synchronization
> + * mechanism.
> + * The main difference is that tail value is increased not
> + * by every thread that finished enqueue/dequeue,
> + * but only by the current last one doing enqueue/dequeue.
> + * That allows threads to skip spinning on tail value,
> + * leaving actual tail value change to last thread at a given instance.
> + * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
> + * one for head update, second for tail update.
> + * As a gain it allows thread to avoid spinning/waiting on tail value.
> + * In comparision original MP/MC algorithm requires one 32-bit CAS
> + * for head update and waiting/spinning on tail value.
> + *
> + * Brief outline:
> + *  - introduce update counter (cnt) for both head and tail.
> + *  - increment head.cnt for each head.value update
> + *  - write head.value and head.cnt atomically (64-bit CAS)
> + *  - move tail.value ahead only when tail.cnt + 1 == head.cnt
> + *    (indicating that this is the last thread updating the tail)
> + *  - increment tail.cnt when each enqueue/dequeue op finishes
> + *    (no matter if tail.value going to change or not)
> + *  - write tail.value and tail.cnt atomically (64-bit CAS)
> + *
> + * To avoid producer/consumer starvation:
> + *  - limit max allowed distance between head and tail value (HTD_MAX).
> + *    I.E. thread is allowed to proceed with changing head.value,
> + *    only when:  head.value - tail.value <= HTD_MAX
> + * HTD_MAX is an optional parameter.
> + * With HTD_MAX == 0 we'll have fully serialized ring -
> + * i.e. only one thread at a time will be able to enqueue/dequeue
> + * to/from the ring.
> + * With HTD_MAX >= ring.capacity - no limitation.
> + * By default HTD_MAX == ring.capacity / 8.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_ring_rts_c11_mem.h>
> +
> +/**
> + * @internal Enqueue several objects on the RTS ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
> + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> + * @param free_space
> + *   returns the amount of space after the enqueue operation has finished
> + * @return
> + *   Actual number of objects enqueued.
> + *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, const void *obj_table,
> +	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
> +	uint32_t *free_space)
> +{
> +	uint32_t free, head;
> +
> +	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
> +
> +	if (n != 0) {
> +		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
> +		__rte_ring_rts_update_tail(&r->rts_prod);
> +	}
> +
> +	if (free_space != NULL)
> +		*free_space = free - n;
> +	return n;
> +}
> +
> +/**
> + * @internal Dequeue several objects from the RTS ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to pull from the ring.
> + * @param behavior
> + *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> ring
> + *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> + * @param available
> + *   returns the number of remaining ring entries after the dequeue has
> finished
> + * @return
> + *   - Actual number of objects dequeued.
> + *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void *obj_table,
> +	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
> +	uint32_t *available)
> +{
> +	uint32_t entries, head;
> +
> +	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
> +
> +	if (n != 0) {
> +		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
> +		__rte_ring_rts_update_tail(&r->rts_cons);
> +	}
> +
> +	if (available != NULL)
> +		*available = entries - n;
> +	return n;
> +}
> +
> +/**
> + * Enqueue several objects on the RTS ring (multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, const void
> *obj_table,
> +	unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, free_space);
> +}
> +
> +/**
> + * Dequeue several objects from an RTS ring (multi-consumers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
> +	unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, available);
> +}
> +
> +/**
> + * Enqueue several objects on the RTS ring (multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned
> +rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, const void
> *obj_table,
> +	unsigned int esize, unsigned int n, unsigned int *free_space) {
> +	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, free_space); }
> +
> +/**
> + * Dequeue several objects from an RTS  ring (multi-consumers safe).
> + * When the requested objects are more than the available objects,
> + * only dequeue the actual number of objects.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - n: Actual number of objects dequeued, 0 if ring is empty
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned
> +rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
> +	unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, available); }
> +
> +/**
> + * Enqueue several objects on the RTS ring (multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> +			 unsigned int n, unsigned int *free_space) {
> +	return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table,
> +			sizeof(uintptr_t), n, free_space);
> +}
> +
> +/**
> + * Dequeue several objects from an RTS ring (multi-consumers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table,
> +			sizeof(uintptr_t), n, available);
> +}
> +
> +/**
> + * Enqueue several objects on the RTS ring (multi-producers safe).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   - n: Actual number of objects enqueued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned
> +rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> +			 unsigned int n, unsigned int *free_space) {
> +	return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table,
> +			sizeof(uintptr_t), n, free_space);
> +}
> +
> +/**
> + * Dequeue several objects from an RTS  ring (multi-consumers safe).
> + * When the requested objects are more than the available objects,
> + * only dequeue the actual number of objects.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   - n: Actual number of objects dequeued, 0 if ring is empty
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned
> +rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table,
> +			sizeof(uintptr_t), n, available);
> +}
> +
> +/**
> + * Return producer max Head-Tail-Distance (HTD).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   Producer HTD value, if producer is set in appropriate sync mode,
> + *   or UINT32_MAX otherwise.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_ring_get_prod_htd_max(const struct rte_ring *r) {
> +	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
> +		return r->rts_prod.htd_max;
> +	return UINT32_MAX;
> +}
> +
> +/**
> + * Set producer max Head-Tail-Distance (HTD).
> + * Note that producer has to use appropriate sync mode (RTS).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param v
> + *   new HTD value to setup.
> + * @return
> + *   Zero on success, or negative error code otherwise.
> + */
> +__rte_experimental
> +static inline int
> +rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v) {
> +	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
> +		return -ENOTSUP;
> +
> +	r->rts_prod.htd_max = v;
> +	return 0;
> +}
> +
> +/**
> + * Return consumer max Head-Tail-Distance (HTD).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @return
> + *   Consumer HTD value, if consumer is set in appropriate sync mode,
> + *   or UINT32_MAX otherwise.
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_ring_get_cons_htd_max(const struct rte_ring *r) {
> +	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
> +		return r->rts_cons.htd_max;
> +	return UINT32_MAX;
> +}
> +
> +/**
> + * Set consumer max Head-Tail-Distance (HTD).
> + * Note that consumer has to use appropriate sync mode (RTS).
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param v
> + *   new HTD value to setup.
> + * @return
> + *   Zero on success, or negative error code otherwise.
> + */
> +__rte_experimental
> +static inline int
> +rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v) {
> +	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
> +		return -ENOTSUP;
> +
> +	r->rts_cons.htd_max = v;
> +	return 0;
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RING_RTS_H_ */
> diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h
> b/lib/librte_ring/rte_ring_rts_c11_mem.h
> new file mode 100644
> index 000000000..9f26817c0
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
> @@ -0,0 +1,179 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2020 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_RTS_C11_MEM_H_
> +#define _RTE_RING_RTS_C11_MEM_H_
> +
> +/**
> + * @file rte_ring_rts_c11_mem.h
> + * It is not recommended to include this file directly,
> + * include <rte_ring.h> instead.
> + * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
> + * For more information please refer to <rte_ring_rts.h>.
> + */
> +
> +/**
> + * @internal This function updates tail values.
> + */
> +static __rte_always_inline void
> +__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht) {
> +	union rte_ring_rts_poscnt h, ot, nt;
> +
> +	/*
> +	 * If there are other enqueues/dequeues in progress that
> +	 * might preceded us, then don't update tail with new value.
> +	 */
> +
> +	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
> +
> +	do {
> +		/* on 32-bit systems we have to do atomic read here */
> +		h.raw = __atomic_load_n(&ht->head.raw,
> __ATOMIC_RELAXED);
> +
> +		nt.raw = ot.raw;
> +		if (++nt.val.cnt == h.val.cnt)
> +			nt.val.pos = h.val.pos;
> +
> +	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw,
> nt.raw,
> +			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0); }
> +
> +/**
> + * @internal This function waits till head/tail distance wouldn't
> + * exceed pre-defined max value.
> + */
> +static __rte_always_inline void
> +__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
> +	union rte_ring_rts_poscnt *h)
> +{
> +	uint32_t max;
> +
> +	max = ht->htd_max;
> +
> +	while (h->val.pos - ht->tail.val.pos > max) {
> +		rte_pause();
> +		h->raw = __atomic_load_n(&ht->head.raw,
> __ATOMIC_ACQUIRE);
> +	}
> +}
> +
> +/**
> + * @internal This function updates the producer head for enqueue.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *free_entries)
> +{
> +	uint32_t n;
> +	union rte_ring_rts_poscnt nh, oh;
> +
> +	const uint32_t capacity = r->capacity;
> +
> +	oh.raw = __atomic_load_n(&r->rts_prod.head.raw,
> __ATOMIC_ACQUIRE);
> +
> +	do {
> +		/* Reset n to the initial burst count */
> +		n = num;
> +
> +		/*
> +		 * wait for prod head/tail distance,
> +		 * make sure that we read prod head *before*
> +		 * reading cons tail.
> +		 */
> +		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
> +
> +		/*
> +		 *  The subtraction is done between two unsigned 32bits
> value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * *old_head > cons_tail). So 'free_entries' is always between
> 0
> +		 * and capacity (which is < size).
> +		 */
> +		*free_entries = capacity + r->cons.tail - oh.val.pos;
> +
> +		/* check that we have enough room in ring */
> +		if (unlikely(n > *free_entries))
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ?
> +					0 : *free_entries;
> +
> +		if (n == 0)
> +			break;
> +
> +		nh.val.pos = oh.val.pos + n;
> +		nh.val.cnt = oh.val.cnt + 1;
> +
> +	/*
> +	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
> +	 *  - OOO reads of cons tail value
> +	 *  - OOO copy of elems to the ring
> +	 */
> +	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
> +			&oh.raw, nh.raw,
> +			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +
> +	*old_head = oh.val.pos;
> +	return n;
> +}
> +
> +/**
> + * @internal This function updates the consumer head for dequeue  */
> +static __rte_always_inline unsigned int
> +__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
> +	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
> +	uint32_t *entries)
> +{
> +	uint32_t n;
> +	union rte_ring_rts_poscnt nh, oh;
> +
> +	oh.raw = __atomic_load_n(&r->rts_cons.head.raw,
> __ATOMIC_ACQUIRE);
> +
> +	/* move cons.head atomically */
> +	do {
> +		/* Restore n as it may change every loop */
> +		n = num;
> +
> +		/*
> +		 * wait for cons head/tail distance,
> +		 * make sure that we read cons head *before*
> +		 * reading prod tail.
> +		 */
> +		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
> +
> +		/* The subtraction is done between two unsigned 32bits value
> +		 * (the result is always modulo 32 bits even if we have
> +		 * cons_head > prod_tail). So 'entries' is always between 0
> +		 * and size(ring)-1.
> +		 */
> +		*entries = r->prod.tail - oh.val.pos;
> +
> +		/* Set the actual entries for dequeue */
> +		if (n > *entries)
> +			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 :
> *entries;
> +
> +		if (unlikely(n == 0))
> +			break;
> +
> +		nh.val.pos = oh.val.pos + n;
> +		nh.val.cnt = oh.val.cnt + 1;
> +
> +	/*
> +	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
> +	 *  - OOO reads of prod tail value
> +	 *  - OOO copy of elems from the ring
> +	 */
> +	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
> +			&oh.raw, nh.raw,
> +			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +
> +	*old_head = oh.val.pos;
> +	return n;
> +}
> +
> +#endif /* _RTE_RING_RTS_C11_MEM_H_ */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 4/9] test/ring: add contention stress test for RTS ring
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-04-19  2:31             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:31 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: [PATCH v5 4/9] test/ring: add contention stress test for RTS ring
> 
> Introduce new test case to test RTS ring mode under contention.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

<snip>


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 5/9] ring: introduce HTS ring mode
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 5/9] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-04-19  2:31             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:31 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: [PATCH v5 5/9] ring: introduce HTS ring mode
> 
> Introduce head/tail sync mode for MT ring synchronization.
> In that mode enqueue/dequeue operation is fully serialized:
> only one thread at a time is allowed to perform given op.
> Suppose to reduce stall times in case when ring is used on overcommitted
> cpus (multiple active threads on the same cpu).
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

<snip>


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 6/9] test/ring: add contention stress test for HTS ring
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-04-19  2:31             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:31 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>

> Subject: [PATCH v5 6/9] test/ring: add contention stress test for HTS ring
> 
> Introduce new test case to test HTS ring mode under contention.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

<snip>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API Konstantin Ananyev
@ 2020-04-19  2:31             ` Honnappa Nagarahalli
  2020-04-19 18:32               ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:31 UTC (permalink / raw)
  To: Konstantin Ananyev, dev; +Cc: david.marchand, jielong.zjl, nd

<snip>

> Subject: [PATCH v5 7/9] ring: introduce peek style API
> 
> For rings with producer/consumer in RTE_RING_SYNC_ST,
> RTE_RING_SYNC_MT_HTS mode, provide an ability to split enqueue/dequeue
> operation into two phases:
>       - enqueue/dequeue start
>       - enqueue/dequeue finish
> That allows user to inspect objects in the ring without removing them from it
> (aka MT safe peek).
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
One nit inline, otherwise,
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  lib/librte_ring/Makefile           |   1 +
>  lib/librte_ring/meson.build        |   1 +
>  lib/librte_ring/rte_ring_c11_mem.h |  44 +++
>  lib/librte_ring/rte_ring_elem.h    |   4 +
>  lib/librte_ring/rte_ring_generic.h |  48 ++++
>  lib/librte_ring/rte_ring_peek.h    | 442 +++++++++++++++++++++++++++++
>  6 files changed, 540 insertions(+)
>  create mode 100644 lib/librte_ring/rte_ring_peek.h
> 
> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile index
> f75d8e530..52bb2a42d 100644
> --- a/lib/librte_ring/Makefile
> +++ b/lib/librte_ring/Makefile
> @@ -22,6 +22,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include :=
> rte_ring.h \
>  					rte_ring_c11_mem.h \
>  					rte_ring_hts.h \
>  					rte_ring_hts_c11_mem.h \
> +					rte_ring_peek.h \
>  					rte_ring_rts.h \
>  					rte_ring_rts_c11_mem.h
> 
> diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build index
> ca37cb8cc..0c1f2d996 100644
> --- a/lib/librte_ring/meson.build
> +++ b/lib/librte_ring/meson.build
> @@ -9,6 +9,7 @@ headers = files('rte_ring.h',
>  		'rte_ring_generic.h',
>  		'rte_ring_hts.h',
>  		'rte_ring_hts_c11_mem.h',
> +		'rte_ring_peek.h',
>  		'rte_ring_rts.h',
>  		'rte_ring_rts_c11_mem.h')
> 
> diff --git a/lib/librte_ring/rte_ring_c11_mem.h
> b/lib/librte_ring/rte_ring_c11_mem.h
> index 0fb73a337..bb3096721 100644
> --- a/lib/librte_ring/rte_ring_c11_mem.h
> +++ b/lib/librte_ring/rte_ring_c11_mem.h
> @@ -10,6 +10,50 @@
>  #ifndef _RTE_RING_C11_MEM_H_
>  #define _RTE_RING_C11_MEM_H_
> 
> +/**
> + * @internal get current tail value.
> + * This function should be used only for single thread producer/consumer.
> + * Check that user didn't request to move tail above the head.
> + * In that situation:
> + * - return zero, that will cause abort any pending changes and
> + *   return head to its previous position.
> + * - throw an assert in debug mode.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
> +	uint32_t num)
> +{
> +	uint32_t h, n, t;
> +
> +	h = ht->head;
> +	t = ht->tail;
> +	n = h - t;
> +
> +	RTE_ASSERT(n >= num);
> +	num = (n >= num) ? num : 0;
> +
> +	*tail = h;
> +	return num;
> +}
> +
> +/**
> + * @internal set new values for head and tail.
> + * This function should be used only for single thread producer/consumer.
> + * Should be used only in conjunction with __rte_ring_st_get_tail.
> + */
> +static __rte_always_inline void
> +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> +	uint32_t num, uint32_t enqueue)
> +{
> +	uint32_t pos;
> +
> +	RTE_SET_USED(enqueue);
> +
> +	pos = tail + num;
> +	ht->head = pos;
> +	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE); }
> +
>  static __rte_always_inline void
>  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
>  		uint32_t single, uint32_t enqueue)
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index df485fc6b..eeb850ab5 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -1071,6 +1071,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r,
> void *obj_table,
>  	return 0;
>  }
> 
> +#ifdef ALLOW_EXPERIMENTAL_API
> +#include <rte_ring_peek.h>
> +#endif
> +
>  #include <rte_ring.h>
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_ring/rte_ring_generic.h
> b/lib/librte_ring/rte_ring_generic.h
> index 953cdbbd5..9f5fdf13b 100644
> --- a/lib/librte_ring/rte_ring_generic.h
> +++ b/lib/librte_ring/rte_ring_generic.h
Changes in this file are not required as we agreed to implement only C11 for new features.

> @@ -10,6 +10,54 @@
>  #ifndef _RTE_RING_GENERIC_H_
>  #define _RTE_RING_GENERIC_H_
> 
> +/**
> + * @internal get current tail value.
> + * This function should be used only for single thread producer/consumer.
> + * Check that user didn't request to move tail above the head.
> + * In that situation:
> + * - return zero, that will cause abort any pending changes and
> + *   return head to its previous position.
> + * - throw an assert in debug mode.
> + */
> +static __rte_always_inline uint32_t
> +__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
> +	uint32_t num)
> +{
> +	uint32_t h, n, t;
> +
> +	h = ht->head;
> +	t = ht->tail;
> +	n = h - t;
> +
> +	RTE_ASSERT(n >= num);
> +	num = (n >= num) ? num : 0;
> +
> +	*tail = h;
> +	return num;
> +}
> +
> +/**
> + * @internal set new values for head and tail.
> + * This function should be used only for single thread producer/consumer.
> + * Should be used only in conjunction with __rte_ring_st_get_tail.
> + */
> +static __rte_always_inline void
> +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> +	uint32_t num, uint32_t enqueue)
> +{
> +	uint32_t pos;
> +
> +	pos = tail + num;
> +
> +	if (enqueue)
> +		rte_smp_wmb();
> +	else
> +		rte_smp_rmb();
> +
> +	ht->head = pos;
> +	ht->tail = pos;
> +}
> +
>  static __rte_always_inline void
>  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
>  		uint32_t single, uint32_t enqueue)
> diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
> new file mode 100644 index 000000000..2d06888b6
> --- /dev/null
> +++ b/lib/librte_ring/rte_ring_peek.h
> @@ -0,0 +1,442 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + *
> + * Copyright (c) 2010-2020 Intel Corporation
> + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> + * All rights reserved.
> + * Derived from FreeBSD's bufring.h
> + * Used as BSD-3 Licensed with permission from Kip Macy.
> + */
> +
> +#ifndef _RTE_RING_PEEK_H_
> +#define _RTE_RING_PEEK_H_
> +
> +/**
> + * @file
> + * @b EXPERIMENTAL: this API may change without prior notice
> + * It is not recommended to include this file directly.
> + * Please include <rte_ring_elem.h> instead.
> + *
> + * Ring Peek API
> + * Introduction of rte_ring with serialized producer/consumer (HTS sync
> +mode)
> + * makes possible to split public enqueue/dequeue API into two phases:
> + * - enqueue/dequeue start
> + * - enqueue/dequeue finish
> + * That allows user to inspect objects in the ring without removing
> +them
> + * from it (aka MT safe peek).
> + * Note that right now this new API is avaialble only for two sync modes:
> + * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
> + * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
> + * It is a user responsibility to create/init ring with appropriate
> +sync
> + * modes selected.
> + * As an example:
> + * // read 1 elem from the ring:
> + * n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
> + * if (n != 0) {
> + *    //examine object
> + *    if (object_examine(obj) == KEEP)
> + *       //decided to keep it in the ring.
> + *       rte_ring_dequeue_finish(ring, 0);
> + *    else
> + *       //decided to remove it from the ring.
> + *       rte_ring_dequeue_finish(ring, n);
> + * }
> + * Note that between _start_ and _finish_ none other thread can proceed
> + * with enqueue(/dequeue) operation till _finish_ completes.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @internal This function moves prod head value.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_enqueue_start(struct rte_ring *r, uint32_t n,
> +		enum rte_ring_queue_behavior behavior, uint32_t
> *free_space) {
> +	uint32_t free, head, next;
> +
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_ST:
> +		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
> +			behavior, &head, &next, &free);
> +		break;
> +	case RTE_RING_SYNC_MT_HTS:
> +		n =  __rte_ring_hts_move_prod_head(r, n, behavior,
> +			&head, &free);
> +		break;
> +	default:
> +		/* unsupported mode, shouldn't be here */
> +		RTE_ASSERT(0);
> +		n = 0;
> +	}
> +
> +	if (free_space != NULL)
> +		*free_space = free - n;
> +	return n;
> +}
> +
> +/**
> + * Start to enqueue several objects on the ring.
> + * Note that no actual objects are put in the queue by this function,
> + * it just reserves for user such ability.
> + * User has to call appropriate enqueue_elem_finish() to copy objects
> +into the
> + * queue and complete given enqueue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects that can be enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_enqueue_bulk_elem_start(struct rte_ring *r, unsigned int n,
> +		unsigned int *free_space)
> +{
> +	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
> +			free_space);
> +}
> +
> +/**
> + * Start to enqueue several objects on the ring.
> + * Note that no actual objects are put in the queue by this function,
> + * it just reserves for user such ability.
> + * User has to call appropriate enqueue_finish() to copy objects into
> +the
> + * queue and complete given enqueue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   The number of objects that can be enqueued, either 0 or n
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
> +		unsigned int *free_space)
> +{
> +	return rte_ring_enqueue_bulk_elem_start(r, n, free_space); }
> +
> +/**
> + * Start to enqueue several objects on the ring.
> + * Note that no actual objects are put in the queue by this function,
> + * it just reserves for user such ability.
> + * User has to call appropriate enqueue_elem_finish() to copy objects
> +into the
> + * queue and complete given enqueue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   Actual number of objects that can be enqueued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_enqueue_burst_elem_start(struct rte_ring *r, unsigned int n,
> +		unsigned int *free_space)
> +{
> +	return __rte_ring_do_enqueue_start(r, n,
> RTE_RING_QUEUE_VARIABLE,
> +			free_space);
> +}
> +
> +/**
> + * Start to enqueue several objects on the ring.
> + * Note that no actual objects are put in the queue by this function,
> + * it just reserves for user such ability.
> + * User has to call appropriate enqueue_finish() to copy objects into
> +the
> + * queue and complete given enqueue operation.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param n
> + *   The number of objects to add in the ring from the obj_table.
> + * @param free_space
> + *   if non-NULL, returns the amount of space in the ring after the
> + *   enqueue operation has finished.
> + * @return
> + *   Actual number of objects that can be enqueued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_enqueue_burst_start(struct rte_ring *r, unsigned int n,
> +		unsigned int *free_space)
> +{
> +	return rte_ring_enqueue_burst_elem_start(r, n, free_space); }
> +
> +/**
> + * Complete to enqueue several objects on the ring.
> + * Note that number of objects to enqueue should not exceed previous
> + * enqueue_start return value.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to add to the ring from the obj_table.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_ring_enqueue_elem_finish(struct rte_ring *r, const void *obj_table,
> +		unsigned int esize, unsigned int n)
> +{
> +	uint32_t tail;
> +
> +	switch (r->prod.sync_type) {
> +	case RTE_RING_SYNC_ST:
> +		n = __rte_ring_st_get_tail(&r->prod, &tail, n);
> +		if (n != 0)
> +			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
> +		__rte_ring_st_set_head_tail(&r->prod, tail, n, 1);
> +		break;
> +	case RTE_RING_SYNC_MT_HTS:
> +		n = __rte_ring_hts_get_tail(&r->hts_prod, &tail, n);
> +		if (n != 0)
> +			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
> +		__rte_ring_hts_set_head_tail(&r->hts_prod, tail, n, 1);
> +		break;
> +	default:
> +		/* unsupported mode, shouldn't be here */
> +		RTE_ASSERT(0);
> +	}
> +}
> +
> +/**
> + * Complete to enqueue several objects on the ring.
> + * Note that number of objects to enqueue should not exceed previous
> + * enqueue_start return value.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects.
> + * @param n
> + *   The number of objects to add to the ring from the obj_table.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_ring_enqueue_finish(struct rte_ring *r, void * const *obj_table,
> +		unsigned int n)
> +{
> +	rte_ring_enqueue_elem_finish(r, obj_table, sizeof(uintptr_t), n); }
> +
> +/**
> + * @internal This function moves cons head value and copies up to *n*
> + * objects from the ring to the user provided obj_table.
> + */
> +static __rte_always_inline unsigned int
> +__rte_ring_do_dequeue_start(struct rte_ring *r, void *obj_table,
> +	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
> +	uint32_t *available)
> +{
> +	uint32_t avail, head, next;
> +
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_ST:
> +		n = __rte_ring_move_cons_head(r, RTE_RING_SYNC_ST, n,
> +			behavior, &head, &next, &avail);
> +		break;
> +	case RTE_RING_SYNC_MT_HTS:
> +		n =  __rte_ring_hts_move_cons_head(r, n, behavior,
> +			&head, &avail);
> +		break;
> +	default:
> +		/* unsupported mode, shouldn't be here */
> +		RTE_ASSERT(0);
> +		n = 0;
> +	}
> +
> +	if (n != 0)
> +		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
> +
> +	if (available != NULL)
> +		*available = avail - n;
> +	return n;
> +}
> +
> +/**
> + * Start to dequeue several objects from the ring.
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation and actually remove objects the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The number of objects dequeued, either 0 or n.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_dequeue_bulk_elem_start(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_FIXED, available);
> +}
> +
> +/**
> + * Start to dequeue several objects from the ring.
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation and actually remove objects the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   Actual number of objects dequeued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	return rte_ring_dequeue_bulk_elem_start(r, obj_table,
> sizeof(uintptr_t),
> +		n, available);
> +}
> +
> +/**
> + * Start to dequeue several objects from the ring.
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation and actually remove objects the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of objects that will be filled.
> + * @param esize
> + *   The size of ring element, in bytes. It must be a multiple of 4.
> + *   This must be the same value used while creating the ring. Otherwise
> + *   the results are undefined.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The actual number of objects dequeued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void *obj_table,
> +		unsigned int esize, unsigned int n, unsigned int *available) {
> +	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
> +			RTE_RING_QUEUE_VARIABLE, available); }
> +
> +/**
> + * Start to dequeue several objects from the ring.
> + * Note that user has to call appropriate dequeue_finish()
> + * to complete given dequeue operation and actually remove objects the ring.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects) that will be filled.
> + * @param n
> + *   The number of objects to dequeue from the ring to the obj_table.
> + * @param available
> + *   If non-NULL, returns the number of remaining ring entries after the
> + *   dequeue has finished.
> + * @return
> + *   The actual number of objects dequeued.
> + */
> +__rte_experimental
> +static __rte_always_inline unsigned int
> +rte_ring_dequeue_burst_start(struct rte_ring *r, void **obj_table,
> +		unsigned int n, unsigned int *available) {
> +	return rte_ring_dequeue_burst_elem_start(r, obj_table,
> +		sizeof(uintptr_t), n, available);
> +}
> +
> +/**
> + * Complete to dequeue several objects from the ring.
> + * Note that number of objects to dequeue should not exceed previous
> + * dequeue_start return value.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param n
> + *   The number of objects to remove from the ring.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_ring_dequeue_elem_finish(struct rte_ring *r, unsigned int n) {
> +	uint32_t tail;
> +
> +	switch (r->cons.sync_type) {
> +	case RTE_RING_SYNC_ST:
> +		n = __rte_ring_st_get_tail(&r->cons, &tail, n);
> +		__rte_ring_st_set_head_tail(&r->cons, tail, n, 0);
> +		break;
> +	case RTE_RING_SYNC_MT_HTS:
> +		n = __rte_ring_hts_get_tail(&r->hts_cons, &tail, n);
> +		__rte_ring_hts_set_head_tail(&r->hts_cons, tail, n, 0);
> +		break;
> +	default:
> +		/* unsupported mode, shouldn't be here */
> +		RTE_ASSERT(0);
> +	}
> +}
> +
> +/**
> + * Complete to dequeue several objects from the ring.
> + * Note that number of objects to dequeue should not exceed previous
> + * dequeue_start return value.
> + *
> + * @param r
> + *   A pointer to the ring structure.
> + * @param n
> + *   The number of objects to remove from the ring.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n) {
> +	rte_ring_dequeue_elem_finish(r, n);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RING_PEEK_H_ */
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 8/9] test/ring: add stress test for MT peek API
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
@ 2020-04-19  2:32             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:32 UTC (permalink / raw)
  To: Konstantin Ananyev, dev; +Cc: david.marchand, jielong.zjl, nd

<snip>

> Subject: [PATCH v5 8/9] test/ring: add stress test for MT peek API
> 
> Introduce new test case to test MT peek API.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

<snip>

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 9/9] test/ring: add functional tests for new sync modes
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 9/9] test/ring: add functional tests for new sync modes Konstantin Ananyev
@ 2020-04-19  2:32             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:32 UTC (permalink / raw)
  To: Konstantin Ananyev, dev; +Cc: david.marchand, jielong.zjl, nd

<snip>

> Subject: [PATCH v5 9/9] test/ring: add functional tests for new sync modes
> 
> Extend test_ring_autotest with new test-cases for RTS/HTS sync modes.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

<snip>


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/9] New sync modes for ring
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
                             ` (8 preceding siblings ...)
  2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 9/9] test/ring: add functional tests for new sync modes Konstantin Ananyev
@ 2020-04-19  2:32           ` Honnappa Nagarahalli
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
  10 siblings, 0 replies; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19  2:32 UTC (permalink / raw)
  To: Konstantin Ananyev, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

Hi Konstantin,
	Changes look good overall, I have integrated RCU defer APIs patch as well. Please consider adding the following (in another patch?)

1) Release notes
2) Updates to programmer guide for RTS and HTS modes

Thank you,
Honnappa

> -----Original Message-----
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Sent: Saturday, April 18, 2020 11:32 AM
> To: dev@dpdk.org
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> david.marchand@redhat.com; jielong.zjl@antfin.com; Konstantin Ananyev
> <konstantin.ananyev@intel.com>
> Subject: [PATCH v5 0/9] New sync modes for ring
> 
> V4 - V5:
> 1. fix i686 clang build problem
> 2. fix formal API comments
> 
> V3 - V4 changes:
> Address comments from Honnappa:
> 1. for new sync modes make legacy API wrappers around _elem_ calls 2.
> remove rte_ring_(hts|rts)_generic.h 3. few changes in C11 version 4. peek API
> - add missing functions for _elem_ 5. remove _IS_SP/_IS_MP, etc. internal
> macros 6. fix param types (obj_table) for _elem_functions 7. fix formal API
> comments 8. deduplicate code for test_ring_stress 9. added functional tests
> for new sync modes
> 
> V2 - V3 changes:
> 1. few more compilation fixes (for gcc 4.8.X) 2. extra update
> devtools/libabigail.abignore (workaround)
> 
> V1 - V2 changes:
> 1. fix compilation issues
> 2. add C11 atomics support
> 3. updates devtools/libabigail.abignore (workaround)
> 
> RFC - V1 changes:
> 1. remove ABI brekage (at least I hope I did) 2. Add support for ring_elem 3.
> rework peek related API a bit 4. rework test to make it less verbose and unite
> all test-cases
>    in one command
> 5. add new test-case for MT peek API
> 
> TODO list:
> 1. Update docs
> 
> These days more and more customers use(/try to use) DPDK based apps
> within overcommitted systems (multiple acttive threads over same pysical
> cores):
> VM, container deployments, etc.
> One quite common problem they hit:
> Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
> LHP is quite a common problem for spin-based sync primitives (spin-locks, etc.)
> on overcommitted systems.
> The situation gets much worse when some sort of fair-locking technique is
> used (ticket-lock, etc.).
> As now not only lock-owner but also lock-waiters scheduling order matters a
> lot (LWP).
> These two problems are well-known for kernel within VMs:
> http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> The problem with rte_ring is that while head accusion is sort of un-fair locking,
> waiting on tail is very similar to ticket lock schema - tail has to be updated in
> particular order.
> That makes current rte_ring implementation to perform really pure on some
> overcommited scenarios.
> It is probably not possible to completely resolve LHP problem in userspace
> only (without some kernel communication/intervention).
> But removing fairness at tail update helps to avoid LWP and can mitigate the
> situation significantly.
> This patch proposes two new optional ring synchronization modes:
> 1) Head/Tail Sync (HTS) mode
> In that mode enqueue/dequeue operation is fully serialized:
>     only one thread at a time is allowed to perform given op.
>     As another enhancement provide ability to split enqueue/dequeue
>     operation into two phases:
>       - enqueue/dequeue start
>       - enqueue/dequeue finish
>     That allows user to inspect objects in the ring without removing
>     them from it (aka MT safe peek).
> 2) Relaxed Tail Sync (RTS)
> The main difference from original MP/MC algorithm is that tail value is
> increased not by every thread that finished enqueue/dequeue, but only by the
> last one.
> That allows threads to avoid spinning on ring tail value, leaving actual tail
> value change to the last thread in the update queue.
> 
> Note that these new sync modes are optional.
> For current rte_ring users nothing should change (both in terms of API/ABI
> and performance).
> Existing sync modes MP/MC,SP/SC kept untouched, set up in the same way
> (via flags and _init_), and MP/MC remains as default one.
> The only thing that changed:
> Format of prod/cons now could differ depending on mode selected at _init_.
> So user has to stick with one sync model through whole ring lifetime.
> In other words, user can't create a ring for let say SP mode and then in the
> middle of data-path change his mind and start using MP_RTS mode.
> For existing modes (SP/MP, SC/MC) format remains the same and user can
> still use them interchangeably, though of course it is an error prone practice.
> 
> Test results on IA (see below) show significant improvements for average
> enqueue/dequeue op times on overcommitted systems.
> For 'classic' DPDK deployments (one thread per core) original MP/MC
> algorithm still shows best numbers, though for 64-bit target RTS numbers are
> not that far away.
> Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
> echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'
> 
> X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> DEQ+ENQ average cycles/obj
>                                                 MP/MC      HTS     RTS
> 1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
> 2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
> 4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
> 8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
> 16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
> 32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51
> 
> 2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
> 4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
> 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72
> 318.79 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59
> 1144.02 1175.14 32thread@2core(--lcores='6,(10-25)@7,(30-45)@8'
> 4264238.80 4627.48 4892.68
> 
> 8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
> 16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
> 32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36
> 2714.12
> 
> i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> DEQ+ENQ average cycles/obj
>                                                 MP/MC      HTS     RTS
> 1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
> 2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
> 8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
> 32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91
> 
> 2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
> 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61
> 361.57 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86
> 1314.90 1416.65
> 
> 8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
> 32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44
> 3028.87
> 
> Konstantin Ananyev (9):
>   test/ring: add contention stress test
>   ring: prepare ring to allow new sync schemes
>   ring: introduce RTS ring mode
>   test/ring: add contention stress test for RTS ring
>   ring: introduce HTS ring mode
>   test/ring: add contention stress test for HTS ring
>   ring: introduce peek style API
>   test/ring: add stress test for MT peek API
>   test/ring: add functional tests for new sync modes
> 
>  app/test/Makefile                      |   5 +
>  app/test/meson.build                   |   5 +
>  app/test/test_pdump.c                  |   6 +-
>  app/test/test_ring.c                   |  93 ++++--
>  app/test/test_ring_hts_stress.c        |  32 ++
>  app/test/test_ring_mpmc_stress.c       |  31 ++
>  app/test/test_ring_peek_stress.c       |  43 +++
>  app/test/test_ring_rts_stress.c        |  32 ++
>  app/test/test_ring_stress.c            |  57 ++++
>  app/test/test_ring_stress.h            |  38 +++
>  app/test/test_ring_stress_impl.h       | 396 ++++++++++++++++++++++
>  devtools/libabigail.abignore           |   7 +
>  lib/librte_pdump/rte_pdump.c           |   2 +-
>  lib/librte_port/rte_port_ring.c        |  12 +-
>  lib/librte_ring/Makefile               |   8 +-
>  lib/librte_ring/meson.build            |  11 +-
>  lib/librte_ring/rte_ring.c             | 114 ++++++-
>  lib/librte_ring/rte_ring.h             | 243 ++++++++------
>  lib/librte_ring/rte_ring_c11_mem.h     |  44 +++
>  lib/librte_ring/rte_ring_core.h        | 184 ++++++++++
>  lib/librte_ring/rte_ring_elem.h        | 141 ++++++--
>  lib/librte_ring/rte_ring_generic.h     |  48 +++
>  lib/librte_ring/rte_ring_hts.h         | 332 +++++++++++++++++++
>  lib/librte_ring/rte_ring_hts_c11_mem.h | 207 ++++++++++++
>  lib/librte_ring/rte_ring_peek.h        | 442 +++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_rts.h         | 439 ++++++++++++++++++++++++
>  lib/librte_ring/rte_ring_rts_c11_mem.h | 179 ++++++++++
>  27 files changed, 2977 insertions(+), 174 deletions(-)  create mode 100644
> app/test/test_ring_hts_stress.c  create mode 100644
> app/test/test_ring_mpmc_stress.c  create mode 100644
> app/test/test_ring_peek_stress.c  create mode 100644
> app/test/test_ring_rts_stress.c  create mode 100644
> app/test/test_ring_stress.c  create mode 100644 app/test/test_ring_stress.h
> create mode 100644 app/test/test_ring_stress_impl.h  create mode 100644
> lib/librte_ring/rte_ring_core.h  create mode 100644
> lib/librte_ring/rte_ring_hts.h  create mode 100644
> lib/librte_ring/rte_ring_hts_c11_mem.h
>  create mode 100644 lib/librte_ring/rte_ring_peek.h  create mode 100644
> lib/librte_ring/rte_ring_rts.h  create mode 100644
> lib/librte_ring/rte_ring_rts_c11_mem.h
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test
  2020-04-19  2:30             ` Honnappa Nagarahalli
@ 2020-04-19  8:03               ` David Marchand
  2020-04-19 11:47                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: David Marchand @ 2020-04-19  8:03 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Konstantin Ananyev, dev, jielong.zjl, nd,
	Jerin Jacob Kollanukkaran, Pavan Nikhilesh

On Sun, Apr 19, 2020 at 4:31 AM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
> > Introduce stress test for ring enqueue/dequeue operations.
> > Performs the following pattern on each slave worker:
> > dequeue/read-write data from the dequeued objects/enqueue.
> > Serves as both functional and performance test of ring enqueue/dequeue
> > operations under high contention (for both over committed and non-over
> > committed scenarios).
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ci/intel-compilation fails for meson due to clang+32b. I believe it is solved by [1] (as you indicated). Can you make this patch dependent on [1]?
> Otherwise,
> Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>
> [1] http://patches.dpdk.org/patch/68280/

I will take this patch as part of the first series that makes it into
master (between ring or traces series).


-- 
David Marchand


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test
  2020-04-19  8:03               ` David Marchand
@ 2020-04-19 11:47                 ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-19 11:47 UTC (permalink / raw)
  To: David Marchand, Honnappa Nagarahalli
  Cc: dev, jielong.zjl, nd, Jerin Jacob Kollanukkaran, Pavan Nikhilesh

> 
> On Sun, Apr 19, 2020 at 4:31 AM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> > > Introduce stress test for ring enqueue/dequeue operations.
> > > Performs the following pattern on each slave worker:
> > > dequeue/read-write data from the dequeued objects/enqueue.
> > > Serves as both functional and performance test of ring enqueue/dequeue
> > > operations under high contention (for both over committed and non-over
> > > committed scenarios).
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ci/intel-compilation fails for meson due to clang+32b. I believe it is solved by [1] (as you indicated). Can you make this patch dependent on
> [1]?
> > Otherwise,
> > Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >
> > [1] http://patches.dpdk.org/patch/68280/
> 
> I will take this patch as part of the first series that makes it into
> master (between ring or traces series).

Thanks David.
BTW, do we have a special keyword for that (Depends-on: ... or so)?


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API
  2020-04-19  2:31             ` Honnappa Nagarahalli
@ 2020-04-19 18:32               ` Ananyev, Konstantin
  2020-04-19 19:12                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-19 18:32 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd


> > diff --git a/lib/librte_ring/rte_ring_generic.h
> > b/lib/librte_ring/rte_ring_generic.h
> > index 953cdbbd5..9f5fdf13b 100644
> > --- a/lib/librte_ring/rte_ring_generic.h
> > +++ b/lib/librte_ring/rte_ring_generic.h
> Changes in this file are not required as we agreed to implement only C11 for new features.

Right, will remove.

> 
> > @@ -10,6 +10,54 @@
> >  #ifndef _RTE_RING_GENERIC_H_
> >  #define _RTE_RING_GENERIC_H_
> >
> > +/**
> > + * @internal get current tail value.
> > + * This function should be used only for single thread producer/consumer.
> > + * Check that user didn't request to move tail above the head.
> > + * In that situation:
> > + * - return zero, that will cause abort any pending changes and
> > + *   return head to its previous position.
> > + * - throw an assert in debug mode.
> > + */
> > +static __rte_always_inline uint32_t
> > +__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
> > +	uint32_t num)
> > +{
> > +	uint32_t h, n, t;
> > +
> > +	h = ht->head;
> > +	t = ht->tail;
> > +	n = h - t;
> > +
> > +	RTE_ASSERT(n >= num);
> > +	num = (n >= num) ? num : 0;
> > +
> > +	*tail = h;
> > +	return num;
> > +}
> > +
> > +/**
> > + * @internal set new values for head and tail.
> > + * This function should be used only for single thread producer/consumer.
> > + * Should be used only in conjunction with __rte_ring_st_get_tail.
> > + */
> > +static __rte_always_inline void
> > +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> > +	uint32_t num, uint32_t enqueue)
> > +{
> > +	uint32_t pos;
> > +
> > +	pos = tail + num;
> > +
> > +	if (enqueue)
> > +		rte_smp_wmb();
> > +	else
> > +		rte_smp_rmb();
> > +
> > +	ht->head = pos;
> > +	ht->tail = pos;
> > +}
> > +
> >  static __rte_always_inline void
> >  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
> >  		uint32_t single, uint32_t enqueue)
> > diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
> > new file mode 100644 index 000000000..2d06888b6
> > --- /dev/null

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API
  2020-04-19 18:32               ` Ananyev, Konstantin
@ 2020-04-19 19:12                 ` Ananyev, Konstantin
  2020-04-19 21:14                   ` Honnappa Nagarahalli
  0 siblings, 1 reply; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-19 19:12 UTC (permalink / raw)
  To: Ananyev, Konstantin, Honnappa Nagarahalli, dev
  Cc: david.marchand, jielong.zjl, nd


> 
> > > diff --git a/lib/librte_ring/rte_ring_generic.h
> > > b/lib/librte_ring/rte_ring_generic.h
> > > index 953cdbbd5..9f5fdf13b 100644
> > > --- a/lib/librte_ring/rte_ring_generic.h
> > > +++ b/lib/librte_ring/rte_ring_generic.h
> > Changes in this file are not required as we agreed to implement only C11 for new features.
> 
> Right, will remove.

Actually no, spoke too early before thinking properly
We do need these functions in rte_ring_generic.h for SP/SC _start_/_finish_.
Konstantin

> 
> >
> > > @@ -10,6 +10,54 @@
> > >  #ifndef _RTE_RING_GENERIC_H_
> > >  #define _RTE_RING_GENERIC_H_
> > >
> > > +/**
> > > + * @internal get current tail value.
> > > + * This function should be used only for single thread producer/consumer.
> > > + * Check that user didn't request to move tail above the head.
> > > + * In that situation:
> > > + * - return zero, that will cause abort any pending changes and
> > > + *   return head to its previous position.
> > > + * - throw an assert in debug mode.
> > > + */
> > > +static __rte_always_inline uint32_t
> > > +__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
> > > +	uint32_t num)
> > > +{
> > > +	uint32_t h, n, t;
> > > +
> > > +	h = ht->head;
> > > +	t = ht->tail;
> > > +	n = h - t;
> > > +
> > > +	RTE_ASSERT(n >= num);
> > > +	num = (n >= num) ? num : 0;
> > > +
> > > +	*tail = h;
> > > +	return num;
> > > +}
> > > +
> > > +/**
> > > + * @internal set new values for head and tail.
> > > + * This function should be used only for single thread producer/consumer.
> > > + * Should be used only in conjunction with __rte_ring_st_get_tail.
> > > + */
> > > +static __rte_always_inline void
> > > +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> > > +	uint32_t num, uint32_t enqueue)
> > > +{
> > > +	uint32_t pos;
> > > +
> > > +	pos = tail + num;
> > > +
> > > +	if (enqueue)
> > > +		rte_smp_wmb();
> > > +	else
> > > +		rte_smp_rmb();
> > > +
> > > +	ht->head = pos;
> > > +	ht->tail = pos;
> > > +}
> > > +
> > >  static __rte_always_inline void
> > >  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
> > >  		uint32_t single, uint32_t enqueue)
> > > diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
> > > new file mode 100644 index 000000000..2d06888b6
> > > --- /dev/null

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API
  2020-04-19 19:12                 ` Ananyev, Konstantin
@ 2020-04-19 21:14                   ` Honnappa Nagarahalli
  2020-04-19 22:41                     ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: Honnappa Nagarahalli @ 2020-04-19 21:14 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: david.marchand, jielong.zjl, nd, Honnappa Nagarahalli, nd

<snip>
> 
> 
> >
> > > > diff --git a/lib/librte_ring/rte_ring_generic.h
> > > > b/lib/librte_ring/rte_ring_generic.h
> > > > index 953cdbbd5..9f5fdf13b 100644
> > > > --- a/lib/librte_ring/rte_ring_generic.h
> > > > +++ b/lib/librte_ring/rte_ring_generic.h
> > > Changes in this file are not required as we agreed to implement only C11
> for new features.
> >
> > Right, will remove.
> 
> Actually no, spoke too early before thinking properly We do need these
> functions in rte_ring_generic.h for SP/SC _start_/_finish_.
> Konstantin
The peek APIs are new functionality. So the peek APIs in legacy format should be wrappers around _elem_ APIs. That is what I see in the code as well:
rte_ring_peek.h has this:
static __rte_always_inline void
rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
{
        rte_ring_dequeue_elem_finish(r, n);
}

I think, I gave you incomplete feedback earlier.
Actually, __rte_ring_st_get_tail and __rte_ring_st_set_head_tail should be in a new file named rte_ring_peek_c11_mem.h. This file should be included in rte_ring_peek.h (same way you have done for RTS and HTS). Then remove both these functions from rte_ring_generic.h and rte_ring_c11_mem.h.

> 
> >
> > >
> > > > @@ -10,6 +10,54 @@
> > > >  #ifndef _RTE_RING_GENERIC_H_
> > > >  #define _RTE_RING_GENERIC_H_
> > > >
> > > > +/**
> > > > + * @internal get current tail value.
> > > > + * This function should be used only for single thread
> producer/consumer.
> > > > + * Check that user didn't request to move tail above the head.
> > > > + * In that situation:
> > > > + * - return zero, that will cause abort any pending changes and
> > > > + *   return head to its previous position.
> > > > + * - throw an assert in debug mode.
> > > > + */
> > > > +static __rte_always_inline uint32_t __rte_ring_st_get_tail(struct
> > > > +rte_ring_headtail *ht, uint32_t *tail,
> > > > +	uint32_t num)
> > > > +{
> > > > +	uint32_t h, n, t;
> > > > +
> > > > +	h = ht->head;
> > > > +	t = ht->tail;
> > > > +	n = h - t;
> > > > +
> > > > +	RTE_ASSERT(n >= num);
> > > > +	num = (n >= num) ? num : 0;
> > > > +
> > > > +	*tail = h;
> > > > +	return num;
> > > > +}
> > > > +
> > > > +/**
> > > > + * @internal set new values for head and tail.
> > > > + * This function should be used only for single thread
> producer/consumer.
> > > > + * Should be used only in conjunction with __rte_ring_st_get_tail.
> > > > + */
> > > > +static __rte_always_inline void
> > > > +__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
> > > > +	uint32_t num, uint32_t enqueue)
> > > > +{
> > > > +	uint32_t pos;
> > > > +
> > > > +	pos = tail + num;
> > > > +
> > > > +	if (enqueue)
> > > > +		rte_smp_wmb();
> > > > +	else
> > > > +		rte_smp_rmb();
> > > > +
> > > > +	ht->head = pos;
> > > > +	ht->tail = pos;
> > > > +}
> > > > +
> > > >  static __rte_always_inline void
> > > >  update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t
> new_val,
> > > >  		uint32_t single, uint32_t enqueue) diff --git
> > > > a/lib/librte_ring/rte_ring_peek.h
> > > > b/lib/librte_ring/rte_ring_peek.h new file mode 100644 index
> > > > 000000000..2d06888b6
> > > > --- /dev/null

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API
  2020-04-19 21:14                   ` Honnappa Nagarahalli
@ 2020-04-19 22:41                     ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-19 22:41 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev; +Cc: david.marchand, jielong.zjl, nd, nd

> <snip>
> >
> >
> > >
> > > > > diff --git a/lib/librte_ring/rte_ring_generic.h
> > > > > b/lib/librte_ring/rte_ring_generic.h
> > > > > index 953cdbbd5..9f5fdf13b 100644
> > > > > --- a/lib/librte_ring/rte_ring_generic.h
> > > > > +++ b/lib/librte_ring/rte_ring_generic.h
> > > > Changes in this file are not required as we agreed to implement only C11
> > for new features.
> > >
> > > Right, will remove.
> >
> > Actually no, spoke too early before thinking properly We do need these
> > functions in rte_ring_generic.h for SP/SC _start_/_finish_.
> > Konstantin
> The peek APIs are new functionality. So the peek APIs in legacy format should be wrappers around _elem_ APIs. That is what I see in the
> code as well:
> rte_ring_peek.h has this:
> static __rte_always_inline void
> rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
> {
>         rte_ring_dequeue_elem_finish(r, n);
> }
> 
> I think, I gave you incomplete feedback earlier.
> Actually, __rte_ring_st_get_tail and __rte_ring_st_set_head_tail should be in a new file named rte_ring_peek_c11_mem.h. This file should
> be included in rte_ring_peek.h (same way you have done for RTS and HTS). Then remove both these functions from rte_ring_generic.h and
> rte_ring_c11_mem.h.

Good idea, yes it should work.


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 00/10] New sync modes for ring
  2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
                             ` (9 preceding siblings ...)
  2020-04-19  2:32           ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Honnappa Nagarahalli
@ 2020-04-20 12:11           ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 01/10] test/ring: add contention stress test Konstantin Ananyev
                               ` (10 more replies)
  10 siblings, 11 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

This patch series depends on following patch:
"meson: add libatomic as a global dependency for i686 clang"
(http://patches.dpdk.org/patch/68876/)

V5 - V6:
1. add dependency on the external patch (-latomic for i686 clang)
2. remove unneeded code from rte_ring_generic (Honnappa)
3. extra comments for ring init/create API (Honnappa)
4. __rte prefix for internal structs  (Honnappa)
5. update docs (rel notes and prog guide)

V4 - V5:
1. fix i686 clang build problem
2. fix formal API comments

V3 - V4 changes:
Address comments from Honnappa:
1. for new sync modes make legacy API wrappers around _elem_ calls
2. remove rte_ring_(hts|rts)_generic.h
3. few changes in C11 version
4. peek API - add missing functions for _elem_
5. remove _IS_SP/_IS_MP, etc. internal macros
6. fix param types (obj_table) for _elem_functions
7. fix formal API comments
8. deduplicate code for test_ring_stress
9. added functional tests for new sync modes

V2 - V3 changes:
1. few more compilation fixes (for gcc 4.8.X)
2. extra update devtools/libabigail.abignore (workaround) 

V1 - V2 changes:
1. fix compilation issues
2. add C11 atomics support
3. updates devtools/libabigail.abignore (workaround)

RFC - V1 changes:
1. remove ABI brekage (at least I hope I did)
2. Add support for ring_elem
3. rework peek related API a bit
4. rework test to make it less verbose and unite all test-cases
   in one command
5. add new test-case for MT peek API

These days more and more customers use(/try to use) DPDK based apps within
overcommitted systems (multiple acttive threads over same pysical cores):
VM, container deployments, etc.
One quite common problem they hit:
Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
LHP is quite a common problem for spin-based sync primitives
(spin-locks, etc.) on overcommitted systems.
The situation gets much worse when some sort of
fair-locking technique is used (ticket-lock, etc.).
As now not only lock-owner but also lock-waiters scheduling
order matters a lot (LWP).
These two problems are well-known for kernel within VMs:
http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
The problem with rte_ring is that while head accusion is sort of
un-fair locking, waiting on tail is very similar to ticket lock schema -
tail has to be updated in particular order.
That makes current rte_ring implementation to perform
really pure on some overcommited scenarios.
It is probably not possible to completely resolve LHP problem in
userspace only (without some kernel communication/intervention).
But removing fairness at tail update helps to avoid LWP and
can mitigate the situation significantly.
This patch proposes two new optional ring synchronization modes:
1) Head/Tail Sync (HTS) mode
In that mode enqueue/dequeue operation is fully serialized:
    only one thread at a time is allowed to perform given op.
    As another enhancement provide ability to split enqueue/dequeue
    operation into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
    That allows user to inspect objects in the ring without removing
    them from it (aka MT safe peek).
2) Relaxed Tail Sync (RTS)
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Note that these new sync modes are optional.
For current rte_ring users nothing should change
(both in terms of API/ABI and performance).
Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
way (via flags and _init_), and MP/MC remains as default one.
The only thing that changed:
Format of prod/cons now could differ depending on mode selected at _init_.
So user has to stick with one sync model through whole ring lifetime.
In other words, user can't create a ring for let say SP mode and then
in the middle of data-path change his mind and start using MP_RTS mode.
For existing modes (SP/MP, SC/MC) format remains the same and
user can still use them interchangeably, though of course it is an
error prone practice.

Test results on IA (see below) show significant improvements
for average enqueue/dequeue op times on overcommitted systems.
For 'classic' DPDK deployments (one thread per core) original MP/MC
algorithm still shows best numbers, though for 64-bit target
RTS numbers are not that far away.
Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'

X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj
                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51

2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68

8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12

i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj
                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91

2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65

8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

Konstantin Ananyev (10):
  test/ring: add contention stress test
  ring: prepare ring to allow new sync schemes
  ring: introduce RTS ring mode
  test/ring: add contention stress test for RTS ring
  ring: introduce HTS ring mode
  test/ring: add contention stress test for HTS ring
  ring: introduce peek style API
  test/ring: add stress test for MT peek API
  test/ring: add functional tests for new sync modes
  doc: update ring guide

 app/test/Makefile                       |   5 +
 app/test/meson.build                    |   5 +
 app/test/test_pdump.c                   |   6 +-
 app/test/test_ring.c                    |  93 +++--
 app/test/test_ring_hts_stress.c         |  32 ++
 app/test/test_ring_mpmc_stress.c        |  31 ++
 app/test/test_ring_peek_stress.c        |  43 +++
 app/test/test_ring_rts_stress.c         |  32 ++
 app/test/test_ring_stress.c             |  57 +++
 app/test/test_ring_stress.h             |  38 ++
 app/test/test_ring_stress_impl.h        | 396 +++++++++++++++++++++
 devtools/libabigail.abignore            |   7 +
 doc/guides/prog_guide/ring_lib.rst      |  95 +++++
 doc/guides/rel_notes/release_20_05.rst  |  16 +
 lib/librte_pdump/rte_pdump.c            |   2 +-
 lib/librte_port/rte_port_ring.c         |  12 +-
 lib/librte_ring/Makefile                |   9 +-
 lib/librte_ring/meson.build             |  12 +-
 lib/librte_ring/rte_ring.c              | 114 +++++-
 lib/librte_ring/rte_ring.h              | 306 ++++++++++------
 lib/librte_ring/rte_ring_core.h         | 184 ++++++++++
 lib/librte_ring/rte_ring_elem.h         | 171 +++++++--
 lib/librte_ring/rte_ring_hts.h          | 332 ++++++++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h  | 164 +++++++++
 lib/librte_ring/rte_ring_peek.h         | 444 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_peek_c11_mem.h | 110 ++++++
 lib/librte_ring/rte_ring_rts.h          | 439 +++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h  | 179 ++++++++++
 28 files changed, 3142 insertions(+), 192 deletions(-)
 create mode 100644 app/test/test_ring_hts_stress.c
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_peek_stress.c
 create mode 100644 app/test/test_ring_rts_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h
 create mode 100644 lib/librte_ring/rte_ring_core.h
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_peek_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 01/10] test/ring: add contention stress test
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 02/10] ring: prepare ring to allow new sync schemes Konstantin Ananyev
                               ` (9 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce stress test for ring enqueue/dequeue operations.
Performs the following pattern on each slave worker:
dequeue/read-write data from the dequeued objects/enqueue.
Serves as both functional and performance test of ring
enqueue/dequeue operations under high contention
(for both over committed and non-over committed scenarios).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile                |   2 +
 app/test/meson.build             |   2 +
 app/test/test_ring_mpmc_stress.c |  31 +++
 app/test/test_ring_stress.c      |  48 ++++
 app/test/test_ring_stress.h      |  35 +++
 app/test/test_ring_stress_impl.h | 396 +++++++++++++++++++++++++++++++
 6 files changed, 514 insertions(+)
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h

diff --git a/app/test/Makefile b/app/test/Makefile
index be53d33c3..a23a011df 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 04b59cffa..8824f366c 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,7 +100,9 @@ test_sources = files('commands.c',
 	'test_rib.c',
 	'test_rib6.c',
 	'test_ring.c',
+	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_mpmc_stress.c b/app/test/test_ring_mpmc_stress.c
new file mode 100644
index 000000000..1524b0248
--- /dev/null
+++ b/app/test/test_ring_mpmc_stress.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num, 0);
+}
+
+const struct test test_ring_mpmc_stress = {
+	.name = "MP/MC",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
new file mode 100644
index 000000000..60706f799
--- /dev/null
+++ b/app/test/test_ring_stress.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static int
+run_test(const struct test *test)
+{
+	int32_t rc;
+	uint32_t i, k;
+
+	for (i = 0, k = 0; i != test->nb_case; i++) {
+
+		printf("TEST-CASE %s %s START\n",
+			test->name, test->cases[i].name);
+
+		rc = test->cases[i].func(test->cases[i].wfunc);
+		k += (rc == 0);
+
+		if (rc != 0)
+			printf("TEST-CASE %s %s FAILED\n",
+				test->name, test->cases[i].name);
+		else
+			printf("TEST-CASE %s %s OK\n",
+				test->name, test->cases[i].name);
+	}
+
+	return k;
+}
+
+static int
+test_ring_stress(void)
+{
+	uint32_t n, k;
+
+	n = 0;
+	k = 0;
+
+	n += test_ring_mpmc_stress.nb_case;
+	k += run_test(&test_ring_mpmc_stress);
+
+	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+		n, k, n - k);
+	return (k != n);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
new file mode 100644
index 000000000..60eac6216
--- /dev/null
+++ b/app/test/test_ring_stress.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <rte_ring.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+#include <rte_random.h>
+#include <rte_malloc.h>
+#include <rte_spinlock.h>
+
+#include "test.h"
+
+struct test_case {
+	const char *name;
+	int (*func)(int (*)(void *));
+	int (*wfunc)(void *arg);
+};
+
+struct test {
+	const char *name;
+	uint32_t nb_case;
+	const struct test_case *cases;
+};
+
+extern const struct test test_ring_mpmc_stress;
diff --git a/app/test/test_ring_stress_impl.h b/app/test/test_ring_stress_impl.h
new file mode 100644
index 000000000..222d62bc4
--- /dev/null
+++ b/app/test/test_ring_stress_impl.h
@@ -0,0 +1,396 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+/**
+ * Stress test for ring enqueue/dequeue operations.
+ * Performs the following pattern on each slave worker:
+ * dequeue/read-write data from the dequeued objects/enqueue.
+ * Serves as both functional and performance test of ring
+ * enqueue/dequeue operations under high contention
+ * (for both over committed and non-over committed scenarios).
+ */
+
+#define RING_NAME	"RING_STRESS"
+#define BULK_NUM	32
+#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
+
+enum {
+	WRK_CMD_STOP,
+	WRK_CMD_RUN,
+};
+
+static volatile uint32_t wrk_cmd __rte_cache_aligned;
+
+/* test run-time in seconds */
+static const uint32_t run_time = 60;
+static const uint32_t verbose;
+
+struct lcore_stat {
+	uint64_t nb_cycle;
+	struct {
+		uint64_t nb_call;
+		uint64_t nb_obj;
+		uint64_t nb_cycle;
+		uint64_t max_cycle;
+		uint64_t min_cycle;
+	} op;
+};
+
+struct lcore_arg {
+	struct rte_ring *rng;
+	struct lcore_stat stats;
+} __rte_cache_aligned;
+
+struct ring_elem {
+	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)];
+} __rte_cache_aligned;
+
+/*
+ * redefinable functions
+ */
+static uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail);
+
+static uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free);
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
+
+
+static void
+lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
+	uint64_t tm, int32_t prcs)
+{
+	ls->op.nb_call += call;
+	ls->op.nb_obj += obj;
+	ls->op.nb_cycle += tm;
+	if (prcs) {
+		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
+		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
+	}
+}
+
+static void
+lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+
+	ms->op.nb_call += ls->op.nb_call;
+	ms->op.nb_obj += ls->op.nb_obj;
+	ms->op.nb_cycle += ls->op.nb_cycle;
+	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
+	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle);
+}
+
+static void
+lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
+	lcore_op_stat_aggr(ms, ls);
+}
+
+static void
+lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls)
+{
+	long double st;
+
+	st = (long double)rte_get_timer_hz() / US_PER_S;
+
+	if (lc == UINT32_MAX)
+		fprintf(f, "%s(AGGREGATE)={\n", __func__);
+	else
+		fprintf(f, "%s(lcore=%u)={\n", __func__, lc);
+
+	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
+		ls->nb_cycle, (long double)ls->nb_cycle / st);
+
+	fprintf(f, "\tDEQ+ENQ={\n");
+
+	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
+	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
+	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
+	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_obj / ls->op.nb_call);
+	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_obj);
+	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_call);
+
+	/* if min/max cycles per call stats was collected */
+	if (ls->op.min_cycle != UINT64_MAX) {
+		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.max_cycle,
+			(long double)ls->op.max_cycle / st);
+		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.min_cycle,
+			(long double)ls->op.min_cycle / st);
+	}
+
+	fprintf(f, "\t},\n");
+	fprintf(f, "};\n");
+}
+
+static void
+fill_ring_elm(struct ring_elem *elm, uint32_t fill)
+{
+	uint32_t i;
+
+	for (i = 0; i != RTE_DIM(elm->cnt); i++)
+		elm->cnt[i] = fill;
+}
+
+static int32_t
+check_updt_elem(struct ring_elem *elm[], uint32_t num,
+	const struct ring_elem *check, const struct ring_elem *fill)
+{
+	uint32_t i;
+
+	static rte_spinlock_t dump_lock;
+
+	for (i = 0; i != num; i++) {
+		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
+			rte_spinlock_lock(&dump_lock);
+			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
+				"offending object: %p\n",
+				__func__, rte_lcore_id(), num, i, elm[i]);
+			rte_memdump(stdout, "expected", check, sizeof(*check));
+			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
+			rte_spinlock_unlock(&dump_lock);
+			return -EINVAL;
+		}
+		memcpy(elm[i], fill, sizeof(*elm[i]));
+	}
+
+	return 0;
+}
+
+static int
+check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
+	const char *fname, const char *opname)
+{
+	if (exp != res) {
+		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
+			fname, lc, opname, exp, res);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static int
+test_worker(void *arg, const char *fname, int32_t prcs)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl, tm0, tm1;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		tm0 = (prcs != 0) ? rte_rdtsc_precise() : 0;
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+		tm0 = (prcs != 0) ? rte_rdtsc_precise() - tm0 : 0;
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, fname,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		tm1 = (prcs != 0) ? rte_rdtsc_precise() : 0;
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+		tm1 = (prcs != 0) ? rte_rdtsc_precise() - tm1 : 0;
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, fname,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, prcs);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	cl = rte_rdtsc_precise() - cl;
+	if (prcs == 0)
+		lcore_stat_update(&la->stats, 0, 0, cl, 0);
+	la->stats.nb_cycle = cl;
+	return rc;
+}
+static int
+test_worker_prcs(void *arg)
+{
+	return test_worker(arg, __func__, 1);
+}
+
+static int
+test_worker_avg(void *arg)
+{
+	return test_worker(arg, __func__, 0);
+}
+
+static void
+mt1_fini(struct rte_ring *rng, void *data)
+{
+	rte_free(rng);
+	rte_free(data);
+}
+
+static int
+mt1_init(struct rte_ring **rng, void **data, uint32_t num)
+{
+	int32_t rc;
+	size_t sz;
+	uint32_t i, nr;
+	struct rte_ring *r;
+	struct ring_elem *elm;
+	void *p;
+
+	*rng = NULL;
+	*data = NULL;
+
+	sz = num * sizeof(*elm);
+	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
+	if (elm == NULL) {
+		printf("%s: alloc(%zu) for %u elems data failed",
+			__func__, sz, num);
+		return -ENOMEM;
+	}
+
+	*data = elm;
+
+	/* alloc ring */
+	nr = 2 * num;
+	sz = rte_ring_get_memsize(nr);
+	r = rte_zmalloc(NULL, sz, __alignof__(*r));
+	if (r == NULL) {
+		printf("%s: alloc(%zu) for FIFO with %u elems failed",
+			__func__, sz, nr);
+		return -ENOMEM;
+	}
+
+	*rng = r;
+
+	rc = _st_ring_init(r, RING_NAME, nr);
+	if (rc != 0) {
+		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
+			__func__, r, nr, rc, strerror(-rc));
+		return rc;
+	}
+
+	for (i = 0; i != num; i++) {
+		fill_ring_elm(elm + i, UINT32_MAX);
+		p = elm + i;
+		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
+			break;
+	}
+
+	if (i != num) {
+		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
+			__func__, r, num, i);
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
+static int
+test_mt1(int (*test)(void *))
+{
+	int32_t rc;
+	uint32_t lc, mc;
+	struct rte_ring *r;
+	void *data;
+	struct lcore_arg arg[RTE_MAX_LCORE];
+
+	static const struct lcore_stat init_stat = {
+		.op.min_cycle = UINT64_MAX,
+	};
+
+	rc = mt1_init(&r, &data, RING_SIZE);
+	if (rc != 0) {
+		mt1_fini(r, data);
+		return rc;
+	}
+
+	memset(arg, 0, sizeof(arg));
+
+	/* launch on all slaves */
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		arg[lc].rng = r;
+		arg[lc].stats = init_stat;
+		rte_eal_remote_launch(test, &arg[lc], lc);
+	}
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_RUN;
+	rte_smp_wmb();
+
+	usleep(run_time * US_PER_S);
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_STOP;
+	rte_smp_wmb();
+
+	/* wait for slaves and collect stats. */
+	mc = rte_lcore_id();
+	arg[mc].stats = init_stat;
+
+	rc = 0;
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		rc |= rte_eal_wait_lcore(lc);
+		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
+		if (verbose != 0)
+			lcore_stat_dump(stdout, lc, &arg[lc].stats);
+	}
+
+	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+	mt1_fini(r, data);
+	return rc;
+}
+
+static const struct test_case tests[] = {
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
+		.func = test_mt1,
+		.wfunc = test_worker_prcs,
+	},
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
+		.func = test_mt1,
+		.wfunc = test_worker_avg,
+	},
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 02/10] ring: prepare ring to allow new sync schemes
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 01/10] test/ring: add contention stress test Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 03/10] ring: introduce RTS ring mode Konstantin Ananyev
                               ` (8 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

To make these preparations two main things are done:
- Change from *single* to *sync_type* to allow different
  synchronisation schemes to be applied.
  Mark *single* as deprecated in comments.
  Add new functions to allow user to query ring sync types.
  Replace direct access to *single* with appropriate function call.
- Move actual rte_ring and related structures definitions into a
  separate file: <rte_ring_core.h>. It allows to refer contents
  of <rte_ring_elem.h> from <rte_ring.h> without introducing a
  circular dependency.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_pdump.c           |   6 +-
 lib/librte_pdump/rte_pdump.c    |   2 +-
 lib/librte_port/rte_port_ring.c |  12 +--
 lib/librte_ring/Makefile        |   1 +
 lib/librte_ring/meson.build     |   1 +
 lib/librte_ring/rte_ring.c      |   6 +-
 lib/librte_ring/rte_ring.h      | 170 ++++++++++++++------------------
 lib/librte_ring/rte_ring_core.h | 132 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_elem.h |  42 +++-----
 9 files changed, 234 insertions(+), 138 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_core.h

diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index ad183184c..6a1180bcb 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -57,8 +57,7 @@ run_pdump_client_tests(void)
 	if (ret < 0)
 		return -1;
 	mp->flags = 0x0000;
-	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
-				      RING_F_SP_ENQ | RING_F_SC_DEQ);
+	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
 	if (ring_client == NULL) {
 		printf("rte_ring_create SR0 failed");
 		return -1;
@@ -71,9 +70,6 @@ run_pdump_client_tests(void)
 	}
 	rte_eth_dev_probing_finish(eth_dev);
 
-	ring_client->prod.single = 0;
-	ring_client->cons.single = 0;
-
 	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
 
 	for (itr = 0; itr < NUM_ITR; itr++) {
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 8a01ac510..f96709f95 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 		rte_errno = EINVAL;
 		return -1;
 	}
-	if (ring->prod.single || ring->cons.single) {
+	if (rte_ring_is_prod_single(ring) || rte_ring_is_cons_single(ring)) {
 		PDUMP_LOG(ERR, "ring with either SP or SC settings"
 		" is not valid for pdump, should have MP and MC settings\n");
 		rte_errno = EINVAL;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 47fcdd06a..52b2d8e55 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->cons.single && is_multi) ||
-		(!(conf->ring->cons.single) && !is_multi)) {
+		(rte_ring_is_cons_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_cons_single(conf->ring) && !is_multi)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
 	}
@@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
@@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 28368e6d1..6572768c9 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_core.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 05402e4f0..c656781da 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -3,6 +3,7 @@
 
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_core.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 77e5de099..fa5733907 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
-	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
+	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 18fc5d845..35ee4491c 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -36,91 +36,7 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_config.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-#include <rte_pause.h>
-
-#define RTE_TAILQ_RING_NAME "RTE_RING"
-
-enum rte_ring_queue_behavior {
-	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items from a ring */
-	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
-};
-
-#define RTE_RING_MZ_PREFIX "RG_"
-/** The maximum length of a ring name. */
-#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
-			   sizeof(RTE_RING_MZ_PREFIX) + 1)
-
-/* structure to hold a pair of head/tail values and other metadata */
-struct rte_ring_headtail {
-	volatile uint32_t head;  /**< Prod/consumer head. */
-	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t single;         /**< True if single prod/cons */
-};
-
-/**
- * An RTE ring structure.
- *
- * The producer and the consumer have a head and a tail index. The particularity
- * of these index is that they are not between 0 and size(ring). These indexes
- * are between 0 and 2^32, and we mask their value when we access the ring[]
- * field. Thanks to this assumption, we can do subtractions between 2 index
- * values in a modulo-32bit base: that's why the overflow of the indexes is not
- * a problem.
- */
-struct rte_ring {
-	/*
-	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
-	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
-	 * next time the ABI changes
-	 */
-	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned; /**< Name of the ring. */
-	int flags;               /**< Flags supplied at creation. */
-	const struct rte_memzone *memzone;
-			/**< Memzone, if any, containing the rte_ring */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t capacity;       /**< Usable size of ring */
-
-	char pad0 __rte_cache_aligned; /**< empty cache line */
-
-	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
-	char pad1 __rte_cache_aligned; /**< empty cache line */
-
-	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
-	char pad2 __rte_cache_aligned; /**< empty cache line */
-};
-
-#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
-#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-/**
- * Ring is to hold exactly requested number of entries.
- * Without this flag set, the ring size requested must be a power of 2, and the
- * usable space will be that size - 1. With the flag, the requested size will
- * be rounded up to the next power of two, but the usable space will be exactly
- * that requested. Worst case, if a power-of-2 size is requested, half the
- * ring space will be wasted.
- */
-#define RING_F_EXACT_SZ 0x0004
-#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
-
-/* @internal defines for passing to the enqueue dequeue worker functions */
-#define __IS_SP 1
-#define __IS_MP 0
-#define __IS_SC 1
-#define __IS_MC 0
+#include <rte_ring_core.h>
 
 /**
  * Calculate the memory size needed for a ring
@@ -420,7 +336,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MP, free_space);
+			RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -443,9 +359,13 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SP, free_space);
+			RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_elem.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -470,7 +390,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -554,7 +474,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MC, available);
+			RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -578,7 +498,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SC, available);
+			RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -605,7 +525,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 /**
@@ -777,6 +697,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
 	return r->capacity;
 }
 
+/**
+ * Return sync type used by producer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_prod_sync_type(const struct rte_ring *r)
+{
+	return r->prod.sync_type;
+}
+
+/**
+ * Check is the ring for single producer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SP, zero otherwise.
+ */
+static inline int
+rte_ring_is_prod_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
+/**
+ * Return sync type used by consumer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_cons_sync_type(const struct rte_ring *r)
+{
+	return r->cons.sync_type;
+}
+
+/**
+ * Check is the ring for single consumer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SC, zero otherwise.
+ */
+static inline int
+rte_ring_is_cons_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -820,7 +796,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -843,7 +819,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -870,7 +846,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -898,7 +874,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -923,7 +899,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -951,7 +927,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
new file mode 100644
index 000000000..d9cef763f
--- /dev/null
+++ b/lib/librte_ring/rte_ring_core.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_CORE_H_
+#define _RTE_RING_CORE_H_
+
+/**
+ * @file
+ * This file contains definion of RTE ring structure itself,
+ * init flags and some related macros.
+ * For majority of DPDK entities, it is not recommended to include
+ * this file directly, use include <rte_ring.h> or <rte_ring_elem.h>
+ * instead.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_debug.h>
+
+#define RTE_TAILQ_RING_NAME "RTE_RING"
+
+/** enqueue/dequeue behavior types */
+enum rte_ring_queue_behavior {
+	/** Enq/Deq a fixed number of items from a ring */
+	RTE_RING_QUEUE_FIXED = 0,
+	/** Enq/Deq as many items as possible from ring */
+	RTE_RING_QUEUE_VARIABLE
+};
+
+#define RTE_RING_MZ_PREFIX "RG_"
+/** The maximum length of a ring name. */
+#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_RING_MZ_PREFIX) + 1)
+
+/** prod/cons sync types */
+enum rte_ring_sync_type {
+	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
+	RTE_RING_SYNC_ST,     /**< single thread only */
+};
+
+/**
+ * structures to hold a pair of head/tail values and other metadata.
+ * Depending on sync_type format of that structure might be different,
+ * but offset for *sync_type* and *tail* values should remain the same.
+ */
+struct rte_ring_headtail {
+	volatile uint32_t head;      /**< prod/consumer head. */
+	volatile uint32_t tail;      /**< prod/consumer tail. */
+	RTE_STD_C11
+	union {
+		/** sync type of prod/cons */
+		enum rte_ring_sync_type sync_type;
+		/** deprecated -  True if single prod/cons */
+		uint32_t single;
+	};
+};
+
+/**
+ * An RTE ring structure.
+ *
+ * The producer and the consumer have a head and a tail index. The particularity
+ * of these index is that they are not between 0 and size(ring). These indexes
+ * are between 0 and 2^32, and we mask their value when we access the ring[]
+ * field. Thanks to this assumption, we can do subtractions between 2 index
+ * values in a modulo-32bit base: that's why the overflow of the indexes is not
+ * a problem.
+ */
+struct rte_ring {
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
+	 * next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned;
+	/**< Name of the ring. */
+	int flags;               /**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t capacity;       /**< Usable size of ring */
+
+	char pad0 __rte_cache_aligned; /**< empty cache line */
+
+	/** Ring producer status. */
+	struct rte_ring_headtail prod __rte_cache_aligned;
+	char pad1 __rte_cache_aligned; /**< empty cache line */
+
+	/** Ring consumer status. */
+	struct rte_ring_headtail cons __rte_cache_aligned;
+	char pad2 __rte_cache_aligned; /**< empty cache line */
+};
+
+#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
+#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
+/**
+ * Ring is to hold exactly requested number of entries.
+ * Without this flag set, the ring size requested must be a power of 2, and the
+ * usable space will be that size - 1. With the flag, the requested size will
+ * be rounded up to the next power of two, but the usable space will be exactly
+ * that requested. Worst case, if a power-of-2 size is requested, half the
+ * ring space will be wasted.
+ */
+#define RING_F_EXACT_SZ 0x0004
+#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_CORE_H_ */
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 663addc73..7406c0b0f 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -20,21 +20,7 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <string.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_config.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-#include <rte_pause.h>
-
-#include "rte_ring.h"
+#include <rte_ring_core.h>
 
 /**
  * @warning
@@ -510,7 +496,7 @@ rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -539,7 +525,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -570,7 +556,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
 }
 
 /**
@@ -675,7 +661,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+				RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -703,7 +689,7 @@ rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -734,7 +720,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
 }
 
 /**
@@ -842,7 +828,7 @@ rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -871,7 +857,7 @@ rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -902,7 +888,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
 }
 
 /**
@@ -934,7 +920,7 @@ rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -963,7 +949,7 @@ rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -995,9 +981,11 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
+#include <rte_ring.h>
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 03/10] ring: introduce RTS ring mode
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 01/10] test/ring: add contention stress test Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 02/10] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 04/10] test/ring: add contention stress test for RTS ring Konstantin Ananyev
                               ` (7 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
Aim to reduce stall times in case when ring is used on
overcommited cpus (multiple active threads on the same cpu).
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---

check-abi.sh reports what I believe is a false-positive about
ring cons/prod changes. As a workaround, devtools/libabigail.abignore is
updated to suppress *struct ring* related errors.

This patch depends on following patch:
"meson: add libatomic as a global dependency for i686 clang"
(http://patches.dpdk.org/patch/68876/)

 devtools/libabigail.abignore           |   7 +
 doc/guides/rel_notes/release_20_05.rst |   7 +
 lib/librte_ring/Makefile               |   4 +-
 lib/librte_ring/meson.build            |   7 +-
 lib/librte_ring/rte_ring.c             | 100 +++++-
 lib/librte_ring/rte_ring.h             | 118 +++++--
 lib/librte_ring/rte_ring_core.h        |  36 +-
 lib/librte_ring/rte_ring_elem.h        | 114 ++++++-
 lib/librte_ring/rte_ring_rts.h         | 439 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h | 179 ++++++++++
 10 files changed, 963 insertions(+), 48 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index a59df8f13..cd86d89ca 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -11,3 +11,10 @@
         type_kind = enum
         name = rte_crypto_asym_xform_type
         changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+; Ignore updates of ring prod/cons
+[suppress_type]
+        type_kind = struct
+        name = rte_ring
+[suppress_type]
+        type_kind = struct
+        name = rte_event_ring
diff --git a/doc/guides/rel_notes/release_20_05.rst b/doc/guides/rel_notes/release_20_05.rst
index 184967844..eedf960d0 100644
--- a/doc/guides/rel_notes/release_20_05.rst
+++ b/doc/guides/rel_notes/release_20_05.rst
@@ -81,6 +81,13 @@ New Features
   by making use of the event device capabilities. The event mode currently supports
   only inline IPsec protocol offload.
 
+* **New synchronization modes for rte_ring.**
+
+  Introduced new optional MT synchronization mode for rte_ring:
+  Relaxed Tail Sync (RTS). With this mode selected, rte_ring shows
+  significant improvements for average enqueue/dequeue times on
+  overcommitted systems.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 6572768c9..04e446e37 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_core.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_rts.h \
+					rte_ring_rts_c11_mem.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index c656781da..a95598032 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,4 +6,9 @@ headers = files('rte_ring.h',
 		'rte_ring_core.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_rts.h',
+		'rte_ring_rts_c11_mem.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index fa5733907..222eec0fb 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
+/* by default set head/tail distance as 1/8 of ring capacity */
+#define HTD_MAX_DEF	8
+
 /* return the size of memory occupied by a ring */
 ssize_t
 rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
@@ -79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
 	return rte_ring_get_memsize_elem(sizeof(void *), count);
 }
 
+/*
+ * internal helper function to reset prod/cons head-tail values.
+ */
+static void
+reset_headtail(void *p)
+{
+	struct rte_ring_headtail *ht;
+	struct rte_ring_rts_headtail *ht_rts;
+
+	ht = p;
+	ht_rts = p;
+
+	switch (ht->sync_type) {
+	case RTE_RING_SYNC_MT:
+	case RTE_RING_SYNC_ST:
+		ht->head = 0;
+		ht->tail = 0;
+		break;
+	case RTE_RING_SYNC_MT_RTS:
+		ht_rts->head.raw = 0;
+		ht_rts->tail.raw = 0;
+		break;
+	default:
+		/* unknown sync mode */
+		RTE_ASSERT(0);
+	}
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+	reset_headtail(&r->prod);
+	reset_headtail(&r->cons);
+}
+
+/*
+ * helper function, calculates sync_type values for prod and cons
+ * based on input flags. Returns zero at success or negative
+ * errno value otherwise.
+ */
+static int
+get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
+	enum rte_ring_sync_type *cons_st)
+{
+	static const uint32_t prod_st_flags =
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+	static const uint32_t cons_st_flags =
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+
+	switch (flags & prod_st_flags) {
+	case 0:
+		*prod_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SP_ENQ:
+		*prod_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MP_RTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (flags & cons_st_flags) {
+	case 0:
+		*cons_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SC_DEQ:
+		*cons_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MC_RTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 int
@@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_rts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
+
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
 	ret = strlcpy(r->name, name, sizeof(r->name));
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
-	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
+	if (ret != 0)
+		return ret;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
@@ -126,8 +206,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 		r->mask = count - 1;
 		r->capacity = r->mask;
 	}
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+
+	/* set default values for head-tail distance */
+	if (flags & RING_F_MP_RTS_ENQ)
+		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
+	if (flags & RING_F_MC_RTS_DEQ)
+		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
 
 	return 0;
 }
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 35ee4491c..c42e1cfc4 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  *
- * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2010-2020 Intel Corporation
  * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
  * All rights reserved.
  * Derived from FreeBSD's bufring.h
@@ -79,12 +79,24 @@ ssize_t rte_ring_get_memsize(unsigned count);
  *   The number of elements in the ring (must be a power of 2).
  * @param flags
  *   An OR of the following:
- *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
- *      is "single-producer". Otherwise, it is "multi-producers".
- *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
- *      is "single-consumer". Otherwise, it is "multi-consumers".
+ *   - One of mutually exclusive flags that define producer behavior:
+ *      - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "single-producer".
+ *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer RTS mode".
+ *     If none of these flags is set, then default "multi-producer"
+ *     behavior is selected.
+ *   - One of mutually exclusive flags that define consumer behavior:
+ *      - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "single-consumer". Otherwise, it is "multi-consumers".
+ *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer RTS mode".
+ *     If none of these flags is set, then default "multi-consumer"
+ *     behavior is selected.
  * @return
  *   0 on success, or a negative value on error.
  */
@@ -114,12 +126,24 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  *   constraint for the reserved zone.
  * @param flags
  *   An OR of the following:
- *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
- *      is "single-producer". Otherwise, it is "multi-producers".
- *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
- *      is "single-consumer". Otherwise, it is "multi-consumers".
+ *   - One of mutually exclusive flags that define producer behavior:
+ *      - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "single-producer".
+ *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer RTS mode".
+ *     If none of these flags is set, then default "multi-producer"
+ *     behavior is selected.
+ *   - One of mutually exclusive flags that define consumer behavior:
+ *      - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "single-consumer". Otherwise, it is "multi-consumers".
+ *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer RTS mode".
+ *     If none of these flags is set, then default "multi-consumer"
+ *     behavior is selected.
  * @return
  *   On success, the pointer to the new allocated ring. NULL on error with
  *    rte_errno set appropriately. Possible errno values include:
@@ -389,8 +413,21 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -524,8 +561,20 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -845,8 +894,21 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -925,9 +987,21 @@ static __rte_always_inline unsigned
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
+			available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
index d9cef763f..bd21fa535 100644
--- a/lib/librte_ring/rte_ring_core.h
+++ b/lib/librte_ring/rte_ring_core.h
@@ -57,6 +57,9 @@ enum rte_ring_queue_behavior {
 enum rte_ring_sync_type {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
+#ifdef ALLOW_EXPERIMENTAL_API
+	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+#endif
 };
 
 /**
@@ -76,6 +79,22 @@ struct rte_ring_headtail {
 	};
 };
 
+union __rte_ring_rts_poscnt {
+	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
+	uint64_t raw __rte_aligned(8);
+	struct {
+		uint32_t cnt; /**< head/tail reference counter */
+		uint32_t pos; /**< head/tail position */
+	} val;
+};
+
+struct rte_ring_rts_headtail {
+	volatile union __rte_ring_rts_poscnt tail;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+	uint32_t htd_max;   /**< max allowed distance between head/tail */
+	volatile union __rte_ring_rts_poscnt head;
+};
+
 /**
  * An RTE ring structure.
  *
@@ -104,11 +123,21 @@ struct rte_ring {
 	char pad0 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail prod;
+		struct rte_ring_rts_headtail rts_prod;
+	}  __rte_cache_aligned;
+
 	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail cons;
+		struct rte_ring_rts_headtail rts_cons;
+	}  __rte_cache_aligned;
+
 	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
@@ -125,6 +154,9 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
+#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
+#define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 7406c0b0f..4030753b6 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -74,12 +74,24 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *   constraint for the reserved zone.
  * @param flags
  *   An OR of the following:
- *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
- *      is "single-producer". Otherwise, it is "multi-producers".
- *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
- *      is "single-consumer". Otherwise, it is "multi-consumers".
+ *   - One of mutually exclusive flags that define producer behavior:
+ *      - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "single-producer".
+ *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer RTS mode".
+ *     If none of these flags is set, then default "multi-producer"
+ *     behavior is selected.
+ *   - One of mutually exclusive flags that define consumer behavior:
+ *      - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "single-consumer". Otherwise, it is "multi-consumers".
+ *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer RTS mode".
+ *     If none of these flags is set, then default "multi-consumer"
+ *     behavior is selected.
  * @return
  *   On success, the pointer to the new allocated ring. NULL on error with
  *    rte_errno set appropriately. Possible errno values include:
@@ -528,6 +540,10 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_rts.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -557,6 +573,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
 			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -661,7 +697,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -719,8 +755,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 /**
@@ -887,8 +940,25 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -979,9 +1049,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 #include <rte_ring.h>
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
new file mode 100644
index 000000000..8ced07096
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -0,0 +1,439 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_H_
+#define _RTE_RING_RTS_H_
+
+/**
+ * @file rte_ring_rts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for Relaxed Tail Sync (RTS) ring mode.
+ * The main idea remains the same as for our original MP/MC synchronization
+ * mechanism.
+ * The main difference is that tail value is increased not
+ * by every thread that finished enqueue/dequeue,
+ * but only by the current last one doing enqueue/dequeue.
+ * That allows threads to skip spinning on tail value,
+ * leaving actual tail value change to last thread at a given instance.
+ * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+ * one for head update, second for tail update.
+ * As a gain it allows thread to avoid spinning/waiting on tail value.
+ * In comparision original MP/MC algorithm requires one 32-bit CAS
+ * for head update and waiting/spinning on tail value.
+ *
+ * Brief outline:
+ *  - introduce update counter (cnt) for both head and tail.
+ *  - increment head.cnt for each head.value update
+ *  - write head.value and head.cnt atomically (64-bit CAS)
+ *  - move tail.value ahead only when tail.cnt + 1 == head.cnt
+ *    (indicating that this is the last thread updating the tail)
+ *  - increment tail.cnt when each enqueue/dequeue op finishes
+ *    (no matter if tail.value going to change or not)
+ *  - write tail.value and tail.cnt atomically (64-bit CAS)
+ *
+ * To avoid producer/consumer starvation:
+ *  - limit max allowed distance between head and tail value (HTD_MAX).
+ *    I.E. thread is allowed to proceed with changing head.value,
+ *    only when:  head.value - tail.value <= HTD_MAX
+ * HTD_MAX is an optional parameter.
+ * With HTD_MAX == 0 we'll have fully serialized ring -
+ * i.e. only one thread at a time will be able to enqueue/dequeue
+ * to/from the ring.
+ * With HTD_MAX >= ring.capacity - no limitation.
+ * By default HTD_MAX == ring.capacity / 8.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_c11_mem.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, const void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Return producer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer HTD value, if producer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_prod_htd_max(const struct rte_ring *r)
+{
+	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set producer max Head-Tail-Distance (HTD).
+ * Note that producer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_prod.htd_max = v;
+	return 0;
+}
+
+/**
+ * Return consumer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer HTD value, if consumer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_cons_htd_max(const struct rte_ring *r)
+{
+	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_cons.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set consumer max Head-Tail-Distance (HTD).
+ * Note that consumer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_cons.htd_max = v;
+	return 0;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h b/lib/librte_ring/rte_ring_rts_c11_mem.h
new file mode 100644
index 000000000..327f22796
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_C11_MEM_H_
+#define _RTE_RING_RTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_rts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union __rte_ring_rts_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union __rte_ring_rts_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union __rte_ring_rts_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/*
+		 * wait for prod head/tail distance,
+		 * make sure that we read prod head *before*
+		 * reading cons tail.
+		 */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of cons tail value
+	 *  - OOO copy of elems to the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union __rte_ring_rts_poscnt nh, oh;
+
+	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/*
+		 * wait for cons head/tail distance,
+		 * make sure that we read cons head *before*
+		 * reading prod tail.
+		 */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of prod tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 04/10] test/ring: add contention stress test for RTS ring
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
                               ` (2 preceding siblings ...)
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 03/10] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 05/10] ring: introduce HTS ring mode Konstantin Ananyev
                               ` (6 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test RTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_rts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_rts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index a23a011df..00b74b5c9 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -79,6 +79,7 @@ SRCS-y += test_rand_perf.c
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 8824f366c..97ad822c1 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_rts_stress.c b/app/test/test_ring_rts_stress.c
new file mode 100644
index 000000000..f5255f24c
--- /dev/null
+++ b/app/test/test_ring_rts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_rts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_rts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ);
+}
+
+const struct test test_ring_rts_stress = {
+	.name = "MT_RTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 60706f799..eab395e30 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -40,6 +40,9 @@ test_ring_stress(void)
 	n += test_ring_mpmc_stress.nb_case;
 	k += run_test(&test_ring_mpmc_stress);
 
+	n += test_ring_rts_stress.nb_case;
+	k += run_test(&test_ring_rts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 60eac6216..32aae2072 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -33,3 +33,4 @@ struct test {
 };
 
 extern const struct test test_ring_mpmc_stress;
+extern const struct test test_ring_rts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 05/10] ring: introduce HTS ring mode
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
                               ` (3 preceding siblings ...)
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 04/10] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 06/10] test/ring: add contention stress test for HTS ring Konstantin Ananyev
                               ` (5 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce head/tail sync mode for MT ring synchronization.
In that mode enqueue/dequeue operation is fully serialized:
only one thread at a time is allowed to perform given op.
Suppose to reduce stall times in case when ring is used on
overcommitted cpus (multiple active threads on the same cpu).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/rel_notes/release_20_05.rst |   8 +-
 lib/librte_ring/Makefile               |   2 +
 lib/librte_ring/meson.build            |   2 +
 lib/librte_ring/rte_ring.c             |  20 +-
 lib/librte_ring/rte_ring.h             |  23 ++
 lib/librte_ring/rte_ring_core.h        |  20 ++
 lib/librte_ring/rte_ring_elem.h        |  19 ++
 lib/librte_ring/rte_ring_hts.h         | 332 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h | 164 ++++++++++++
 9 files changed, 584 insertions(+), 6 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h

diff --git a/doc/guides/rel_notes/release_20_05.rst b/doc/guides/rel_notes/release_20_05.rst
index eedf960d0..db8a281db 100644
--- a/doc/guides/rel_notes/release_20_05.rst
+++ b/doc/guides/rel_notes/release_20_05.rst
@@ -83,10 +83,10 @@ New Features
 
 * **New synchronization modes for rte_ring.**
 
-  Introduced new optional MT synchronization mode for rte_ring:
-  Relaxed Tail Sync (RTS). With this mode selected, rte_ring shows
-  significant improvements for average enqueue/dequeue times on
-  overcommitted systems.
+  Introduced new optional MT synchronization modes for rte_ring:
+  Relaxed Tail Sync (RTS) mode and Head/Tail Sync (HTS) mode.
+  With these mode selected, rte_ring shows significant improvements for
+  average enqueue/dequeue times on overcommitted systems.
 
 
 Removed Items
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 04e446e37..f75d8e530 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -20,6 +20,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
+					rte_ring_hts.h \
+					rte_ring_hts_c11_mem.h \
 					rte_ring_rts.h \
 					rte_ring_rts_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index a95598032..ca37cb8cc 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -7,6 +7,8 @@ headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
+		'rte_ring_hts.h',
+		'rte_ring_hts_c11_mem.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_c11_mem.h')
 
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 222eec0fb..ebe5ccf0d 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -89,9 +89,11 @@ static void
 reset_headtail(void *p)
 {
 	struct rte_ring_headtail *ht;
+	struct rte_ring_hts_headtail *ht_hts;
 	struct rte_ring_rts_headtail *ht_rts;
 
 	ht = p;
+	ht_hts = p;
 	ht_rts = p;
 
 	switch (ht->sync_type) {
@@ -104,6 +106,9 @@ reset_headtail(void *p)
 		ht_rts->head.raw = 0;
 		ht_rts->tail.raw = 0;
 		break;
+	case RTE_RING_SYNC_MT_HTS:
+		ht_hts->ht.raw = 0;
+		break;
 	default:
 		/* unknown sync mode */
 		RTE_ASSERT(0);
@@ -127,9 +132,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	enum rte_ring_sync_type *cons_st)
 {
 	static const uint32_t prod_st_flags =
-		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ | RING_F_MP_HTS_ENQ);
 	static const uint32_t cons_st_flags =
-		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ | RING_F_MC_HTS_DEQ);
 
 	switch (flags & prod_st_flags) {
 	case 0:
@@ -141,6 +146,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MP_RTS_ENQ:
 		*prod_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MP_HTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -155,6 +163,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MC_RTS_DEQ:
 		*cons_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MC_HTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -176,6 +187,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_hts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
+
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
 		offsetof(struct rte_ring_rts_headtail, sync_type));
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index c42e1cfc4..7cf046528 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -86,6 +86,9 @@ ssize_t rte_ring_get_memsize(unsigned count);
  *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
  *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
  *        is "multi-producer RTS mode".
+ *      - RING_F_MP_HTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer HTS mode".
  *     If none of these flags is set, then default "multi-producer"
  *     behavior is selected.
  *   - One of mutually exclusive flags that define consumer behavior:
@@ -95,6 +98,9 @@ ssize_t rte_ring_get_memsize(unsigned count);
  *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
  *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
  *        is "multi-consumer RTS mode".
+ *      - RING_F_MC_HTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer HTS mode".
  *     If none of these flags is set, then default "multi-consumer"
  *     behavior is selected.
  * @return
@@ -133,6 +139,9 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
  *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
  *        is "multi-producer RTS mode".
+ *      - RING_F_MP_HTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer HTS mode".
  *     If none of these flags is set, then default "multi-producer"
  *     behavior is selected.
  *   - One of mutually exclusive flags that define consumer behavior:
@@ -142,6 +151,9 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
  *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
  *        is "multi-consumer RTS mode".
+ *      - RING_F_MC_HTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer HTS mode".
  *     If none of these flags is set, then default "multi-consumer"
  *     behavior is selected.
  * @return
@@ -422,6 +434,9 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -569,6 +584,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 #ifdef ALLOW_EXPERIMENTAL_API
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
 #endif
 	}
 
@@ -903,6 +920,9 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -996,6 +1016,9 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
 			available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
+			available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
index bd21fa535..16718ca7f 100644
--- a/lib/librte_ring/rte_ring_core.h
+++ b/lib/librte_ring/rte_ring_core.h
@@ -59,6 +59,7 @@ enum rte_ring_sync_type {
 	RTE_RING_SYNC_ST,     /**< single thread only */
 #ifdef ALLOW_EXPERIMENTAL_API
 	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
 #endif
 };
 
@@ -95,6 +96,20 @@ struct rte_ring_rts_headtail {
 	volatile union __rte_ring_rts_poscnt head;
 };
 
+union __rte_ring_hts_pos {
+	/** raw 8B value to read/write *head* and *tail* as one atomic op */
+	uint64_t raw __rte_aligned(8);
+	struct {
+		uint32_t head; /**< head position */
+		uint32_t tail; /**< tail position */
+	} pos;
+};
+
+struct rte_ring_hts_headtail {
+	volatile union __rte_ring_hts_pos ht;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+};
+
 /**
  * An RTE ring structure.
  *
@@ -126,6 +141,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail prod;
+		struct rte_ring_hts_headtail hts_prod;
 		struct rte_ring_rts_headtail rts_prod;
 	}  __rte_cache_aligned;
 
@@ -135,6 +151,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail cons;
+		struct rte_ring_hts_headtail hts_cons;
 		struct rte_ring_rts_headtail rts_cons;
 	}  __rte_cache_aligned;
 
@@ -157,6 +174,9 @@ struct rte_ring {
 #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
 #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
 
+#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP HTS". */
+#define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC HTS". */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 4030753b6..492eef936 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -81,6 +81,9 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
  *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
  *        is "multi-producer RTS mode".
+ *      - RING_F_MP_HTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer HTS mode".
  *     If none of these flags is set, then default "multi-producer"
  *     behavior is selected.
  *   - One of mutually exclusive flags that define consumer behavior:
@@ -90,6 +93,9 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
  *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
  *        is "multi-consumer RTS mode".
+ *      - RING_F_MC_HTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer HTS mode".
  *     If none of these flags is set, then default "multi-consumer"
  *     behavior is selected.
  * @return
@@ -541,6 +547,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 }
 
 #ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_hts.h>
 #include <rte_ring_rts.h>
 #endif
 
@@ -585,6 +592,9 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
 #endif
 	}
 
@@ -766,6 +776,9 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
@@ -951,6 +964,9 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
 			n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
 #endif
 	}
 
@@ -1060,6 +1076,9 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
new file mode 100644
index 000000000..c7701defc
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -0,0 +1,332 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_H_
+#define _RTE_RING_HTS_H_
+
+/**
+ * @file rte_ring_hts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
+ * In that mode enqueue/dequeue operation is fully serialized:
+ * at any given moment only one enqueue/dequeue operation can proceed.
+ * This is achieved by allowing a thread to proceed with changing head.value
+ * only when head.value == tail.value.
+ * Both head and tail values are updated atomically (as one 64-bit value).
+ * To achieve that 64-bit CAS is used by head update routine.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_c11_mem.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue_elem(struct rte_ring *r, const void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_prod, head, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue_elem(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_cons, head, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_c11_mem.h b/lib/librte_ring/rte_ring_hts_c11_mem.h
new file mode 100644
index 000000000..16e54b6ff
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_c11_mem.h
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_C11_MEM_H_
+#define _RTE_RING_HTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_hts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+/**
+ * @internal update tail with new value.
+ */
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t old_tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t tail;
+
+	RTE_SET_USED(enqueue);
+
+	tail = old_tail + num;
+	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union __rte_ring_hts_pos *p)
+{
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union __rte_ring_hts_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/*
+		 * wait for tail to be equal to head,
+		 * make sure that we read prod head/tail *before*
+		 * reading cons tail.
+		 */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of cons tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union __rte_ring_hts_pos np, op;
+
+	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/*
+		 * wait for tail to be equal to head,
+		 * make sure that we read cons head/tail *before*
+		 * reading prod tail.
+		 */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of prod tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 06/10] test/ring: add contention stress test for HTS ring
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
                               ` (4 preceding siblings ...)
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 05/10] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 07/10] ring: introduce peek style API Konstantin Ananyev
                               ` (4 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test HTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_hts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_hts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 00b74b5c9..28f0b9ac2 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
+SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 97ad822c1..20c4978c2 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
+	'test_ring_hts_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_hts_stress.c b/app/test/test_ring_hts_stress.c
new file mode 100644
index 000000000..edc9175cb
--- /dev/null
+++ b/app/test/test_ring_hts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_hts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_hts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_hts_stress = {
+	.name = "MT_HTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index eab395e30..29a1368d7 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -43,6 +43,9 @@ test_ring_stress(void)
 	n += test_ring_rts_stress.nb_case;
 	k += run_test(&test_ring_rts_stress);
 
+	n += test_ring_hts_stress.nb_case;
+	k += run_test(&test_ring_hts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 32aae2072..9a87c7f7b 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -34,3 +34,4 @@ struct test {
 
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
+extern const struct test test_ring_hts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 07/10] ring: introduce peek style API
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
                               ` (5 preceding siblings ...)
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 06/10] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 08/10] test/ring: add stress test for MT peek API Konstantin Ananyev
                               ` (3 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
mode, provide an ability to split enqueue/dequeue operation
into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
That allows user to inspect objects in the ring without removing
them from it (aka MT safe peek).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/rel_notes/release_20_05.rst  |  11 +-
 lib/librte_ring/Makefile                |   2 +
 lib/librte_ring/meson.build             |   2 +
 lib/librte_ring/rte_ring.h              |   3 +
 lib/librte_ring/rte_ring_elem.h         |   4 +
 lib/librte_ring/rte_ring_peek.h         | 444 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_peek_c11_mem.h | 110 ++++++
 7 files changed, 575 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_peek_c11_mem.h

diff --git a/doc/guides/rel_notes/release_20_05.rst b/doc/guides/rel_notes/release_20_05.rst
index db8a281db..ec558c64f 100644
--- a/doc/guides/rel_notes/release_20_05.rst
+++ b/doc/guides/rel_notes/release_20_05.rst
@@ -81,13 +81,22 @@ New Features
   by making use of the event device capabilities. The event mode currently supports
   only inline IPsec protocol offload.
 
-* **New synchronization modes for rte_ring.**
+* **Added new API for rte_ring.**
+
+  * New synchronization modes for rte_ring.
 
   Introduced new optional MT synchronization modes for rte_ring:
   Relaxed Tail Sync (RTS) mode and Head/Tail Sync (HTS) mode.
   With these mode selected, rte_ring shows significant improvements for
   average enqueue/dequeue times on overcommitted systems.
 
+  * Added peek style API for rte_ring.
+
+  For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
+  mode, provide an ability to split enqueue/dequeue operation into two phases
+  (enqueue/dequeue start; enqueue/dequeue finish). That allows user to inspect
+  objects in the ring without removing them from it (aka MT safe peek).
+
 
 Removed Items
 -------------
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index f75d8e530..83a9d0840 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,6 +22,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_c11_mem.h \
 					rte_ring_hts.h \
 					rte_ring_hts_c11_mem.h \
+					rte_ring_peek.h \
+					rte_ring_peek_c11_mem.h \
 					rte_ring_rts.h \
 					rte_ring_rts_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ca37cb8cc..4f77647cd 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -9,6 +9,8 @@ headers = files('rte_ring.h',
 		'rte_ring_generic.h',
 		'rte_ring_hts.h',
 		'rte_ring_hts_c11_mem.h',
+		'rte_ring_peek.h',
+		'rte_ring_peek_c11_mem.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_c11_mem.h')
 
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 7cf046528..86faede81 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -25,6 +25,9 @@
  * - Multi- or single-producer enqueue.
  * - Bulk dequeue.
  * - Bulk enqueue.
+ * - Ability to select different sync modes for producer/consumer.
+ * - Dequeue start/finish (depending on consumer sync modes).
+ * - Enqueue start/finish (depending on producer sync mode).
  *
  * Note: the ring implementation is not preemptible. Refer to Programmer's
  * guide/Environment Abstraction Layer/Multiple pthread/Known Issues/rte_ring
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 492eef936..a5a4c46f9 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -1089,6 +1089,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	return 0;
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_peek.h>
+#endif
+
 #include <rte_ring.h>
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
new file mode 100644
index 000000000..1ad8bba22
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek.h
@@ -0,0 +1,444 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_H_
+#define _RTE_RING_PEEK_H_
+
+/**
+ * @file
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Ring Peek API
+ * Introduction of rte_ring with serialized producer/consumer (HTS sync mode)
+ * makes possible to split public enqueue/dequeue API into two phases:
+ * - enqueue/dequeue start
+ * - enqueue/dequeue finish
+ * That allows user to inspect objects in the ring without removing them
+ * from it (aka MT safe peek).
+ * Note that right now this new API is avaialble only for two sync modes:
+ * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
+ * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
+ * It is a user responsibility to create/init ring with appropriate sync
+ * modes selected.
+ * As an example:
+ * // read 1 elem from the ring:
+ * n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
+ * if (n != 0) {
+ *    //examine object
+ *    if (object_examine(obj) == KEEP)
+ *       //decided to keep it in the ring.
+ *       rte_ring_dequeue_finish(ring, 0);
+ *    else
+ *       //decided to remove it from the ring.
+ *       rte_ring_dequeue_finish(ring, n);
+ * }
+ * Note that between _start_ and _finish_ none other thread can proceed
+ * with enqueue(/dequeue) operation till _finish_ completes.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_peek_c11_mem.h>
+
+/**
+ * @internal This function moves prod head value.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_start(struct rte_ring *r, uint32_t n,
+		enum rte_ring_queue_behavior behavior, uint32_t *free_space)
+{
+	uint32_t free, head, next;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &free);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_prod_head(r, n, behavior,
+			&head, &free);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_elem_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return rte_ring_enqueue_bulk_elem_start(r, n, free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_elem_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_elem_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_VARIABLE,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return rte_ring_enqueue_burst_elem_start(r, n, free_space);
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_elem_finish(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_st_set_head_tail(&r->prod, tail, n, 1);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_hts_set_head_tail(&r->hts_prod, tail, n, 1);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int n)
+{
+	rte_ring_enqueue_elem_finish(r, obj_table, sizeof(uintptr_t), n);
+}
+
+/**
+ * @internal This function moves cons head value and copies up to *n*
+ * objects from the ring to the user provided obj_table.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_start(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t avail, head, next;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_cons_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &avail);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_cons_head(r, n, behavior,
+			&head, &avail);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (n != 0)
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+
+	if (available != NULL)
+		*available = avail - n;
+	return n;
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem_start(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   Actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_bulk_elem_start(r, obj_table, sizeof(uintptr_t),
+		n, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_burst_elem_start(r, obj_table,
+		sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to remove from the ring.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_elem_finish(struct rte_ring *r, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->cons, &tail, n);
+		__rte_ring_st_set_head_tail(&r->cons, tail, n, 0);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_cons, &tail, n);
+		__rte_ring_hts_set_head_tail(&r->hts_cons, tail, n, 0);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to remove from the ring.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
+{
+	rte_ring_dequeue_elem_finish(r, n);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_PEEK_H_ */
diff --git a/lib/librte_ring/rte_ring_peek_c11_mem.h b/lib/librte_ring/rte_ring_peek_c11_mem.h
new file mode 100644
index 000000000..99321f124
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek_c11_mem.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_C11_MEM_H_
+#define _RTE_RING_PEEK_C11_MEM_H_
+
+/**
+ * @file rte_ring_peek_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for rte_ring peek API.
+ * For more information please refer to <rte_ring_peek.h>.
+ */
+
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	RTE_SET_USED(enqueue);
+
+	pos = tail + num;
+	ht->head = pos;
+	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+}
+
+/**
+ * @internal get current tail value.
+ * This function should be used only for producer/consumer in MT_HTS mode.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union __rte_ring_hts_pos p;
+
+	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * This function should be used only for producer/consumer in MT_HTS mode.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	union __rte_ring_hts_pos p;
+
+	RTE_SET_USED(enqueue);
+
+	p.pos.head = tail + num;
+	p.pos.tail = p.pos.head;
+
+	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+}
+
+#endif /* _RTE_RING_PEEK_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 08/10] test/ring: add stress test for MT peek API
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
                               ` (6 preceding siblings ...)
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 07/10] ring: introduce peek style API Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 09/10] test/ring: add functional tests for new sync modes Konstantin Ananyev
                               ` (2 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test MT peek API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile                |  1 +
 app/test/meson.build             |  1 +
 app/test/test_ring_peek_stress.c | 43 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c      |  3 +++
 app/test/test_ring_stress.h      |  1 +
 5 files changed, 49 insertions(+)
 create mode 100644 app/test/test_ring_peek_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 28f0b9ac2..631a21028 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -80,6 +80,7 @@ SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_peek_stress.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 20c4978c2..d15278cf9 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_hts_stress.c',
+	'test_ring_peek_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_peek_stress.c b/app/test/test_ring_peek_stress.c
new file mode 100644
index 000000000..cfc82d728
--- /dev/null
+++ b/app/test/test_ring_peek_stress.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+#include <rte_ring_elem.h>
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_start(r, obj, n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_finish(r, obj, n);
+	return n;
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_peek_stress = {
+	.name = "MT_PEEK",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 29a1368d7..853fcc190 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -46,6 +46,9 @@ test_ring_stress(void)
 	n += test_ring_hts_stress.nb_case;
 	k += run_test(&test_ring_hts_stress);
 
+	n += test_ring_peek_stress.nb_case;
+	k += run_test(&test_ring_peek_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 9a87c7f7b..60953ce47 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -35,3 +35,4 @@ struct test {
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
 extern const struct test test_ring_hts_stress;
+extern const struct test test_ring_peek_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 09/10] test/ring: add functional tests for new sync modes
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
                               ` (7 preceding siblings ...)
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 08/10] test/ring: add stress test for MT peek API Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 10/10] doc: update ring guide Konstantin Ananyev
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Extend test_ring_autotest with new test-cases for RTS/HTS sync modes.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_ring.c | 93 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 20 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fbcd109b1..e21557cd9 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -203,7 +203,8 @@ test_ring_negative_tests(void)
  * Random number of elements are enqueued and dequeued.
  */
 static int
-test_ring_burst_bulk_tests1(unsigned int api_type)
+test_ring_burst_bulk_tests1(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -213,12 +214,11 @@ test_ring_burst_bulk_tests1(unsigned int api_type)
 	const unsigned int rsz = RING_SIZE - 1;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -294,7 +294,8 @@ test_ring_burst_bulk_tests1(unsigned int api_type)
  * dequeued data.
  */
 static int
-test_ring_burst_bulk_tests2(unsigned int api_type)
+test_ring_burst_bulk_tests2(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -302,12 +303,11 @@ test_ring_burst_bulk_tests2(unsigned int api_type)
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -390,7 +390,8 @@ test_ring_burst_bulk_tests2(unsigned int api_type)
  * Enqueue and dequeue to cover the entire ring length.
  */
 static int
-test_ring_burst_bulk_tests3(unsigned int api_type)
+test_ring_burst_bulk_tests3(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -398,12 +399,11 @@ test_ring_burst_bulk_tests3(unsigned int api_type)
 	unsigned int i, j;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -465,7 +465,8 @@ test_ring_burst_bulk_tests3(unsigned int api_type)
  * Enqueue till the ring is full and dequeue till the ring becomes empty.
  */
 static int
-test_ring_burst_bulk_tests4(unsigned int api_type)
+test_ring_burst_bulk_tests4(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -474,12 +475,11 @@ test_ring_burst_bulk_tests4(unsigned int api_type)
 	unsigned int num_elems;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -815,7 +815,23 @@ test_ring_with_exact_size(void)
 static int
 test_ring(void)
 {
+	int32_t rc;
 	unsigned int i, j;
+	const char *tname;
+
+	static const struct {
+		uint32_t create_flags;
+		const char *name;
+	} test_sync_modes[] = {
+		{
+			RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ,
+			"Test MT_RTS ring",
+		},
+		{
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ,
+			"Test MT_HTS ring",
+		},
+	};
 
 	/* Negative test cases */
 	if (test_ring_negative_tests() < 0)
@@ -832,30 +848,67 @@ test_ring(void)
 	 * The test cases are split into smaller test cases to
 	 * help clang compile faster.
 	 */
+	tname = "Test standard ring";
+
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests1(i | j) < 0)
+			if (test_ring_burst_bulk_tests1(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests2(i | j) < 0)
+			if (test_ring_burst_bulk_tests2(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests3(i | j) < 0)
+			if (test_ring_burst_bulk_tests3(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests4(i | j) < 0)
+			if (test_ring_burst_bulk_tests4(i | j, 0, tname) < 0)
+				goto test_fail;
+
+	/* Burst and bulk operations with MT_RTS and MT_HTS sync modes */
+	for (i = 0; i != RTE_DIM(test_sync_modes); i++) {
+		for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST;
+				j <<= 1) {
+
+			rc = test_ring_burst_bulk_tests1(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+
+			rc = test_ring_burst_bulk_tests2(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
 				goto test_fail;
 
+			rc = test_ring_burst_bulk_tests3(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+
+			rc = test_ring_burst_bulk_tests3(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+		}
+	}
+
 	/* dump the ring status */
 	rte_ring_list_dump(stdout);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v6 10/10] doc: update ring guide
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
                               ` (8 preceding siblings ...)
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 09/10] test/ring: add functional tests for new sync modes Konstantin Ananyev
@ 2020-04-20 12:11             ` Konstantin Ananyev
  2020-04-20 13:47               ` David Marchand
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
  10 siblings, 1 reply; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:11 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Changed the rte_ring chapter in programmer's guide to reflect
the addition of new sync modes and peek style API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 doc/guides/prog_guide/ring_lib.rst | 95 ++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index 8cb2b2dd4..668e67ecb 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -349,6 +349,101 @@ even if only the first term of subtraction has overflowed:
     uint32_t entries = (prod_tail - cons_head);
     uint32_t free_entries = (mask + cons_tail -prod_head);
 
+Producer/consumer synchronization modes
+---------------------------------------
+
+rte_ring supports different synchronization modes for porducer and consumers.
+These modes can be specified at ring creation/init time via ``flags`` parameter.
+That should help  user to configure ring in way most suitable for his
+specific usage scenarios.
+Currently supported modes:
+
+MP/MC (default one)
+~~~~~~~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) mode. This is a default enqueue (/dequeue)
+mode for the ring. In this mode multiple threads can enqueue (/dequeue) 
+objects to (/from) the ring. For 'classic' DPDK deployments (with one thread
+per core) this is usually most suitable and fastest synchronization mode.
+As a well known limitaion - it can perform quite pure on some overcommitted
+scenarios.
+
+SP/SC 
+~~~~~
+Single-producer (/single-consumer) mode. In this mode only one thread at a time
+is allowed to enqueue (/dequeue) objects to (/from) the ring.
+
+MP_RTS/MC_RTS
+~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode.
+The main difference from original MP/MC algorithm is that
+tail value is increased not by every thread that finished enqueue/dequeue,
+but only by the last one.
+That allows threads to avoid spinning on ring tail value,
+leaving actual tail value change to the last thread at a given instance.
+That technique helps to avoid Lock-Waiter-Preemtion (LWP) problem on tail
+update and improves average enqueue/dequeue times on overcommitted systems.
+To achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+one for head update, second for tail update.
+In comparison original MP/MC algorithm requires one 32-bit CAS
+for head update and waiting/spinning on tail value.
+
+MP_HTS/MC_HTS 
+~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) with Head/Tail Sync (HTS) mode.
+In that mode enqueue/dequeue operation is fully serialized:
+at any given moment only one enqueue/dequeue operation can proceed.
+This is achieved by allowing a thread to proceed with changing ``head.value``
+only when ``head.value == tail.value``.
+Both head and tail values are updated atomically (as one 64-bit value).
+To achieve that 64-bit CAS is used by head update routine.
+That technique also avoids Lock-Waiter-Preemtion (LWP) problem on tail
+update and helps to improve ring enqueue/dequeue behavior in overcommitted
+scenarios. Another advantage of fully serialized producer/consumer -
+it provides ability to implement MT safe peek API for rte_ring.
+
+
+Ring Peek API
+-------------
+
+For ring with serialized producer/consumer (HTS sync mode) it is  possible
+to split public enqueue/dequeue API into two phases:
+
+*   enqueue/dequeue start
+
+*   enqueue/dequeue finish
+
+That allows user to inspect objects in the ring without removing them
+from it (aka MT safe peek) and reserve space for the objects in the ring
+before actual enqueue.
+Note that this API is available only for two sync modes:
+
+*   Single Producer/Single Consumer (SP/SC)
+
+*   Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
+
+It is a user responsibility to create/init ring with appropriate sync modes
+selected. As an example of usage:
+
+.. code-block:: c
+
+    /* read 1 elem from the ring: */
+    uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
+    if (n != 0) {
+        /* examine object */
+        if (object_examine(obj) == KEEP)
+            /* decided to keep it in the ring. */
+            rte_ring_dequeue_finish(ring, 0);
+        else
+            /* decided to remove it from the ring. */
+            rte_ring_dequeue_finish(ring, n);
+    }
+
+Note that between ``_start_`` and ``_finish_`` none other thread can proceed
+with enqueue(/dequeue) operation till ``_finish_`` completes.
+
 References
 ----------
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 00/10] New sync modes for ring
  2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
                               ` (9 preceding siblings ...)
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 10/10] doc: update ring guide Konstantin Ananyev
@ 2020-04-20 12:28             ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 01/10] test/ring: add contention stress test Konstantin Ananyev
                                 ` (10 more replies)
  10 siblings, 11 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

This patch series depends on following patch:
"meson: add libatomic as a global dependency for i686 clang"
(http://patches.dpdk.org/patch/68876/)

V6 - V7:
1. fix checkpatch issues

V5 - V6:
1. add dependency on the external patch (-latomic for i686 clang)
2. remove unneeded code from rte_ring_generic (Honnappa)
3. extra comments for ring init/create API (Honnappa)
4. __rte prefix for internal structs  (Honnappa)
5. update docs (rel notes and prog guide)

V4 - V5:
1. fix i686 clang build problem
2. fix formal API comments

V3 - V4 changes:
Address comments from Honnappa:
1. for new sync modes make legacy API wrappers around _elem_ calls
2. remove rte_ring_(hts|rts)_generic.h
3. few changes in C11 version
4. peek API - add missing functions for _elem_
5. remove _IS_SP/_IS_MP, etc. internal macros
6. fix param types (obj_table) for _elem_functions
7. fix formal API comments
8. deduplicate code for test_ring_stress
9. added functional tests for new sync modes

V2 - V3 changes:
1. few more compilation fixes (for gcc 4.8.X)
2. extra update devtools/libabigail.abignore (workaround) 

V1 - V2 changes:
1. fix compilation issues
2. add C11 atomics support
3. updates devtools/libabigail.abignore (workaround)

RFC - V1 changes:
1. remove ABI brekage (at least I hope I did)
2. Add support for ring_elem
3. rework peek related API a bit
4. rework test to make it less verbose and unite all test-cases
   in one command
5. add new test-case for MT peek API

These days more and more customers use(/try to use) DPDK based apps within
overcommitted systems (multiple acttive threads over same pysical cores):
VM, container deployments, etc.
One quite common problem they hit:
Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
LHP is quite a common problem for spin-based sync primitives
(spin-locks, etc.) on overcommitted systems.
The situation gets much worse when some sort of
fair-locking technique is used (ticket-lock, etc.).
As now not only lock-owner but also lock-waiters scheduling
order matters a lot (LWP).
These two problems are well-known for kernel within VMs:
http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
The problem with rte_ring is that while head accusion is sort of
un-fair locking, waiting on tail is very similar to ticket lock schema -
tail has to be updated in particular order.
That makes current rte_ring implementation to perform
really pure on some overcommited scenarios.
It is probably not possible to completely resolve LHP problem in
userspace only (without some kernel communication/intervention).
But removing fairness at tail update helps to avoid LWP and
can mitigate the situation significantly.
This patch proposes two new optional ring synchronization modes:
1) Head/Tail Sync (HTS) mode
In that mode enqueue/dequeue operation is fully serialized:
    only one thread at a time is allowed to perform given op.
    As another enhancement provide ability to split enqueue/dequeue
    operation into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
    That allows user to inspect objects in the ring without removing
    them from it (aka MT safe peek).
2) Relaxed Tail Sync (RTS)
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Note that these new sync modes are optional.
For current rte_ring users nothing should change
(both in terms of API/ABI and performance).
Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
way (via flags and _init_), and MP/MC remains as default one.
The only thing that changed:
Format of prod/cons now could differ depending on mode selected at _init_.
So user has to stick with one sync model through whole ring lifetime.
In other words, user can't create a ring for let say SP mode and then
in the middle of data-path change his mind and start using MP_RTS mode.
For existing modes (SP/MP, SC/MC) format remains the same and
user can still use them interchangeably, though of course it is an
error prone practice.

Test results on IA (see below) show significant improvements
for average enqueue/dequeue op times on overcommitted systems.
For 'classic' DPDK deployments (one thread per core) original MP/MC
algorithm still shows best numbers, though for 64-bit target
RTS numbers are not that far away.
Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'

X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj
                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51

2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68

8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12

i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
DEQ+ENQ average cycles/obj
                                                MP/MC      HTS     RTS
1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91

2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65

8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

Konstantin Ananyev (10):
  test/ring: add contention stress test
  ring: prepare ring to allow new sync schemes
  ring: introduce RTS ring mode
  test/ring: add contention stress test for RTS ring
  ring: introduce HTS ring mode
  test/ring: add contention stress test for HTS ring
  ring: introduce peek style API
  test/ring: add stress test for MT peek API
  test/ring: add functional tests for new sync modes
  doc: update ring guide

 app/test/Makefile                       |   5 +
 app/test/meson.build                    |   5 +
 app/test/test_pdump.c                   |   6 +-
 app/test/test_ring.c                    |  93 +++--
 app/test/test_ring_hts_stress.c         |  32 ++
 app/test/test_ring_mpmc_stress.c        |  31 ++
 app/test/test_ring_peek_stress.c        |  43 +++
 app/test/test_ring_rts_stress.c         |  32 ++
 app/test/test_ring_stress.c             |  57 +++
 app/test/test_ring_stress.h             |  38 ++
 app/test/test_ring_stress_impl.h        | 396 +++++++++++++++++++++
 devtools/libabigail.abignore            |   7 +
 doc/guides/prog_guide/ring_lib.rst      |  95 +++++
 doc/guides/rel_notes/release_20_05.rst  |  16 +
 lib/librte_pdump/rte_pdump.c            |   2 +-
 lib/librte_port/rte_port_ring.c         |  12 +-
 lib/librte_ring/Makefile                |   9 +-
 lib/librte_ring/meson.build             |  12 +-
 lib/librte_ring/rte_ring.c              | 114 +++++-
 lib/librte_ring/rte_ring.h              | 306 ++++++++++------
 lib/librte_ring/rte_ring_core.h         | 184 ++++++++++
 lib/librte_ring/rte_ring_elem.h         | 171 +++++++--
 lib/librte_ring/rte_ring_hts.h          | 332 ++++++++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h  | 164 +++++++++
 lib/librte_ring/rte_ring_peek.h         | 444 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_peek_c11_mem.h | 110 ++++++
 lib/librte_ring/rte_ring_rts.h          | 439 +++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h  | 179 ++++++++++
 28 files changed, 3142 insertions(+), 192 deletions(-)
 create mode 100644 app/test/test_ring_hts_stress.c
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_peek_stress.c
 create mode 100644 app/test/test_ring_rts_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h
 create mode 100644 lib/librte_ring/rte_ring_core.h
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_peek_c11_mem.h
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 01/10] test/ring: add contention stress test
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 02/10] ring: prepare ring to allow new sync schemes Konstantin Ananyev
                                 ` (9 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce stress test for ring enqueue/dequeue operations.
Performs the following pattern on each slave worker:
dequeue/read-write data from the dequeued objects/enqueue.
Serves as both functional and performance test of ring
enqueue/dequeue operations under high contention
(for both over committed and non-over committed scenarios).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile                |   2 +
 app/test/meson.build             |   2 +
 app/test/test_ring_mpmc_stress.c |  31 +++
 app/test/test_ring_stress.c      |  48 ++++
 app/test/test_ring_stress.h      |  35 +++
 app/test/test_ring_stress_impl.h | 396 +++++++++++++++++++++++++++++++
 6 files changed, 514 insertions(+)
 create mode 100644 app/test/test_ring_mpmc_stress.c
 create mode 100644 app/test/test_ring_stress.c
 create mode 100644 app/test/test_ring_stress.h
 create mode 100644 app/test/test_ring_stress_impl.h

diff --git a/app/test/Makefile b/app/test/Makefile
index be53d33c3..a23a011df 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,7 +77,9 @@ SRCS-y += test_external_mem.c
 SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
+SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index 04b59cffa..8824f366c 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -100,7 +100,9 @@ test_sources = files('commands.c',
 	'test_rib.c',
 	'test_rib6.c',
 	'test_ring.c',
+	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
 	'test_service_cores.c',
diff --git a/app/test/test_ring_mpmc_stress.c b/app/test/test_ring_mpmc_stress.c
new file mode 100644
index 000000000..1524b0248
--- /dev/null
+++ b/app/test/test_ring_mpmc_stress.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num, 0);
+}
+
+const struct test test_ring_mpmc_stress = {
+	.name = "MP/MC",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
new file mode 100644
index 000000000..60706f799
--- /dev/null
+++ b/app/test/test_ring_stress.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+static int
+run_test(const struct test *test)
+{
+	int32_t rc;
+	uint32_t i, k;
+
+	for (i = 0, k = 0; i != test->nb_case; i++) {
+
+		printf("TEST-CASE %s %s START\n",
+			test->name, test->cases[i].name);
+
+		rc = test->cases[i].func(test->cases[i].wfunc);
+		k += (rc == 0);
+
+		if (rc != 0)
+			printf("TEST-CASE %s %s FAILED\n",
+				test->name, test->cases[i].name);
+		else
+			printf("TEST-CASE %s %s OK\n",
+				test->name, test->cases[i].name);
+	}
+
+	return k;
+}
+
+static int
+test_ring_stress(void)
+{
+	uint32_t n, k;
+
+	n = 0;
+	k = 0;
+
+	n += test_ring_mpmc_stress.nb_case;
+	k += run_test(&test_ring_mpmc_stress);
+
+	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+		n, k, n - k);
+	return (k != n);
+}
+
+REGISTER_TEST_COMMAND(ring_stress_autotest, test_ring_stress);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
new file mode 100644
index 000000000..60eac6216
--- /dev/null
+++ b/app/test/test_ring_stress.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <stdalign.h>
+#include <string.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <rte_ring.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_pause.h>
+#include <rte_random.h>
+#include <rte_malloc.h>
+#include <rte_spinlock.h>
+
+#include "test.h"
+
+struct test_case {
+	const char *name;
+	int (*func)(int (*)(void *));
+	int (*wfunc)(void *arg);
+};
+
+struct test {
+	const char *name;
+	uint32_t nb_case;
+	const struct test_case *cases;
+};
+
+extern const struct test test_ring_mpmc_stress;
diff --git a/app/test/test_ring_stress_impl.h b/app/test/test_ring_stress_impl.h
new file mode 100644
index 000000000..222d62bc4
--- /dev/null
+++ b/app/test/test_ring_stress_impl.h
@@ -0,0 +1,396 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress.h"
+
+/**
+ * Stress test for ring enqueue/dequeue operations.
+ * Performs the following pattern on each slave worker:
+ * dequeue/read-write data from the dequeued objects/enqueue.
+ * Serves as both functional and performance test of ring
+ * enqueue/dequeue operations under high contention
+ * (for both over committed and non-over committed scenarios).
+ */
+
+#define RING_NAME	"RING_STRESS"
+#define BULK_NUM	32
+#define RING_SIZE	(2 * BULK_NUM * RTE_MAX_LCORE)
+
+enum {
+	WRK_CMD_STOP,
+	WRK_CMD_RUN,
+};
+
+static volatile uint32_t wrk_cmd __rte_cache_aligned;
+
+/* test run-time in seconds */
+static const uint32_t run_time = 60;
+static const uint32_t verbose;
+
+struct lcore_stat {
+	uint64_t nb_cycle;
+	struct {
+		uint64_t nb_call;
+		uint64_t nb_obj;
+		uint64_t nb_cycle;
+		uint64_t max_cycle;
+		uint64_t min_cycle;
+	} op;
+};
+
+struct lcore_arg {
+	struct rte_ring *rng;
+	struct lcore_stat stats;
+} __rte_cache_aligned;
+
+struct ring_elem {
+	uint32_t cnt[RTE_CACHE_LINE_SIZE / sizeof(uint32_t)];
+} __rte_cache_aligned;
+
+/*
+ * redefinable functions
+ */
+static uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail);
+
+static uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free);
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num);
+
+
+static void
+lcore_stat_update(struct lcore_stat *ls, uint64_t call, uint64_t obj,
+	uint64_t tm, int32_t prcs)
+{
+	ls->op.nb_call += call;
+	ls->op.nb_obj += obj;
+	ls->op.nb_cycle += tm;
+	if (prcs) {
+		ls->op.max_cycle = RTE_MAX(ls->op.max_cycle, tm);
+		ls->op.min_cycle = RTE_MIN(ls->op.min_cycle, tm);
+	}
+}
+
+static void
+lcore_op_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+
+	ms->op.nb_call += ls->op.nb_call;
+	ms->op.nb_obj += ls->op.nb_obj;
+	ms->op.nb_cycle += ls->op.nb_cycle;
+	ms->op.max_cycle = RTE_MAX(ms->op.max_cycle, ls->op.max_cycle);
+	ms->op.min_cycle = RTE_MIN(ms->op.min_cycle, ls->op.min_cycle);
+}
+
+static void
+lcore_stat_aggr(struct lcore_stat *ms, const struct lcore_stat *ls)
+{
+	ms->nb_cycle = RTE_MAX(ms->nb_cycle, ls->nb_cycle);
+	lcore_op_stat_aggr(ms, ls);
+}
+
+static void
+lcore_stat_dump(FILE *f, uint32_t lc, const struct lcore_stat *ls)
+{
+	long double st;
+
+	st = (long double)rte_get_timer_hz() / US_PER_S;
+
+	if (lc == UINT32_MAX)
+		fprintf(f, "%s(AGGREGATE)={\n", __func__);
+	else
+		fprintf(f, "%s(lcore=%u)={\n", __func__, lc);
+
+	fprintf(f, "\tnb_cycle=%" PRIu64 "(%.2Lf usec),\n",
+		ls->nb_cycle, (long double)ls->nb_cycle / st);
+
+	fprintf(f, "\tDEQ+ENQ={\n");
+
+	fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ls->op.nb_call);
+	fprintf(f, "\t\tnb_obj=%" PRIu64 ",\n", ls->op.nb_obj);
+	fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ls->op.nb_cycle);
+	fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_obj / ls->op.nb_call);
+	fprintf(f, "\t\tcycles/obj(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_obj);
+	fprintf(f, "\t\tcycles/call(avg): %.2Lf\n",
+		(long double)ls->op.nb_cycle / ls->op.nb_call);
+
+	/* if min/max cycles per call stats was collected */
+	if (ls->op.min_cycle != UINT64_MAX) {
+		fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.max_cycle,
+			(long double)ls->op.max_cycle / st);
+		fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+			ls->op.min_cycle,
+			(long double)ls->op.min_cycle / st);
+	}
+
+	fprintf(f, "\t},\n");
+	fprintf(f, "};\n");
+}
+
+static void
+fill_ring_elm(struct ring_elem *elm, uint32_t fill)
+{
+	uint32_t i;
+
+	for (i = 0; i != RTE_DIM(elm->cnt); i++)
+		elm->cnt[i] = fill;
+}
+
+static int32_t
+check_updt_elem(struct ring_elem *elm[], uint32_t num,
+	const struct ring_elem *check, const struct ring_elem *fill)
+{
+	uint32_t i;
+
+	static rte_spinlock_t dump_lock;
+
+	for (i = 0; i != num; i++) {
+		if (memcmp(check, elm[i], sizeof(*check)) != 0) {
+			rte_spinlock_lock(&dump_lock);
+			printf("%s(lc=%u, num=%u) failed at %u-th iter, "
+				"offending object: %p\n",
+				__func__, rte_lcore_id(), num, i, elm[i]);
+			rte_memdump(stdout, "expected", check, sizeof(*check));
+			rte_memdump(stdout, "result", elm[i], sizeof(elm[i]));
+			rte_spinlock_unlock(&dump_lock);
+			return -EINVAL;
+		}
+		memcpy(elm[i], fill, sizeof(*elm[i]));
+	}
+
+	return 0;
+}
+
+static int
+check_ring_op(uint32_t exp, uint32_t res, uint32_t lc,
+	const char *fname, const char *opname)
+{
+	if (exp != res) {
+		printf("%s(lc=%u) failure: %s expected: %u, returned %u\n",
+			fname, lc, opname, exp, res);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static int
+test_worker(void *arg, const char *fname, int32_t prcs)
+{
+	int32_t rc;
+	uint32_t lc, n, num;
+	uint64_t cl, tm0, tm1;
+	struct lcore_arg *la;
+	struct ring_elem def_elm, loc_elm;
+	struct ring_elem *obj[2 * BULK_NUM];
+
+	la = arg;
+	lc = rte_lcore_id();
+
+	fill_ring_elm(&def_elm, UINT32_MAX);
+	fill_ring_elm(&loc_elm, lc);
+
+	while (wrk_cmd != WRK_CMD_RUN) {
+		rte_smp_rmb();
+		rte_pause();
+	}
+
+	cl = rte_rdtsc_precise();
+
+	do {
+		/* num in interval [7/8, 11/8] of BULK_NUM */
+		num = 7 * BULK_NUM / 8 + rte_rand() % (BULK_NUM / 2);
+
+		/* reset all pointer values */
+		memset(obj, 0, sizeof(obj));
+
+		/* dequeue num elems */
+		tm0 = (prcs != 0) ? rte_rdtsc_precise() : 0;
+		n = _st_ring_dequeue_bulk(la->rng, (void **)obj, num, NULL);
+		tm0 = (prcs != 0) ? rte_rdtsc_precise() - tm0 : 0;
+
+		/* check return value and objects */
+		rc = check_ring_op(num, n, lc, fname,
+			RTE_STR(_st_ring_dequeue_bulk));
+		if (rc == 0)
+			rc = check_updt_elem(obj, num, &def_elm, &loc_elm);
+		if (rc != 0)
+			break;
+
+		/* enqueue num elems */
+		rte_compiler_barrier();
+		rc = check_updt_elem(obj, num, &loc_elm, &def_elm);
+		if (rc != 0)
+			break;
+
+		tm1 = (prcs != 0) ? rte_rdtsc_precise() : 0;
+		n = _st_ring_enqueue_bulk(la->rng, (void **)obj, num, NULL);
+		tm1 = (prcs != 0) ? rte_rdtsc_precise() - tm1 : 0;
+
+		/* check return value */
+		rc = check_ring_op(num, n, lc, fname,
+			RTE_STR(_st_ring_enqueue_bulk));
+		if (rc != 0)
+			break;
+
+		lcore_stat_update(&la->stats, 1, num, tm0 + tm1, prcs);
+
+	} while (wrk_cmd == WRK_CMD_RUN);
+
+	cl = rte_rdtsc_precise() - cl;
+	if (prcs == 0)
+		lcore_stat_update(&la->stats, 0, 0, cl, 0);
+	la->stats.nb_cycle = cl;
+	return rc;
+}
+static int
+test_worker_prcs(void *arg)
+{
+	return test_worker(arg, __func__, 1);
+}
+
+static int
+test_worker_avg(void *arg)
+{
+	return test_worker(arg, __func__, 0);
+}
+
+static void
+mt1_fini(struct rte_ring *rng, void *data)
+{
+	rte_free(rng);
+	rte_free(data);
+}
+
+static int
+mt1_init(struct rte_ring **rng, void **data, uint32_t num)
+{
+	int32_t rc;
+	size_t sz;
+	uint32_t i, nr;
+	struct rte_ring *r;
+	struct ring_elem *elm;
+	void *p;
+
+	*rng = NULL;
+	*data = NULL;
+
+	sz = num * sizeof(*elm);
+	elm = rte_zmalloc(NULL, sz, __alignof__(*elm));
+	if (elm == NULL) {
+		printf("%s: alloc(%zu) for %u elems data failed",
+			__func__, sz, num);
+		return -ENOMEM;
+	}
+
+	*data = elm;
+
+	/* alloc ring */
+	nr = 2 * num;
+	sz = rte_ring_get_memsize(nr);
+	r = rte_zmalloc(NULL, sz, __alignof__(*r));
+	if (r == NULL) {
+		printf("%s: alloc(%zu) for FIFO with %u elems failed",
+			__func__, sz, nr);
+		return -ENOMEM;
+	}
+
+	*rng = r;
+
+	rc = _st_ring_init(r, RING_NAME, nr);
+	if (rc != 0) {
+		printf("%s: _st_ring_init(%p, %u) failed, error: %d(%s)\n",
+			__func__, r, nr, rc, strerror(-rc));
+		return rc;
+	}
+
+	for (i = 0; i != num; i++) {
+		fill_ring_elm(elm + i, UINT32_MAX);
+		p = elm + i;
+		if (_st_ring_enqueue_bulk(r, &p, 1, NULL) != 1)
+			break;
+	}
+
+	if (i != num) {
+		printf("%s: _st_ring_enqueue(%p, %u) returned %u\n",
+			__func__, r, num, i);
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
+static int
+test_mt1(int (*test)(void *))
+{
+	int32_t rc;
+	uint32_t lc, mc;
+	struct rte_ring *r;
+	void *data;
+	struct lcore_arg arg[RTE_MAX_LCORE];
+
+	static const struct lcore_stat init_stat = {
+		.op.min_cycle = UINT64_MAX,
+	};
+
+	rc = mt1_init(&r, &data, RING_SIZE);
+	if (rc != 0) {
+		mt1_fini(r, data);
+		return rc;
+	}
+
+	memset(arg, 0, sizeof(arg));
+
+	/* launch on all slaves */
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		arg[lc].rng = r;
+		arg[lc].stats = init_stat;
+		rte_eal_remote_launch(test, &arg[lc], lc);
+	}
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_RUN;
+	rte_smp_wmb();
+
+	usleep(run_time * US_PER_S);
+
+	/* signal worker to start test */
+	wrk_cmd = WRK_CMD_STOP;
+	rte_smp_wmb();
+
+	/* wait for slaves and collect stats. */
+	mc = rte_lcore_id();
+	arg[mc].stats = init_stat;
+
+	rc = 0;
+	RTE_LCORE_FOREACH_SLAVE(lc) {
+		rc |= rte_eal_wait_lcore(lc);
+		lcore_stat_aggr(&arg[mc].stats, &arg[lc].stats);
+		if (verbose != 0)
+			lcore_stat_dump(stdout, lc, &arg[lc].stats);
+	}
+
+	lcore_stat_dump(stdout, UINT32_MAX, &arg[mc].stats);
+	mt1_fini(r, data);
+	return rc;
+}
+
+static const struct test_case tests[] = {
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-PRCS",
+		.func = test_mt1,
+		.wfunc = test_worker_prcs,
+	},
+	{
+		.name = "MT-WRK_ENQ_DEQ-MST_NONE-AVG",
+		.func = test_mt1,
+		.wfunc = test_worker_avg,
+	},
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 02/10] ring: prepare ring to allow new sync schemes
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 01/10] test/ring: add contention stress test Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 03/10] ring: introduce RTS ring mode Konstantin Ananyev
                                 ` (8 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

To make these preparations two main things are done:
- Change from *single* to *sync_type* to allow different
  synchronisation schemes to be applied.
  Mark *single* as deprecated in comments.
  Add new functions to allow user to query ring sync types.
  Replace direct access to *single* with appropriate function call.
- Move actual rte_ring and related structures definitions into a
  separate file: <rte_ring_core.h>. It allows to refer contents
  of <rte_ring_elem.h> from <rte_ring.h> without introducing a
  circular dependency.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_pdump.c           |   6 +-
 lib/librte_pdump/rte_pdump.c    |   2 +-
 lib/librte_port/rte_port_ring.c |  12 +--
 lib/librte_ring/Makefile        |   1 +
 lib/librte_ring/meson.build     |   1 +
 lib/librte_ring/rte_ring.c      |   6 +-
 lib/librte_ring/rte_ring.h      | 170 ++++++++++++++------------------
 lib/librte_ring/rte_ring_core.h | 132 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_elem.h |  42 +++-----
 9 files changed, 234 insertions(+), 138 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_core.h

diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index ad183184c..6a1180bcb 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -57,8 +57,7 @@ run_pdump_client_tests(void)
 	if (ret < 0)
 		return -1;
 	mp->flags = 0x0000;
-	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(),
-				      RING_F_SP_ENQ | RING_F_SC_DEQ);
+	ring_client = rte_ring_create("SR0", RING_SIZE, rte_socket_id(), 0);
 	if (ring_client == NULL) {
 		printf("rte_ring_create SR0 failed");
 		return -1;
@@ -71,9 +70,6 @@ run_pdump_client_tests(void)
 	}
 	rte_eth_dev_probing_finish(eth_dev);
 
-	ring_client->prod.single = 0;
-	ring_client->cons.single = 0;
-
 	printf("\n***** flags = RTE_PDUMP_FLAG_TX *****\n");
 
 	for (itr = 0; itr < NUM_ITR; itr++) {
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 8a01ac510..f96709f95 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -380,7 +380,7 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 		rte_errno = EINVAL;
 		return -1;
 	}
-	if (ring->prod.single || ring->cons.single) {
+	if (rte_ring_is_prod_single(ring) || rte_ring_is_cons_single(ring)) {
 		PDUMP_LOG(ERR, "ring with either SP or SC settings"
 		" is not valid for pdump, should have MP and MC settings\n");
 		rte_errno = EINVAL;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 47fcdd06a..52b2d8e55 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -44,8 +44,8 @@ rte_port_ring_reader_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->cons.single && is_multi) ||
-		(!(conf->ring->cons.single) && !is_multi)) {
+		(rte_ring_is_cons_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_cons_single(conf->ring) && !is_multi)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
 	}
@@ -171,8 +171,8 @@ rte_port_ring_writer_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
@@ -440,8 +440,8 @@ rte_port_ring_writer_nodrop_create_internal(void *params, int socket_id,
 	/* Check input parameters */
 	if ((conf == NULL) ||
 		(conf->ring == NULL) ||
-		(conf->ring->prod.single && is_multi) ||
-		(!(conf->ring->prod.single) && !is_multi) ||
+		(rte_ring_is_prod_single(conf->ring) && is_multi) ||
+		(!rte_ring_is_prod_single(conf->ring) && !is_multi) ||
 		(conf->tx_burst_sz > RTE_PORT_IN_BURST_SIZE_MAX)) {
 		RTE_LOG(ERR, PORT, "%s: Invalid Parameters\n", __func__);
 		return NULL;
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 28368e6d1..6572768c9 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -16,6 +16,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
+					rte_ring_core.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index 05402e4f0..c656781da 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -3,6 +3,7 @@
 
 sources = files('rte_ring.c')
 headers = files('rte_ring.h',
+		'rte_ring_core.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h')
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 77e5de099..fa5733907 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -106,8 +106,10 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.single = (flags & RING_F_SP_ENQ) ? __IS_SP : __IS_MP;
-	r->cons.single = (flags & RING_F_SC_DEQ) ? __IS_SC : __IS_MC;
+	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
+		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 18fc5d845..35ee4491c 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -36,91 +36,7 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_config.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-#include <rte_pause.h>
-
-#define RTE_TAILQ_RING_NAME "RTE_RING"
-
-enum rte_ring_queue_behavior {
-	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items from a ring */
-	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
-};
-
-#define RTE_RING_MZ_PREFIX "RG_"
-/** The maximum length of a ring name. */
-#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
-			   sizeof(RTE_RING_MZ_PREFIX) + 1)
-
-/* structure to hold a pair of head/tail values and other metadata */
-struct rte_ring_headtail {
-	volatile uint32_t head;  /**< Prod/consumer head. */
-	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t single;         /**< True if single prod/cons */
-};
-
-/**
- * An RTE ring structure.
- *
- * The producer and the consumer have a head and a tail index. The particularity
- * of these index is that they are not between 0 and size(ring). These indexes
- * are between 0 and 2^32, and we mask their value when we access the ring[]
- * field. Thanks to this assumption, we can do subtractions between 2 index
- * values in a modulo-32bit base: that's why the overflow of the indexes is not
- * a problem.
- */
-struct rte_ring {
-	/*
-	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
-	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
-	 * next time the ABI changes
-	 */
-	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned; /**< Name of the ring. */
-	int flags;               /**< Flags supplied at creation. */
-	const struct rte_memzone *memzone;
-			/**< Memzone, if any, containing the rte_ring */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t capacity;       /**< Usable size of ring */
-
-	char pad0 __rte_cache_aligned; /**< empty cache line */
-
-	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
-	char pad1 __rte_cache_aligned; /**< empty cache line */
-
-	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
-	char pad2 __rte_cache_aligned; /**< empty cache line */
-};
-
-#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
-#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-/**
- * Ring is to hold exactly requested number of entries.
- * Without this flag set, the ring size requested must be a power of 2, and the
- * usable space will be that size - 1. With the flag, the requested size will
- * be rounded up to the next power of two, but the usable space will be exactly
- * that requested. Worst case, if a power-of-2 size is requested, half the
- * ring space will be wasted.
- */
-#define RING_F_EXACT_SZ 0x0004
-#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
-
-/* @internal defines for passing to the enqueue dequeue worker functions */
-#define __IS_SP 1
-#define __IS_MP 0
-#define __IS_SC 1
-#define __IS_MC 0
+#include <rte_ring_core.h>
 
 /**
  * Calculate the memory size needed for a ring
@@ -420,7 +336,7 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MP, free_space);
+			RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -443,9 +359,13 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SP, free_space);
+			RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_elem.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -470,7 +390,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -554,7 +474,7 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_MC, available);
+			RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -578,7 +498,7 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			__IS_SC, available);
+			RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -605,7 +525,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 /**
@@ -777,6 +697,62 @@ rte_ring_get_capacity(const struct rte_ring *r)
 	return r->capacity;
 }
 
+/**
+ * Return sync type used by producer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_prod_sync_type(const struct rte_ring *r)
+{
+	return r->prod.sync_type;
+}
+
+/**
+ * Check is the ring for single producer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SP, zero otherwise.
+ */
+static inline int
+rte_ring_is_prod_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_prod_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
+/**
+ * Return sync type used by consumer in the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer sync type value.
+ */
+static inline enum rte_ring_sync_type
+rte_ring_get_cons_sync_type(const struct rte_ring *r)
+{
+	return r->cons.sync_type;
+}
+
+/**
+ * Check is the ring for single consumer.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   true if ring is SC, zero otherwise.
+ */
+static inline int
+rte_ring_is_cons_single(const struct rte_ring *r)
+{
+	return (rte_ring_get_cons_sync_type(r) == RTE_RING_SYNC_ST);
+}
+
 /**
  * Dump the status of all rings on the console
  *
@@ -820,7 +796,7 @@ rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -843,7 +819,7 @@ rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -870,7 +846,7 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.single, free_space);
+			r->prod.sync_type, free_space);
 }
 
 /**
@@ -898,7 +874,7 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -923,7 +899,7 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -951,7 +927,7 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 {
 	return __rte_ring_do_dequeue(r, obj_table, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
new file mode 100644
index 000000000..d9cef763f
--- /dev/null
+++ b/lib/librte_ring/rte_ring_core.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_CORE_H_
+#define _RTE_RING_CORE_H_
+
+/**
+ * @file
+ * This file contains definion of RTE ring structure itself,
+ * init flags and some related macros.
+ * For majority of DPDK entities, it is not recommended to include
+ * this file directly, use include <rte_ring.h> or <rte_ring_elem.h>
+ * instead.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_config.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+#include <rte_pause.h>
+#include <rte_debug.h>
+
+#define RTE_TAILQ_RING_NAME "RTE_RING"
+
+/** enqueue/dequeue behavior types */
+enum rte_ring_queue_behavior {
+	/** Enq/Deq a fixed number of items from a ring */
+	RTE_RING_QUEUE_FIXED = 0,
+	/** Enq/Deq as many items as possible from ring */
+	RTE_RING_QUEUE_VARIABLE
+};
+
+#define RTE_RING_MZ_PREFIX "RG_"
+/** The maximum length of a ring name. */
+#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_RING_MZ_PREFIX) + 1)
+
+/** prod/cons sync types */
+enum rte_ring_sync_type {
+	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
+	RTE_RING_SYNC_ST,     /**< single thread only */
+};
+
+/**
+ * structures to hold a pair of head/tail values and other metadata.
+ * Depending on sync_type format of that structure might be different,
+ * but offset for *sync_type* and *tail* values should remain the same.
+ */
+struct rte_ring_headtail {
+	volatile uint32_t head;      /**< prod/consumer head. */
+	volatile uint32_t tail;      /**< prod/consumer tail. */
+	RTE_STD_C11
+	union {
+		/** sync type of prod/cons */
+		enum rte_ring_sync_type sync_type;
+		/** deprecated -  True if single prod/cons */
+		uint32_t single;
+	};
+};
+
+/**
+ * An RTE ring structure.
+ *
+ * The producer and the consumer have a head and a tail index. The particularity
+ * of these index is that they are not between 0 and size(ring). These indexes
+ * are between 0 and 2^32, and we mask their value when we access the ring[]
+ * field. Thanks to this assumption, we can do subtractions between 2 index
+ * values in a modulo-32bit base: that's why the overflow of the indexes is not
+ * a problem.
+ */
+struct rte_ring {
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
+	 * next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE] __rte_cache_aligned;
+	/**< Name of the ring. */
+	int flags;               /**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t capacity;       /**< Usable size of ring */
+
+	char pad0 __rte_cache_aligned; /**< empty cache line */
+
+	/** Ring producer status. */
+	struct rte_ring_headtail prod __rte_cache_aligned;
+	char pad1 __rte_cache_aligned; /**< empty cache line */
+
+	/** Ring consumer status. */
+	struct rte_ring_headtail cons __rte_cache_aligned;
+	char pad2 __rte_cache_aligned; /**< empty cache line */
+};
+
+#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
+#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
+/**
+ * Ring is to hold exactly requested number of entries.
+ * Without this flag set, the ring size requested must be a power of 2, and the
+ * usable space will be that size - 1. With the flag, the requested size will
+ * be rounded up to the next power of two, but the usable space will be exactly
+ * that requested. Worst case, if a power-of-2 size is requested, half the
+ * ring space will be wasted.
+ */
+#define RING_F_EXACT_SZ 0x0004
+#define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_CORE_H_ */
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 663addc73..7406c0b0f 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -20,21 +20,7 @@
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <stdint.h>
-#include <string.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_config.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-#include <rte_pause.h>
-
-#include "rte_ring.h"
+#include <rte_ring_core.h>
 
 /**
  * @warning
@@ -510,7 +496,7 @@ rte_ring_mp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_MP, free_space);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -539,7 +525,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_SP, free_space);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -570,7 +556,7 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->prod.single, free_space);
+			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
 }
 
 /**
@@ -675,7 +661,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_FIXED, __IS_MC, available);
+				RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -703,7 +689,7 @@ rte_ring_sc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, __IS_SC, available);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -734,7 +720,7 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.single, available);
+			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
 }
 
 /**
@@ -842,7 +828,7 @@ rte_ring_mp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
 }
 
 /**
@@ -871,7 +857,7 @@ rte_ring_sp_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SP, free_space);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
 }
 
 /**
@@ -902,7 +888,7 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.single, free_space);
+			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
 }
 
 /**
@@ -934,7 +920,7 @@ rte_ring_mc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_MC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -963,7 +949,7 @@ rte_ring_sc_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, __IS_SC, available);
+			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
 }
 
 /**
@@ -995,9 +981,11 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
 				RTE_RING_QUEUE_VARIABLE,
-				r->cons.single, available);
+				r->cons.sync_type, available);
 }
 
+#include <rte_ring.h>
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 03/10] ring: introduce RTS ring mode
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 01/10] test/ring: add contention stress test Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 02/10] ring: prepare ring to allow new sync schemes Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 04/10] test/ring: add contention stress test for RTS ring Konstantin Ananyev
                                 ` (7 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce relaxed tail sync (RTS) mode for MT ring synchronization.
Aim to reduce stall times in case when ring is used on
overcommited cpus (multiple active threads on the same cpu).
The main difference from original MP/MC algorithm is that
tail value is increased not by every thread that finished enqueue/dequeue,
but only by the last one.
That allows threads to avoid spinning on ring tail value,
leaving actual tail value change to the last thread in the update queue.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---

This patch depends on following patch:
"meson: add libatomic as a global dependency for i686 clang"
(http://patches.dpdk.org/patch/68876/)

check-abi.sh reports what I believe is a false-positive about
ring cons/prod changes. As a workaround, devtools/libabigail.abignore is
updated to suppress *struct ring* related errors.

 devtools/libabigail.abignore           |   7 +
 doc/guides/rel_notes/release_20_05.rst |   7 +
 lib/librte_ring/Makefile               |   4 +-
 lib/librte_ring/meson.build            |   7 +-
 lib/librte_ring/rte_ring.c             | 100 +++++-
 lib/librte_ring/rte_ring.h             | 118 +++++--
 lib/librte_ring/rte_ring_core.h        |  36 +-
 lib/librte_ring/rte_ring_elem.h        | 114 ++++++-
 lib/librte_ring/rte_ring_rts.h         | 439 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_rts_c11_mem.h | 179 ++++++++++
 10 files changed, 963 insertions(+), 48 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_rts.h
 create mode 100644 lib/librte_ring/rte_ring_rts_c11_mem.h

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index a59df8f13..cd86d89ca 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -11,3 +11,10 @@
         type_kind = enum
         name = rte_crypto_asym_xform_type
         changed_enumerators = RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+; Ignore updates of ring prod/cons
+[suppress_type]
+        type_kind = struct
+        name = rte_ring
+[suppress_type]
+        type_kind = struct
+        name = rte_event_ring
diff --git a/doc/guides/rel_notes/release_20_05.rst b/doc/guides/rel_notes/release_20_05.rst
index 184967844..eedf960d0 100644
--- a/doc/guides/rel_notes/release_20_05.rst
+++ b/doc/guides/rel_notes/release_20_05.rst
@@ -81,6 +81,13 @@ New Features
   by making use of the event device capabilities. The event mode currently supports
   only inline IPsec protocol offload.
 
+* **New synchronization modes for rte_ring.**
+
+  Introduced new optional MT synchronization mode for rte_ring:
+  Relaxed Tail Sync (RTS). With this mode selected, rte_ring shows
+  significant improvements for average enqueue/dequeue times on
+  overcommitted systems.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 6572768c9..04e446e37 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -19,6 +19,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_core.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
-					rte_ring_c11_mem.h
+					rte_ring_c11_mem.h \
+					rte_ring_rts.h \
+					rte_ring_rts_c11_mem.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index c656781da..a95598032 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -6,4 +6,9 @@ headers = files('rte_ring.h',
 		'rte_ring_core.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
-		'rte_ring_generic.h')
+		'rte_ring_generic.h',
+		'rte_ring_rts.h',
+		'rte_ring_rts_c11_mem.h')
+
+# rte_ring_create_elem and rte_ring_get_memsize_elem are experimental
+allow_experimental_apis = true
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index fa5733907..222eec0fb 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -45,6 +45,9 @@ EAL_REGISTER_TAILQ(rte_ring_tailq)
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
+/* by default set head/tail distance as 1/8 of ring capacity */
+#define HTD_MAX_DEF	8
+
 /* return the size of memory occupied by a ring */
 ssize_t
 rte_ring_get_memsize_elem(unsigned int esize, unsigned int count)
@@ -79,11 +82,84 @@ rte_ring_get_memsize(unsigned int count)
 	return rte_ring_get_memsize_elem(sizeof(void *), count);
 }
 
+/*
+ * internal helper function to reset prod/cons head-tail values.
+ */
+static void
+reset_headtail(void *p)
+{
+	struct rte_ring_headtail *ht;
+	struct rte_ring_rts_headtail *ht_rts;
+
+	ht = p;
+	ht_rts = p;
+
+	switch (ht->sync_type) {
+	case RTE_RING_SYNC_MT:
+	case RTE_RING_SYNC_ST:
+		ht->head = 0;
+		ht->tail = 0;
+		break;
+	case RTE_RING_SYNC_MT_RTS:
+		ht_rts->head.raw = 0;
+		ht_rts->tail.raw = 0;
+		break;
+	default:
+		/* unknown sync mode */
+		RTE_ASSERT(0);
+	}
+}
+
 void
 rte_ring_reset(struct rte_ring *r)
 {
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+	reset_headtail(&r->prod);
+	reset_headtail(&r->cons);
+}
+
+/*
+ * helper function, calculates sync_type values for prod and cons
+ * based on input flags. Returns zero at success or negative
+ * errno value otherwise.
+ */
+static int
+get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
+	enum rte_ring_sync_type *cons_st)
+{
+	static const uint32_t prod_st_flags =
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+	static const uint32_t cons_st_flags =
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+
+	switch (flags & prod_st_flags) {
+	case 0:
+		*prod_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SP_ENQ:
+		*prod_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MP_RTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (flags & cons_st_flags) {
+	case 0:
+		*cons_st = RTE_RING_SYNC_MT;
+		break;
+	case RING_F_SC_DEQ:
+		*cons_st = RTE_RING_SYNC_ST;
+		break;
+	case RING_F_MC_RTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_RTS;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 int
@@ -100,16 +176,20 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_rts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_rts_headtail, tail.val.pos));
+
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
 	ret = strlcpy(r->name, name, sizeof(r->name));
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.sync_type = (flags & RING_F_SP_ENQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
-	r->cons.sync_type = (flags & RING_F_SC_DEQ) ?
-		RTE_RING_SYNC_ST : RTE_RING_SYNC_MT;
+	ret = get_sync_type(flags, &r->prod.sync_type, &r->cons.sync_type);
+	if (ret != 0)
+		return ret;
 
 	if (flags & RING_F_EXACT_SZ) {
 		r->size = rte_align32pow2(count + 1);
@@ -126,8 +206,12 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 		r->mask = count - 1;
 		r->capacity = r->mask;
 	}
-	r->prod.head = r->cons.head = 0;
-	r->prod.tail = r->cons.tail = 0;
+
+	/* set default values for head-tail distance */
+	if (flags & RING_F_MP_RTS_ENQ)
+		rte_ring_set_prod_htd_max(r, r->capacity / HTD_MAX_DEF);
+	if (flags & RING_F_MC_RTS_DEQ)
+		rte_ring_set_cons_htd_max(r, r->capacity / HTD_MAX_DEF);
 
 	return 0;
 }
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 35ee4491c..c42e1cfc4 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  *
- * Copyright (c) 2010-2017 Intel Corporation
+ * Copyright (c) 2010-2020 Intel Corporation
  * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
  * All rights reserved.
  * Derived from FreeBSD's bufring.h
@@ -79,12 +79,24 @@ ssize_t rte_ring_get_memsize(unsigned count);
  *   The number of elements in the ring (must be a power of 2).
  * @param flags
  *   An OR of the following:
- *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
- *      is "single-producer". Otherwise, it is "multi-producers".
- *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
- *      is "single-consumer". Otherwise, it is "multi-consumers".
+ *   - One of mutually exclusive flags that define producer behavior:
+ *      - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "single-producer".
+ *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer RTS mode".
+ *     If none of these flags is set, then default "multi-producer"
+ *     behavior is selected.
+ *   - One of mutually exclusive flags that define consumer behavior:
+ *      - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "single-consumer". Otherwise, it is "multi-consumers".
+ *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer RTS mode".
+ *     If none of these flags is set, then default "multi-consumer"
+ *     behavior is selected.
  * @return
  *   0 on success, or a negative value on error.
  */
@@ -114,12 +126,24 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  *   constraint for the reserved zone.
  * @param flags
  *   An OR of the following:
- *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
- *      is "single-producer". Otherwise, it is "multi-producers".
- *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
- *      is "single-consumer". Otherwise, it is "multi-consumers".
+ *   - One of mutually exclusive flags that define producer behavior:
+ *      - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "single-producer".
+ *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer RTS mode".
+ *     If none of these flags is set, then default "multi-producer"
+ *     behavior is selected.
+ *   - One of mutually exclusive flags that define consumer behavior:
+ *      - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "single-consumer". Otherwise, it is "multi-consumers".
+ *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer RTS mode".
+ *     If none of these flags is set, then default "multi-consumer"
+ *     behavior is selected.
  * @return
  *   On success, the pointer to the new allocated ring. NULL on error with
  *    rte_errno set appropriately. Possible errno values include:
@@ -389,8 +413,21 @@ static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -524,8 +561,20 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -845,8 +894,21 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE,
-			r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 /**
@@ -925,9 +987,21 @@ static __rte_always_inline unsigned
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
+			available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	return 0;
 }
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
index d9cef763f..bd21fa535 100644
--- a/lib/librte_ring/rte_ring_core.h
+++ b/lib/librte_ring/rte_ring_core.h
@@ -57,6 +57,9 @@ enum rte_ring_queue_behavior {
 enum rte_ring_sync_type {
 	RTE_RING_SYNC_MT,     /**< multi-thread safe (default mode) */
 	RTE_RING_SYNC_ST,     /**< single thread only */
+#ifdef ALLOW_EXPERIMENTAL_API
+	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+#endif
 };
 
 /**
@@ -76,6 +79,22 @@ struct rte_ring_headtail {
 	};
 };
 
+union __rte_ring_rts_poscnt {
+	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
+	uint64_t raw __rte_aligned(8);
+	struct {
+		uint32_t cnt; /**< head/tail reference counter */
+		uint32_t pos; /**< head/tail position */
+	} val;
+};
+
+struct rte_ring_rts_headtail {
+	volatile union __rte_ring_rts_poscnt tail;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+	uint32_t htd_max;   /**< max allowed distance between head/tail */
+	volatile union __rte_ring_rts_poscnt head;
+};
+
 /**
  * An RTE ring structure.
  *
@@ -104,11 +123,21 @@ struct rte_ring {
 	char pad0 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail prod;
+		struct rte_ring_rts_headtail rts_prod;
+	}  __rte_cache_aligned;
+
 	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_cache_aligned;
+	RTE_STD_C11
+	union {
+		struct rte_ring_headtail cons;
+		struct rte_ring_rts_headtail rts_cons;
+	}  __rte_cache_aligned;
+
 	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
@@ -125,6 +154,9 @@ struct rte_ring {
 #define RING_F_EXACT_SZ 0x0004
 #define RTE_RING_SZ_MASK  (0x7fffffffU) /**< Ring size mask */
 
+#define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
+#define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 7406c0b0f..4030753b6 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -74,12 +74,24 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *   constraint for the reserved zone.
  * @param flags
  *   An OR of the following:
- *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
- *      is "single-producer". Otherwise, it is "multi-producers".
- *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
- *      is "single-consumer". Otherwise, it is "multi-consumers".
+ *   - One of mutually exclusive flags that define producer behavior:
+ *      - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "single-producer".
+ *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer RTS mode".
+ *     If none of these flags is set, then default "multi-producer"
+ *     behavior is selected.
+ *   - One of mutually exclusive flags that define consumer behavior:
+ *      - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "single-consumer". Otherwise, it is "multi-consumers".
+ *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer RTS mode".
+ *     If none of these flags is set, then default "multi-consumer"
+ *     behavior is selected.
  * @return
  *   On success, the pointer to the new allocated ring. NULL on error with
  *    rte_errno set appropriately. Possible errno values include:
@@ -528,6 +540,10 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_rts.h>
+#endif
+
 /**
  * Enqueue several objects on a ring.
  *
@@ -557,6 +573,26 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 {
 	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
 			RTE_RING_QUEUE_FIXED, r->prod.sync_type, free_space);
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -661,7 +697,7 @@ rte_ring_mc_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
 	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
+			RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_MT, available);
 }
 
 /**
@@ -719,8 +755,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_FIXED, r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 /**
@@ -887,8 +940,25 @@ static __rte_always_inline unsigned
 rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue_elem(r, obj_table, esize, n,
-			RTE_RING_QUEUE_VARIABLE, r->prod.sync_type, free_space);
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
+			free_space);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (free_space != NULL)
+		*free_space = 0;
+	return 0;
 }
 
 /**
@@ -979,9 +1049,25 @@ static __rte_always_inline unsigned int
 rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 		unsigned int esize, unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue_elem(r, obj_table, esize, n,
-				RTE_RING_QUEUE_VARIABLE,
-				r->cons.sync_type, available);
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_MT:
+		return rte_ring_mc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+	case RTE_RING_SYNC_ST:
+		return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
+			available);
+#ifdef ALLOW_EXPERIMENTAL_API
+	case RTE_RING_SYNC_MT_RTS:
+		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
+#endif
+	}
+
+	/* valid ring should never reach this point */
+	RTE_ASSERT(0);
+	if (available != NULL)
+		*available = 0;
+	return 0;
 }
 
 #include <rte_ring.h>
diff --git a/lib/librte_ring/rte_ring_rts.h b/lib/librte_ring/rte_ring_rts.h
new file mode 100644
index 000000000..8ced07096
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts.h
@@ -0,0 +1,439 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_H_
+#define _RTE_RING_RTS_H_
+
+/**
+ * @file rte_ring_rts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for Relaxed Tail Sync (RTS) ring mode.
+ * The main idea remains the same as for our original MP/MC synchronization
+ * mechanism.
+ * The main difference is that tail value is increased not
+ * by every thread that finished enqueue/dequeue,
+ * but only by the current last one doing enqueue/dequeue.
+ * That allows threads to skip spinning on tail value,
+ * leaving actual tail value change to last thread at a given instance.
+ * RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+ * one for head update, second for tail update.
+ * As a gain it allows thread to avoid spinning/waiting on tail value.
+ * In comparision original MP/MC algorithm requires one 32-bit CAS
+ * for head update and waiting/spinning on tail value.
+ *
+ * Brief outline:
+ *  - introduce update counter (cnt) for both head and tail.
+ *  - increment head.cnt for each head.value update
+ *  - write head.value and head.cnt atomically (64-bit CAS)
+ *  - move tail.value ahead only when tail.cnt + 1 == head.cnt
+ *    (indicating that this is the last thread updating the tail)
+ *  - increment tail.cnt when each enqueue/dequeue op finishes
+ *    (no matter if tail.value going to change or not)
+ *  - write tail.value and tail.cnt atomically (64-bit CAS)
+ *
+ * To avoid producer/consumer starvation:
+ *  - limit max allowed distance between head and tail value (HTD_MAX).
+ *    I.E. thread is allowed to proceed with changing head.value,
+ *    only when:  head.value - tail.value <= HTD_MAX
+ * HTD_MAX is an optional parameter.
+ * With HTD_MAX == 0 we'll have fully serialized ring -
+ * i.e. only one thread at a time will be able to enqueue/dequeue
+ * to/from the ring.
+ * With HTD_MAX >= ring.capacity - no limitation.
+ * By default HTD_MAX == ring.capacity / 8.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_rts_c11_mem.h>
+
+/**
+ * @internal Enqueue several objects on the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_enqueue_elem(struct rte_ring *r, const void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_rts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_prod);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the RTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_rts_dequeue_elem(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_rts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_rts_update_tail(&r->rts_cons);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_rts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_rts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_rts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_rts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Enqueue several objects on the RTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_rts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an RTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_rts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Return producer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Producer HTD value, if producer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_prod_htd_max(const struct rte_ring *r)
+{
+	if (r->prod.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_prod.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set producer max Head-Tail-Distance (HTD).
+ * Note that producer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_prod_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->prod.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_prod.htd_max = v;
+	return 0;
+}
+
+/**
+ * Return consumer max Head-Tail-Distance (HTD).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   Consumer HTD value, if consumer is set in appropriate sync mode,
+ *   or UINT32_MAX otherwise.
+ */
+__rte_experimental
+static inline uint32_t
+rte_ring_get_cons_htd_max(const struct rte_ring *r)
+{
+	if (r->cons.sync_type == RTE_RING_SYNC_MT_RTS)
+		return r->rts_cons.htd_max;
+	return UINT32_MAX;
+}
+
+/**
+ * Set consumer max Head-Tail-Distance (HTD).
+ * Note that consumer has to use appropriate sync mode (RTS).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param v
+ *   new HTD value to setup.
+ * @return
+ *   Zero on success, or negative error code otherwise.
+ */
+__rte_experimental
+static inline int
+rte_ring_set_cons_htd_max(struct rte_ring *r, uint32_t v)
+{
+	if (r->cons.sync_type != RTE_RING_SYNC_MT_RTS)
+		return -ENOTSUP;
+
+	r->rts_cons.htd_max = v;
+	return 0;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_RTS_H_ */
diff --git a/lib/librte_ring/rte_ring_rts_c11_mem.h b/lib/librte_ring/rte_ring_rts_c11_mem.h
new file mode 100644
index 000000000..327f22796
--- /dev/null
+++ b/lib/librte_ring/rte_ring_rts_c11_mem.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_RTS_C11_MEM_H_
+#define _RTE_RING_RTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_rts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for Relaxed Tail Sync (RTS) ring mode.
+ * For more information please refer to <rte_ring_rts.h>.
+ */
+
+/**
+ * @internal This function updates tail values.
+ */
+static __rte_always_inline void
+__rte_ring_rts_update_tail(struct rte_ring_rts_headtail *ht)
+{
+	union __rte_ring_rts_poscnt h, ot, nt;
+
+	/*
+	 * If there are other enqueues/dequeues in progress that
+	 * might preceded us, then don't update tail with new value.
+	 */
+
+	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* on 32-bit systems we have to do atomic read here */
+		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+
+		nt.raw = ot.raw;
+		if (++nt.val.cnt == h.val.cnt)
+			nt.val.pos = h.val.pos;
+
+	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
+			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+}
+
+/**
+ * @internal This function waits till head/tail distance wouldn't
+ * exceed pre-defined max value.
+ */
+static __rte_always_inline void
+__rte_ring_rts_head_wait(const struct rte_ring_rts_headtail *ht,
+	union __rte_ring_rts_poscnt *h)
+{
+	uint32_t max;
+
+	max = ht->htd_max;
+
+	while (h->val.pos - ht->tail.val.pos > max) {
+		rte_pause();
+		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_rts_move_prod_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union __rte_ring_rts_poscnt nh, oh;
+
+	const uint32_t capacity = r->capacity;
+
+	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/*
+		 * wait for prod head/tail distance,
+		 * make sure that we read prod head *before*
+		 * reading cons tail.
+		 */
+		__rte_ring_rts_head_wait(&r->rts_prod, &oh);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - oh.val.pos;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of cons tail value
+	 *  - OOO copy of elems to the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_rts_move_cons_head(struct rte_ring *r, uint32_t num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union __rte_ring_rts_poscnt nh, oh;
+
+	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/*
+		 * wait for cons head/tail distance,
+		 * make sure that we read cons head *before*
+		 * reading prod tail.
+		 */
+		__rte_ring_rts_head_wait(&r->rts_cons, &oh);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - oh.val.pos;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		nh.val.pos = oh.val.pos + n;
+		nh.val.cnt = oh.val.cnt + 1;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of prod tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
+			&oh.raw, nh.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = oh.val.pos;
+	return n;
+}
+
+#endif /* _RTE_RING_RTS_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 04/10] test/ring: add contention stress test for RTS ring
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
                                 ` (2 preceding siblings ...)
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 03/10] ring: introduce RTS ring mode Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 05/10] ring: introduce HTS ring mode Konstantin Ananyev
                                 ` (6 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test RTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_rts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_rts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index a23a011df..00b74b5c9 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -79,6 +79,7 @@ SRCS-y += test_rand_perf.c
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 8824f366c..97ad822c1 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_perf.c',
+	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
 	'test_rwlock.c',
 	'test_sched.c',
diff --git a/app/test/test_ring_rts_stress.c b/app/test/test_ring_rts_stress.c
new file mode 100644
index 000000000..f5255f24c
--- /dev/null
+++ b/app/test/test_ring_rts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_rts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_rts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ);
+}
+
+const struct test test_ring_rts_stress = {
+	.name = "MT_RTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 60706f799..eab395e30 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -40,6 +40,9 @@ test_ring_stress(void)
 	n += test_ring_mpmc_stress.nb_case;
 	k += run_test(&test_ring_mpmc_stress);
 
+	n += test_ring_rts_stress.nb_case;
+	k += run_test(&test_ring_rts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 60eac6216..32aae2072 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -33,3 +33,4 @@ struct test {
 };
 
 extern const struct test test_ring_mpmc_stress;
+extern const struct test test_ring_rts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 05/10] ring: introduce HTS ring mode
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
                                 ` (3 preceding siblings ...)
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 04/10] test/ring: add contention stress test for RTS ring Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 06/10] test/ring: add contention stress test for HTS ring Konstantin Ananyev
                                 ` (5 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce head/tail sync mode for MT ring synchronization.
In that mode enqueue/dequeue operation is fully serialized:
only one thread at a time is allowed to perform given op.
Suppose to reduce stall times in case when ring is used on
overcommitted cpus (multiple active threads on the same cpu).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/rel_notes/release_20_05.rst |   8 +-
 lib/librte_ring/Makefile               |   2 +
 lib/librte_ring/meson.build            |   2 +
 lib/librte_ring/rte_ring.c             |  20 +-
 lib/librte_ring/rte_ring.h             |  23 ++
 lib/librte_ring/rte_ring_core.h        |  20 ++
 lib/librte_ring/rte_ring_elem.h        |  19 ++
 lib/librte_ring/rte_ring_hts.h         | 332 +++++++++++++++++++++++++
 lib/librte_ring/rte_ring_hts_c11_mem.h | 164 ++++++++++++
 9 files changed, 584 insertions(+), 6 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring_hts.h
 create mode 100644 lib/librte_ring/rte_ring_hts_c11_mem.h

diff --git a/doc/guides/rel_notes/release_20_05.rst b/doc/guides/rel_notes/release_20_05.rst
index eedf960d0..db8a281db 100644
--- a/doc/guides/rel_notes/release_20_05.rst
+++ b/doc/guides/rel_notes/release_20_05.rst
@@ -83,10 +83,10 @@ New Features
 
 * **New synchronization modes for rte_ring.**
 
-  Introduced new optional MT synchronization mode for rte_ring:
-  Relaxed Tail Sync (RTS). With this mode selected, rte_ring shows
-  significant improvements for average enqueue/dequeue times on
-  overcommitted systems.
+  Introduced new optional MT synchronization modes for rte_ring:
+  Relaxed Tail Sync (RTS) mode and Head/Tail Sync (HTS) mode.
+  With these mode selected, rte_ring shows significant improvements for
+  average enqueue/dequeue times on overcommitted systems.
 
 
 Removed Items
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 04e446e37..f75d8e530 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -20,6 +20,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_elem.h \
 					rte_ring_generic.h \
 					rte_ring_c11_mem.h \
+					rte_ring_hts.h \
+					rte_ring_hts_c11_mem.h \
 					rte_ring_rts.h \
 					rte_ring_rts_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index a95598032..ca37cb8cc 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -7,6 +7,8 @@ headers = files('rte_ring.h',
 		'rte_ring_elem.h',
 		'rte_ring_c11_mem.h',
 		'rte_ring_generic.h',
+		'rte_ring_hts.h',
+		'rte_ring_hts_c11_mem.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_c11_mem.h')
 
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 222eec0fb..ebe5ccf0d 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -89,9 +89,11 @@ static void
 reset_headtail(void *p)
 {
 	struct rte_ring_headtail *ht;
+	struct rte_ring_hts_headtail *ht_hts;
 	struct rte_ring_rts_headtail *ht_rts;
 
 	ht = p;
+	ht_hts = p;
 	ht_rts = p;
 
 	switch (ht->sync_type) {
@@ -104,6 +106,9 @@ reset_headtail(void *p)
 		ht_rts->head.raw = 0;
 		ht_rts->tail.raw = 0;
 		break;
+	case RTE_RING_SYNC_MT_HTS:
+		ht_hts->ht.raw = 0;
+		break;
 	default:
 		/* unknown sync mode */
 		RTE_ASSERT(0);
@@ -127,9 +132,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	enum rte_ring_sync_type *cons_st)
 {
 	static const uint32_t prod_st_flags =
-		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ);
+		(RING_F_SP_ENQ | RING_F_MP_RTS_ENQ | RING_F_MP_HTS_ENQ);
 	static const uint32_t cons_st_flags =
-		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ);
+		(RING_F_SC_DEQ | RING_F_MC_RTS_DEQ | RING_F_MC_HTS_DEQ);
 
 	switch (flags & prod_st_flags) {
 	case 0:
@@ -141,6 +146,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MP_RTS_ENQ:
 		*prod_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MP_HTS_ENQ:
+		*prod_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -155,6 +163,9 @@ get_sync_type(uint32_t flags, enum rte_ring_sync_type *prod_st,
 	case RING_F_MC_RTS_DEQ:
 		*cons_st = RTE_RING_SYNC_MT_RTS;
 		break;
+	case RING_F_MC_HTS_DEQ:
+		*cons_st = RTE_RING_SYNC_MT_HTS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -176,6 +187,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
+		offsetof(struct rte_ring_hts_headtail, sync_type));
+	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
+		offsetof(struct rte_ring_hts_headtail, ht.pos.tail));
+
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, sync_type) !=
 		offsetof(struct rte_ring_rts_headtail, sync_type));
 	RTE_BUILD_BUG_ON(offsetof(struct rte_ring_headtail, tail) !=
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index c42e1cfc4..7cf046528 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -86,6 +86,9 @@ ssize_t rte_ring_get_memsize(unsigned count);
  *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
  *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
  *        is "multi-producer RTS mode".
+ *      - RING_F_MP_HTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer HTS mode".
  *     If none of these flags is set, then default "multi-producer"
  *     behavior is selected.
  *   - One of mutually exclusive flags that define consumer behavior:
@@ -95,6 +98,9 @@ ssize_t rte_ring_get_memsize(unsigned count);
  *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
  *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
  *        is "multi-consumer RTS mode".
+ *      - RING_F_MC_HTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer HTS mode".
  *     If none of these flags is set, then default "multi-consumer"
  *     behavior is selected.
  * @return
@@ -133,6 +139,9 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
  *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
  *        is "multi-producer RTS mode".
+ *      - RING_F_MP_HTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer HTS mode".
  *     If none of these flags is set, then default "multi-producer"
  *     behavior is selected.
  *   - One of mutually exclusive flags that define consumer behavior:
@@ -142,6 +151,9 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
  *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
  *        is "multi-consumer RTS mode".
+ *      - RING_F_MC_HTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer HTS mode".
  *     If none of these flags is set, then default "multi-consumer"
  *     behavior is selected.
  * @return
@@ -422,6 +434,9 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -569,6 +584,8 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 #ifdef ALLOW_EXPERIMENTAL_API
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
 #endif
 	}
 
@@ -903,6 +920,9 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
+			free_space);
 #endif
 	}
 
@@ -996,6 +1016,9 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
 			available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
+			available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_core.h b/lib/librte_ring/rte_ring_core.h
index bd21fa535..16718ca7f 100644
--- a/lib/librte_ring/rte_ring_core.h
+++ b/lib/librte_ring/rte_ring_core.h
@@ -59,6 +59,7 @@ enum rte_ring_sync_type {
 	RTE_RING_SYNC_ST,     /**< single thread only */
 #ifdef ALLOW_EXPERIMENTAL_API
 	RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
+	RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
 #endif
 };
 
@@ -95,6 +96,20 @@ struct rte_ring_rts_headtail {
 	volatile union __rte_ring_rts_poscnt head;
 };
 
+union __rte_ring_hts_pos {
+	/** raw 8B value to read/write *head* and *tail* as one atomic op */
+	uint64_t raw __rte_aligned(8);
+	struct {
+		uint32_t head; /**< head position */
+		uint32_t tail; /**< tail position */
+	} pos;
+};
+
+struct rte_ring_hts_headtail {
+	volatile union __rte_ring_hts_pos ht;
+	enum rte_ring_sync_type sync_type;  /**< sync type of prod/cons */
+};
+
 /**
  * An RTE ring structure.
  *
@@ -126,6 +141,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail prod;
+		struct rte_ring_hts_headtail hts_prod;
 		struct rte_ring_rts_headtail rts_prod;
 	}  __rte_cache_aligned;
 
@@ -135,6 +151,7 @@ struct rte_ring {
 	RTE_STD_C11
 	union {
 		struct rte_ring_headtail cons;
+		struct rte_ring_hts_headtail hts_cons;
 		struct rte_ring_rts_headtail rts_cons;
 	}  __rte_cache_aligned;
 
@@ -157,6 +174,9 @@ struct rte_ring {
 #define RING_F_MP_RTS_ENQ 0x0008 /**< The default enqueue is "MP RTS". */
 #define RING_F_MC_RTS_DEQ 0x0010 /**< The default dequeue is "MC RTS". */
 
+#define RING_F_MP_HTS_ENQ 0x0020 /**< The default enqueue is "MP HTS". */
+#define RING_F_MC_HTS_DEQ 0x0040 /**< The default dequeue is "MC HTS". */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 4030753b6..492eef936 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -81,6 +81,9 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *      - RING_F_MP_RTS_ENQ: If this flag is set, the default behavior when
  *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
  *        is "multi-producer RTS mode".
+ *      - RING_F_MP_HTS_ENQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *        is "multi-producer HTS mode".
  *     If none of these flags is set, then default "multi-producer"
  *     behavior is selected.
  *   - One of mutually exclusive flags that define consumer behavior:
@@ -90,6 +93,9 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *      - RING_F_MC_RTS_DEQ: If this flag is set, the default behavior when
  *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
  *        is "multi-consumer RTS mode".
+ *      - RING_F_MC_HTS_DEQ: If this flag is set, the default behavior when
+ *        using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *        is "multi-consumer HTS mode".
  *     If none of these flags is set, then default "multi-consumer"
  *     behavior is selected.
  * @return
@@ -541,6 +547,7 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 }
 
 #ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_hts.h>
 #include <rte_ring_rts.h>
 #endif
 
@@ -585,6 +592,9 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
 			free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table, esize, n,
+			free_space);
 #endif
 	}
 
@@ -766,6 +776,9 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
@@ -951,6 +964,9 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
 			n, free_space);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table, esize,
+			n, free_space);
 #endif
 	}
 
@@ -1060,6 +1076,9 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	case RTE_RING_SYNC_MT_RTS:
 		return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
 			n, available);
+	case RTE_RING_SYNC_MT_HTS:
+		return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table, esize,
+			n, available);
 #endif
 	}
 
diff --git a/lib/librte_ring/rte_ring_hts.h b/lib/librte_ring/rte_ring_hts.h
new file mode 100644
index 000000000..c7701defc
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts.h
@@ -0,0 +1,332 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_H_
+#define _RTE_RING_HTS_H_
+
+/**
+ * @file rte_ring_hts.h
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring.h> instead.
+ *
+ * Contains functions for serialized, aka Head-Tail Sync (HTS) ring mode.
+ * In that mode enqueue/dequeue operation is fully serialized:
+ * at any given moment only one enqueue/dequeue operation can proceed.
+ * This is achieved by allowing a thread to proceed with changing head.value
+ * only when head.value == tail.value.
+ * Both head and tail values are updated atomically (as one 64-bit value).
+ * To achieve that 64-bit CAS is used by head update routine.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_hts_c11_mem.h>
+
+/**
+ * @internal Enqueue several objects on the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
+ * @param free_space
+ *   returns the amount of space after the enqueue operation has finished
+ * @return
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_enqueue_elem(struct rte_ring *r, const void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *free_space)
+{
+	uint32_t free, head;
+
+	n =  __rte_ring_hts_move_prod_head(r, n, behavior, &head, &free);
+
+	if (n != 0) {
+		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_prod, head, n, 1);
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * @internal Dequeue several objects from the HTS ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to pull from the ring.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
+ * @param available
+ *   returns the number of remaining ring entries after the dequeue has finished
+ * @return
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_hts_dequeue_elem(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t entries, head;
+
+	n = __rte_ring_hts_move_cons_head(r, n, behavior, &head, &entries);
+
+	if (n != 0) {
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+		__rte_ring_hts_update_tail(&r->hts_cons, head, n, 0);
+	}
+
+	if (available != NULL)
+		*available = entries - n;
+	return n;
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+		RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst_elem(struct rte_ring *r, const void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *free_space)
+{
+	return __rte_ring_do_hts_enqueue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
+	unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_hts_dequeue_elem(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mp_hts_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS ring (multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_mc_hts_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Enqueue several objects on the HTS ring (multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mp_hts_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned int n, unsigned int *free_space)
+{
+	return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, free_space);
+}
+
+/**
+ * Dequeue several objects from an HTS  ring (multi-consumers safe).
+ * When the requested objects are more than the available objects,
+ * only dequeue the actual number of objects.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+__rte_experimental
+static __rte_always_inline unsigned
+rte_ring_mc_hts_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table,
+			sizeof(uintptr_t), n, available);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_HTS_H_ */
diff --git a/lib/librte_ring/rte_ring_hts_c11_mem.h b/lib/librte_ring/rte_ring_hts_c11_mem.h
new file mode 100644
index 000000000..16e54b6ff
--- /dev/null
+++ b/lib/librte_ring/rte_ring_hts_c11_mem.h
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_HTS_C11_MEM_H_
+#define _RTE_RING_HTS_C11_MEM_H_
+
+/**
+ * @file rte_ring_hts_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for head/tail sync (HTS) ring mode.
+ * For more information please refer to <rte_ring_hts.h>.
+ */
+
+/**
+ * @internal update tail with new value.
+ */
+static __rte_always_inline void
+__rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t old_tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t tail;
+
+	RTE_SET_USED(enqueue);
+
+	tail = old_tail + num;
+	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
+}
+
+/**
+ * @internal waits till tail will become equal to head.
+ * Means no writer/reader is active for that ring.
+ * Suppose to work as serialization point.
+ */
+static __rte_always_inline void
+__rte_ring_hts_head_wait(const struct rte_ring_hts_headtail *ht,
+		union __rte_ring_hts_pos *p)
+{
+	while (p->pos.head != p->pos.tail) {
+		rte_pause();
+		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+	}
+}
+
+/**
+ * @internal This function updates the producer head for enqueue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_prod_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *free_entries)
+{
+	uint32_t n;
+	union __rte_ring_hts_pos np, op;
+
+	const uint32_t capacity = r->capacity;
+
+	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
+
+	do {
+		/* Reset n to the initial burst count */
+		n = num;
+
+		/*
+		 * wait for tail to be equal to head,
+		 * make sure that we read prod head/tail *before*
+		 * reading cons tail.
+		 */
+		__rte_ring_hts_head_wait(&r->hts_prod, &op);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > cons_tail). So 'free_entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*free_entries = capacity + r->cons.tail - op.pos.head;
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *free_entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *free_entries;
+
+		if (n == 0)
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of cons tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+/**
+ * @internal This function updates the consumer head for dequeue
+ */
+static __rte_always_inline unsigned int
+__rte_ring_hts_move_cons_head(struct rte_ring *r, unsigned int num,
+	enum rte_ring_queue_behavior behavior, uint32_t *old_head,
+	uint32_t *entries)
+{
+	uint32_t n;
+	union __rte_ring_hts_pos np, op;
+
+	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = num;
+
+		/*
+		 * wait for tail to be equal to head,
+		 * make sure that we read cons head/tail *before*
+		 * reading prod tail.
+		 */
+		__rte_ring_hts_head_wait(&r->hts_cons, &op);
+
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1.
+		 */
+		*entries = r->prod.tail - op.pos.head;
+
+		/* Set the actual entries for dequeue */
+		if (n > *entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+		if (unlikely(n == 0))
+			break;
+
+		np.pos.tail = op.pos.tail;
+		np.pos.head = op.pos.head + n;
+
+	/*
+	 * this CAS(ACQUIRE, ACQUIRE) serves as a hoist barrier to prevent:
+	 *  - OOO reads of prod tail value
+	 *  - OOO copy of elems from the ring
+	 */
+	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
+			&op.raw, np.raw,
+			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+
+	*old_head = op.pos.head;
+	return n;
+}
+
+#endif /* _RTE_RING_HTS_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 06/10] test/ring: add contention stress test for HTS ring
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
                                 ` (4 preceding siblings ...)
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 05/10] ring: introduce HTS ring mode Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 07/10] ring: introduce peek style API Konstantin Ananyev
                                 ` (4 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test HTS ring mode under contention.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile               |  1 +
 app/test/meson.build            |  1 +
 app/test/test_ring_hts_stress.c | 32 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c     |  3 +++
 app/test/test_ring_stress.h     |  1 +
 5 files changed, 38 insertions(+)
 create mode 100644 app/test/test_ring_hts_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 00b74b5c9..28f0b9ac2 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -78,6 +78,7 @@ SRCS-y += test_rand_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
+SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 97ad822c1..20c4978c2 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -101,6 +101,7 @@ test_sources = files('commands.c',
 	'test_rib6.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
+	'test_ring_hts_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_hts_stress.c b/app/test/test_ring_hts_stress.c
new file mode 100644
index 000000000..edc9175cb
--- /dev/null
+++ b/app/test/test_ring_hts_stress.c
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	return rte_ring_mc_hts_dequeue_bulk(r, obj, n, avail);
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	return rte_ring_mp_hts_enqueue_bulk(r, obj, n, free);
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_hts_stress = {
+	.name = "MT_HTS",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index eab395e30..29a1368d7 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -43,6 +43,9 @@ test_ring_stress(void)
 	n += test_ring_rts_stress.nb_case;
 	k += run_test(&test_ring_rts_stress);
 
+	n += test_ring_hts_stress.nb_case;
+	k += run_test(&test_ring_hts_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 32aae2072..9a87c7f7b 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -34,3 +34,4 @@ struct test {
 
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
+extern const struct test test_ring_hts_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 07/10] ring: introduce peek style API
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
                                 ` (5 preceding siblings ...)
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 06/10] test/ring: add contention stress test for HTS ring Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 08/10] test/ring: add stress test for MT peek API Konstantin Ananyev
                                 ` (3 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
mode, provide an ability to split enqueue/dequeue operation
into two phases:
      - enqueue/dequeue start
      - enqueue/dequeue finish
That allows user to inspect objects in the ring without removing
them from it (aka MT safe peek).

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/rel_notes/release_20_05.rst  |  11 +-
 lib/librte_ring/Makefile                |   2 +
 lib/librte_ring/meson.build             |   2 +
 lib/librte_ring/rte_ring.h              |   3 +
 lib/librte_ring/rte_ring_elem.h         |   4 +
 lib/librte_ring/rte_ring_peek.h         | 444 ++++++++++++++++++++++++
 lib/librte_ring/rte_ring_peek_c11_mem.h | 110 ++++++
 7 files changed, 575 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ring/rte_ring_peek.h
 create mode 100644 lib/librte_ring/rte_ring_peek_c11_mem.h

diff --git a/doc/guides/rel_notes/release_20_05.rst b/doc/guides/rel_notes/release_20_05.rst
index db8a281db..ec558c64f 100644
--- a/doc/guides/rel_notes/release_20_05.rst
+++ b/doc/guides/rel_notes/release_20_05.rst
@@ -81,13 +81,22 @@ New Features
   by making use of the event device capabilities. The event mode currently supports
   only inline IPsec protocol offload.
 
-* **New synchronization modes for rte_ring.**
+* **Added new API for rte_ring.**
+
+  * New synchronization modes for rte_ring.
 
   Introduced new optional MT synchronization modes for rte_ring:
   Relaxed Tail Sync (RTS) mode and Head/Tail Sync (HTS) mode.
   With these mode selected, rte_ring shows significant improvements for
   average enqueue/dequeue times on overcommitted systems.
 
+  * Added peek style API for rte_ring.
+
+  For rings with producer/consumer in RTE_RING_SYNC_ST, RTE_RING_SYNC_MT_HTS
+  mode, provide an ability to split enqueue/dequeue operation into two phases
+  (enqueue/dequeue start; enqueue/dequeue finish). That allows user to inspect
+  objects in the ring without removing them from it (aka MT safe peek).
+
 
 Removed Items
 -------------
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index f75d8e530..83a9d0840 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -22,6 +22,8 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
 					rte_ring_c11_mem.h \
 					rte_ring_hts.h \
 					rte_ring_hts_c11_mem.h \
+					rte_ring_peek.h \
+					rte_ring_peek_c11_mem.h \
 					rte_ring_rts.h \
 					rte_ring_rts_c11_mem.h
 
diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build
index ca37cb8cc..4f77647cd 100644
--- a/lib/librte_ring/meson.build
+++ b/lib/librte_ring/meson.build
@@ -9,6 +9,8 @@ headers = files('rte_ring.h',
 		'rte_ring_generic.h',
 		'rte_ring_hts.h',
 		'rte_ring_hts_c11_mem.h',
+		'rte_ring_peek.h',
+		'rte_ring_peek_c11_mem.h',
 		'rte_ring_rts.h',
 		'rte_ring_rts_c11_mem.h')
 
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 7cf046528..86faede81 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -25,6 +25,9 @@
  * - Multi- or single-producer enqueue.
  * - Bulk dequeue.
  * - Bulk enqueue.
+ * - Ability to select different sync modes for producer/consumer.
+ * - Dequeue start/finish (depending on consumer sync modes).
+ * - Enqueue start/finish (depending on producer sync mode).
  *
  * Note: the ring implementation is not preemptible. Refer to Programmer's
  * guide/Environment Abstraction Layer/Multiple pthread/Known Issues/rte_ring
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 492eef936..a5a4c46f9 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -1089,6 +1089,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 	return 0;
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+#include <rte_ring_peek.h>
+#endif
+
 #include <rte_ring.h>
 
 #ifdef __cplusplus
diff --git a/lib/librte_ring/rte_ring_peek.h b/lib/librte_ring/rte_ring_peek.h
new file mode 100644
index 000000000..1ad8bba22
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek.h
@@ -0,0 +1,444 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_H_
+#define _RTE_RING_PEEK_H_
+
+/**
+ * @file
+ * @b EXPERIMENTAL: this API may change without prior notice
+ * It is not recommended to include this file directly.
+ * Please include <rte_ring_elem.h> instead.
+ *
+ * Ring Peek API
+ * Introduction of rte_ring with serialized producer/consumer (HTS sync mode)
+ * makes possible to split public enqueue/dequeue API into two phases:
+ * - enqueue/dequeue start
+ * - enqueue/dequeue finish
+ * That allows user to inspect objects in the ring without removing them
+ * from it (aka MT safe peek).
+ * Note that right now this new API is avaialble only for two sync modes:
+ * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
+ * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS).
+ * It is a user responsibility to create/init ring with appropriate sync
+ * modes selected.
+ * As an example:
+ * // read 1 elem from the ring:
+ * n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
+ * if (n != 0) {
+ *    //examine object
+ *    if (object_examine(obj) == KEEP)
+ *       //decided to keep it in the ring.
+ *       rte_ring_dequeue_finish(ring, 0);
+ *    else
+ *       //decided to remove it from the ring.
+ *       rte_ring_dequeue_finish(ring, n);
+ * }
+ * Note that between _start_ and _finish_ none other thread can proceed
+ * with enqueue(/dequeue) operation till _finish_ completes.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_ring_peek_c11_mem.h>
+
+/**
+ * @internal This function moves prod head value.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_enqueue_start(struct rte_ring *r, uint32_t n,
+		enum rte_ring_queue_behavior behavior, uint32_t *free_space)
+{
+	uint32_t free, head, next;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_prod_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &free);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_prod_head(r, n, behavior,
+			&head, &free);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (free_space != NULL)
+		*free_space = free - n;
+	return n;
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_elem_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_elem_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_FIXED,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   The number of objects that can be enqueued, either 0 or n
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_bulk_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return rte_ring_enqueue_bulk_elem_start(r, n, free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_elem_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_elem_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return __rte_ring_do_enqueue_start(r, n, RTE_RING_QUEUE_VARIABLE,
+			free_space);
+}
+
+/**
+ * Start to enqueue several objects on the ring.
+ * Note that no actual objects are put in the queue by this function,
+ * it just reserves for user such ability.
+ * User has to call appropriate enqueue_finish() to copy objects into the
+ * queue and complete given enqueue operation.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param free_space
+ *   if non-NULL, returns the amount of space in the ring after the
+ *   enqueue operation has finished.
+ * @return
+ *   Actual number of objects that can be enqueued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_enqueue_burst_start(struct rte_ring *r, unsigned int n,
+		unsigned int *free_space)
+{
+	return rte_ring_enqueue_burst_elem_start(r, n, free_space);
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_elem_finish(struct rte_ring *r, const void *obj_table,
+		unsigned int esize, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->prod.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_st_set_head_tail(&r->prod, tail, n, 1);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_prod, &tail, n);
+		if (n != 0)
+			__rte_ring_enqueue_elems(r, tail, obj_table, esize, n);
+		__rte_ring_hts_set_head_tail(&r->hts_prod, tail, n, 1);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to enqueue several objects on the ring.
+ * Note that number of objects to enqueue should not exceed previous
+ * enqueue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects.
+ * @param n
+ *   The number of objects to add to the ring from the obj_table.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_enqueue_finish(struct rte_ring *r, void * const *obj_table,
+		unsigned int n)
+{
+	rte_ring_enqueue_elem_finish(r, obj_table, sizeof(uintptr_t), n);
+}
+
+/**
+ * @internal This function moves cons head value and copies up to *n*
+ * objects from the ring to the user provided obj_table.
+ */
+static __rte_always_inline unsigned int
+__rte_ring_do_dequeue_start(struct rte_ring *r, void *obj_table,
+	uint32_t esize, uint32_t n, enum rte_ring_queue_behavior behavior,
+	uint32_t *available)
+{
+	uint32_t avail, head, next;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_move_cons_head(r, RTE_RING_SYNC_ST, n,
+			behavior, &head, &next, &avail);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n =  __rte_ring_hts_move_cons_head(r, n, behavior,
+			&head, &avail);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+		n = 0;
+	}
+
+	if (n != 0)
+		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
+
+	if (available != NULL)
+		*available = avail - n;
+	return n;
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The number of objects dequeued, either 0 or n.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_elem_start(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_FIXED, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   Actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_bulk_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_bulk_elem_start(r, obj_table, sizeof(uintptr_t),
+		n, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of objects that will be filled.
+ * @param esize
+ *   The size of ring element, in bytes. It must be a multiple of 4.
+ *   This must be the same value used while creating the ring. Otherwise
+ *   the results are undefined.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_elem_start(struct rte_ring *r, void *obj_table,
+		unsigned int esize, unsigned int n, unsigned int *available)
+{
+	return __rte_ring_do_dequeue_start(r, obj_table, esize, n,
+			RTE_RING_QUEUE_VARIABLE, available);
+}
+
+/**
+ * Start to dequeue several objects from the ring.
+ * Note that user has to call appropriate dequeue_finish()
+ * to complete given dequeue operation and actually remove objects the ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param available
+ *   If non-NULL, returns the number of remaining ring entries after the
+ *   dequeue has finished.
+ * @return
+ *   The actual number of objects dequeued.
+ */
+__rte_experimental
+static __rte_always_inline unsigned int
+rte_ring_dequeue_burst_start(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
+{
+	return rte_ring_dequeue_burst_elem_start(r, obj_table,
+		sizeof(uintptr_t), n, available);
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to remove from the ring.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_elem_finish(struct rte_ring *r, unsigned int n)
+{
+	uint32_t tail;
+
+	switch (r->cons.sync_type) {
+	case RTE_RING_SYNC_ST:
+		n = __rte_ring_st_get_tail(&r->cons, &tail, n);
+		__rte_ring_st_set_head_tail(&r->cons, tail, n, 0);
+		break;
+	case RTE_RING_SYNC_MT_HTS:
+		n = __rte_ring_hts_get_tail(&r->hts_cons, &tail, n);
+		__rte_ring_hts_set_head_tail(&r->hts_cons, tail, n, 0);
+		break;
+	default:
+		/* unsupported mode, shouldn't be here */
+		RTE_ASSERT(0);
+	}
+}
+
+/**
+ * Complete to dequeue several objects from the ring.
+ * Note that number of objects to dequeue should not exceed previous
+ * dequeue_start return value.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param n
+ *   The number of objects to remove from the ring.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_ring_dequeue_finish(struct rte_ring *r, unsigned int n)
+{
+	rte_ring_dequeue_elem_finish(r, n);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_PEEK_H_ */
diff --git a/lib/librte_ring/rte_ring_peek_c11_mem.h b/lib/librte_ring/rte_ring_peek_c11_mem.h
new file mode 100644
index 000000000..99321f124
--- /dev/null
+++ b/lib/librte_ring/rte_ring_peek_c11_mem.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ *
+ * Copyright (c) 2010-2020 Intel Corporation
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ * Derived from FreeBSD's bufring.h
+ * Used as BSD-3 Licensed with permission from Kip Macy.
+ */
+
+#ifndef _RTE_RING_PEEK_C11_MEM_H_
+#define _RTE_RING_PEEK_C11_MEM_H_
+
+/**
+ * @file rte_ring_peek_c11_mem.h
+ * It is not recommended to include this file directly,
+ * include <rte_ring.h> instead.
+ * Contains internal helper functions for rte_ring peek API.
+ * For more information please refer to <rte_ring_peek.h>.
+ */
+
+/**
+ * @internal get current tail value.
+ * This function should be used only for single thread producer/consumer.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_st_get_tail(struct rte_ring_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t h, n, t;
+
+	h = ht->head;
+	t = ht->tail;
+	n = h - t;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = h;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail.
+ * This function should be used only for single thread producer/consumer.
+ * Should be used only in conjunction with __rte_ring_st_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_st_set_head_tail(struct rte_ring_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	uint32_t pos;
+
+	RTE_SET_USED(enqueue);
+
+	pos = tail + num;
+	ht->head = pos;
+	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+}
+
+/**
+ * @internal get current tail value.
+ * This function should be used only for producer/consumer in MT_HTS mode.
+ * Check that user didn't request to move tail above the head.
+ * In that situation:
+ * - return zero, that will cause abort any pending changes and
+ *   return head to its previous position.
+ * - throw an assert in debug mode.
+ */
+static __rte_always_inline uint32_t
+__rte_ring_hts_get_tail(struct rte_ring_hts_headtail *ht, uint32_t *tail,
+	uint32_t num)
+{
+	uint32_t n;
+	union __rte_ring_hts_pos p;
+
+	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	n = p.pos.head - p.pos.tail;
+
+	RTE_ASSERT(n >= num);
+	num = (n >= num) ? num : 0;
+
+	*tail = p.pos.tail;
+	return num;
+}
+
+/**
+ * @internal set new values for head and tail as one atomic 64 bit operation.
+ * This function should be used only for producer/consumer in MT_HTS mode.
+ * Should be used only in conjunction with __rte_ring_hts_get_tail.
+ */
+static __rte_always_inline void
+__rte_ring_hts_set_head_tail(struct rte_ring_hts_headtail *ht, uint32_t tail,
+	uint32_t num, uint32_t enqueue)
+{
+	union __rte_ring_hts_pos p;
+
+	RTE_SET_USED(enqueue);
+
+	p.pos.head = tail + num;
+	p.pos.tail = p.pos.head;
+
+	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+}
+
+#endif /* _RTE_RING_PEEK_C11_MEM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 08/10] test/ring: add stress test for MT peek API
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
                                 ` (6 preceding siblings ...)
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 07/10] ring: introduce peek style API Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 09/10] test/ring: add functional tests for new sync modes Konstantin Ananyev
                                 ` (2 subsequent siblings)
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Introduce new test case to test MT peek API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/Makefile                |  1 +
 app/test/meson.build             |  1 +
 app/test/test_ring_peek_stress.c | 43 ++++++++++++++++++++++++++++++++
 app/test/test_ring_stress.c      |  3 +++
 app/test/test_ring_stress.h      |  1 +
 5 files changed, 49 insertions(+)
 create mode 100644 app/test/test_ring_peek_stress.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 28f0b9ac2..631a21028 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -80,6 +80,7 @@ SRCS-y += test_ring.c
 SRCS-y += test_ring_mpmc_stress.c
 SRCS-y += test_ring_hts_stress.c
 SRCS-y += test_ring_perf.c
+SRCS-y += test_ring_peek_stress.c
 SRCS-y += test_ring_rts_stress.c
 SRCS-y += test_ring_stress.c
 SRCS-y += test_pmd_perf.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 20c4978c2..d15278cf9 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -102,6 +102,7 @@ test_sources = files('commands.c',
 	'test_ring.c',
 	'test_ring_mpmc_stress.c',
 	'test_ring_hts_stress.c',
+	'test_ring_peek_stress.c',
 	'test_ring_perf.c',
 	'test_ring_rts_stress.c',
 	'test_ring_stress.c',
diff --git a/app/test/test_ring_peek_stress.c b/app/test/test_ring_peek_stress.c
new file mode 100644
index 000000000..cfc82d728
--- /dev/null
+++ b/app/test/test_ring_peek_stress.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include "test_ring_stress_impl.h"
+#include <rte_ring_elem.h>
+
+static inline uint32_t
+_st_ring_dequeue_bulk(struct rte_ring *r, void **obj, uint32_t n,
+	uint32_t *avail)
+{
+	uint32_t m;
+
+	m = rte_ring_dequeue_bulk_start(r, obj, n, avail);
+	n = (m == n) ? n : 0;
+	rte_ring_dequeue_finish(r, n);
+	return n;
+}
+
+static inline uint32_t
+_st_ring_enqueue_bulk(struct rte_ring *r, void * const *obj, uint32_t n,
+	uint32_t *free)
+{
+	uint32_t m;
+
+	m = rte_ring_enqueue_bulk_start(r, n, free);
+	n = (m == n) ? n : 0;
+	rte_ring_enqueue_finish(r, obj, n);
+	return n;
+}
+
+static int
+_st_ring_init(struct rte_ring *r, const char *name, uint32_t num)
+{
+	return rte_ring_init(r, name, num,
+		RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ);
+}
+
+const struct test test_ring_peek_stress = {
+	.name = "MT_PEEK",
+	.nb_case = RTE_DIM(tests),
+	.cases = tests,
+};
diff --git a/app/test/test_ring_stress.c b/app/test/test_ring_stress.c
index 29a1368d7..853fcc190 100644
--- a/app/test/test_ring_stress.c
+++ b/app/test/test_ring_stress.c
@@ -46,6 +46,9 @@ test_ring_stress(void)
 	n += test_ring_hts_stress.nb_case;
 	k += run_test(&test_ring_hts_stress);
 
+	n += test_ring_peek_stress.nb_case;
+	k += run_test(&test_ring_peek_stress);
+
 	printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
 		n, k, n - k);
 	return (k != n);
diff --git a/app/test/test_ring_stress.h b/app/test/test_ring_stress.h
index 9a87c7f7b..60953ce47 100644
--- a/app/test/test_ring_stress.h
+++ b/app/test/test_ring_stress.h
@@ -35,3 +35,4 @@ struct test {
 extern const struct test test_ring_mpmc_stress;
 extern const struct test test_ring_rts_stress;
 extern const struct test test_ring_hts_stress;
+extern const struct test test_ring_peek_stress;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 09/10] test/ring: add functional tests for new sync modes
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
                                 ` (7 preceding siblings ...)
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 08/10] test/ring: add stress test for MT peek API Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 10/10] doc: update ring guide Konstantin Ananyev
  2020-04-21 11:31               ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring David Marchand
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Extend test_ring_autotest with new test-cases for RTS/HTS sync modes.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_ring.c | 93 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 73 insertions(+), 20 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index fbcd109b1..e21557cd9 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -203,7 +203,8 @@ test_ring_negative_tests(void)
  * Random number of elements are enqueued and dequeued.
  */
 static int
-test_ring_burst_bulk_tests1(unsigned int api_type)
+test_ring_burst_bulk_tests1(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -213,12 +214,11 @@ test_ring_burst_bulk_tests1(unsigned int api_type)
 	const unsigned int rsz = RING_SIZE - 1;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -294,7 +294,8 @@ test_ring_burst_bulk_tests1(unsigned int api_type)
  * dequeued data.
  */
 static int
-test_ring_burst_bulk_tests2(unsigned int api_type)
+test_ring_burst_bulk_tests2(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -302,12 +303,11 @@ test_ring_burst_bulk_tests2(unsigned int api_type)
 	unsigned int i;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -390,7 +390,8 @@ test_ring_burst_bulk_tests2(unsigned int api_type)
  * Enqueue and dequeue to cover the entire ring length.
  */
 static int
-test_ring_burst_bulk_tests3(unsigned int api_type)
+test_ring_burst_bulk_tests3(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -398,12 +399,11 @@ test_ring_burst_bulk_tests3(unsigned int api_type)
 	unsigned int i, j;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -465,7 +465,8 @@ test_ring_burst_bulk_tests3(unsigned int api_type)
  * Enqueue till the ring is full and dequeue till the ring becomes empty.
  */
 static int
-test_ring_burst_bulk_tests4(unsigned int api_type)
+test_ring_burst_bulk_tests4(unsigned int api_type, unsigned int create_flags,
+	const char *tname)
 {
 	struct rte_ring *r;
 	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
@@ -474,12 +475,11 @@ test_ring_burst_bulk_tests4(unsigned int api_type)
 	unsigned int num_elems;
 
 	for (i = 0; i < RTE_DIM(esize); i++) {
-		test_ring_print_test_string("Test standard ring", api_type,
-						esize[i]);
+		test_ring_print_test_string(tname, api_type, esize[i]);
 
 		/* Create the ring */
 		r = test_ring_create("test_ring_burst_bulk_tests", esize[i],
-					RING_SIZE, SOCKET_ID_ANY, 0);
+					RING_SIZE, SOCKET_ID_ANY, create_flags);
 
 		/* alloc dummy object pointers */
 		src = test_ring_calloc(RING_SIZE * 2, esize[i]);
@@ -815,7 +815,23 @@ test_ring_with_exact_size(void)
 static int
 test_ring(void)
 {
+	int32_t rc;
 	unsigned int i, j;
+	const char *tname;
+
+	static const struct {
+		uint32_t create_flags;
+		const char *name;
+	} test_sync_modes[] = {
+		{
+			RING_F_MP_RTS_ENQ | RING_F_MC_RTS_DEQ,
+			"Test MT_RTS ring",
+		},
+		{
+			RING_F_MP_HTS_ENQ | RING_F_MC_HTS_DEQ,
+			"Test MT_HTS ring",
+		},
+	};
 
 	/* Negative test cases */
 	if (test_ring_negative_tests() < 0)
@@ -832,30 +848,67 @@ test_ring(void)
 	 * The test cases are split into smaller test cases to
 	 * help clang compile faster.
 	 */
+	tname = "Test standard ring";
+
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests1(i | j) < 0)
+			if (test_ring_burst_bulk_tests1(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests2(i | j) < 0)
+			if (test_ring_burst_bulk_tests2(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests3(i | j) < 0)
+			if (test_ring_burst_bulk_tests3(i | j, 0, tname) < 0)
 				goto test_fail;
 
 	for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST; j <<= 1)
 		for (i = TEST_RING_THREAD_DEF;
 					i <= TEST_RING_THREAD_MPMC; i <<= 1)
-			if (test_ring_burst_bulk_tests4(i | j) < 0)
+			if (test_ring_burst_bulk_tests4(i | j, 0, tname) < 0)
+				goto test_fail;
+
+	/* Burst and bulk operations with MT_RTS and MT_HTS sync modes */
+	for (i = 0; i != RTE_DIM(test_sync_modes); i++) {
+		for (j = TEST_RING_ELEM_BULK; j <= TEST_RING_ELEM_BURST;
+				j <<= 1) {
+
+			rc = test_ring_burst_bulk_tests1(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+
+			rc = test_ring_burst_bulk_tests2(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
 				goto test_fail;
 
+			rc = test_ring_burst_bulk_tests3(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+
+			rc = test_ring_burst_bulk_tests3(
+				TEST_RING_THREAD_DEF | j,
+				test_sync_modes[i].create_flags,
+				test_sync_modes[i].name);
+			if (rc < 0)
+				goto test_fail;
+		}
+	}
+
 	/* dump the ring status */
 	rte_ring_list_dump(stdout);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* [dpdk-dev] [PATCH v7 10/10] doc: update ring guide
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
                                 ` (8 preceding siblings ...)
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 09/10] test/ring: add functional tests for new sync modes Konstantin Ananyev
@ 2020-04-20 12:28               ` Konstantin Ananyev
  2020-04-21 11:31               ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring David Marchand
  10 siblings, 0 replies; 146+ messages in thread
From: Konstantin Ananyev @ 2020-04-20 12:28 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, david.marchand, jielong.zjl, Konstantin Ananyev

Changed the rte_ring chapter in programmer's guide to reflect
the addition of new sync modes and peek style API.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 doc/guides/prog_guide/ring_lib.rst | 95 ++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index 8cb2b2dd4..22c48e50d 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -349,6 +349,101 @@ even if only the first term of subtraction has overflowed:
     uint32_t entries = (prod_tail - cons_head);
     uint32_t free_entries = (mask + cons_tail -prod_head);
 
+Producer/consumer synchronization modes
+---------------------------------------
+
+rte_ring supports different synchronization modes for porducer and consumers.
+These modes can be specified at ring creation/init time via ``flags`` parameter.
+That should help  user to configure ring in way most suitable for his
+specific usage scenarios.
+Currently supported modes:
+
+MP/MC (default one)
+~~~~~~~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) mode. This is a default enqueue (/dequeue)
+mode for the ring. In this mode multiple threads can enqueue (/dequeue)
+objects to (/from) the ring. For 'classic' DPDK deployments (with one thread
+per core) this is usually most suitable and fastest synchronization mode.
+As a well known limitaion - it can perform quite pure on some overcommitted
+scenarios.
+
+SP/SC
+~~~~~
+Single-producer (/single-consumer) mode. In this mode only one thread at a time
+is allowed to enqueue (/dequeue) objects to (/from) the ring.
+
+MP_RTS/MC_RTS
+~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode.
+The main difference from original MP/MC algorithm is that
+tail value is increased not by every thread that finished enqueue/dequeue,
+but only by the last one.
+That allows threads to avoid spinning on ring tail value,
+leaving actual tail value change to the last thread at a given instance.
+That technique helps to avoid Lock-Waiter-Preemtion (LWP) problem on tail
+update and improves average enqueue/dequeue times on overcommitted systems.
+To achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
+one for head update, second for tail update.
+In comparison original MP/MC algorithm requires one 32-bit CAS
+for head update and waiting/spinning on tail value.
+
+MP_HTS/MC_HTS
+~~~~~~~~~~~~~
+
+Multi-producer (/multi-consumer) with Head/Tail Sync (HTS) mode.
+In that mode enqueue/dequeue operation is fully serialized:
+at any given moment only one enqueue/dequeue operation can proceed.
+This is achieved by allowing a thread to proceed with changing ``head.value``
+only when ``head.value == tail.value``.
+Both head and tail values are updated atomically (as one 64-bit value).
+To achieve that 64-bit CAS is used by head update routine.
+That technique also avoids Lock-Waiter-Preemtion (LWP) problem on tail
+update and helps to improve ring enqueue/dequeue behavior in overcommitted
+scenarios. Another advantage of fully serialized producer/consumer -
+it provides ability to implement MT safe peek API for rte_ring.
+
+
+Ring Peek API
+-------------
+
+For ring with serialized producer/consumer (HTS sync mode) it is  possible
+to split public enqueue/dequeue API into two phases:
+
+*   enqueue/dequeue start
+
+*   enqueue/dequeue finish
+
+That allows user to inspect objects in the ring without removing them
+from it (aka MT safe peek) and reserve space for the objects in the ring
+before actual enqueue.
+Note that this API is available only for two sync modes:
+
+*   Single Producer/Single Consumer (SP/SC)
+
+*   Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
+
+It is a user responsibility to create/init ring with appropriate sync modes
+selected. As an example of usage:
+
+.. code-block:: c
+
+    /* read 1 elem from the ring: */
+    uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
+    if (n != 0) {
+        /* examine object */
+        if (object_examine(obj) == KEEP)
+            /* decided to keep it in the ring. */
+            rte_ring_dequeue_finish(ring, 0);
+        else
+            /* decided to remove it from the ring. */
+            rte_ring_dequeue_finish(ring, n);
+    }
+
+Note that between ``_start_`` and ``_finish_`` none other thread can proceed
+with enqueue(/dequeue) operation till ``_finish_`` completes.
+
 References
 ----------
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v6 10/10] doc: update ring guide
  2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 10/10] doc: update ring guide Konstantin Ananyev
@ 2020-04-20 13:47               ` David Marchand
  2020-04-20 14:07                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 146+ messages in thread
From: David Marchand @ 2020-04-20 13:47 UTC (permalink / raw)
  To: Konstantin Ananyev, Honnappa Nagarahalli; +Cc: dev, jielong.zjl

On Mon, Apr 20, 2020 at 2:12 PM Konstantin Ananyev
<konstantin.ananyev@intel.com> wrote:
>
> Changed the rte_ring chapter in programmer's guide to reflect
> the addition of new sync modes and peek style API.

I'd like to split this as follows, see below.
I have a couple of typos too.


If you are fine with it, I'll proceed and squash when merging.


>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  doc/guides/prog_guide/ring_lib.rst | 95 ++++++++++++++++++++++++++++++
>  1 file changed, 95 insertions(+)
>
> diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
> index 8cb2b2dd4..668e67ecb 100644
> --- a/doc/guides/prog_guide/ring_lib.rst
> +++ b/doc/guides/prog_guide/ring_lib.rst
> @@ -349,6 +349,101 @@ even if only the first term of subtraction has overflowed:
>      uint32_t entries = (prod_tail - cons_head);
>      uint32_t free_entries = (mask + cons_tail -prod_head);
>

From here, this first part would go to patch2 "ring: prepare ring to
allow new sync schemes".

> +Producer/consumer synchronization modes
> +---------------------------------------
> +
> +rte_ring supports different synchronization modes for porducer and consumers.

producers*

> +These modes can be specified at ring creation/init time via ``flags`` parameter.
> +That should help  user to configure ring in way most suitable for his

double space to remove.
users?


> +specific usage scenarios.
> +Currently supported modes:
> +
> +MP/MC (default one)
> +~~~~~~~~~~~~~~~~~~~
> +
> +Multi-producer (/multi-consumer) mode. This is a default enqueue (/dequeue)
> +mode for the ring. In this mode multiple threads can enqueue (/dequeue)
> +objects to (/from) the ring. For 'classic' DPDK deployments (with one thread
> +per core) this is usually most suitable and fastest synchronization mode.

the most*

> +As a well known limitaion - it can perform quite pure on some overcommitted

limitation*

> +scenarios.
> +
> +SP/SC
> +~~~~~
> +Single-producer (/single-consumer) mode. In this mode only one thread at a time
> +is allowed to enqueue (/dequeue) objects to (/from) the ring.

End of first part.

Then the second part that would go to patch3 "ring: introduce RTS ring mode".

> +
> +MP_RTS/MC_RTS
> +~~~~~~~~~~~~~
> +
> +Multi-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode.
> +The main difference from original MP/MC algorithm is that

from the original*

> +tail value is increased not by every thread that finished enqueue/dequeue,
> +but only by the last one.
> +That allows threads to avoid spinning on ring tail value,
> +leaving actual tail value change to the last thread at a given instance.
> +That technique helps to avoid Lock-Waiter-Preemtion (LWP) problem on tail

the Lock-Waiter-Preemption*

> +update and improves average enqueue/dequeue times on overcommitted systems.
> +To achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
> +one for head update, second for tail update.
> +In comparison original MP/MC algorithm requires one 32-bit CAS

the original*

> +for head update and waiting/spinning on tail value.
> +

End of second part.

Third part that would go to patch 5 "ring: introduce HTS ring mode".


> +MP_HTS/MC_HTS
> +~~~~~~~~~~~~~
> +
> +Multi-producer (/multi-consumer) with Head/Tail Sync (HTS) mode.
> +In that mode enqueue/dequeue operation is fully serialized:
> +at any given moment only one enqueue/dequeue operation can proceed.
> +This is achieved by allowing a thread to proceed with changing ``head.value``
> +only when ``head.value == tail.value``.
> +Both head and tail values are updated atomically (as one 64-bit value).
> +To achieve that 64-bit CAS is used by head update routine.
> +That technique also avoids Lock-Waiter-Preemtion (LWP) problem on tail

the Lock-Waiter-Preemption*


> +update and helps to improve ring enqueue/dequeue behavior in overcommitted
> +scenarios. Another advantage of fully serialized producer/consumer -
> +it provides ability to implement MT safe peek API for rte_ring.

it provides the ability*

End of 3rd part.

Last part would go to patch 7 "ring: introduce peek style API".

> +
> +
> +Ring Peek API
> +-------------
> +
> +For ring with serialized producer/consumer (HTS sync mode) it is  possible

double space.

> +to split public enqueue/dequeue API into two phases:
> +
> +*   enqueue/dequeue start
> +
> +*   enqueue/dequeue finish
> +
> +That allows user to inspect objects in the ring without removing them
> +from it (aka MT safe peek) and reserve space for the objects in the ring
> +before actual enqueue.
> +Note that this API is available only for two sync modes:
> +
> +*   Single Producer/Single Consumer (SP/SC)
> +
> +*   Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
> +
> +It is a user responsibility to create/init ring with appropriate sync modes
> +selected. As an example of usage:
> +
> +.. code-block:: c
> +
> +    /* read 1 elem from the ring: */
> +    uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
> +    if (n != 0) {
> +        /* examine object */
> +        if (object_examine(obj) == KEEP)
> +            /* decided to keep it in the ring. */
> +            rte_ring_dequeue_finish(ring, 0);
> +        else
> +            /* decided to remove it from the ring. */
> +            rte_ring_dequeue_finish(ring, n);
> +    }
> +
> +Note that between ``_start_`` and ``_finish_`` none other thread can proceed
> +with enqueue(/dequeue) operation till ``_finish_`` completes.
> +



-- 
David Marchand


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v6 10/10] doc: update ring guide
  2020-04-20 13:47               ` David Marchand
@ 2020-04-20 14:07                 ` Ananyev, Konstantin
  0 siblings, 0 replies; 146+ messages in thread
From: Ananyev, Konstantin @ 2020-04-20 14:07 UTC (permalink / raw)
  To: David Marchand, Honnappa Nagarahalli; +Cc: dev, jielong.zjl


> 
> On Mon, Apr 20, 2020 at 2:12 PM Konstantin Ananyev
> <konstantin.ananyev@intel.com> wrote:
> >
> > Changed the rte_ring chapter in programmer's guide to reflect
> > the addition of new sync modes and peek style API.
> 
> I'd like to split this as follows, see below.
> I have a couple of typos too.
> 
> 
> If you are fine with it, I'll proceed and squash when merging.

Yes, I am.
Thanks
Konstantin

> 
> 
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  doc/guides/prog_guide/ring_lib.rst | 95 ++++++++++++++++++++++++++++++
> >  1 file changed, 95 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
> > index 8cb2b2dd4..668e67ecb 100644
> > --- a/doc/guides/prog_guide/ring_lib.rst
> > +++ b/doc/guides/prog_guide/ring_lib.rst
> > @@ -349,6 +349,101 @@ even if only the first term of subtraction has overflowed:
> >      uint32_t entries = (prod_tail - cons_head);
> >      uint32_t free_entries = (mask + cons_tail -prod_head);
> >
> 
> From here, this first part would go to patch2 "ring: prepare ring to
> allow new sync schemes".
> 
> > +Producer/consumer synchronization modes
> > +---------------------------------------
> > +
> > +rte_ring supports different synchronization modes for porducer and consumers.
> 
> producers*
> 
> > +These modes can be specified at ring creation/init time via ``flags`` parameter.
> > +That should help  user to configure ring in way most suitable for his
> 
> double space to remove.
> users?
> 
> 
> > +specific usage scenarios.
> > +Currently supported modes:
> > +
> > +MP/MC (default one)
> > +~~~~~~~~~~~~~~~~~~~
> > +
> > +Multi-producer (/multi-consumer) mode. This is a default enqueue (/dequeue)
> > +mode for the ring. In this mode multiple threads can enqueue (/dequeue)
> > +objects to (/from) the ring. For 'classic' DPDK deployments (with one thread
> > +per core) this is usually most suitable and fastest synchronization mode.
> 
> the most*
> 
> > +As a well known limitaion - it can perform quite pure on some overcommitted
> 
> limitation*
> 
> > +scenarios.
> > +
> > +SP/SC
> > +~~~~~
> > +Single-producer (/single-consumer) mode. In this mode only one thread at a time
> > +is allowed to enqueue (/dequeue) objects to (/from) the ring.
> 
> End of first part.
> 
> Then the second part that would go to patch3 "ring: introduce RTS ring mode".
> 
> > +
> > +MP_RTS/MC_RTS
> > +~~~~~~~~~~~~~
> > +
> > +Multi-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode.
> > +The main difference from original MP/MC algorithm is that
> 
> from the original*
> 
> > +tail value is increased not by every thread that finished enqueue/dequeue,
> > +but only by the last one.
> > +That allows threads to avoid spinning on ring tail value,
> > +leaving actual tail value change to the last thread at a given instance.
> > +That technique helps to avoid Lock-Waiter-Preemtion (LWP) problem on tail
> 
> the Lock-Waiter-Preemption*
> 
> > +update and improves average enqueue/dequeue times on overcommitted systems.
> > +To achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
> > +one for head update, second for tail update.
> > +In comparison original MP/MC algorithm requires one 32-bit CAS
> 
> the original*
> 
> > +for head update and waiting/spinning on tail value.
> > +
> 
> End of second part.
> 
> Third part that would go to patch 5 "ring: introduce HTS ring mode".
> 
> 
> > +MP_HTS/MC_HTS
> > +~~~~~~~~~~~~~
> > +
> > +Multi-producer (/multi-consumer) with Head/Tail Sync (HTS) mode.
> > +In that mode enqueue/dequeue operation is fully serialized:
> > +at any given moment only one enqueue/dequeue operation can proceed.
> > +This is achieved by allowing a thread to proceed with changing ``head.value``
> > +only when ``head.value == tail.value``.
> > +Both head and tail values are updated atomically (as one 64-bit value).
> > +To achieve that 64-bit CAS is used by head update routine.
> > +That technique also avoids Lock-Waiter-Preemtion (LWP) problem on tail
> 
> the Lock-Waiter-Preemption*
> 
> 
> > +update and helps to improve ring enqueue/dequeue behavior in overcommitted
> > +scenarios. Another advantage of fully serialized producer/consumer -
> > +it provides ability to implement MT safe peek API for rte_ring.
> 
> it provides the ability*
> 
> End of 3rd part.
> 
> Last part would go to patch 7 "ring: introduce peek style API".
> 
> > +
> > +
> > +Ring Peek API
> > +-------------
> > +
> > +For ring with serialized producer/consumer (HTS sync mode) it is  possible
> 
> double space.
> 
> > +to split public enqueue/dequeue API into two phases:
> > +
> > +*   enqueue/dequeue start
> > +
> > +*   enqueue/dequeue finish
> > +
> > +That allows user to inspect objects in the ring without removing them
> > +from it (aka MT safe peek) and reserve space for the objects in the ring
> > +before actual enqueue.
> > +Note that this API is available only for two sync modes:
> > +
> > +*   Single Producer/Single Consumer (SP/SC)
> > +
> > +*   Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
> > +
> > +It is a user responsibility to create/init ring with appropriate sync modes
> > +selected. As an example of usage:
> > +
> > +.. code-block:: c
> > +
> > +    /* read 1 elem from the ring: */
> > +    uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
> > +    if (n != 0) {
> > +        /* examine object */
> > +        if (object_examine(obj) == KEEP)
> > +            /* decided to keep it in the ring. */
> > +            rte_ring_dequeue_finish(ring, 0);
> > +        else
> > +            /* decided to remove it from the ring. */
> > +            rte_ring_dequeue_finish(ring, n);
> > +    }
> > +
> > +Note that between ``_start_`` and ``_finish_`` none other thread can proceed
> > +with enqueue(/dequeue) operation till ``_finish_`` completes.
> > +
> 
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: [dpdk-dev] [PATCH v7 00/10] New sync modes for ring
  2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
                                 ` (9 preceding siblings ...)
  2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 10/10] doc: update ring guide Konstantin Ananyev
@ 2020-04-21 11:31               ` David Marchand
  10 siblings, 0 replies; 146+ messages in thread
From: David Marchand @ 2020-04-21 11:31 UTC (permalink / raw)
  To: Konstantin Ananyev, Honnappa Nagarahalli
  Cc: dev, jielong.zjl, Pavan Nikhilesh, Jerin Jacob Kollanukkaran,
	Thomas Monjalon

On Mon, Apr 20, 2020 at 2:28 PM Konstantin Ananyev
<konstantin.ananyev@intel.com> wrote:
> These days more and more customers use(/try to use) DPDK based apps within
> overcommitted systems (multiple acttive threads over same pysical cores):
> VM, container deployments, etc.
> One quite common problem they hit:
> Lock-Holder-Preemption/Lock-Waiter-Preemption with rte_ring.
> LHP is quite a common problem for spin-based sync primitives
> (spin-locks, etc.) on overcommitted systems.
> The situation gets much worse when some sort of
> fair-locking technique is used (ticket-lock, etc.).
> As now not only lock-owner but also lock-waiters scheduling
> order matters a lot (LWP).
> These two problems are well-known for kernel within VMs:
> http://www-archive.xenproject.org/files/xensummitboston08/LHP.pdf
> https://www.cs.hs-rm.de/~kaiser/events/wamos2017/Slides/selcuk.pdf
> The problem with rte_ring is that while head accusion is sort of
> un-fair locking, waiting on tail is very similar to ticket lock schema -
> tail has to be updated in particular order.
> That makes current rte_ring implementation to perform
> really pure on some overcommited scenarios.
> It is probably not possible to completely resolve LHP problem in
> userspace only (without some kernel communication/intervention).
> But removing fairness at tail update helps to avoid LWP and
> can mitigate the situation significantly.
> This patch proposes two new optional ring synchronization modes:
> 1) Head/Tail Sync (HTS) mode
> In that mode enqueue/dequeue operation is fully serialized:
>     only one thread at a time is allowed to perform given op.
>     As another enhancement provide ability to split enqueue/dequeue
>     operation into two phases:
>       - enqueue/dequeue start
>       - enqueue/dequeue finish
>     That allows user to inspect objects in the ring without removing
>     them from it (aka MT safe peek).
> 2) Relaxed Tail Sync (RTS)
> The main difference from original MP/MC algorithm is that
> tail value is increased not by every thread that finished enqueue/dequeue,
> but only by the last one.
> That allows threads to avoid spinning on ring tail value,
> leaving actual tail value change to the last thread in the update queue.
>
> Note that these new sync modes are optional.
> For current rte_ring users nothing should change
> (both in terms of API/ABI and performance).
> Existing sync modes MP/MC,SP/SC kept untouched, set up in the same
> way (via flags and _init_), and MP/MC remains as default one.
> The only thing that changed:
> Format of prod/cons now could differ depending on mode selected at _init_.
> So user has to stick with one sync model through whole ring lifetime.
> In other words, user can't create a ring for let say SP mode and then
> in the middle of data-path change his mind and start using MP_RTS mode.
> For existing modes (SP/MP, SC/MC) format remains the same and
> user can still use them interchangeably, though of course it is an
> error prone practice.
>
> Test results on IA (see below) show significant improvements
> for average enqueue/dequeue op times on overcommitted systems.
> For 'classic' DPDK deployments (one thread per core) original MP/MC
> algorithm still shows best numbers, though for 64-bit target
> RTS numbers are not that far away.
> Numbers were produced by new UT test-case: ring_stress_autotest, i.e.:
> echo ring_stress_autotest | ./dpdk-test -n 4 --lcores='...'
>
> X86_64 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> DEQ+ENQ average cycles/obj
>                                                 MP/MC      HTS     RTS
> 1thread@1core(--lcores=6-7)                     8.00       8.15    8.99
> 2thread@2core(--lcores=6-8)                     19.14      19.61   20.35
> 4thread@4core(--lcores=6-10)                    29.43      29.79   31.82
> 8thread@8core(--lcores=6-14)                    110.59     192.81  119.50
> 16thread@16core(--lcores=6-22)                  461.03     813.12  495.59
> 32thread/@32core(--lcores='6-22,55-70')         982.90     1972.38 1160.51
>
> 2thread@1core(--lcores='6,(10-11)@7'            20140.50   23.58   25.14
> 4thread@2core(--lcores='6,(10-11)@7,(20-21)@8'  153680.60  76.88   80.05
> 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  280314.32  294.72  318.79
> 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 643176.59  1144.02 1175.14
> 32thread@2core(--lcores='6,(10-25)@7,(30-45)@8' 4264238.80 4627.48 4892.68
>
> 8thread@2core(--lcores='6,(10-17)@(7,8))'       321085.98  298.59  307.47
> 16thread@4core(--lcores='6,(20-35)@(7-10))'     1900705.61 575.35  678.29
> 32thread@4core(--lcores='6,(20-51)@(7-10))'     5510445.85 2164.36 2714.12
>
> i686 @ Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> DEQ+ENQ average cycles/obj
>                                                 MP/MC      HTS     RTS
> 1thread@1core(--lcores=6-7)                     7.85       12.13   11.31
> 2thread@2core(--lcores=6-8)                     17.89      24.52   21.86
> 8thread@8core(--lcores=6-14)                    32.58      354.20  54.58
> 32thread/@32core(--lcores='6-22,55-70')         813.77     6072.41 2169.91
>
> 2thread@1core(--lcores='6,(10-11)@7'            16095.00   36.06   34.74
> 8thread@2core(--lcores='6,(10-13)@7,(20-23)@8'  1140354.54 346.61  361.57
> 16thread@2core(--lcores='6,(10-17)@7,(20-27)@8' 1920417.86 1314.90 1416.65
>
> 8thread@2core(--lcores='6,(10-17)@(7,8))'       594358.61  332.70  357.74
> 32thread@4core(--lcores='6,(20-51)@(7-10))'     5319896.86 2836.44 3028.87

I fixed a couple of typos and split the doc updates.

Series applied with the patch from Pavan.
Thanks for the work Konstantin, Honnappa.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 146+ messages in thread

end of thread, other threads:[~2020-04-21 11:31 UTC | newest]

Thread overview: 146+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24 11:35 [dpdk-dev] [RFC 0/6] New sync modes for ring Konstantin Ananyev
2020-02-24 11:35 ` [dpdk-dev] [RFC 1/6] test/ring: add contention stress test Konstantin Ananyev
2020-02-24 11:35 ` [dpdk-dev] [RFC 2/6] ring: rework ring layout to allow new sync schemes Konstantin Ananyev
2020-02-24 11:35 ` [dpdk-dev] [RFC 3/6] ring: introduce RTS ring mode Konstantin Ananyev
2020-02-24 11:35 ` [dpdk-dev] [RFC 4/6] test/ring: add contention stress test for RTS ring Konstantin Ananyev
2020-02-24 11:35 ` [dpdk-dev] [RFC 5/6] ring: introduce HTS ring mode Konstantin Ananyev
2020-03-25 20:44   ` Honnappa Nagarahalli
2020-03-26 12:26     ` Ananyev, Konstantin
2020-02-24 11:35 ` [dpdk-dev] [RFC 6/6] test/ring: add contention stress test for HTS ring Konstantin Ananyev
2020-02-24 16:59 ` [dpdk-dev] [RFC 0/6] New sync modes for ring Stephen Hemminger
2020-02-24 17:59   ` Jerin Jacob
2020-02-24 19:35     ` Stephen Hemminger
2020-02-24 20:52       ` Honnappa Nagarahalli
2020-02-25 11:45         ` Ananyev, Konstantin
2020-02-25 13:41       ` Ananyev, Konstantin
2020-02-26 16:53         ` Morten Brørup
2020-02-27 10:31         ` Jerin Jacob
2020-02-28  0:17           ` David Christensen
2020-03-20 16:45             ` Ananyev, Konstantin
2020-02-25  0:58     ` Honnappa Nagarahalli
2020-02-25 15:14       ` Ananyev, Konstantin
2020-03-25 20:43 ` Honnappa Nagarahalli
2020-03-26  1:50   ` Ananyev, Konstantin
2020-03-30 21:29     ` Honnappa Nagarahalli
2020-03-30 23:37       ` Honnappa Nagarahalli
2020-03-31 17:21         ` Ananyev, Konstantin
2020-03-31 16:43 ` [dpdk-dev] [PATCH v1 0/8] " Konstantin Ananyev
2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 1/8] test/ring: add contention stress test Konstantin Ananyev
2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 2/8] ring: prepare ring to allow new sync schemes Konstantin Ananyev
2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 3/8] ring: introduce RTS ring mode Konstantin Ananyev
2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 4/8] test/ring: add contention stress test for RTS ring Konstantin Ananyev
2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 5/8] ring: introduce HTS ring mode Konstantin Ananyev
2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 6/8] test/ring: add contention stress test for HTS ring Konstantin Ananyev
2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 7/8] ring: introduce peek style API Konstantin Ananyev
2020-03-31 16:43   ` [dpdk-dev] [PATCH v1 8/8] test/ring: add stress test for MT peek API Konstantin Ananyev
2020-04-02 22:09   ` [dpdk-dev] [PATCH v2 0/9] New sync modes for ring Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 1/9] test/ring: add contention stress test Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 3/9] ring: introduce RTS ring mode Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 5/9] ring: introduce HTS ring mode Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 7/9] ring: introduce peek style API Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
2020-04-02 22:09     ` [dpdk-dev] [PATCH v2 9/9] ring: add C11 memory model for new sync modes Konstantin Ananyev
2020-04-03 17:42     ` [dpdk-dev] [PATCH v3 0/9] New sync modes for ring Konstantin Ananyev
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 1/9] test/ring: add contention stress test Konstantin Ananyev
2020-04-08  4:59         ` Honnappa Nagarahalli
2020-04-09 12:36           ` Ananyev, Konstantin
2020-04-09 13:00             ` Ananyev, Konstantin
2020-04-10 18:01               ` Honnappa Nagarahalli
2020-04-10 16:59             ` Honnappa Nagarahalli
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
2020-04-08  4:59         ` Honnappa Nagarahalli
2020-04-09 13:39           ` Ananyev, Konstantin
2020-04-10 20:15             ` Honnappa Nagarahalli
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 3/9] ring: introduce RTS ring mode Konstantin Ananyev
2020-04-04 17:27         ` Wang, Haiyue
2020-04-08  5:00         ` Honnappa Nagarahalli
2020-04-09 14:52           ` Ananyev, Konstantin
2020-04-10 23:10             ` Honnappa Nagarahalli
2020-04-13 14:29               ` David Marchand
2020-04-13 16:42                 ` Honnappa Nagarahalli
2020-04-14 13:47                   ` David Marchand
2020-04-14 15:57                     ` Honnappa Nagarahalli
2020-04-14 16:21                       ` Ananyev, Konstantin
2020-04-14 13:18               ` Ananyev, Konstantin
2020-04-14 15:58                 ` Honnappa Nagarahalli
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 5/9] ring: introduce HTS ring mode Konstantin Ananyev
2020-04-13 23:27         ` Honnappa Nagarahalli
2020-04-14 16:12           ` Ananyev, Konstantin
2020-04-14 17:06             ` Honnappa Nagarahalli
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 7/9] ring: introduce peek style API Konstantin Ananyev
2020-04-14  3:45         ` Honnappa Nagarahalli
2020-04-14 16:47           ` Ananyev, Konstantin
2020-04-14 17:30             ` Honnappa Nagarahalli
2020-04-14 22:24               ` Ananyev, Konstantin
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
2020-04-03 17:42       ` [dpdk-dev] [PATCH v3 9/9] ring: add C11 memory model for new sync modes Konstantin Ananyev
2020-04-04 14:16         ` [dpdk-dev] 回复:[PATCH " 周介龙
2020-04-14  4:28         ` [dpdk-dev] [PATCH " Honnappa Nagarahalli
2020-04-14 18:29           ` Ananyev, Konstantin
2020-04-15 20:28           ` Ananyev, Konstantin
2020-04-17 13:36       ` [dpdk-dev] [PATCH v4 0/9] New sync modes for ring Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 1/9] test/ring: add contention stress test Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 3/9] ring: introduce RTS ring mode Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 5/9] ring: introduce HTS ring mode Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 7/9] ring: introduce peek style API Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
2020-04-17 13:36         ` [dpdk-dev] [PATCH v4 9/9] test/ring: add functional tests for new sync modes Konstantin Ananyev
2020-04-18 16:32         ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Konstantin Ananyev
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 1/9] test/ring: add contention stress test Konstantin Ananyev
2020-04-19  2:30             ` Honnappa Nagarahalli
2020-04-19  8:03               ` David Marchand
2020-04-19 11:47                 ` Ananyev, Konstantin
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 2/9] ring: prepare ring to allow new sync schemes Konstantin Ananyev
2020-04-19  2:31             ` Honnappa Nagarahalli
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 3/9] ring: introduce RTS ring mode Konstantin Ananyev
2020-04-19  2:31             ` Honnappa Nagarahalli
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 4/9] test/ring: add contention stress test for RTS ring Konstantin Ananyev
2020-04-19  2:31             ` Honnappa Nagarahalli
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 5/9] ring: introduce HTS ring mode Konstantin Ananyev
2020-04-19  2:31             ` Honnappa Nagarahalli
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 6/9] test/ring: add contention stress test for HTS ring Konstantin Ananyev
2020-04-19  2:31             ` Honnappa Nagarahalli
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 7/9] ring: introduce peek style API Konstantin Ananyev
2020-04-19  2:31             ` Honnappa Nagarahalli
2020-04-19 18:32               ` Ananyev, Konstantin
2020-04-19 19:12                 ` Ananyev, Konstantin
2020-04-19 21:14                   ` Honnappa Nagarahalli
2020-04-19 22:41                     ` Ananyev, Konstantin
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 8/9] test/ring: add stress test for MT peek API Konstantin Ananyev
2020-04-19  2:32             ` Honnappa Nagarahalli
2020-04-18 16:32           ` [dpdk-dev] [PATCH v5 9/9] test/ring: add functional tests for new sync modes Konstantin Ananyev
2020-04-19  2:32             ` Honnappa Nagarahalli
2020-04-19  2:32           ` [dpdk-dev] [PATCH v5 0/9] New sync modes for ring Honnappa Nagarahalli
2020-04-20 12:11           ` [dpdk-dev] [PATCH v6 00/10] " Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 01/10] test/ring: add contention stress test Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 02/10] ring: prepare ring to allow new sync schemes Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 03/10] ring: introduce RTS ring mode Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 04/10] test/ring: add contention stress test for RTS ring Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 05/10] ring: introduce HTS ring mode Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 06/10] test/ring: add contention stress test for HTS ring Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 07/10] ring: introduce peek style API Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 08/10] test/ring: add stress test for MT peek API Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 09/10] test/ring: add functional tests for new sync modes Konstantin Ananyev
2020-04-20 12:11             ` [dpdk-dev] [PATCH v6 10/10] doc: update ring guide Konstantin Ananyev
2020-04-20 13:47               ` David Marchand
2020-04-20 14:07                 ` Ananyev, Konstantin
2020-04-20 12:28             ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 01/10] test/ring: add contention stress test Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 02/10] ring: prepare ring to allow new sync schemes Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 03/10] ring: introduce RTS ring mode Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 04/10] test/ring: add contention stress test for RTS ring Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 05/10] ring: introduce HTS ring mode Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 06/10] test/ring: add contention stress test for HTS ring Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 07/10] ring: introduce peek style API Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 08/10] test/ring: add stress test for MT peek API Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 09/10] test/ring: add functional tests for new sync modes Konstantin Ananyev
2020-04-20 12:28               ` [dpdk-dev] [PATCH v7 10/10] doc: update ring guide Konstantin Ananyev
2020-04-21 11:31               ` [dpdk-dev] [PATCH v7 00/10] New sync modes for ring David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).