V4: Include the ring perf test case enhancement patch in the series. V3: Allow experimental API for meson build V2: Fix the coding style issue(commit message line too long) V1: To flush a ring not in use, dequeue one by one is wasting cpu cycles. The patch is to just resetting the head and tail indices to save cpu cycle. Gavin Hu (2): ring: add reset api to flush the ring when not in use hash: flush the rings instead of dequeuing one by one Joyce Kong (1): test/ring: ring perf test case enhancement lib/librte_hash/Makefile | 2 +- lib/librte_hash/meson.build | 3 ++ lib/librte_hash/rte_cuckoo_hash.c | 11 ++--- lib/librte_ring/rte_ring.h | 20 +++++++++ lib/librte_ring/rte_ring_version.map | 7 +++ test/test/test_ring_perf.c | 82 ++++++++++++++++++++++++++++++++++-- 6 files changed, 114 insertions(+), 11 deletions(-) -- 2.7.4
From: Joyce Kong <joyce.kong@arm.com> Run ring perf test on all available cores to really verify MPMC operations. The old way of running on a pair of cores is not enough for MPMC rings. We used this test case for ring optimization and it was really helpful for measuring the ring performance in multi-core environment. Suggested-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Dharmik Thakkar <Dharmik.Thakkar@arm.com> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> --- test/test/test_ring_perf.c | 82 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 79 insertions(+), 3 deletions(-) diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c index ebb3939..01c6937 100644 --- a/test/test/test_ring_perf.c +++ b/test/test/test_ring_perf.c @@ -9,7 +9,7 @@ #include <rte_cycles.h> #include <rte_launch.h> #include <rte_pause.h> - +#include <string.h> #include "test.h" /* @@ -20,6 +20,7 @@ * * Empty ring dequeue * * Enqueue/dequeue of bursts in 1 threads * * Enqueue/dequeue of bursts in 2 threads + * * Enqueue/dequeue of bursts in all available threads */ #define RING_NAME "RING_PERF" @@ -248,9 +249,80 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, } } +static rte_atomic32_t synchro; +static uint64_t queue_count[RTE_MAX_LCORE]; + +#define TIME_MS 100 + +static int +load_loop_fn(void *p) +{ + uint64_t time_diff = 0; + uint64_t begin = 0; + uint64_t hz = rte_get_timer_hz(); + uint64_t lcount = 0; + const unsigned int lcore = rte_lcore_id(); + struct thread_params *params = p; + void *burst[MAX_BURST] = {0}; + + /* wait synchro for slaves */ + if (lcore != rte_get_master_lcore()) + while (rte_atomic32_read(&synchro) == 0) + rte_pause(); + + begin = rte_get_timer_cycles(); + while (time_diff < hz * TIME_MS / 1000) { + rte_ring_mp_enqueue_bulk(params->r, burst, params->size, NULL); + rte_ring_mc_dequeue_bulk(params->r, burst, params->size, NULL); + lcount++; + time_diff = rte_get_timer_cycles() - begin; + } + queue_count[lcore] = lcount; + return 0; +} + +static int +run_on_all_cores(struct rte_ring *r) +{ + uint64_t total = 0; + unsigned int i, c; + struct thread_params param; + + memset(¶m, 0, sizeof(struct thread_params)); + for (i = 0; i < RTE_DIM(bulk_sizes); i++) { + printf("\nBulk enq/dequeue count on size %u\n", bulk_sizes[i]); + param.size = bulk_sizes[i]; + param.r = r; + + /* clear synchro and start slaves */ + rte_atomic32_set(&synchro, 0); + if (rte_eal_mp_remote_launch(load_loop_fn, + ¶m, SKIP_MASTER) < 0) + return -1; + + /* start synchro and launch test on master */ + rte_atomic32_set(&synchro, 1); + load_loop_fn(¶m); + + rte_eal_mp_wait_lcore(); + + RTE_LCORE_FOREACH(c) { + printf("Core [%u] count = %"PRIu64"\n", + c, queue_count[c]); + total += queue_count[c]; + } + + printf("Total count (size: %u): %"PRIu64"\n", bulk_sizes[i], + total); + } + + return 0; +} + /* - * Test function that determines how long an enqueue + dequeue of a single item - * takes on a single lcore. Result is for comparison with the bulk enq+deq. + * Test function that determines how long an enqueue + dequeue of a single + * item takes on a single lcore. Result is for comparison with the bulk + * enq+deq. */ static void test_single_enqueue_dequeue(struct rte_ring *r) @@ -394,6 +466,10 @@ test_ring_perf(void) printf("\n### Testing using two NUMA nodes ###\n"); run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk); } + + printf("\n### Testing using all slave nodes ###\n"); + run_on_all_cores(r); + rte_ring_free(r); return 0; } -- 2.7.4
Currently, the flush is done by dequeuing the ring in a while loop. It is much simpler to flush the queue by resetting the head and tail indices. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> --- lib/librte_ring/rte_ring.h | 20 ++++++++++++++++++++ lib/librte_ring/rte_ring_version.map | 7 +++++++ 2 files changed, 27 insertions(+) diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index af5444a..2830300 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -671,6 +671,26 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p) } /** + * Flush a ring. + * + * This function flush all the elements in a ring + * + * @b EXPERIMENTAL: this API may change without prior notice + * + * @warning + * Make sure the ring is not in use while calling this function. + * + * @param r + * A pointer to the ring structure. + */ +static inline void __rte_experimental +rte_ring_reset(struct rte_ring *r) +{ + r->prod.head = r->cons.head = 0; + r->prod.tail = r->cons.tail = 0; +} + +/** * Return the number of entries in a ring. * * @param r diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map index d935efd..581d9ca 100644 --- a/lib/librte_ring/rte_ring_version.map +++ b/lib/librte_ring/rte_ring_version.map @@ -17,3 +17,10 @@ DPDK_2.2 { rte_ring_free; } DPDK_2.0; + +EXPERIMENTAL { + global: + + rte_ring_reset; + +}; -- 2.7.4
Within rte_hash_reset, calling a while loop to dequeue one by one from the ring, while not using them at all, is wasting cycles, The patch just flush the ring by resetting the indices can save cpu cycles. Fixes: b26473ff8f4a ("hash: add reset function") Fixes: 75706568a7eb ("hash: add extendable bucket feature") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> --- lib/librte_hash/Makefile | 2 +- lib/librte_hash/meson.build | 3 +++ lib/librte_hash/rte_cuckoo_hash.c | 11 ++++------- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index c8c435d..5669d83 100644 --- a/lib/librte_hash/Makefile +++ b/lib/librte_hash/Makefile @@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_hash.a -CFLAGS += -O3 +CFLAGS += -O3 -DALLOW_EXPERIMENTAL_API CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) LDLIBS += -lrte_eal -lrte_ring diff --git a/lib/librte_hash/meson.build b/lib/librte_hash/meson.build index efc06ed..ebf70de 100644 --- a/lib/librte_hash/meson.build +++ b/lib/librte_hash/meson.build @@ -14,3 +14,6 @@ headers = files('rte_cmp_arm64.h', sources = files('rte_cuckoo_hash.c', 'rte_fbk_hash.c') deps += ['ring'] + +# rte ring reset is not yet part of stable API +allow_experimental_apis = true diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c index c01489b..4b08049 100644 --- a/lib/librte_hash/rte_cuckoo_hash.c +++ b/lib/librte_hash/rte_cuckoo_hash.c @@ -559,7 +559,6 @@ __hash_rw_reader_unlock(const struct rte_hash *h) void rte_hash_reset(struct rte_hash *h) { - void *ptr; uint32_t tot_ring_cnt, i; if (h == NULL) @@ -570,16 +569,14 @@ rte_hash_reset(struct rte_hash *h) memset(h->key_store, 0, h->key_entry_size * (h->entries + 1)); *h->tbl_chng_cnt = 0; - /* clear the free ring */ - while (rte_ring_dequeue(h->free_slots, &ptr) == 0) - continue; + /* reset the free ring */ + rte_ring_reset(h->free_slots); - /* clear free extendable bucket ring and memory */ + /* flush free extendable bucket ring and memory */ if (h->ext_table_support) { memset(h->buckets_ext, 0, h->num_buckets * sizeof(struct rte_hash_bucket)); - while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0) - continue; + rte_ring_reset(h->free_ext_bkts); } /* Repopulate the free slots ring. Entry zero is reserved for key misses */ -- 2.7.4
> Suggested-by: Gavin Hu <gavin.hu@arm.com>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Dharmik Thakkar <Dharmik.Thakkar@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Small comment about the email addresses,
Please do not use uppercase so we can match/grep more easily.
02/01/2019 01:55, Gavin Hu:
> V4: Include the ring perf test case enhancement patch in the series.
>
> V3: Allow experimental API for meson build
>
> V2: Fix the coding style issue(commit message line too long)
>
> V1: To flush a ring not in use, dequeue one by one is wasting cpu cycles.
> The patch is to just resetting the head and tail indices to save cpu
> cycle.
It is too late for adding this API in 19.02, but we should
review and give opinion, so it will be ready to integrate
in early February.
> > 02/01/2019 01:55, Gavin Hu: > > V4: Include the ring perf test case enhancement patch in the series. This change is not related to this patch. Should be a separate patch? There were comments provided: http://mails.dpdk.org/archives/dev/2018-December/121893.html http://mails.dpdk.org/archives/dev/2018-December/122157.html Do you plan to address these? > > > > V3: Allow experimental API for meson build > > > > V2: Fix the coding style issue(commit message line too long) > > > > V1: To flush a ring not in use, dequeue one by one is wasting cpu cycles. > > The patch is to just resetting the head and tail indices to save cpu > > cycle. > > It is too late for adding this API in 19.02, but we should review and give > opinion, so it will be ready to integrate in early February. >
V5: 1. Commit message tweaking for ring test case enhancement patch 2. Upper to lower for mails to make match/grep more easily V4: 1. Include the ring perf test case enhancement patch in the series. 2. Replace ARRAY_SIZE with RTE_DIM. 3. Call memset to avoid clang compling complains. V3: Allow experimental API for meson build V2: Fix the coding style issue(commit message line too long) V1: To flush a ring not in use, dequeue one by one is wasting cpu cycles. The patch is to just resetting the head and tail indices to save cpu cycle. Gavin Hu (2): ring: add reset API to flush the ring when not in use hash: flush the rings instead of dequeuing one by one Joyce Kong (1): test/ring: ring perf test case enhancement lib/librte_hash/Makefile | 2 +- lib/librte_hash/meson.build | 3 ++ lib/librte_hash/rte_cuckoo_hash.c | 11 ++--- lib/librte_ring/rte_ring.h | 20 +++++++++ lib/librte_ring/rte_ring_version.map | 7 +++ test/test/test_ring_perf.c | 82 ++++++++++++++++++++++++++++++++++-- 6 files changed, 114 insertions(+), 11 deletions(-) -- 2.7.4
From: Joyce Kong <joyce.kong@arm.com> Run ring perf test on all available cores to really verify MPMC operations. The old way of running on a pair of cores is not enough for MPMC rings. Suggested-by: gavin hu <gavin.hu@arm.com> Signed-off-by: joyce kong <joyce.kong@arm.com> Reviewed-by: ruifeng wang <ruifeng.wang@arm.com> Reviewed-by: honnappa nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: dharmik thakkar <dharmik.thakkar@arm.com> Reviewed-by: ola liljedahl <ola.liljedahl@arm.com> Reviewed-by: gavin hu <gavin.hu@arm.com> --- test/test/test_ring_perf.c | 82 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 79 insertions(+), 3 deletions(-) diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c index ebb3939..01c6937 100644 --- a/test/test/test_ring_perf.c +++ b/test/test/test_ring_perf.c @@ -9,7 +9,7 @@ #include <rte_cycles.h> #include <rte_launch.h> #include <rte_pause.h> - +#include <string.h> #include "test.h" /* @@ -20,6 +20,7 @@ * * Empty ring dequeue * * Enqueue/dequeue of bursts in 1 threads * * Enqueue/dequeue of bursts in 2 threads + * * Enqueue/dequeue of bursts in all available threads */ #define RING_NAME "RING_PERF" @@ -248,9 +249,80 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, } } +static rte_atomic32_t synchro; +static uint64_t queue_count[RTE_MAX_LCORE]; + +#define TIME_MS 100 + +static int +load_loop_fn(void *p) +{ + uint64_t time_diff = 0; + uint64_t begin = 0; + uint64_t hz = rte_get_timer_hz(); + uint64_t lcount = 0; + const unsigned int lcore = rte_lcore_id(); + struct thread_params *params = p; + void *burst[MAX_BURST] = {0}; + + /* wait synchro for slaves */ + if (lcore != rte_get_master_lcore()) + while (rte_atomic32_read(&synchro) == 0) + rte_pause(); + + begin = rte_get_timer_cycles(); + while (time_diff < hz * TIME_MS / 1000) { + rte_ring_mp_enqueue_bulk(params->r, burst, params->size, NULL); + rte_ring_mc_dequeue_bulk(params->r, burst, params->size, NULL); + lcount++; + time_diff = rte_get_timer_cycles() - begin; + } + queue_count[lcore] = lcount; + return 0; +} + +static int +run_on_all_cores(struct rte_ring *r) +{ + uint64_t total = 0; + unsigned int i, c; + struct thread_params param; + + memset(¶m, 0, sizeof(struct thread_params)); + for (i = 0; i < RTE_DIM(bulk_sizes); i++) { + printf("\nBulk enq/dequeue count on size %u\n", bulk_sizes[i]); + param.size = bulk_sizes[i]; + param.r = r; + + /* clear synchro and start slaves */ + rte_atomic32_set(&synchro, 0); + if (rte_eal_mp_remote_launch(load_loop_fn, + ¶m, SKIP_MASTER) < 0) + return -1; + + /* start synchro and launch test on master */ + rte_atomic32_set(&synchro, 1); + load_loop_fn(¶m); + + rte_eal_mp_wait_lcore(); + + RTE_LCORE_FOREACH(c) { + printf("Core [%u] count = %"PRIu64"\n", + c, queue_count[c]); + total += queue_count[c]; + } + + printf("Total count (size: %u): %"PRIu64"\n", bulk_sizes[i], + total); + } + + return 0; +} + /* - * Test function that determines how long an enqueue + dequeue of a single item - * takes on a single lcore. Result is for comparison with the bulk enq+deq. + * Test function that determines how long an enqueue + dequeue of a single + * item takes on a single lcore. Result is for comparison with the bulk + * enq+deq. */ static void test_single_enqueue_dequeue(struct rte_ring *r) @@ -394,6 +466,10 @@ test_ring_perf(void) printf("\n### Testing using two NUMA nodes ###\n"); run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk); } + + printf("\n### Testing using all slave nodes ###\n"); + run_on_all_cores(r); + rte_ring_free(r); return 0; } -- 2.7.4
From: Gavin Hu <gavin.hu@arm.com> Currently, the flush is done by dequeuing the ring in a while loop. It is much simpler to flush the queue by resetting the head and tail indices. Signed-off-by: gavin hu <gavin.hu@arm.com> Reviewed-by: ruifeng wang <ruifeng.wang@arm.com> Reviewed-by: honnappa nagarahalli <honnappa.nagarahalli@arm.com> --- lib/librte_ring/rte_ring.h | 20 ++++++++++++++++++++ lib/librte_ring/rte_ring_version.map | 7 +++++++ 2 files changed, 27 insertions(+) diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index af5444a..2830300 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -671,6 +671,26 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p) } /** + * Flush a ring. + * + * This function flush all the elements in a ring + * + * @b EXPERIMENTAL: this API may change without prior notice + * + * @warning + * Make sure the ring is not in use while calling this function. + * + * @param r + * A pointer to the ring structure. + */ +static inline void __rte_experimental +rte_ring_reset(struct rte_ring *r) +{ + r->prod.head = r->cons.head = 0; + r->prod.tail = r->cons.tail = 0; +} + +/** * Return the number of entries in a ring. * * @param r diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map index d935efd..581d9ca 100644 --- a/lib/librte_ring/rte_ring_version.map +++ b/lib/librte_ring/rte_ring_version.map @@ -17,3 +17,10 @@ DPDK_2.2 { rte_ring_free; } DPDK_2.0; + +EXPERIMENTAL { + global: + + rte_ring_reset; + +}; -- 2.7.4
From: Gavin Hu <gavin.hu@arm.com> Within rte_hash_reset, calling a while loop to dequeue one by one from the ring, while not using them at all, is wasting cycles, The patch just flush the ring by resetting the indices can save cpu cycles. Fixes: b26473ff8f4a ("hash: add reset function") Fixes: 75706568a7eb ("hash: add extendable bucket feature") Cc: stable@dpdk.org Signed-off-by: gavin hu <gavin.hu@arm.com> Reviewed-by: honnappa nagarahalli <honnappa.nagarahalli@arm.com> --- lib/librte_hash/Makefile | 2 +- lib/librte_hash/meson.build | 3 +++ lib/librte_hash/rte_cuckoo_hash.c | 11 ++++------- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index c8c435d..5669d83 100644 --- a/lib/librte_hash/Makefile +++ b/lib/librte_hash/Makefile @@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_hash.a -CFLAGS += -O3 +CFLAGS += -O3 -DALLOW_EXPERIMENTAL_API CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) LDLIBS += -lrte_eal -lrte_ring diff --git a/lib/librte_hash/meson.build b/lib/librte_hash/meson.build index efc06ed..ebf70de 100644 --- a/lib/librte_hash/meson.build +++ b/lib/librte_hash/meson.build @@ -14,3 +14,6 @@ headers = files('rte_cmp_arm64.h', sources = files('rte_cuckoo_hash.c', 'rte_fbk_hash.c') deps += ['ring'] + +# rte ring reset is not yet part of stable API +allow_experimental_apis = true diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c index c01489b..4b08049 100644 --- a/lib/librte_hash/rte_cuckoo_hash.c +++ b/lib/librte_hash/rte_cuckoo_hash.c @@ -559,7 +559,6 @@ __hash_rw_reader_unlock(const struct rte_hash *h) void rte_hash_reset(struct rte_hash *h) { - void *ptr; uint32_t tot_ring_cnt, i; if (h == NULL) @@ -570,16 +569,14 @@ rte_hash_reset(struct rte_hash *h) memset(h->key_store, 0, h->key_entry_size * (h->entries + 1)); *h->tbl_chng_cnt = 0; - /* clear the free ring */ - while (rte_ring_dequeue(h->free_slots, &ptr) == 0) - continue; + /* reset the free ring */ + rte_ring_reset(h->free_slots); - /* clear free extendable bucket ring and memory */ + /* flush free extendable bucket ring and memory */ if (h->ext_table_support) { memset(h->buckets_ext, 0, h->num_buckets * sizeof(struct rte_hash_bucket)); - while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0) - continue; + rte_ring_reset(h->free_ext_bkts); } /* Repopulate the free slots ring. Entry zero is reserved for key misses */ -- 2.7.4
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, January 2, 2019 8:49 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com;
> bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
> olivier.matz@6wind.com; Joyce Kong (Arm Technology China)
> <Joyce.Kong@arm.com>
> Subject: Re: [PATCH v4 1/3] test/ring: ring perf test case enhancement
>
> > Suggested-by: Gavin Hu <gavin.hu@arm.com>
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Reviewed-by: Dharmik Thakkar <Dharmik.Thakkar@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>
> Small comment about the email addresses,
> Please do not use uppercase so we can match/grep more easily.
>
Thanks for reminding, already fixed this in v5.
> -----Original Message----- > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Sent: Thursday, January 3, 2019 2:41 AM > To: Thomas Monjalon <thomas@monjalon.net>; Gavin Hu (Arm Technology > China) <Gavin.Hu@arm.com> > Cc: dev@dpdk.org; jerinj@marvell.com; hemant.agrawal@nxp.com; > bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com; nd > <nd@arm.com>; olivier.matz@6wind.com; nd <nd@arm.com> > Subject: RE: [PATCH v4 0/3] add rte ring reset api and use it to flush a ring by > hash > > > > > 02/01/2019 01:55, Gavin Hu: > > > V4: Include the ring perf test case enhancement patch in the series. > This change is not related to this patch. Should be a separate patch? I included it in this patch set to avoid patches scattering here and there, Anyway I updated the title of the cover letter to reflect this. > There were comments provided: > http://mails.dpdk.org/archives/dev/2018-December/121893.html > http://mails.dpdk.org/archives/dev/2018-December/122157.html > > Do you plan to address these? One was addressed in v4 and the other was address in new v5. Thanks! > > > > > > > V3: Allow experimental API for meson build > > > > > > V2: Fix the coding style issue(commit message line too long) > > > > > > V1: To flush a ring not in use, dequeue one by one is wasting cpu cycles. > > > The patch is to just resetting the head and tail indices to save cpu > > > cycle. > > > > It is too late for adding this API in 19.02, but we should review and give > > opinion, so it will be ready to integrate in early February. > >
03/01/2019 03:38, gavin hu:
> From: Joyce Kong <joyce.kong@arm.com>
>
> Run ring perf test on all available cores to really verify MPMC operations.
> The old way of running on a pair of cores is not enough for MPMC rings.
>
> Suggested-by: gavin hu <gavin.hu@arm.com>
> Signed-off-by: joyce kong <joyce.kong@arm.com>
> Reviewed-by: ruifeng wang <ruifeng.wang@arm.com>
> Reviewed-by: honnappa nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: dharmik thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: ola liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: gavin hu <gavin.hu@arm.com>
Sorry Gavin, there is a misunderstanding.
I was suggesting to not use uppercase in email addresses,
but please keep uppercase in names in front of the addresses.
Example:
Real Name <real.name@company.com>
As for many guidelines, in doubt, you can check the git history.
Thanks
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, January 3, 2019 3:40 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com;
> bruce.richardson@intel.com; ferruh.yigit@intel.com; Joyce Kong (Arm
> Technology China) <Joyce.Kong@arm.com>
> Subject: Re: [PATCH v5 1/3] test/ring: ring perf test case enhancement
>
> 03/01/2019 03:38, gavin hu:
> > From: Joyce Kong <joyce.kong@arm.com>
> >
> > Run ring perf test on all available cores to really verify MPMC operations.
> > The old way of running on a pair of cores is not enough for MPMC rings.
> >
> > Suggested-by: gavin hu <gavin.hu@arm.com>
> > Signed-off-by: joyce kong <joyce.kong@arm.com>
> > Reviewed-by: ruifeng wang <ruifeng.wang@arm.com>
> > Reviewed-by: honnappa nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: dharmik thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: ola liljedahl <ola.liljedahl@arm.com>
> > Reviewed-by: gavin hu <gavin.hu@arm.com>
>
> Sorry Gavin, there is a misunderstanding.
> I was suggesting to not use uppercase in email addresses,
> but please keep uppercase in names in front of the addresses.
> Example:
> Real Name <real.name@company.com>
>
> As for many guidelines, in doubt, you can check the git history.
> Thanks
Sorry I misunderstood and I already realized that, I will wait a bit for other comments then
Submit a new version to correct this.
V6: Made upper case for the user name to comply with the convention. V5: 1. Commit message tweaking for ring test case enhancement patch 2. Upper to lower for mails to make match/grep more easily V4: 1. Include the ring perf test case enhancement patch in the series. 2. Replace ARRAY_SIZE with RTE_DIM. 3. Call memset to avoid clang compling complains. V3: Allow experimental API for meson build V2: Fix the coding style issue(commit message line too long) V1: To flush a ring not in use, dequeue one by one is wasting cpu cycles. The patch is to just resetting the head and tail indices to save cpu cycle. Gavin Hu (2): ring: add reset API to flush the ring when not in use hash: flush the rings instead of dequeuing one by one Joyce Kong (1): test/ring: ring perf test case enhancement lib/librte_hash/Makefile | 2 +- lib/librte_hash/meson.build | 3 ++ lib/librte_hash/rte_cuckoo_hash.c | 11 ++--- lib/librte_ring/rte_ring.h | 20 +++++++++ lib/librte_ring/rte_ring_version.map | 7 +++ test/test/test_ring_perf.c | 82 ++++++++++++++++++++++++++++++++++-- 6 files changed, 114 insertions(+), 11 deletions(-) -- 2.7.4
From: Joyce Kong <joyce.kong@arm.com> Run ring perf test on all available cores to really verify MPMC operations. The old way of running on a pair of cores is not enough for MPMC rings. Suggested-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> --- test/test/test_ring_perf.c | 82 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 79 insertions(+), 3 deletions(-) diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c index ebb3939..01c6937 100644 --- a/test/test/test_ring_perf.c +++ b/test/test/test_ring_perf.c @@ -9,7 +9,7 @@ #include <rte_cycles.h> #include <rte_launch.h> #include <rte_pause.h> - +#include <string.h> #include "test.h" /* @@ -20,6 +20,7 @@ * * Empty ring dequeue * * Enqueue/dequeue of bursts in 1 threads * * Enqueue/dequeue of bursts in 2 threads + * * Enqueue/dequeue of bursts in all available threads */ #define RING_NAME "RING_PERF" @@ -248,9 +249,80 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, } } +static rte_atomic32_t synchro; +static uint64_t queue_count[RTE_MAX_LCORE]; + +#define TIME_MS 100 + +static int +load_loop_fn(void *p) +{ + uint64_t time_diff = 0; + uint64_t begin = 0; + uint64_t hz = rte_get_timer_hz(); + uint64_t lcount = 0; + const unsigned int lcore = rte_lcore_id(); + struct thread_params *params = p; + void *burst[MAX_BURST] = {0}; + + /* wait synchro for slaves */ + if (lcore != rte_get_master_lcore()) + while (rte_atomic32_read(&synchro) == 0) + rte_pause(); + + begin = rte_get_timer_cycles(); + while (time_diff < hz * TIME_MS / 1000) { + rte_ring_mp_enqueue_bulk(params->r, burst, params->size, NULL); + rte_ring_mc_dequeue_bulk(params->r, burst, params->size, NULL); + lcount++; + time_diff = rte_get_timer_cycles() - begin; + } + queue_count[lcore] = lcount; + return 0; +} + +static int +run_on_all_cores(struct rte_ring *r) +{ + uint64_t total = 0; + unsigned int i, c; + struct thread_params param; + + memset(¶m, 0, sizeof(struct thread_params)); + for (i = 0; i < RTE_DIM(bulk_sizes); i++) { + printf("\nBulk enq/dequeue count on size %u\n", bulk_sizes[i]); + param.size = bulk_sizes[i]; + param.r = r; + + /* clear synchro and start slaves */ + rte_atomic32_set(&synchro, 0); + if (rte_eal_mp_remote_launch(load_loop_fn, + ¶m, SKIP_MASTER) < 0) + return -1; + + /* start synchro and launch test on master */ + rte_atomic32_set(&synchro, 1); + load_loop_fn(¶m); + + rte_eal_mp_wait_lcore(); + + RTE_LCORE_FOREACH(c) { + printf("Core [%u] count = %"PRIu64"\n", + c, queue_count[c]); + total += queue_count[c]; + } + + printf("Total count (size: %u): %"PRIu64"\n", bulk_sizes[i], + total); + } + + return 0; +} + /* - * Test function that determines how long an enqueue + dequeue of a single item - * takes on a single lcore. Result is for comparison with the bulk enq+deq. + * Test function that determines how long an enqueue + dequeue of a single + * item takes on a single lcore. Result is for comparison with the bulk + * enq+deq. */ static void test_single_enqueue_dequeue(struct rte_ring *r) @@ -394,6 +466,10 @@ test_ring_perf(void) printf("\n### Testing using two NUMA nodes ###\n"); run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk); } + + printf("\n### Testing using all slave nodes ###\n"); + run_on_all_cores(r); + rte_ring_free(r); return 0; } -- 2.7.4
From: Gavin Hu <gavin.hu@arm.com> Currently, the flush is done by dequeuing the ring in a while loop. It is much simpler to flush the queue by resetting the head and tail indices. Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- lib/librte_ring/rte_ring.h | 20 ++++++++++++++++++++ lib/librte_ring/rte_ring_version.map | 7 +++++++ 2 files changed, 27 insertions(+) diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index af5444a..2830300 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -671,6 +671,26 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p) } /** + * Flush a ring. + * + * This function flush all the elements in a ring + * + * @b EXPERIMENTAL: this API may change without prior notice + * + * @warning + * Make sure the ring is not in use while calling this function. + * + * @param r + * A pointer to the ring structure. + */ +static inline void __rte_experimental +rte_ring_reset(struct rte_ring *r) +{ + r->prod.head = r->cons.head = 0; + r->prod.tail = r->cons.tail = 0; +} + +/** * Return the number of entries in a ring. * * @param r diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map index d935efd..581d9ca 100644 --- a/lib/librte_ring/rte_ring_version.map +++ b/lib/librte_ring/rte_ring_version.map @@ -17,3 +17,10 @@ DPDK_2.2 { rte_ring_free; } DPDK_2.0; + +EXPERIMENTAL { + global: + + rte_ring_reset; + +}; -- 2.7.4
From: Gavin Hu <gavin.hu@arm.com> Within rte_hash_reset, calling a while loop to dequeue one by one from the ring, while not using them at all, is wasting cycles, The patch just flush the ring by resetting the indices can save cpu cycles. Fixes: b26473ff8f4a ("hash: add reset function") Fixes: 75706568a7eb ("hash: add extendable bucket feature") Cc: stable@dpdk.org Signed-off-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- lib/librte_hash/Makefile | 2 +- lib/librte_hash/meson.build | 3 +++ lib/librte_hash/rte_cuckoo_hash.c | 11 ++++------- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index c8c435d..5669d83 100644 --- a/lib/librte_hash/Makefile +++ b/lib/librte_hash/Makefile @@ -6,7 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_hash.a -CFLAGS += -O3 +CFLAGS += -O3 -DALLOW_EXPERIMENTAL_API CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) LDLIBS += -lrte_eal -lrte_ring diff --git a/lib/librte_hash/meson.build b/lib/librte_hash/meson.build index efc06ed..ebf70de 100644 --- a/lib/librte_hash/meson.build +++ b/lib/librte_hash/meson.build @@ -14,3 +14,6 @@ headers = files('rte_cmp_arm64.h', sources = files('rte_cuckoo_hash.c', 'rte_fbk_hash.c') deps += ['ring'] + +# rte ring reset is not yet part of stable API +allow_experimental_apis = true diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c index c01489b..4b08049 100644 --- a/lib/librte_hash/rte_cuckoo_hash.c +++ b/lib/librte_hash/rte_cuckoo_hash.c @@ -559,7 +559,6 @@ __hash_rw_reader_unlock(const struct rte_hash *h) void rte_hash_reset(struct rte_hash *h) { - void *ptr; uint32_t tot_ring_cnt, i; if (h == NULL) @@ -570,16 +569,14 @@ rte_hash_reset(struct rte_hash *h) memset(h->key_store, 0, h->key_entry_size * (h->entries + 1)); *h->tbl_chng_cnt = 0; - /* clear the free ring */ - while (rte_ring_dequeue(h->free_slots, &ptr) == 0) - continue; + /* reset the free ring */ + rte_ring_reset(h->free_slots); - /* clear free extendable bucket ring and memory */ + /* flush free extendable bucket ring and memory */ if (h->ext_table_support) { memset(h->buckets_ext, 0, h->num_buckets * sizeof(struct rte_hash_bucket)); - while (rte_ring_dequeue(h->free_ext_bkts, &ptr) == 0) - continue; + rte_ring_reset(h->free_ext_bkts); } /* Repopulate the free slots ring. Entry zero is reserved for key misses */ -- 2.7.4
>-----Original Message----- >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of gavin hu >Sent: Wednesday, January 9, 2019 3:32 AM >To: dev@dpdk.org >Cc: nd@arm.com; thomas@monjalon.net; jerinj@marvell.com; hemant.agrawal@nxp.com; Honnappa.Nagarahalli@arm.com; >gavin.hu@arm.com; olivier.matz@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>; stable@dpdk.org >Subject: [dpdk-dev] [PATCH v6 3/3] hash: flush the rings instead of dequeuing one by one > >From: Gavin Hu <gavin.hu@arm.com> > >Within rte_hash_reset, calling a while loop to dequeue one by >one from the ring, while not using them at all, is wasting cycles, >The patch just flush the ring by resetting the indices can save cpu >cycles. > >Fixes: b26473ff8f4a ("hash: add reset function") >Fixes: 75706568a7eb ("hash: add extendable bucket feature") >Cc: stable@dpdk.org [Wang, Yipeng] I don't think it is a bug fix applicable to previous stable versions. > >Signed-off-by: Gavin Hu <gavin.hu@arm.com> >Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> [Wang, Yipeng] Otherwise the hash table change looks good to me and unit test passed on my setup. Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
Run ring perf test on all available cores to really verify MPMC operations. The old way of running on a pair of cores is not enough for MPMC rings. Suggested-by: Gavin Hu <gavin.hu@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Dharmik Thakkar <Dharmik.Thakkar@arm.com> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> --- v7: This patch was separated from the v6 series as less relevant and the other patches in the series(http://patchwork.dpdk.org/cover/56549/) were already merged. app/test/test_ring_perf.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c index b6ad703..70ee46f 100644 --- a/app/test/test_ring_perf.c +++ b/app/test/test_ring_perf.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2019 Arm Limited */ @@ -9,6 +10,7 @@ #include <rte_cycles.h> #include <rte_launch.h> #include <rte_pause.h> +#include <string.h> #include "test.h" @@ -20,6 +22,7 @@ * * Empty ring dequeue * * Enqueue/dequeue of bursts in 1 threads * * Enqueue/dequeue of bursts in 2 threads + * * Enqueue/dequeue of bursts in all available threads */ #define RING_NAME "RING_PERF" @@ -258,6 +261,76 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, } } +static rte_atomic32_t synchro; +static uint64_t queue_count[RTE_MAX_LCORE]; + +#define TIME_MS 100 + +static int +load_loop_fn(void *p) +{ + uint64_t time_diff = 0; + uint64_t begin = 0; + uint64_t hz = rte_get_timer_hz(); + uint64_t lcount = 0; + const unsigned int lcore = rte_lcore_id(); + struct thread_params *params = p; + void *burst[MAX_BURST] = {0}; + + /* wait synchro for slaves */ + if (lcore != rte_get_master_lcore()) + while (rte_atomic32_read(&synchro) == 0) + rte_pause(); + + begin = rte_get_timer_cycles(); + while (time_diff < hz * TIME_MS / 1000) { + rte_ring_mp_enqueue_bulk(params->r, burst, params->size, NULL); + rte_ring_mc_dequeue_bulk(params->r, burst, params->size, NULL); + lcount++; + time_diff = rte_get_timer_cycles() - begin; + } + queue_count[lcore] = lcount; + return 0; +} + +static int +run_on_all_cores(struct rte_ring *r) +{ + uint64_t total = 0; + struct thread_params param; + unsigned int i, c; + + memset(¶m, 0, sizeof(struct thread_params)); + for (i = 0; i < RTE_DIM(bulk_sizes); i++) { + printf("\nBulk enq/dequeue count on size %u\n", bulk_sizes[i]); + param.size = bulk_sizes[i]; + param.r = r; + + /* clear synchro and start slaves */ + rte_atomic32_set(&synchro, 0); + if (rte_eal_mp_remote_launch(load_loop_fn, ¶m, + SKIP_MASTER) < 0) + return -1; + + /* start synchro and launch test on master */ + rte_atomic32_set(&synchro, 1); + load_loop_fn(¶m); + + rte_eal_mp_wait_lcore(); + + RTE_LCORE_FOREACH(c) { + printf("Core [%u] count = %"PRIu64"\n", + c, queue_count[c]); + total += queue_count[c]; + } + + printf("Total count (size: %u): %"PRIu64"\n", + bulk_sizes[i], total); + } + + return 0; +} + /* * Test function that determines how long an enqueue + dequeue of a single item * takes on a single lcore. Result is for comparison with the bulk enq+deq. @@ -404,6 +477,10 @@ test_ring_perf(void) printf("\n### Testing using two NUMA nodes ###\n"); run_on_core_pair(&cores, r, enqueue_bulk, dequeue_bulk); } + + printf("\n### Testing using all slave nodes ###\n"); + run_on_all_cores(r); + rte_ring_free(r); return 0; } -- 2.7.4
On Mon, Sep 09, 2019 at 01:19:00PM +0800, Joyce Kong wrote:
> Run ring perf test on all available cores to really verify MPMC operations.
> The old way of running on a pair of cores is not enough for MPMC rings.
>
> Suggested-by: Gavin Hu <gavin.hu@arm.com>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Dharmik Thakkar <Dharmik.Thakkar@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
On Tue, Oct 8, 2019 at 10:07 AM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> On Mon, Sep 09, 2019 at 01:19:00PM +0800, Joyce Kong wrote:
> > Run ring perf test on all available cores to really verify MPMC operations.
> > The old way of running on a pair of cores is not enough for MPMC rings.
> >
> > Suggested-by: Gavin Hu <gavin.hu@arm.com>
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Reviewed-by: Dharmik Thakkar <Dharmik.Thakkar@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
Applied, thanks.
--
David Marchand