For smp barriers in app/eventdev, remove the unnecessary barriers or replace them with thread fence. Feifei Wang (6): app/eventdev: fix SMP barrier bugs for perf test app/eventdev: remove unnecessary barriers for perf test app/eventdev: replace wmb with thread fence for perf test app/eventdev: add release barriers for pipeline test app/eventdev: remove unnecessary barriers for pipeline test app/eventdev: remove unnecessary barriers for order test app/test-eventdev/test_order_common.h | 2 - app/test-eventdev/test_perf_common.c | 4 -- app/test-eventdev/test_perf_common.h | 14 +++++- app/test-eventdev/test_pipeline_common.c | 1 - app/test-eventdev/test_pipeline_queue.c | 64 +++++++++++++++++++++--- 5 files changed, 68 insertions(+), 17 deletions(-) -- 2.17.1
This patch fixes RTE SMP barrier bugs for the perf test of eventdev. For the "perf_process_last_stage" function, wmb after storing processed_pkts should be moved before it. This is because the worker lcore should ensure it has really finished data processing, e.g. event stored into buffers, before the shared variables "w->processed_pkts"are stored. For the "perf_process_last_stage_latency", on the one hand, the wmb should be moved before storing into "w->processed_pkts". The reason is the same as above. But on the other hand, for "w->latency", wmb is unnecessary due to data dependency. Fixes: 2369f73329f8 ("app/testeventdev: add perf queue worker functions") Cc: jerinj@marvell.com Cc: stable@dpdk.org Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-eventdev/test_perf_common.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/app/test-eventdev/test_perf_common.h b/app/test-eventdev/test_perf_common.h index ff9705df8..e7233e5a5 100644 --- a/app/test-eventdev/test_perf_common.h +++ b/app/test-eventdev/test_perf_common.h @@ -97,8 +97,13 @@ perf_process_last_stage(struct rte_mempool *const pool, void *bufs[], int const buf_sz, uint8_t count) { bufs[count++] = ev->event_ptr; - w->processed_pkts++; + + /* wmb here ensures event_prt is stored before + * updating the number of processed packets + * for worker lcores + */ rte_smp_wmb(); + w->processed_pkts++; if (unlikely(count == buf_sz)) { count = 0; @@ -116,6 +121,12 @@ perf_process_last_stage_latency(struct rte_mempool *const pool, struct perf_elt *const m = ev->event_ptr; bufs[count++] = ev->event_ptr; + + /* wmb here ensures event_prt is stored before + * updating the number of processed packets + * for worker lcores + */ + rte_smp_wmb(); w->processed_pkts++; if (unlikely(count == buf_sz)) { @@ -127,7 +138,6 @@ perf_process_last_stage_latency(struct rte_mempool *const pool, } w->latency += latency; - rte_smp_wmb(); return count; } -- 2.17.1
For "processed_pkts" and "total_latency" functions, no operations should keep the order that being executed before loading "worker[i].processed_pkts". Thus rmb is unnecessary before loading. For "perf_launch_lcores" function, wmb after that the main lcore updates the varaible "t->done", which represents the end of the test signal, is unnecessary. Because after the main lcore updates this siginal varaible, it will jump out of the launch function loop, and wait other lcores stop or return error in the main function(evt_main.c). During this time, there is no important storing operation and thus no need for wmb. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-eventdev/test_perf_common.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/app/test-eventdev/test_perf_common.c b/app/test-eventdev/test_perf_common.c index 955edb752..34cded373 100644 --- a/app/test-eventdev/test_perf_common.c +++ b/app/test-eventdev/test_perf_common.c @@ -224,7 +224,6 @@ processed_pkts(struct test_perf *t) uint8_t i; uint64_t total = 0; - rte_smp_rmb(); for (i = 0; i < t->nb_workers; i++) total += t->worker[i].processed_pkts; @@ -237,7 +236,6 @@ total_latency(struct test_perf *t) uint8_t i; uint64_t total = 0; - rte_smp_rmb(); for (i = 0; i < t->nb_workers; i++) total += t->worker[i].latency; @@ -327,7 +325,6 @@ perf_launch_lcores(struct evt_test *test, struct evt_options *opt, opt->prod_type == EVT_PROD_TYPE_EVENT_TIMER_ADPTR) { t->done = true; - rte_smp_wmb(); break; } } @@ -341,7 +338,6 @@ perf_launch_lcores(struct evt_test *test, struct evt_options *opt, rte_event_dev_dump(opt->dev_id, stdout); evt_err("No schedules for seconds, deadlock"); t->done = true; - rte_smp_wmb(); break; } dead_lock_remaining = remaining; -- 2.17.1
Simply replace rte_smp barrier with atomic threand fence. Signed-off-by: Phil Yang <phil.yang@arm.com> Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-eventdev/test_perf_common.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/app/test-eventdev/test_perf_common.h b/app/test-eventdev/test_perf_common.h index e7233e5a5..9785dc3e2 100644 --- a/app/test-eventdev/test_perf_common.h +++ b/app/test-eventdev/test_perf_common.h @@ -98,11 +98,11 @@ perf_process_last_stage(struct rte_mempool *const pool, { bufs[count++] = ev->event_ptr; - /* wmb here ensures event_prt is stored before - * updating the number of processed packets - * for worker lcores + /* release fence here ensures event_prt is + * stored before updating the number of + * processed packets for worker lcores */ - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); w->processed_pkts++; if (unlikely(count == buf_sz)) { @@ -122,11 +122,11 @@ perf_process_last_stage_latency(struct rte_mempool *const pool, bufs[count++] = ev->event_ptr; - /* wmb here ensures event_prt is stored before - * updating the number of processed packets - * for worker lcores + /* release fence here ensures event_prt is + * stored before updating the number of + * processed packets for worker lcores */ - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); w->processed_pkts++; if (unlikely(count == buf_sz)) { -- 2.17.1
Add release barriers before updating the processed packets for worker lcores to ensure the worker lcore has really finished data processing and then it can update the processed packets number. Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker functions") Cc: pbhagavatula@marvell.com Cc: stable@dpdk.org Signed-off-by: Phil Yang <phil.yang@arm.com> Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-eventdev/test_pipeline_queue.c | 64 +++++++++++++++++++++---- 1 file changed, 56 insertions(+), 8 deletions(-) diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-eventdev/test_pipeline_queue.c index 7bebac34f..8e499d8d5 100644 --- a/app/test-eventdev/test_pipeline_queue.c +++ b/app/test-eventdev/test_pipeline_queue.c @@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void *arg) if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { pipeline_event_tx(dev, port, &ev); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visiable to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); } else { ev.queue_id++; pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); @@ -59,7 +65,13 @@ pipeline_queue_worker_single_stage_fwd(void *arg) rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); pipeline_event_enqueue(dev, port, &ev); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visiable to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); } return 0; @@ -84,7 +96,13 @@ pipeline_queue_worker_single_stage_burst_tx(void *arg) if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) { pipeline_event_tx(dev, port, &ev[i]); ev[i].op = RTE_EVENT_OP_RELEASE; - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visiable to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); } else { ev[i].queue_id++; pipeline_fwd_event(&ev[i], @@ -121,7 +139,13 @@ pipeline_queue_worker_single_stage_burst_fwd(void *arg) } pipeline_event_enqueue_burst(dev, port, ev, nb_rx); - w->processed_pkts += nb_rx; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visiable to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), nb_rx, + __ATOMIC_RELEASE); } return 0; @@ -146,7 +170,13 @@ pipeline_queue_worker_multi_stage_tx(void *arg) if (ev.queue_id == tx_queue[ev.mbuf->port]) { pipeline_event_tx(dev, port, &ev); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visiable to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); continue; } @@ -180,7 +210,13 @@ pipeline_queue_worker_multi_stage_fwd(void *arg) ev.queue_id = tx_queue[ev.mbuf->port]; rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visiable to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); } else { ev.queue_id++; pipeline_fwd_event(&ev, sched_type_list[cq_id]); @@ -214,7 +250,13 @@ pipeline_queue_worker_multi_stage_burst_tx(void *arg) if (ev[i].queue_id == tx_queue[ev[i].mbuf->port]) { pipeline_event_tx(dev, port, &ev[i]); ev[i].op = RTE_EVENT_OP_RELEASE; - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visiable to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); continue; } @@ -254,7 +296,13 @@ pipeline_queue_worker_multi_stage_burst_fwd(void *arg) rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0); pipeline_fwd_event(&ev[i], RTE_SCHED_TYPE_ATOMIC); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visiable to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); } else { ev[i].queue_id++; pipeline_fwd_event(&ev[i], -- 2.17.1
For "processed_pkts" function, no operations should keep the order that being executed before loading "worker[i].processed_pkts". Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-eventdev/test_pipeline_common.c | 1 - 1 file changed, 1 deletion(-) diff --git a/app/test-eventdev/test_pipeline_common.c b/app/test-eventdev/test_pipeline_common.c index c67be48e9..b47d76743 100644 --- a/app/test-eventdev/test_pipeline_common.c +++ b/app/test-eventdev/test_pipeline_common.c @@ -44,7 +44,6 @@ processed_pkts(struct test_pipeline *t) uint8_t i; uint64_t total = 0; - rte_smp_rmb(); for (i = 0; i < t->nb_workers; i++) total += t->worker[i].processed_pkts; -- 2.17.1
For the wmb in order_process_stage_1 and order_process_stage_invalid in the order test, they can be removed. This is because when the test results are wrong, the worker core writes 'true' to t->err. Then other worker cores, producer cores and the main core will load the 'error' index and stop testing. So, for the worker cores, no other storing operation needs to be guaranteed after this when errors happen. Signed-off-by: Feifei Wang <feifei.wang2@arm.com> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-eventdev/test_order_common.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/app/test-eventdev/test_order_common.h b/app/test-eventdev/test_order_common.h index 5ef840493..cd9d6009e 100644 --- a/app/test-eventdev/test_order_common.h +++ b/app/test-eventdev/test_order_common.h @@ -104,7 +104,6 @@ order_process_stage_1(struct test_order *const t, flow, *order_mbuf_seqn(t, ev->mbuf), expected_flow_seq[flow]); t->err = true; - rte_smp_wmb(); } /* * Events from an atomic flow of an event queue can be scheduled only to @@ -123,7 +122,6 @@ order_process_stage_invalid(struct test_order *const t, { evt_err("invalid queue %d", ev->queue_id); t->err = true; - rte_smp_wmb(); } #define ORDER_WORKER_INIT\ -- 2.17.1
For smp barriers in app/eventdev, remove the unnecessary barriers or replace them with thread fence. Feifei Wang (6): app/eventdev: fix SMP barrier bugs for perf test app/eventdev: remove unnecessary barriers for perf test app/eventdev: replace wmb with thread fence for perf test app/eventdev: add release barriers for pipeline test app/eventdev: remove unnecessary barriers for pipeline test app/eventdev: remove unnecessary barriers for order test app/test-eventdev/test_order_common.h | 2 - app/test-eventdev/test_perf_common.c | 4 -- app/test-eventdev/test_perf_common.h | 14 +++++- app/test-eventdev/test_pipeline_common.c | 1 - app/test-eventdev/test_pipeline_queue.c | 64 +++++++++++++++++++++--- 5 files changed, 68 insertions(+), 17 deletions(-) -- 2.17.1