Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test

DPDK patches and discussions
 help / color / mirror / Atom feed

* Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
@ 2020-12-22 10:33 Pavan Nikhilesh Bhagavatula
  2021-01-05  7:39 ` Feifei Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2020-12-22 10:33 UTC (permalink / raw)
  To: Feifei Wang, Jerin Jacob Kollanukkaran, Harry van Haaren,
	Pavan Nikhilesh
  Cc: dev, nd, Honnappa.Nagarahalli, stable, Phil Yang


>Add release barriers before updating the processed packets for worker
>lcores to ensure the worker lcore has really finished data processing
>and then it can update the processed packets number.
>

I believe we can live with minor inaccuracies in stats being presented
as atomics are pretty heavy when scheduler is limited to burst size as 1. 

One option is to move it before a pipeline operation (pipeline_event_tx, pipeline_fwd_event etc.)
as they imply implicit release barrier (as all the changes done to the event should be visible to the next core).

>Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
>functions")
>Cc: pbhagavatula@marvell.com
>Cc: stable@dpdk.org
>
>Signed-off-by: Phil Yang <phil.yang@arm.com>
>Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> app/test-eventdev/test_pipeline_queue.c | 64
>+++++++++++++++++++++----
> 1 file changed, 56 insertions(+), 8 deletions(-)
>
>diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
>eventdev/test_pipeline_queue.c
>index 7bebac34f..0c0ec0ceb 100644
>--- a/app/test-eventdev/test_pipeline_queue.c
>+++ b/app/test-eventdev/test_pipeline_queue.c
>@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void
>*arg)
>
> 		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
> 			pipeline_event_tx(dev, port, &ev);
>-			w->processed_pkts++;
>+
>+			/* release barrier here ensures stored operation
>+			 * of the event completes before the number of
>+			 * processed pkts is visible to the main core
>+			 */
>+			__atomic_fetch_add(&(w->processed_pkts), 1,
>+					__ATOMIC_RELEASE);
> 		} else {
> 			ev.queue_id++;
> 			pipeline_fwd_event(&ev,
>RTE_SCHED_TYPE_ATOMIC);
>@@ -59,7 +65,13 @@ pipeline_queue_worker_single_stage_fwd(void
>*arg)
> 		rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
> 		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
> 		pipeline_event_enqueue(dev, port, &ev);
>-		w->processed_pkts++;
>+
>+		/* release barrier here ensures stored operation
>+		 * of the event completes before the number of
>+		 * processed pkts is visible to the main core
>+		 */
>+		__atomic_fetch_add(&(w->processed_pkts), 1,
>+				__ATOMIC_RELEASE);
> 	}
>
> 	return 0;
>@@ -84,7 +96,13 @@
>pipeline_queue_worker_single_stage_burst_tx(void *arg)
> 			if (ev[i].sched_type ==
>RTE_SCHED_TYPE_ATOMIC) {
> 				pipeline_event_tx(dev, port, &ev[i]);
> 				ev[i].op = RTE_EVENT_OP_RELEASE;
>-				w->processed_pkts++;
>+
>+				/* release barrier here ensures stored
>operation
>+				 * of the event completes before the
>number of
>+				 * processed pkts is visible to the main
>core
>+				 */
>+				__atomic_fetch_add(&(w-
>>processed_pkts), 1,
>+						__ATOMIC_RELEASE);
> 			} else {
> 				ev[i].queue_id++;
> 				pipeline_fwd_event(&ev[i],
>@@ -121,7 +139,13 @@
>pipeline_queue_worker_single_stage_burst_fwd(void *arg)
> 		}
>
> 		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
>-		w->processed_pkts += nb_rx;
>+
>+		/* release barrier here ensures stored operation
>+		 * of the event completes before the number of
>+		 * processed pkts is visible to the main core
>+		 */
>+		__atomic_fetch_add(&(w->processed_pkts), nb_rx,
>+				__ATOMIC_RELEASE);
> 	}
>
> 	return 0;
>@@ -146,7 +170,13 @@ pipeline_queue_worker_multi_stage_tx(void
>*arg)
>
> 		if (ev.queue_id == tx_queue[ev.mbuf->port]) {
> 			pipeline_event_tx(dev, port, &ev);
>-			w->processed_pkts++;
>+
>+			/* release barrier here ensures stored operation
>+			 * of the event completes before the number of
>+			 * processed pkts is visible to the main core
>+			 */
>+			__atomic_fetch_add(&(w->processed_pkts), 1,
>+					__ATOMIC_RELEASE);
> 			continue;
> 		}
>
>@@ -180,7 +210,13 @@
>pipeline_queue_worker_multi_stage_fwd(void *arg)
> 			ev.queue_id = tx_queue[ev.mbuf->port];
> 			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
> 			pipeline_fwd_event(&ev,
>RTE_SCHED_TYPE_ATOMIC);
>-			w->processed_pkts++;
>+
>+			/* release barrier here ensures stored operation
>+			 * of the event completes before the number of
>+			 * processed pkts is visible to the main core
>+			 */
>+			__atomic_fetch_add(&(w->processed_pkts), 1,
>+					__ATOMIC_RELEASE);
> 		} else {
> 			ev.queue_id++;
> 			pipeline_fwd_event(&ev,
>sched_type_list[cq_id]);
>@@ -214,7 +250,13 @@
>pipeline_queue_worker_multi_stage_burst_tx(void *arg)
> 			if (ev[i].queue_id == tx_queue[ev[i].mbuf-
>>port]) {
> 				pipeline_event_tx(dev, port, &ev[i]);
> 				ev[i].op = RTE_EVENT_OP_RELEASE;
>-				w->processed_pkts++;
>+
>+				/* release barrier here ensures stored
>operation
>+				 * of the event completes before the
>number of
>+				 * processed pkts is visible to the main
>core
>+				 */
>+				__atomic_fetch_add(&(w-
>>processed_pkts), 1,
>+						__ATOMIC_RELEASE);
> 				continue;
> 			}
>
>@@ -254,7 +296,13 @@
>pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>
>	rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
> 				pipeline_fwd_event(&ev[i],
>
>	RTE_SCHED_TYPE_ATOMIC);
>-				w->processed_pkts++;
>+
>+				/* release barrier here ensures stored
>operation
>+				 * of the event completes before the
>number of
>+				 * processed pkts is visible to the main
>core
>+				 */
>+				__atomic_fetch_add(&(w-
>>processed_pkts), 1,
>+						__ATOMIC_RELEASE);
> 			} else {
> 				ev[i].queue_id++;
> 				pipeline_fwd_event(&ev[i],
>--
>2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2020-12-22 10:33 [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test Pavan Nikhilesh Bhagavatula
@ 2021-01-05  7:39 ` Feifei Wang
  2021-01-05  9:29   ` Pavan Nikhilesh Bhagavatula
  0 siblings, 1 reply; 12+ messages in thread
From: Feifei Wang @ 2021-01-05  7:39 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, jerinj, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd

Hi, Pavan

Sorry for my late reply and thanks very much for your review.

> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: 2020年12月22日 18:33
> To: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com; Harry van
> Haaren <harry.van.haaren@intel.com>; Pavan Nikhilesh
> <pbhagavatula@caviumnetworks.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Phil Yang
> <Phil.Yang@arm.com>
> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
> pipeline test
> 
> 
> >Add release barriers before updating the processed packets for worker
> >lcores to ensure the worker lcore has really finished data processing
> >and then it can update the processed packets number.
> >
> 
> I believe we can live with minor inaccuracies in stats being presented as
> atomics are pretty heavy when scheduler is limited to burst size as 1.
> 
> One option is to move it before a pipeline operation (pipeline_event_tx,
> pipeline_fwd_event etc.) as they imply implicit release barrier (as all the
> changes done to the event should be visible to the next core).

If I understand correctly, your meaning is that move release barriers before
pipeline_event_tx or pipeline_fwd_event. This can ensure the event has been
processed before the next core begins to tx/fwd. For example:

if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
		+	__atomic_thread_fence(__ATOMIC_RELEASE); 
			pipeline_event_tx(dev, port, &ev);
			w->processed_pkts++;
		} else {
			ev.queue_id++;
		+	__atomic_thread_fence(__ATOMIC_RELEASE);
			pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
			pipeline_event_enqueue(dev, port, &ev);

However, there are two reasons to prevent this:

First, compare with other tests in app/eventdev, for example, the eventdev perf test,
the wmb is after event operation to ensure operation has been finished and then w->processed_pkts++.
So, if we move release barriers before tx/fwd, it may cause that the tests of app/eventdev
become  inconsistent.This may reduce the maintainability of the code and make it difficult to understand.

Second, it is a test case, though heavy thread may cause performance degradation, it can ensure that
the operation process and the test result are correct. And maybe for a test case, correctness is more important
than performance.

So, due to two reasons above, I'm ambivalent about how we should do in the next step. 

Best Regards
Feifei

> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
> >functions")
> >Cc: pbhagavatula@marvell.com
> >Cc: stable@dpdk.org
> >
> >Signed-off-by: Phil Yang <phil.yang@arm.com>
> >Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> >Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >---
> > app/test-eventdev/test_pipeline_queue.c | 64
> >+++++++++++++++++++++----
> > 1 file changed, 56 insertions(+), 8 deletions(-)
> >
> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb 100644
> >--- a/app/test-eventdev/test_pipeline_queue.c
> >+++ b/app/test-eventdev/test_pipeline_queue.c
> >@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void
> >*arg)
> >
> > 		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
> > 			pipeline_event_tx(dev, port, &ev);
> >-			w->processed_pkts++;
> >+
> >+			/* release barrier here ensures stored operation
> >+			 * of the event completes before the number of
> >+			 * processed pkts is visible to the main core
> >+			 */
> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
> >+					__ATOMIC_RELEASE);
> > 		} else {
> > 			ev.queue_id++;
> > 			pipeline_fwd_event(&ev,
> >RTE_SCHED_TYPE_ATOMIC);
> >@@ -59,7 +65,13 @@ pipeline_queue_worker_single_stage_fwd(void
> >*arg)
> > 		rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
> > 		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
> > 		pipeline_event_enqueue(dev, port, &ev);
> >-		w->processed_pkts++;
> >+
> >+		/* release barrier here ensures stored operation
> >+		 * of the event completes before the number of
> >+		 * processed pkts is visible to the main core
> >+		 */
> >+		__atomic_fetch_add(&(w->processed_pkts), 1,
> >+				__ATOMIC_RELEASE);
> > 	}
> >
> > 	return 0;
> >@@ -84,7 +96,13 @@
> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
> > 			if (ev[i].sched_type ==
> >RTE_SCHED_TYPE_ATOMIC) {
> > 				pipeline_event_tx(dev, port, &ev[i]);
> > 				ev[i].op = RTE_EVENT_OP_RELEASE;
> >-				w->processed_pkts++;
> >+
> >+				/* release barrier here ensures stored
> >operation
> >+				 * of the event completes before the
> >number of
> >+				 * processed pkts is visible to the main
> >core
> >+				 */
> >+				__atomic_fetch_add(&(w-
> >>processed_pkts), 1,
> >+						__ATOMIC_RELEASE);
> > 			} else {
> > 				ev[i].queue_id++;
> > 				pipeline_fwd_event(&ev[i],
> >@@ -121,7 +139,13 @@
> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
> > 		}
> >
> > 		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
> >-		w->processed_pkts += nb_rx;
> >+
> >+		/* release barrier here ensures stored operation
> >+		 * of the event completes before the number of
> >+		 * processed pkts is visible to the main core
> >+		 */
> >+		__atomic_fetch_add(&(w->processed_pkts), nb_rx,
> >+				__ATOMIC_RELEASE);
> > 	}
> >
> > 	return 0;
> >@@ -146,7 +170,13 @@ pipeline_queue_worker_multi_stage_tx(void
> >*arg)
> >
> > 		if (ev.queue_id == tx_queue[ev.mbuf->port]) {
> > 			pipeline_event_tx(dev, port, &ev);
> >-			w->processed_pkts++;
> >+
> >+			/* release barrier here ensures stored operation
> >+			 * of the event completes before the number of
> >+			 * processed pkts is visible to the main core
> >+			 */
> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
> >+					__ATOMIC_RELEASE);
> > 			continue;
> > 		}
> >
> >@@ -180,7 +210,13 @@
> >pipeline_queue_worker_multi_stage_fwd(void *arg)
> > 			ev.queue_id = tx_queue[ev.mbuf->port];
> > 			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
> > 			pipeline_fwd_event(&ev,
> >RTE_SCHED_TYPE_ATOMIC);
> >-			w->processed_pkts++;
> >+
> >+			/* release barrier here ensures stored operation
> >+			 * of the event completes before the number of
> >+			 * processed pkts is visible to the main core
> >+			 */
> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
> >+					__ATOMIC_RELEASE);
> > 		} else {
> > 			ev.queue_id++;
> > 			pipeline_fwd_event(&ev,
> >sched_type_list[cq_id]);
> >@@ -214,7 +250,13 @@
> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
> > 			if (ev[i].queue_id == tx_queue[ev[i].mbuf-
> >>port]) {
> > 				pipeline_event_tx(dev, port, &ev[i]);
> > 				ev[i].op = RTE_EVENT_OP_RELEASE;
> >-				w->processed_pkts++;
> >+
> >+				/* release barrier here ensures stored
> >operation
> >+				 * of the event completes before the
> >number of
> >+				 * processed pkts is visible to the main
> >core
> >+				 */
> >+				__atomic_fetch_add(&(w-
> >>processed_pkts), 1,
> >+						__ATOMIC_RELEASE);
> > 				continue;
> > 			}
> >
> >@@ -254,7 +296,13 @@
> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
> >
> >	rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
> > 				pipeline_fwd_event(&ev[i],
> >
> >	RTE_SCHED_TYPE_ATOMIC);
> >-				w->processed_pkts++;
> >+
> >+				/* release barrier here ensures stored
> >operation
> >+				 * of the event completes before the
> >number of
> >+				 * processed pkts is visible to the main
> >core
> >+				 */
> >+				__atomic_fetch_add(&(w-
> >>processed_pkts), 1,
> >+						__ATOMIC_RELEASE);
> > 			} else {
> > 				ev[i].queue_id++;
> > 				pipeline_fwd_event(&ev[i],
> >--
> >2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2021-01-05  7:39 ` Feifei Wang
@ 2021-01-05  9:29   ` Pavan Nikhilesh Bhagavatula
  2021-01-08  7:12     ` [dpdk-dev] 回复: " Feifei Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2021-01-05  9:29 UTC (permalink / raw)
  To: Feifei Wang, Jerin Jacob Kollanukkaran, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd

Hi Feifei,

>Hi, Pavan
>
>Sorry for my late reply and thanks very much for your review.
>
>> -----Original Message-----
>> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
>> Sent: 2020年12月22日 18:33
>> To: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com;
>Harry van
>> Haaren <harry.van.haaren@intel.com>; Pavan Nikhilesh
>> <pbhagavatula@caviumnetworks.com>
>> Cc: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Phil Yang
>> <Phil.Yang@arm.com>
>> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
>for
>> pipeline test
>>
>>
>> >Add release barriers before updating the processed packets for
>worker
>> >lcores to ensure the worker lcore has really finished data processing
>> >and then it can update the processed packets number.
>> >
>>
>> I believe we can live with minor inaccuracies in stats being presented
>as
>> atomics are pretty heavy when scheduler is limited to burst size as 1.
>>
>> One option is to move it before a pipeline operation
>(pipeline_event_tx,
>> pipeline_fwd_event etc.) as they imply implicit release barrier (as all
>the
>> changes done to the event should be visible to the next core).
>
>If I understand correctly, your meaning is that move release barriers
>before
>pipeline_event_tx or pipeline_fwd_event. This can ensure the event has
>been
>processed before the next core begins to tx/fwd. For example:

What I meant was event APIs such as `rte_event_enqueue_burst`, `rte_event_eth_tx_adapter_enqueue`
act as an implicit release barrier and the API `rte_event_dequeue_burst` act as an implicit acquire barrier.

Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I don’t believe we need barriers in 
Between.

>
>if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>		+	__atomic_thread_fence(__ATOMIC_RELEASE);
>			pipeline_event_tx(dev, port, &ev);
>			w->processed_pkts++;
>		} else {
>			ev.queue_id++;
>		+	__atomic_thread_fence(__ATOMIC_RELEASE);
>			pipeline_fwd_event(&ev,
>RTE_SCHED_TYPE_ATOMIC);
>			pipeline_event_enqueue(dev, port, &ev);
>
>However, there are two reasons to prevent this:
>
>First, compare with other tests in app/eventdev, for example, the
>eventdev perf test,
>the wmb is after event operation to ensure operation has been finished
>and then w->processed_pkts++.

In case of perf_* tests start with a dequeue() and finally ends with a mempool_put()
should also act as implicit acquire release pairs making stats consistent?

>So, if we move release barriers before tx/fwd, it may cause that the
>tests of app/eventdev
>become  inconsistent.This may reduce the maintainability of the code
>and make it difficult to understand.
>
>Second, it is a test case, though heavy thread may cause performance
>degradation, it can ensure that
>the operation process and the test result are correct. And maybe for a
>test case, correctness is more important
>than performance.
>

Most of our internal perf test run on 24/48 core combinations and since 
Octeontx2 event device driver supports a burst size of 1, it will show up as
Huge performance degradation.

>So, due to two reasons above, I'm ambivalent about how we should do
>in the next step.
>
>Best Regards
>Feifei

Regards,
Pavan.

>
>> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
>> >functions")
>> >Cc: pbhagavatula@marvell.com
>> >Cc: stable@dpdk.org
>> >
>> >Signed-off-by: Phil Yang <phil.yang@arm.com>
>> >Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>> >Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> >---
>> > app/test-eventdev/test_pipeline_queue.c | 64
>> >+++++++++++++++++++++----
>> > 1 file changed, 56 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
>> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>100644
>> >--- a/app/test-eventdev/test_pipeline_queue.c
>> >+++ b/app/test-eventdev/test_pipeline_queue.c
>> >@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void
>> >*arg)
>> >
>> > 		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>> > 			pipeline_event_tx(dev, port, &ev);
>> >-			w->processed_pkts++;
>> >+
>> >+			/* release barrier here ensures stored operation
>> >+			 * of the event completes before the number of
>> >+			 * processed pkts is visible to the main core
>> >+			 */
>> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
>> >+					__ATOMIC_RELEASE);
>> > 		} else {
>> > 			ev.queue_id++;
>> > 			pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >@@ -59,7 +65,13 @@
>pipeline_queue_worker_single_stage_fwd(void
>> >*arg)
>> > 		rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> > 		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
>> > 		pipeline_event_enqueue(dev, port, &ev);
>> >-		w->processed_pkts++;
>> >+
>> >+		/* release barrier here ensures stored operation
>> >+		 * of the event completes before the number of
>> >+		 * processed pkts is visible to the main core
>> >+		 */
>> >+		__atomic_fetch_add(&(w->processed_pkts), 1,
>> >+				__ATOMIC_RELEASE);
>> > 	}
>> >
>> > 	return 0;
>> >@@ -84,7 +96,13 @@
>> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>> > 			if (ev[i].sched_type ==
>> >RTE_SCHED_TYPE_ATOMIC) {
>> > 				pipeline_event_tx(dev, port, &ev[i]);
>> > 				ev[i].op = RTE_EVENT_OP_RELEASE;
>> >-				w->processed_pkts++;
>> >+
>> >+				/* release barrier here ensures stored
>> >operation
>> >+				 * of the event completes before the
>> >number of
>> >+				 * processed pkts is visible to the main
>> >core
>> >+				 */
>> >+				__atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+						__ATOMIC_RELEASE);
>> > 			} else {
>> > 				ev[i].queue_id++;
>> > 				pipeline_fwd_event(&ev[i],
>> >@@ -121,7 +139,13 @@
>> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>> > 		}
>> >
>> > 		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
>> >-		w->processed_pkts += nb_rx;
>> >+
>> >+		/* release barrier here ensures stored operation
>> >+		 * of the event completes before the number of
>> >+		 * processed pkts is visible to the main core
>> >+		 */
>> >+		__atomic_fetch_add(&(w->processed_pkts), nb_rx,
>> >+				__ATOMIC_RELEASE);
>> > 	}
>> >
>> > 	return 0;
>> >@@ -146,7 +170,13 @@
>pipeline_queue_worker_multi_stage_tx(void
>> >*arg)
>> >
>> > 		if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>> > 			pipeline_event_tx(dev, port, &ev);
>> >-			w->processed_pkts++;
>> >+
>> >+			/* release barrier here ensures stored operation
>> >+			 * of the event completes before the number of
>> >+			 * processed pkts is visible to the main core
>> >+			 */
>> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
>> >+					__ATOMIC_RELEASE);
>> > 			continue;
>> > 		}
>> >
>> >@@ -180,7 +210,13 @@
>> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>> > 			ev.queue_id = tx_queue[ev.mbuf->port];
>> > 			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> > 			pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >-			w->processed_pkts++;
>> >+
>> >+			/* release barrier here ensures stored operation
>> >+			 * of the event completes before the number of
>> >+			 * processed pkts is visible to the main core
>> >+			 */
>> >+			__atomic_fetch_add(&(w->processed_pkts), 1,
>> >+					__ATOMIC_RELEASE);
>> > 		} else {
>> > 			ev.queue_id++;
>> > 			pipeline_fwd_event(&ev,
>> >sched_type_list[cq_id]);
>> >@@ -214,7 +250,13 @@
>> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>> > 			if (ev[i].queue_id == tx_queue[ev[i].mbuf-
>> >>port]) {
>> > 				pipeline_event_tx(dev, port, &ev[i]);
>> > 				ev[i].op = RTE_EVENT_OP_RELEASE;
>> >-				w->processed_pkts++;
>> >+
>> >+				/* release barrier here ensures stored
>> >operation
>> >+				 * of the event completes before the
>> >number of
>> >+				 * processed pkts is visible to the main
>> >core
>> >+				 */
>> >+				__atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+						__ATOMIC_RELEASE);
>> > 				continue;
>> > 			}
>> >
>> >@@ -254,7 +296,13 @@
>> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>> >
>> >	rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>> > 				pipeline_fwd_event(&ev[i],
>> >
>> >	RTE_SCHED_TYPE_ATOMIC);
>> >-				w->processed_pkts++;
>> >+
>> >+				/* release barrier here ensures stored
>> >operation
>> >+				 * of the event completes before the
>> >number of
>> >+				 * processed pkts is visible to the main
>> >core
>> >+				 */
>> >+				__atomic_fetch_add(&(w-
>> >>processed_pkts), 1,
>> >+						__ATOMIC_RELEASE);
>> > 			} else {
>> > 				ev[i].queue_id++;
>> > 				pipeline_fwd_event(&ev[i],
>> >--
>> >2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] 回复: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2021-01-05  9:29   ` Pavan Nikhilesh Bhagavatula
@ 2021-01-08  7:12     ` Feifei Wang
  2021-01-08  9:12       ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
  0 siblings, 1 reply; 12+ messages in thread
From: Feifei Wang @ 2021-01-08  7:12 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, jerinj, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd, nd

Hi, Pavan



> -----邮件原件-----

> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>

> 发送时间: 2021年1月5日 17:29

> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com; Harry

> van Haaren <harry.van.haaren@intel.com>

> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli

> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng Wang

> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>

> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline

> test

>

> Hi Feifei,

>

> >Hi, Pavan

> >

> >Sorry for my late reply and thanks very much for your review.

> >

> >> -----Original Message-----

> >> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>>

> >> Sent: 2020年12月22日 18:33

> >> To: Feifei Wang <Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>; jerinj@marvell.com<mailto:jerinj@marvell.com>;

> >Harry van

> >> Haaren <harry.van.haaren@intel.com<mailto:harry.van.haaren@intel.com>>; Pavan Nikhilesh

> >> <pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@caviumnetworks.com>>

> >> Cc: dev@dpdk.org<mailto:dev@dpdk.org>; nd <nd@arm.com<mailto:nd@arm.com>>; Honnappa Nagarahalli

> >> <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; stable@dpdk.org<mailto:stable@dpdk.org>; Phil Yang

> >> <Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>

> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers

> >for

> >> pipeline test

> >>

> >>

> >> >Add release barriers before updating the processed packets for

> >worker

> >> >lcores to ensure the worker lcore has really finished data

> >> >processing and then it can update the processed packets number.

> >> >

> >>

> >> I believe we can live with minor inaccuracies in stats being

> >> presented

> >as

> >> atomics are pretty heavy when scheduler is limited to burst size as 1.

> >>

> >> One option is to move it before a pipeline operation

> >(pipeline_event_tx,

> >> pipeline_fwd_event etc.) as they imply implicit release barrier (as

> >> all

> >the

> >> changes done to the event should be visible to the next core).

> >

> >If I understand correctly, your meaning is that move release barriers

> >before pipeline_event_tx or pipeline_fwd_event. This can ensure the

> >event has been processed before the next core begins to tx/fwd. For

> >example:

>

> What I meant was event APIs such as `rte_event_enqueue_burst`,

> `rte_event_eth_tx_adapter_enqueue`

> act as an implicit release barrier and the API `rte_event_dequeue_burst` act

> as an implicit acquire barrier.



>

> Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I

> don’t believe we need barriers in Between.



Sorry for my misunderstanding this. And I agree with you that no barriers are

needed between dequeue and enqueue.



Now, let's go back to the beginning. Actually with this patch, our barrier is mainly

for the synchronous variable " w->processed_pkts ". As we all know, the event is firstly

dequeued and then enqueued, after this, the event can be treated as the processed event

and included in the statistics("w->processed_pkts++").



Thus, we add a release barrier before " w->processed_pkts++" is to prevent this operation

being executed ahead of time. For example:

dequeue  ->  w->processed_pkts++  ->  enqueue

This cause that the worker doesn't actually finish this event processing, but the event is treated

as the processed one and included in the statistics.



______________________________________________________________________________



By the way, I have two other questions about pipeline process test in "test_pipeline_queue".

1. when do we start counting processed events (w->processed_pkts)?

For the fwd mode (internal_port = false), when we choose single stage, application increments

the number events processed after "pipeline_event enqueue". However, when we choose multiple

stage, application increments the number events processed before "pipleline_event_enqueue". So,

maybe we can unify this. For example of multiple stage:



                                if (cq_id == last_queue) {

                                                ev.queue_id = tx_queue[ev.mbuf->port];

                                                rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);

                                                pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);

                                +             pipeline_event_enqueue(dev, port, &ev);

                                                w->processed_pkts++;

                                } else {

                                                ev.queue_id++;

                                                pipeline_fwd_event(&ev, sched_type_list[cq_id]);

                                +             pipeline_event_enqueue(dev, port, &ev);

                                }



                -              pipeline_event_enqueue(dev, port, &ev);



2. Whether  "pipeline_event_enqueue" is needed after "pipeline_event_tx" for tx mode?

For single_stage_burst_tx mode, after "pipeline_event_tx", the worker has to enqueue again

due to  "pipeline_event_enqueue_burst", so maybe we should jump out of the loop after

“pipeline_event_tx”, for example:



                                                if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) {

                                                                pipeline_event_tx(dev, port, &ev[i]);

                                                                ev[i].op = RTE_EVENT_OP_RELEASE;

                                                                w->processed_pkts++;

                                                +             continue;

                                                } else {

                                                                ev[i].queue_id++;

                                                                pipeline_fwd_event(&ev[i],

                                                                                                RTE_SCHED_TYPE_ATOMIC);

                                                }

                                }



                                pipeline_event_enqueue_burst(dev, port, ev, nb_rx);





>

> >

> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {

> >                          +             __atomic_thread_fence(__ATOMIC_RELEASE);

> >                                          pipeline_event_tx(dev, port, &ev);

> >                                          w->processed_pkts++;

> >                          } else {

> >                                          ev.queue_id++;

> >                          +             __atomic_thread_fence(__ATOMIC_RELEASE);

> >                                          pipeline_fwd_event(&ev,

> >RTE_SCHED_TYPE_ATOMIC);

> >                                          pipeline_event_enqueue(dev, port, &ev);

> >

> >However, there are two reasons to prevent this:

> >

> >First, compare with other tests in app/eventdev, for example, the

> >eventdev perf test, the wmb is after event operation to ensure

> >operation has been finished and then w->processed_pkts++.

>

> In case of perf_* tests start with a dequeue() and finally ends with a

> mempool_put() should also act as implicit acquire release pairs making stats

> consistent?



For perf tests, this consistency refers to that there is a wmb after mempool_put().

Please refer to this link:

http://patches.dpdk.org/patch/85634/



>

> >So, if we move release barriers before tx/fwd, it may cause that the

> >tests of app/eventdev become  inconsistent.This may reduce the

> >maintainability of the code and make it difficult to understand.

> >

> >Second, it is a test case, though heavy thread may cause performance

> >degradation, it can ensure that the operation process and the test

> >result are correct. And maybe for a test case, correctness is more

> >important than performance.

> >

>

> Most of our internal perf test run on 24/48 core combinations and since

> Octeontx2 event device driver supports a burst size of 1, it will show up as

> Huge performance degradation.



For the impact on performance, I do the test using software driver, following are some test results:

------------------------------------------------------------------------------------------------------------------------------------

Architecture: aarch64

Nics: ixgbe-82599

CPU: Cortex-A72

BURST_SIZE: 1

Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a

Flow: one flow, 64bits package, TX rate: 1.4Mpps



Without this patch:

0.954 mpps avg 0.953 mpps



With this patch:

0.932 mpps avg 0.930 mpps

------------------------------------------------------------------------------------------------------------------------------------



Based on the result above, there is no significant performance degradation with this patch.

This is because the release barrier is only for  “w->processed_pkts++”. It just ensures that the worker core

increments the number events processed after enqueue, and it doesn’t affect dequeue/enqueue:



dequeue -> enqueue -> release barrier -> w->processed_pkts++



On the other hand, I infer the reason for the slight decrease in measurement performance is that the release barrier

prevent “w->processed_pkts++” before that the event has been processed (enqueue). But I think this test result is closer

to the real performance.

And sorry for that we have no octentx2 device, so there is no test result on Octeontx2 event device driver. Would you please

help us test this patch on octentx2 when you are convenient. Thanks very much.



Best Regards

Feifei



>

> >So, due to two reasons above, I'm ambivalent about how we should do in

> >the next step.

> >

> >Best Regards

> >Feifei

>

> Regards,

> Pavan.

>

> >

> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker

> >> >functions")

> >> >Cc: pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>

> >> >Cc: stable@dpdk.org<mailto:stable@dpdk.org>

> >> >

> >> >Signed-off-by: Phil Yang <phil.yang@arm.com<mailto:phil.yang@arm.com>>

> >> >Signed-off-by: Feifei Wang <feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>

> >> >Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>

> >> >---

> >> > app/test-eventdev/test_pipeline_queue.c | 64

> >> >+++++++++++++++++++++----

> >> > 1 file changed, 56 insertions(+), 8 deletions(-)

> >> >

> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-

> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb

> >100644

> >> >--- a/app/test-eventdev/test_pipeline_queue.c

> >> >+++ b/app/test-eventdev/test_pipeline_queue.c

> >> >@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void

> >> >*arg)

> >> >

> >> >                    if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {

> >> >                                    pipeline_event_tx(dev, port, &ev);

> >> >-                                  w->processed_pkts++;

> >> >+

> >> >+                                 /* release barrier here ensures stored operation

> >> >+                                 * of the event completes before the number of

> >> >+                                 * processed pkts is visible to the main core

> >> >+                                 */

> >> >+                                 __atomic_fetch_add(&(w->processed_pkts), 1,

> >> >+                                                                 __ATOMIC_RELEASE);

> >> >                    } else {

> >> >                                    ev.queue_id++;

> >> >                                    pipeline_fwd_event(&ev,

> >> >RTE_SCHED_TYPE_ATOMIC);

> >> >@@ -59,7 +65,13 @@

> >pipeline_queue_worker_single_stage_fwd(void

> >> >*arg)

> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);

> >> >                    pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);

> >> >                    pipeline_event_enqueue(dev, port, &ev);

> >> >-                  w->processed_pkts++;

> >> >+

> >> >+                 /* release barrier here ensures stored operation

> >> >+                 * of the event completes before the number of

> >> >+                 * processed pkts is visible to the main core

> >> >+                 */

> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,

> >> >+                                                 __ATOMIC_RELEASE);

> >> >    }

> >> >

> >> >    return 0;

> >> >@@ -84,7 +96,13 @@

> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)

> >> >                                    if (ev[i].sched_type ==

> >> >RTE_SCHED_TYPE_ATOMIC) {

> >> >                                                    pipeline_event_tx(dev, port, &ev[i]);

> >> >                                                    ev[i].op = RTE_EVENT_OP_RELEASE;

> >> >-                                                  w->processed_pkts++;

> >> >+

> >> >+                                                 /* release barrier here ensures stored

> >> >operation

> >> >+                                                 * of the event completes before the

> >> >number of

> >> >+                                                 * processed pkts is visible to the main

> >> >core

> >> >+                                                 */

> >> >+                                                 __atomic_fetch_add(&(w-

> >> >>processed_pkts), 1,

> >> >+                                                                                 __ATOMIC_RELEASE);

> >> >                                    } else {

> >> >                                                    ev[i].queue_id++;

> >> >                                                    pipeline_fwd_event(&ev[i],

> >> >@@ -121,7 +139,13 @@

> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)

> >> >                    }

> >> >

> >> >                    pipeline_event_enqueue_burst(dev, port, ev, nb_rx);

> >> >-                  w->processed_pkts += nb_rx;

> >> >+

> >> >+                 /* release barrier here ensures stored operation

> >> >+                 * of the event completes before the number of

> >> >+                 * processed pkts is visible to the main core

> >> >+                 */

> >> >+                 __atomic_fetch_add(&(w->processed_pkts), nb_rx,

> >> >+                                                 __ATOMIC_RELEASE);

> >> >    }

> >> >

> >> >    return 0;

> >> >@@ -146,7 +170,13 @@

> >pipeline_queue_worker_multi_stage_tx(void

> >> >*arg)

> >> >

> >> >                    if (ev.queue_id == tx_queue[ev.mbuf->port]) {

> >> >                                    pipeline_event_tx(dev, port, &ev);

> >> >-                                  w->processed_pkts++;

> >> >+

> >> >+                                 /* release barrier here ensures stored operation

> >> >+                                 * of the event completes before the number of

> >> >+                                 * processed pkts is visible to the main core

> >> >+                                 */

> >> >+                                 __atomic_fetch_add(&(w->processed_pkts), 1,

> >> >+                                                                 __ATOMIC_RELEASE);

> >> >                                    continue;

> >> >                    }

> >> >

> >> >@@ -180,7 +210,13 @@

> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)

> >> >                                    ev.queue_id = tx_queue[ev.mbuf->port];

> >> >                                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);

> >> >                                    pipeline_fwd_event(&ev,

> >> >RTE_SCHED_TYPE_ATOMIC);

> >> >-                                  w->processed_pkts++;

> >> >+

> >> >+                                 /* release barrier here ensures stored operation

> >> >+                                 * of the event completes before the number of

> >> >+                                 * processed pkts is visible to the main core

> >> >+                                 */

> >> >+                                 __atomic_fetch_add(&(w->processed_pkts), 1,

> >> >+                                                                 __ATOMIC_RELEASE);

> >> >                    } else {

> >> >                                    ev.queue_id++;

> >> >                                    pipeline_fwd_event(&ev,

> >> >sched_type_list[cq_id]);

> >> >@@ -214,7 +250,13 @@

> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)

> >> >                                    if (ev[i].queue_id == tx_queue[ev[i].mbuf-

> >> >>port]) {

> >> >                                                    pipeline_event_tx(dev, port, &ev[i]);

> >> >                                                    ev[i].op = RTE_EVENT_OP_RELEASE;

> >> >-                                                  w->processed_pkts++;

> >> >+

> >> >+                                                 /* release barrier here ensures stored

> >> >operation

> >> >+                                                 * of the event completes before the

> >> >number of

> >> >+                                                 * processed pkts is visible to the main

> >> >core

> >> >+                                                 */

> >> >+                                                 __atomic_fetch_add(&(w-

> >> >>processed_pkts), 1,

> >> >+                                                                                 __ATOMIC_RELEASE);

> >> >                                                    continue;

> >> >                                    }

> >> >

> >> >@@ -254,7 +296,13 @@

> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)

> >> >

> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);

> >> >                                                    pipeline_fwd_event(&ev[i],

> >> >

> >> >    RTE_SCHED_TYPE_ATOMIC);

> >> >-                                                  w->processed_pkts++;

> >> >+

> >> >+                                                 /* release barrier here ensures stored

> >> >operation

> >> >+                                                 * of the event completes before the

> >> >number of

> >> >+                                                 * processed pkts is visible to the main

> >> >core

> >> >+                                                 */

> >> >+                                                 __atomic_fetch_add(&(w-

> >> >>processed_pkts), 1,

> >> >+                                                                                 __ATOMIC_RELEASE);

> >> >                                    } else {

> >> >                                                    ev[i].queue_id++;

> >> >                                                    pipeline_fwd_event(&ev[i],

> >> >--

> >> >2.17.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2021-01-08  7:12     ` [dpdk-dev] 回复: " Feifei Wang
@ 2021-01-08  9:12       ` Pavan Nikhilesh Bhagavatula
  2021-01-08 10:44         ` [dpdk-dev] 回复: " Feifei Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2021-01-08  9:12 UTC (permalink / raw)
  To: Feifei Wang, Jerin Jacob Kollanukkaran, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd, nd

Hi Feifei,

>Hi, Pavan
>
>
>
>> -----邮件原件-----
>
>> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
>
>> 发送时间: 2021年1月5日 17:29
>
>> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com;
>Harry
>
>> van Haaren <harry.van.haaren@intel.com>
>
>> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
>
>> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng Wang
>
>> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
>
>> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
>pipeline
>
>> test
>
>>
>
>> Hi Feifei,
>
>>
>
>> >Hi, Pavan
>
>> >
>
>> >Sorry for my late reply and thanks very much for your review.
>
>> >
>
>> >> -----Original Message-----
>
>> >> From: Pavan Nikhilesh Bhagavatula
><pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>>
>
>> >> Sent: 2020年12月22日 18:33
>
>> >> To: Feifei Wang
><Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>;
>jerinj@marvell.com<mailto:jerinj@marvell.com>;
>
>> >Harry van
>
>> >> Haaren
><harry.van.haaren@intel.com<mailto:harry.van.haaren@intel.com>>;
>Pavan Nikhilesh
>
>> >>
><pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@caviumn
>etworks.com>>
>
>> >> Cc: dev@dpdk.org<mailto:dev@dpdk.org>; nd
><nd@arm.com<mailto:nd@arm.com>>; Honnappa Nagarahalli
>
>> >>
><Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm
>.com>>; stable@dpdk.org<mailto:stable@dpdk.org>; Phil Yang
>
>> >> <Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>
>
>> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
>barriers
>
>> >for
>
>> >> pipeline test
>
>> >>
>
>> >>
>
>> >> >Add release barriers before updating the processed packets for
>
>> >worker
>
>> >> >lcores to ensure the worker lcore has really finished data
>
>> >> >processing and then it can update the processed packets number.
>
>> >> >
>
>> >>
>
>> >> I believe we can live with minor inaccuracies in stats being
>
>> >> presented
>
>> >as
>
>> >> atomics are pretty heavy when scheduler is limited to burst size as
>1.
>
>> >>
>
>> >> One option is to move it before a pipeline operation
>
>> >(pipeline_event_tx,
>
>> >> pipeline_fwd_event etc.) as they imply implicit release barrier (as
>
>> >> all
>
>> >the
>
>> >> changes done to the event should be visible to the next core).
>
>> >
>
>> >If I understand correctly, your meaning is that move release barriers
>
>> >before pipeline_event_tx or pipeline_fwd_event. This can ensure the
>
>> >event has been processed before the next core begins to tx/fwd. For
>
>> >example:
>
>>
>
>> What I meant was event APIs such as `rte_event_enqueue_burst`,
>
>> `rte_event_eth_tx_adapter_enqueue`
>
>> act as an implicit release barrier and the API
>`rte_event_dequeue_burst` act
>
>> as an implicit acquire barrier.
>
>
>
>>
>
>> Since, pipeline_* test starts with a dequeue() and ends with an
>enqueue() I
>
>> don’t believe we need barriers in Between.
>
>
>
>Sorry for my misunderstanding this. And I agree with you that no
>barriers are
>
>needed between dequeue and enqueue.
>
>
>
>Now, let's go back to the beginning. Actually with this patch, our barrier
>is mainly
>
>for the synchronous variable " w->processed_pkts ". As we all know, the
>event is firstly
>
>dequeued and then enqueued, after this, the event can be treated as
>the processed event
>
>and included in the statistics("w->processed_pkts++").
>
>
>
>Thus, we add a release barrier before " w->processed_pkts++" is to
>prevent this operation
>
>being executed ahead of time. For example:
>
>dequeue  ->  w->processed_pkts++  ->  enqueue
>
>This cause that the worker doesn't actually finish this event processing,
>but the event is treated
>
>as the processed one and included in the statistics.
>

But the current sequence is dequeue-> enqueue-> w->processed_pkts++ and
enqueue already acts as an explicit release barrier right?

>
>
>__________________________________________________________
>____________________
>
>
>
>By the way, I have two other questions about pipeline process test in
>"test_pipeline_queue".
>
>1. when do we start counting processed events (w->processed_pkts)?
>
>For the fwd mode (internal_port = false), when we choose single stage,
>application increments
>
>the number events processed after "pipeline_event enqueue".
>However, when we choose multiple
>
>stage, application increments the number events processed before
>"pipleline_event_enqueue". 

We count an event as process when all the stages have completed and its Trasnmitted.

>So,
>
>maybe we can unify this. For example of multiple stage:
>
>
>
>                                if (cq_id == last_queue) {
>
>                                                ev.queue_id = tx_queue[ev.mbuf->port];
>
>                                                rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>0);
>
>                                                pipeline_fwd_event(&ev,
>RTE_SCHED_TYPE_ATOMIC);
>
>                                +             pipeline_event_enqueue(dev, port, &ev);
>
>                                                w->processed_pkts++;
>
>                                } else {
>
>                                                ev.queue_id++;
>
>                                                pipeline_fwd_event(&ev,
>sched_type_list[cq_id]);
>
>                                +             pipeline_event_enqueue(dev, port, &ev);
>
>                                }
>
>
>
>                -              pipeline_event_enqueue(dev, port, &ev);
>
>

The above change makes sense. 

>
>2. Whether  "pipeline_event_enqueue" is needed after
>"pipeline_event_tx" for tx mode?
>
>For single_stage_burst_tx mode, after "pipeline_event_tx", the worker
>has to enqueue again
>
>due to  "pipeline_event_enqueue_burst", so maybe we should jump out
>of the loop after
>
>“pipeline_event_tx”,

We call enqueue burst to release the events i.e. enqueue events with 
RTE_EVENT_OP_RELEASE.

>
 for example:
>
>
>
>                                                if (ev[i].sched_type ==
>RTE_SCHED_TYPE_ATOMIC) {
>
>                                                                pipeline_event_tx(dev, port, &ev[i]);
>
>                                                                ev[i].op = RTE_EVENT_OP_RELEASE;
>
>                                                                w->processed_pkts++;
>
>                                                +             continue;
>
>                                                } else {
>
>                                                                ev[i].queue_id++;
>
>                                                                pipeline_fwd_event(&ev[i],
>
>
>RTE_SCHED_TYPE_ATOMIC);
>
>                                                }
>
>                                }
>
>
>
>                                pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
>
>
>
>
>
>>
>
>> >
>
>> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>
>> >                          +
>__atomic_thread_fence(__ATOMIC_RELEASE);
>
>> >                                          pipeline_event_tx(dev, port, &ev);
>
>> >                                          w->processed_pkts++;
>
>> >                          } else {
>
>> >                                          ev.queue_id++;
>
>> >                          +
>__atomic_thread_fence(__ATOMIC_RELEASE);
>
>> >                                          pipeline_fwd_event(&ev,
>
>> >RTE_SCHED_TYPE_ATOMIC);
>
>> >                                          pipeline_event_enqueue(dev, port, &ev);
>
>> >
>
>> >However, there are two reasons to prevent this:
>
>> >
>
>> >First, compare with other tests in app/eventdev, for example, the
>
>> >eventdev perf test, the wmb is after event operation to ensure
>
>> >operation has been finished and then w->processed_pkts++.
>
>>
>
>> In case of perf_* tests start with a dequeue() and finally ends with a
>
>> mempool_put() should also act as implicit acquire release pairs
>making stats
>
>> consistent?
>
>
>
>For perf tests, this consistency refers to that there is a wmb after
>mempool_put().
>
>Please refer to this link:
>
>https://urldefense.proofpoint.com/v2/url?u=http-
>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0mO
>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=zg
>QHeSDiXWfI1PIIUxXBqMS6E-2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
>sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
>
>
>
>>
>
>> >So, if we move release barriers before tx/fwd, it may cause that the
>
>> >tests of app/eventdev become  inconsistent.This may reduce the
>
>> >maintainability of the code and make it difficult to understand.
>
>> >
>
>> >Second, it is a test case, though heavy thread may cause
>performance
>
>> >degradation, it can ensure that the operation process and the test
>
>> >result are correct. And maybe for a test case, correctness is more
>
>> >important than performance.
>
>> >
>
>>
>
>> Most of our internal perf test run on 24/48 core combinations and
>since
>
>> Octeontx2 event device driver supports a burst size of 1, it will show
>up as
>
>> Huge performance degradation.
>
>
>
>For the impact on performance, I do the test using software driver,
>following are some test results:
>
>---------------------------------------------------------------------------------------
>---------------------------------------------
>
>Architecture: aarch64
>
>Nics: ixgbe-82599
>
>CPU: Cortex-A72
>
>BURST_SIZE: 1
>
>Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
>test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
>
>Flow: one flow, 64bits package, TX rate: 1.4Mpps
>
>
>
>Without this patch:
>
>0.954 mpps avg 0.953 mpps
>
>
>
>With this patch:
>
>0.932 mpps avg 0.930 mpps
>
>---------------------------------------------------------------------------------------
>---------------------------------------------
>
>
>
>Based on the result above, there is no significant performance
>degradation with this patch.
>
>This is because the release barrier is only for  “w->processed_pkts++”. It
>just ensures that the worker core
>
>increments the number events processed after enqueue, and it doesn’t
>affect dequeue/enqueue:
>
>
>
>dequeue -> enqueue -> release barrier -> w->processed_pkts++
>

Here enqueue already acts as an explicit release barrier.

>
>
>On the other hand, I infer the reason for the slight decrease in
>measurement performance is that the release barrier
>
>prevent “w->processed_pkts++” before that the event has been
>processed (enqueue). But I think this test result is closer
>
>to the real performance.
>
>And sorry for that we have no octentx2 device, so there is no test result
>on Octeontx2 event device driver. Would you please
>
>help us test this patch on octentx2 when you are convenient. Thanks
>very much.
>

I will report the performance numbers on Monday.

>
>
>Best Regards
>
>Feifei

Regards,
Pavan.

>
>
>
>>
>
>> >So, due to two reasons above, I'm ambivalent about how we should
>do in
>
>> >the next step.
>
>> >
>
>> >Best Regards
>
>> >Feifei
>
>>
>
>> Regards,
>
>> Pavan.
>
>>
>
>> >
>
>> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
>
>> >> >functions")
>
>> >> >Cc:
>pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>
>
>> >> >Cc: stable@dpdk.org<mailto:stable@dpdk.org>
>
>> >> >
>
>> >> >Signed-off-by: Phil Yang
><phil.yang@arm.com<mailto:phil.yang@arm.com>>
>
>> >> >Signed-off-by: Feifei Wang
><feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>
>
>> >> >Reviewed-by: Ruifeng Wang
><ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>
>
>> >> >---
>
>> >> > app/test-eventdev/test_pipeline_queue.c | 64
>
>> >> >+++++++++++++++++++++----
>
>> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
>
>> >> >
>
>> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
>
>> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>
>> >100644
>
>> >> >--- a/app/test-eventdev/test_pipeline_queue.c
>
>> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
>
>> >> >@@ -30,7 +30,13 @@
>pipeline_queue_worker_single_stage_tx(void
>
>> >> >*arg)
>
>> >> >
>
>> >> >                    if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>
>> >> >                                    pipeline_event_tx(dev, port, &ev);
>
>> >> >-                                  w->processed_pkts++;
>
>> >> >+
>
>> >> >+                                 /* release barrier here ensures stored
>operation
>
>> >> >+                                 * of the event completes before the number
>of
>
>> >> >+                                 * processed pkts is visible to the main core
>
>> >> >+                                 */
>
>> >> >+                                 __atomic_fetch_add(&(w->processed_pkts),
>1,
>
>> >> >+                                                                 __ATOMIC_RELEASE);
>
>> >> >                    } else {
>
>> >> >                                    ev.queue_id++;
>
>> >> >                                    pipeline_fwd_event(&ev,
>
>> >> >RTE_SCHED_TYPE_ATOMIC);
>
>> >> >@@ -59,7 +65,13 @@
>
>> >pipeline_queue_worker_single_stage_fwd(void
>
>> >> >*arg)
>
>> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>
>> >> >                    pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
>
>> >> >                    pipeline_event_enqueue(dev, port, &ev);
>
>> >> >-                  w->processed_pkts++;
>
>> >> >+
>
>> >> >+                 /* release barrier here ensures stored operation
>
>> >> >+                 * of the event completes before the number of
>
>> >> >+                 * processed pkts is visible to the main core
>
>> >> >+                 */
>
>> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
>
>> >> >+                                                 __ATOMIC_RELEASE);
>
>> >> >    }
>
>> >> >
>
>> >> >    return 0;
>
>> >> >@@ -84,7 +96,13 @@
>
>> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>
>> >> >                                    if (ev[i].sched_type ==
>
>> >> >RTE_SCHED_TYPE_ATOMIC) {
>
>> >> >                                                    pipeline_event_tx(dev, port, &ev[i]);
>
>> >> >                                                    ev[i].op = RTE_EVENT_OP_RELEASE;
>
>> >> >-                                                  w->processed_pkts++;
>
>> >> >+
>
>> >> >+                                                 /* release barrier here ensures stored
>
>> >> >operation
>
>> >> >+                                                 * of the event completes before the
>
>> >> >number of
>
>> >> >+                                                 * processed pkts is visible to the main
>
>> >> >core
>
>> >> >+                                                 */
>
>> >> >+                                                 __atomic_fetch_add(&(w-
>
>> >> >>processed_pkts), 1,
>
>> >> >+
>__ATOMIC_RELEASE);
>
>> >> >                                    } else {
>
>> >> >                                                    ev[i].queue_id++;
>
>> >> >                                                    pipeline_fwd_event(&ev[i],
>
>> >> >@@ -121,7 +139,13 @@
>
>> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>
>> >> >                    }
>
>> >> >
>
>> >> >                    pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
>
>> >> >-                  w->processed_pkts += nb_rx;
>
>> >> >+
>
>> >> >+                 /* release barrier here ensures stored operation
>
>> >> >+                 * of the event completes before the number of
>
>> >> >+                 * processed pkts is visible to the main core
>
>> >> >+                 */
>
>> >> >+                 __atomic_fetch_add(&(w->processed_pkts), nb_rx,
>
>> >> >+                                                 __ATOMIC_RELEASE);
>
>> >> >    }
>
>> >> >
>
>> >> >    return 0;
>
>> >> >@@ -146,7 +170,13 @@
>
>> >pipeline_queue_worker_multi_stage_tx(void
>
>> >> >*arg)
>
>> >> >
>
>> >> >                    if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>
>> >> >                                    pipeline_event_tx(dev, port, &ev);
>
>> >> >-                                  w->processed_pkts++;
>
>> >> >+
>
>> >> >+                                 /* release barrier here ensures stored
>operation
>
>> >> >+                                 * of the event completes before the number
>of
>
>> >> >+                                 * processed pkts is visible to the main core
>
>> >> >+                                 */
>
>> >> >+                                 __atomic_fetch_add(&(w->processed_pkts),
>1,
>
>> >> >+                                                                 __ATOMIC_RELEASE);
>
>> >> >                                    continue;
>
>> >> >                    }
>
>> >> >
>
>> >> >@@ -180,7 +210,13 @@
>
>> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>
>> >> >                                    ev.queue_id = tx_queue[ev.mbuf->port];
>
>> >> >                                    rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>0);
>
>> >> >                                    pipeline_fwd_event(&ev,
>
>> >> >RTE_SCHED_TYPE_ATOMIC);
>
>> >> >-                                  w->processed_pkts++;
>
>> >> >+
>
>> >> >+                                 /* release barrier here ensures stored
>operation
>
>> >> >+                                 * of the event completes before the number
>of
>
>> >> >+                                 * processed pkts is visible to the main core
>
>> >> >+                                 */
>
>> >> >+                                 __atomic_fetch_add(&(w->processed_pkts),
>1,
>
>> >> >+                                                                 __ATOMIC_RELEASE);
>
>> >> >                    } else {
>
>> >> >                                    ev.queue_id++;
>
>> >> >                                    pipeline_fwd_event(&ev,
>
>> >> >sched_type_list[cq_id]);
>
>> >> >@@ -214,7 +250,13 @@
>
>> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>
>> >> >                                    if (ev[i].queue_id == tx_queue[ev[i].mbuf-
>
>> >> >>port]) {
>
>> >> >                                                    pipeline_event_tx(dev, port, &ev[i]);
>
>> >> >                                                    ev[i].op = RTE_EVENT_OP_RELEASE;
>
>> >> >-                                                  w->processed_pkts++;
>
>> >> >+
>
>> >> >+                                                 /* release barrier here ensures stored
>
>> >> >operation
>
>> >> >+                                                 * of the event completes before the
>
>> >> >number of
>
>> >> >+                                                 * processed pkts is visible to the main
>
>> >> >core
>
>> >> >+                                                 */
>
>> >> >+                                                 __atomic_fetch_add(&(w-
>
>> >> >>processed_pkts), 1,
>
>> >> >+
>__ATOMIC_RELEASE);
>
>> >> >                                                    continue;
>
>> >> >                                    }
>
>> >> >
>
>> >> >@@ -254,7 +296,13 @@
>
>> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>
>> >> >
>
>> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>
>> >> >                                                    pipeline_fwd_event(&ev[i],
>
>> >> >
>
>> >> >    RTE_SCHED_TYPE_ATOMIC);
>
>> >> >-                                                  w->processed_pkts++;
>
>> >> >+
>
>> >> >+                                                 /* release barrier here ensures stored
>
>> >> >operation
>
>> >> >+                                                 * of the event completes before the
>
>> >> >number of
>
>> >> >+                                                 * processed pkts is visible to the main
>
>> >> >core
>
>> >> >+                                                 */
>
>> >> >+                                                 __atomic_fetch_add(&(w-
>
>> >> >>processed_pkts), 1,
>
>> >> >+
>__ATOMIC_RELEASE);
>
>> >> >                                    } else {
>
>> >> >                                                    ev[i].queue_id++;
>
>> >> >                                                    pipeline_fwd_event(&ev[i],
>
>> >> >--
>
>> >> >2.17.1
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] 回复: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2021-01-08  9:12       ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
@ 2021-01-08 10:44         ` Feifei Wang
  2021-01-08 10:58           ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
  0 siblings, 1 reply; 12+ messages in thread
From: Feifei Wang @ 2021-01-08 10:44 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, jerinj, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd, nd, nd

Hi, Pavan

> -----邮件原件-----
> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> 发送时间: 2021年1月8日 17:13
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com; Harry
> van Haaren <harry.van.haaren@intel.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline
> test
> 
> Hi Feifei,
> 
> >Hi, Pavan
> >
> >
> >
> >> -----邮件原件-----
> >
> >> 发件人: Pavan Nikhilesh Bhagavatula <mailto:pbhagavatula@marvell.com>
> >
> >> 发送时间: 2021年1月5日 17:29
> >
> >> 收件人: Feifei Wang <mailto:Feifei.Wang2@arm.com>; mailto:jerinj@marvell.com;
> >Harry
> >
> >> van Haaren <mailto:harry.van.haaren@intel.com>
> >
> >> 抄送: mailto:dev@dpdk.org; nd <mailto:nd@arm.com>; Honnappa Nagarahalli
> >
> >> <mailto:Honnappa.Nagarahalli@arm.com>; mailto:stable@dpdk.org; Ruifeng Wang
> >
> >> <mailto:Ruifeng.Wang@arm.com>; nd <mailto:nd@arm.com>
> >
> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
> >pipeline
> >
> >> test
> >
> >>
> >
> >> Hi Feifei,
> >
> >>
> >
> >> >Hi, Pavan
> >
> >> >
> >
> >> >Sorry for my late reply and thanks very much for your review.
> >
> >> >
> >
> >> >> -----Original Message-----
> >
> >> >> From: Pavan Nikhilesh Bhagavatula
> ><mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>>
> >
> >> >> Sent: 2020年12月22日 18:33
> >
> >> >> To: Feifei Wang
> ><mailto:Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>;
> >mailto:jerinj@marvell.com<mailto:jerinj@marvell.com>;
> >
> >> >Harry van
> >
> >> >> Haaren
> ><mailto:harry.van.haaren@intel.com<mailto:harry.van.haaren@intel.com>>;
> >Pavan Nikhilesh
> >
> >> >>
> ><pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@caviumn
> >etworks.com>>
> >
> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd
> ><mailto:nd@arm.com<mailto:nd@arm.com>>; Honnappa Nagarahalli
> >
> >> >>
> ><Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm
> >.com>>; mailto:stable@dpdk.org<mailto:stable@dpdk.org>; Phil Yang
> >
> >> >> <mailto:Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>
> >
> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
> >barriers
> >
> >> >for
> >
> >> >> pipeline test
> >
> >> >>
> >
> >> >>
> >
> >> >> >Add release barriers before updating the processed packets for
> >
> >> >worker
> >
> >> >> >lcores to ensure the worker lcore has really finished data
> >
> >> >> >processing and then it can update the processed packets number.
> >
> >> >> >
> >
> >> >>
> >
> >> >> I believe we can live with minor inaccuracies in stats being
> >
> >> >> presented
> >
> >> >as
> >
> >> >> atomics are pretty heavy when scheduler is limited to burst size
> >> >> as
> >1.
> >
> >> >>
> >
> >> >> One option is to move it before a pipeline operation
> >
> >> >(pipeline_event_tx,
> >
> >> >> pipeline_fwd_event etc.) as they imply implicit release barrier
> >> >> (as
> >
> >> >> all
> >
> >> >the
> >
> >> >> changes done to the event should be visible to the next core).
> >
> >> >
> >
> >> >If I understand correctly, your meaning is that move release
> >> >barriers
> >
> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure the
> >
> >> >event has been processed before the next core begins to tx/fwd. For
> >
> >> >example:
> >
> >>
> >
> >> What I meant was event APIs such as `rte_event_enqueue_burst`,
> >
> >> `rte_event_eth_tx_adapter_enqueue`
> >
> >> act as an implicit release barrier and the API
> >`rte_event_dequeue_burst` act
> >
> >> as an implicit acquire barrier.
> >
> >
> >
> >>
> >
> >> Since, pipeline_* test starts with a dequeue() and ends with an
> >enqueue() I
> >
> >> don’t believe we need barriers in Between.
> >
> >
> >
> >Sorry for my misunderstanding this. And I agree with you that no
> >barriers are
> >
> >needed between dequeue and enqueue.
> >
> >
> >
> >Now, let's go back to the beginning. Actually with this patch, our
> >barrier is mainly
> >
> >for the synchronous variable " w->processed_pkts ". As we all know, the
> >event is firstly
> >
> >dequeued and then enqueued, after this, the event can be treated as the
> >processed event
> >
> >and included in the statistics("w->processed_pkts++").
> >
> >
> >
> >Thus, we add a release barrier before " w->processed_pkts++" is to
> >prevent this operation
> >
> >being executed ahead of time. For example:
> >
> >dequeue  ->  w->processed_pkts++  ->  enqueue
> >
> >This cause that the worker doesn't actually finish this event
> >processing, but the event is treated
> >
> >as the processed one and included in the statistics.
> >
> 
> But the current sequence is dequeue-> enqueue-> w->processed_pkts++
> and enqueue already acts as an explicit release barrier right?
> 

Sorry maybe I cannot understand how “enqueue” as an explicit release barrier. I think of two possibilities:
1. As you say before, all the changes done to the event should be visible to the next core and enqueue is a operation for event, so the next core should wait for the event to be enqueued.
I think this is due to data dependence for the same variable. However, ‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot prevent ‘w->processed_pkts++’ before enqueue. 
And the main core may load updated ‘w->processed_pkts’ but actually the event is still being processed. For example: 

 Time Slot	   Worker 1                                      Main core
   1                           dequeue                   
   2                  w->processed_pkts++
   3                       		                              load w->processed_pkts
   4                          enqueue

2. Some release barriers have been included in enqueue. There is a release barrier in rte_ring_enqueue :
move head -> copy elements to the ring -> release barrier -> update tail -> w->processed_pkts++
However, this barrier cannot prevent ‘w->processed_pkts++’ before update tail, and when update_tail has been finished, the enqueue process  can be seen completed. 

> >
> >
> >_________________________________________________________
> _
> >____________________
> >
> >
> >
> >By the way, I have two other questions about pipeline process test in
> >"test_pipeline_queue".
> >
> >1. when do we start counting processed events (w->processed_pkts)?
> >
> >For the fwd mode (internal_port = false), when we choose single stage,
> >application increments
> >
> >the number events processed after "pipeline_event enqueue".
> >However, when we choose multiple
> >
> >stage, application increments the number events processed before
> >"pipleline_event_enqueue".
> 
> We count an event as process when all the stages have completed and its
> Trasnmitted.
> 
> >So,
> >
> >maybe we can unify this. For example of multiple stage:
> >
> >
> >
> >                                if (cq_id == last_queue) {
> >
> >                                                ev.queue_id =
> > tx_queue[ev.mbuf->port];
> >
> >
> >rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >0);
> >
> >                                                pipeline_fwd_event(&ev,
> >RTE_SCHED_TYPE_ATOMIC);
> >
> >                                +             pipeline_event_enqueue(dev, port, &ev);
> >
> >                                                w->processed_pkts++;
> >
> >                                } else {
> >
> >                                                ev.queue_id++;
> >
> >                                                pipeline_fwd_event(&ev,
> >sched_type_list[cq_id]);
> >
> >                                +             pipeline_event_enqueue(dev, port, &ev);
> >
> >                                }
> >
> >
> >
> >                -              pipeline_event_enqueue(dev, port, &ev);
> >
> >
> 
> The above change makes sense.
> 
Thanks for your review, and I’ll update this change into the next version.
> >
> >2. Whether  "pipeline_event_enqueue" is needed after
> >"pipeline_event_tx" for tx mode?
> >
> >For single_stage_burst_tx mode, after "pipeline_event_tx", the worker
> >has to enqueue again
> >
> >due to  "pipeline_event_enqueue_burst", so maybe we should jump out of
> >the loop after
> >
> >“pipeline_event_tx”,
> 
> We call enqueue burst to release the events i.e. enqueue events with
> RTE_EVENT_OP_RELEASE.
> 
However, 
In case of single event, for ' pipeline_queue_worker_single_stage_tx' and ' pipeline_queue_worker_multi_stage_tx',
after tx, there is no release operation.

> >
>  for example:
> >
> >
> >
> >                                                if (ev[i].sched_type ==
> >RTE_SCHED_TYPE_ATOMIC) {
> >
> >
> > pipeline_event_tx(dev, port, &ev[i]);
> >
> >
> > ev[i].op = RTE_EVENT_OP_RELEASE;
> >
> >
> > w->processed_pkts++;
> >
> >                                                +             continue;
> >
> >                                                } else {
> >
> >
> > ev[i].queue_id++;
> >
> >
> > pipeline_fwd_event(&ev[i],
> >
> >
> >RTE_SCHED_TYPE_ATOMIC);
> >
> >                                                }
> >
> >                                }
> >
> >
> >
> >                                pipeline_event_enqueue_burst(dev, port,
> > ev, nb_rx);
> >
> >
> >
> >
> >
> >>
> >
> >> >
> >
> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
> >
> >> >                          +
> >__atomic_thread_fence(__ATOMIC_RELEASE);
> >
> >> >                                          pipeline_event_tx(dev,
> >> > port, &ev);
> >
> >> >                                          w->processed_pkts++;
> >
> >> >                          } else {
> >
> >> >                                          ev.queue_id++;
> >
> >> >                          +
> >__atomic_thread_fence(__ATOMIC_RELEASE);
> >
> >> >                                          pipeline_fwd_event(&ev,
> >
> >> >RTE_SCHED_TYPE_ATOMIC);
> >
> >> >
> >> > pipeline_event_enqueue(dev, port, &ev);
> >
> >> >
> >
> >> >However, there are two reasons to prevent this:
> >
> >> >
> >
> >> >First, compare with other tests in app/eventdev, for example, the
> >
> >> >eventdev perf test, the wmb is after event operation to ensure
> >
> >> >operation has been finished and then w->processed_pkts++.
> >
> >>
> >
> >> In case of perf_* tests start with a dequeue() and finally ends with
> >> a
> >
> >> mempool_put() should also act as implicit acquire release pairs
> >making stats
> >
> >> consistent?
> >
> >
> >
> >For perf tests, this consistency refers to that there is a wmb after
> >mempool_put().
> >
> >Please refer to this link:
> >
> >https://urldefense.proofpoint.com/v2/url?u=http-
> >3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0mO
> >yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=zg
> >QHeSDiXWfI1PIIUxXBqMS6E-2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
> >
> >
> >
> >>
> >
> >> >So, if we move release barriers before tx/fwd, it may cause that the
> >
> >> >tests of app/eventdev become  inconsistent.This may reduce the
> >
> >> >maintainability of the code and make it difficult to understand.
> >
> >> >
> >
> >> >Second, it is a test case, though heavy thread may cause
> >performance
> >
> >> >degradation, it can ensure that the operation process and the test
> >
> >> >result are correct. And maybe for a test case, correctness is more
> >
> >> >important than performance.
> >
> >> >
> >
> >>
> >
> >> Most of our internal perf test run on 24/48 core combinations and
> >since
> >
> >> Octeontx2 event device driver supports a burst size of 1, it will
> >> show
> >up as
> >
> >> Huge performance degradation.
> >
> >
> >
> >For the impact on performance, I do the test using software driver,
> >following are some test results:
> >
> >-----------------------------------------------------------------------
> >----------------
> >---------------------------------------------
> >
> >Architecture: aarch64
> >
> >Nics: ixgbe-82599
> >
> >CPU: Cortex-A72
> >
> >BURST_SIZE: 1
> >
> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
> >
> >Flow: one flow, 64bits package, TX rate: 1.4Mpps
> >
> >
> >
> >Without this patch:
> >
> >0.954 mpps avg 0.953 mpps
> >
> >
> >
> >With this patch:
> >
> >0.932 mpps avg 0.930 mpps
> >
> >-----------------------------------------------------------------------
> >----------------
> >---------------------------------------------
> >
> >
> >
> >Based on the result above, there is no significant performance
> >degradation with this patch.
> >
> >This is because the release barrier is only for  “w->processed_pkts++”.
> >It just ensures that the worker core
> >
> >increments the number events processed after enqueue, and it doesn’t
> >affect dequeue/enqueue:
> >
> >
> >
> >dequeue -> enqueue -> release barrier -> w->processed_pkts++
> >
> 
> Here enqueue already acts as an explicit release barrier.
> 
Please refer above reasons.

> >
> >
> >On the other hand, I infer the reason for the slight decrease in
> >measurement performance is that the release barrier
> >
> >prevent “w->processed_pkts++” before that the event has been processed
> >(enqueue). But I think this test result is closer
> >
> >to the real performance.
> >
> >And sorry for that we have no octentx2 device, so there is no test
> >result on Octeontx2 event device driver. Would you please
> >
> >help us test this patch on octentx2 when you are convenient. Thanks
> >very much.
> >
> 
> I will report the performance numbers on Monday.
> 

That’s great, Thanks very much for your help.

Best Regards
Feifei

> >
> >
> >Best Regards
> >
> >Feifei
> 
> Regards,
> Pavan.
> 
> >
> >
> >
> >>
> >
> >> >So, due to two reasons above, I'm ambivalent about how we should
> >do in
> >
> >> >the next step.
> >
> >> >
> >
> >> >Best Regards
> >
> >> >Feifei
> >
> >>
> >
> >> Regards,
> >
> >> Pavan.
> >
> >>
> >
> >> >
> >
> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
> >
> >> >> >functions")
> >
> >> >> >Cc:
> >mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>
> >
> >> >> >Cc: mailto:stable@dpdk.org<mailto:stable@dpdk.org>
> >
> >> >> >
> >
> >> >> >Signed-off-by: Phil Yang
> ><mailto:phil.yang@arm.com<mailto:phil.yang@arm.com>>
> >
> >> >> >Signed-off-by: Feifei Wang
> ><mailto:feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>
> >
> >> >> >Reviewed-by: Ruifeng Wang
> ><mailto:ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>
> >
> >> >> >---
> >
> >> >> > app/test-eventdev/test_pipeline_queue.c | 64
> >
> >> >> >+++++++++++++++++++++----
> >
> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
> >
> >> >> >
> >
> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
> >
> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
> >
> >> >100644
> >
> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c
> >
> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
> >
> >> >> >@@ -30,7 +30,13 @@
> >pipeline_queue_worker_single_stage_tx(void
> >
> >> >> >*arg)
> >
> >> >> >
> >
> >> >> >                    if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
> >
> >> >> >                                    pipeline_event_tx(dev, port,
> >> >> > &ev);
> >
> >> >> >-                                  w->processed_pkts++;
> >
> >> >> >+
> >
> >> >> >+                                 /* release barrier here ensures
> >> >> >+ stored
> >operation
> >
> >> >> >+                                 * of the event completes before
> >> >> >+ the number
> >of
> >
> >> >> >+                                 * processed pkts is visible to
> >> >> >+ the main core
> >
> >> >> >+                                 */
> >
> >> >> >+
> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >1,
> >
> >> >> >+
> >> >> >+ __ATOMIC_RELEASE);
> >
> >> >> >                    } else {
> >
> >> >> >                                    ev.queue_id++;
> >
> >> >> >                                    pipeline_fwd_event(&ev,
> >
> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >
> >> >> >@@ -59,7 +65,13 @@
> >
> >> >pipeline_queue_worker_single_stage_fwd(void
> >
> >> >> >*arg)
> >
> >> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
> >
> >> >> >                    pipeline_fwd_event(&ev,
> >> >> > RTE_SCHED_TYPE_ATOMIC);
> >
> >> >> >                    pipeline_event_enqueue(dev, port, &ev);
> >
> >> >> >-                  w->processed_pkts++;
> >
> >> >> >+
> >
> >> >> >+                 /* release barrier here ensures stored
> >> >> >+ operation
> >
> >> >> >+                 * of the event completes before the number of
> >
> >> >> >+                 * processed pkts is visible to the main core
> >
> >> >> >+                 */
> >
> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
> >
> >> >> >+
> >> >> >+ __ATOMIC_RELEASE);
> >
> >> >> >    }
> >
> >> >> >
> >
> >> >> >    return 0;
> >
> >> >> >@@ -84,7 +96,13 @@
> >
> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
> >
> >> >> >                                    if (ev[i].sched_type ==
> >
> >> >> >RTE_SCHED_TYPE_ATOMIC) {
> >
> >> >> >
> >> >> > pipeline_event_tx(dev, port, &ev[i]);
> >
> >> >> >                                                    ev[i].op =
> >> >> > RTE_EVENT_OP_RELEASE;
> >
> >> >> >-                                                  w->processed_pkts++;
> >
> >> >> >+
> >
> >> >> >+                                                 /* release
> >> >> >+ barrier here ensures stored
> >
> >> >> >operation
> >
> >> >> >+                                                 * of the event
> >> >> >+ completes before the
> >
> >> >> >number of
> >
> >> >> >+                                                 * processed
> >> >> >+ pkts is visible to the main
> >
> >> >> >core
> >
> >> >> >+                                                 */
> >
> >> >> >+
> >> >> >+ __atomic_fetch_add(&(w-
> >
> >> >> >>processed_pkts), 1,
> >
> >> >> >+
> >__ATOMIC_RELEASE);
> >
> >> >> >                                    } else {
> >
> >> >> >
> >> >> > ev[i].queue_id++;
> >
> >> >> >
> >> >> > pipeline_fwd_event(&ev[i],
> >
> >> >> >@@ -121,7 +139,13 @@
> >
> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
> >
> >> >> >                    }
> >
> >> >> >
> >
> >> >> >                    pipeline_event_enqueue_burst(dev, port, ev,
> >> >> > nb_rx);
> >
> >> >> >-                  w->processed_pkts += nb_rx;
> >
> >> >> >+
> >
> >> >> >+                 /* release barrier here ensures stored
> >> >> >+ operation
> >
> >> >> >+                 * of the event completes before the number of
> >
> >> >> >+                 * processed pkts is visible to the main core
> >
> >> >> >+                 */
> >
> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), nb_rx,
> >
> >> >> >+
> >> >> >+ __ATOMIC_RELEASE);
> >
> >> >> >    }
> >
> >> >> >
> >
> >> >> >    return 0;
> >
> >> >> >@@ -146,7 +170,13 @@
> >
> >> >pipeline_queue_worker_multi_stage_tx(void
> >
> >> >> >*arg)
> >
> >> >> >
> >
> >> >> >                    if (ev.queue_id == tx_queue[ev.mbuf->port]) {
> >
> >> >> >                                    pipeline_event_tx(dev, port,
> >> >> > &ev);
> >
> >> >> >-                                  w->processed_pkts++;
> >
> >> >> >+
> >
> >> >> >+                                 /* release barrier here ensures
> >> >> >+ stored
> >operation
> >
> >> >> >+                                 * of the event completes before
> >> >> >+ the number
> >of
> >
> >> >> >+                                 * processed pkts is visible to
> >> >> >+ the main core
> >
> >> >> >+                                 */
> >
> >> >> >+
> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >1,
> >
> >> >> >+
> >> >> >+ __ATOMIC_RELEASE);
> >
> >> >> >                                    continue;
> >
> >> >> >                    }
> >
> >> >> >
> >
> >> >> >@@ -180,7 +210,13 @@
> >
> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
> >
> >> >> >                                    ev.queue_id =
> >> >> > tx_queue[ev.mbuf->port];
> >
> >> >> >
> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >0);
> >
> >> >> >                                    pipeline_fwd_event(&ev,
> >
> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >
> >> >> >-                                  w->processed_pkts++;
> >
> >> >> >+
> >
> >> >> >+                                 /* release barrier here ensures
> >> >> >+ stored
> >operation
> >
> >> >> >+                                 * of the event completes before
> >> >> >+ the number
> >of
> >
> >> >> >+                                 * processed pkts is visible to
> >> >> >+ the main core
> >
> >> >> >+                                 */
> >
> >> >> >+
> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >1,
> >
> >> >> >+
> >> >> >+ __ATOMIC_RELEASE);
> >
> >> >> >                    } else {
> >
> >> >> >                                    ev.queue_id++;
> >
> >> >> >                                    pipeline_fwd_event(&ev,
> >
> >> >> >sched_type_list[cq_id]);
> >
> >> >> >@@ -214,7 +250,13 @@
> >
> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
> >
> >> >> >                                    if (ev[i].queue_id ==
> >> >> > tx_queue[ev[i].mbuf-
> >
> >> >> >>port]) {
> >
> >> >> >
> >> >> > pipeline_event_tx(dev, port, &ev[i]);
> >
> >> >> >                                                    ev[i].op =
> >> >> > RTE_EVENT_OP_RELEASE;
> >
> >> >> >-                                                  w->processed_pkts++;
> >
> >> >> >+
> >
> >> >> >+                                                 /* release
> >> >> >+ barrier here ensures stored
> >
> >> >> >operation
> >
> >> >> >+                                                 * of the event
> >> >> >+ completes before the
> >
> >> >> >number of
> >
> >> >> >+                                                 * processed
> >> >> >+ pkts is visible to the main
> >
> >> >> >core
> >
> >> >> >+                                                 */
> >
> >> >> >+
> >> >> >+ __atomic_fetch_add(&(w-
> >
> >> >> >>processed_pkts), 1,
> >
> >> >> >+
> >__ATOMIC_RELEASE);
> >
> >> >> >                                                    continue;
> >
> >> >> >                                    }
> >
> >> >> >
> >
> >> >> >@@ -254,7 +296,13 @@
> >
> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
> >
> >> >> >
> >
> >> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
> >
> >> >> >
> >> >> > pipeline_fwd_event(&ev[i],
> >
> >> >> >
> >
> >> >> >    RTE_SCHED_TYPE_ATOMIC);
> >
> >> >> >-                                                  w->processed_pkts++;
> >
> >> >> >+
> >
> >> >> >+                                                 /* release
> >> >> >+ barrier here ensures stored
> >
> >> >> >operation
> >
> >> >> >+                                                 * of the event
> >> >> >+ completes before the
> >
> >> >> >number of
> >
> >> >> >+                                                 * processed
> >> >> >+ pkts is visible to the main
> >
> >> >> >core
> >
> >> >> >+                                                 */
> >
> >> >> >+
> >> >> >+ __atomic_fetch_add(&(w-
> >
> >> >> >>processed_pkts), 1,
> >
> >> >> >+
> >__ATOMIC_RELEASE);
> >
> >> >> >                                    } else {
> >
> >> >> >
> >> >> > ev[i].queue_id++;
> >
> >> >> >
> >> >> > pipeline_fwd_event(&ev[i],
> >
> >> >> >--
> >
> >> >> >2.17.1
> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2021-01-08 10:44         ` [dpdk-dev] 回复: " Feifei Wang
@ 2021-01-08 10:58           ` Pavan Nikhilesh Bhagavatula
  2021-01-11  1:57             ` [dpdk-dev] 回复: " Feifei Wang
  2021-01-12  8:29             ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
  0 siblings, 2 replies; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2021-01-08 10:58 UTC (permalink / raw)
  To: Feifei Wang, Jerin Jacob Kollanukkaran, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd, nd, nd

Hi Feifei,

>Hi, Pavan
>
>> -----邮件原件-----
>> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
>> 发送时间: 2021年1月8日 17:13
>> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com;
>Harry
>> van Haaren <harry.van.haaren@intel.com>
>> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng Wang
>> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
>> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
>pipeline
>> test
>>
>> Hi Feifei,
>>
>> >Hi, Pavan
>> >
>> >
>> >
>> >> -----邮件原件-----
>> >
>> >> 发件人: Pavan Nikhilesh Bhagavatula
><mailto:pbhagavatula@marvell.com>
>> >
>> >> 发送时间: 2021年1月5日 17:29
>> >
>> >> 收件人: Feifei Wang <mailto:Feifei.Wang2@arm.com>;
>mailto:jerinj@marvell.com;
>> >Harry
>> >
>> >> van Haaren <mailto:harry.van.haaren@intel.com>
>> >
>> >> 抄送: mailto:dev@dpdk.org; nd <mailto:nd@arm.com>; Honnappa
>Nagarahalli
>> >
>> >> <mailto:Honnappa.Nagarahalli@arm.com>;
>mailto:stable@dpdk.org; Ruifeng Wang
>> >
>> >> <mailto:Ruifeng.Wang@arm.com>; nd <mailto:nd@arm.com>
>> >
>> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
>for
>> >pipeline
>> >
>> >> test
>> >
>> >>
>> >
>> >> Hi Feifei,
>> >
>> >>
>> >
>> >> >Hi, Pavan
>> >
>> >> >
>> >
>> >> >Sorry for my late reply and thanks very much for your review.
>> >
>> >> >
>> >
>> >> >> -----Original Message-----
>> >
>> >> >> From: Pavan Nikhilesh Bhagavatula
>>
>><mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.c
>om>>
>> >
>> >> >> Sent: 2020年12月22日 18:33
>> >
>> >> >> To: Feifei Wang
>> ><mailto:Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>;
>> >mailto:jerinj@marvell.com<mailto:jerinj@marvell.com>;
>> >
>> >> >Harry van
>> >
>> >> >> Haaren
>>
>><mailto:harry.van.haaren@intel.com<mailto:harry.van.haaren@intel.
>com>>;
>> >Pavan Nikhilesh
>> >
>> >> >>
>>
>><pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@cavium
>n
>> >etworks.com>>
>> >
>> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd
>> ><mailto:nd@arm.com<mailto:nd@arm.com>>; Honnappa
>Nagarahalli
>> >
>> >> >>
>>
>><Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@ar
>m
>> >.com>>; mailto:stable@dpdk.org<mailto:stable@dpdk.org>; Phil
>Yang
>> >
>> >> >> <mailto:Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>
>> >
>> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
>> >barriers
>> >
>> >> >for
>> >
>> >> >> pipeline test
>> >
>> >> >>
>> >
>> >> >>
>> >
>> >> >> >Add release barriers before updating the processed packets for
>> >
>> >> >worker
>> >
>> >> >> >lcores to ensure the worker lcore has really finished data
>> >
>> >> >> >processing and then it can update the processed packets
>number.
>> >
>> >> >> >
>> >
>> >> >>
>> >
>> >> >> I believe we can live with minor inaccuracies in stats being
>> >
>> >> >> presented
>> >
>> >> >as
>> >
>> >> >> atomics are pretty heavy when scheduler is limited to burst size
>> >> >> as
>> >1.
>> >
>> >> >>
>> >
>> >> >> One option is to move it before a pipeline operation
>> >
>> >> >(pipeline_event_tx,
>> >
>> >> >> pipeline_fwd_event etc.) as they imply implicit release barrier
>> >> >> (as
>> >
>> >> >> all
>> >
>> >> >the
>> >
>> >> >> changes done to the event should be visible to the next core).
>> >
>> >> >
>> >
>> >> >If I understand correctly, your meaning is that move release
>> >> >barriers
>> >
>> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure
>the
>> >
>> >> >event has been processed before the next core begins to tx/fwd.
>For
>> >
>> >> >example:
>> >
>> >>
>> >
>> >> What I meant was event APIs such as `rte_event_enqueue_burst`,
>> >
>> >> `rte_event_eth_tx_adapter_enqueue`
>> >
>> >> act as an implicit release barrier and the API
>> >`rte_event_dequeue_burst` act
>> >
>> >> as an implicit acquire barrier.
>> >
>> >
>> >
>> >>
>> >
>> >> Since, pipeline_* test starts with a dequeue() and ends with an
>> >enqueue() I
>> >
>> >> don’t believe we need barriers in Between.
>> >
>> >
>> >
>> >Sorry for my misunderstanding this. And I agree with you that no
>> >barriers are
>> >
>> >needed between dequeue and enqueue.
>> >
>> >
>> >
>> >Now, let's go back to the beginning. Actually with this patch, our
>> >barrier is mainly
>> >
>> >for the synchronous variable " w->processed_pkts ". As we all know,
>the
>> >event is firstly
>> >
>> >dequeued and then enqueued, after this, the event can be treated as
>the
>> >processed event
>> >
>> >and included in the statistics("w->processed_pkts++").
>> >
>> >
>> >
>> >Thus, we add a release barrier before " w->processed_pkts++" is to
>> >prevent this operation
>> >
>> >being executed ahead of time. For example:
>> >
>> >dequeue  ->  w->processed_pkts++  ->  enqueue
>> >
>> >This cause that the worker doesn't actually finish this event
>> >processing, but the event is treated
>> >
>> >as the processed one and included in the statistics.
>> >
>>
>> But the current sequence is dequeue-> enqueue-> w-
>>processed_pkts++
>> and enqueue already acts as an explicit release barrier right?
>>
>
>Sorry maybe I cannot understand how “enqueue” as an explicit release
>barrier. I think of two possibilities:
>1. As you say before, all the changes done to the event should be visible
>to the next core and enqueue is a operation for event, so the next core
>should wait for the event to be enqueued.
>I think this is due to data dependence for the same variable. However,
>‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot
>prevent ‘w->processed_pkts++’ before enqueue.
>And the main core may load updated ‘w->processed_pkts’ but actually
>the event is still being processed. For example:
>
> Time Slot	   Worker 1                                      Main core
>   1                           dequeue
>   2                  w->processed_pkts++
>   3                       		                              load w->processed_pkts
>   4                          enqueue
>
>2. Some release barriers have been included in enqueue. There is a
>release barrier in rte_ring_enqueue :
>move head -> copy elements to the ring -> release barrier -> update tail
>-> w->processed_pkts++
>However, this barrier cannot prevent ‘w->processed_pkts++’ before
>update tail, and when update_tail has been finished, the enqueue
>process  can be seen completed.

I was talking about case 2 in particular almost all enqueue calls have some kind of 
release barrier in place. I do agree w->processed_pkts++ might get reordered with 
tail update but since enqueue itself is a ldr + blr I was hoping that it wouldn't occur.

We can continue the discussion once I have some performance data.

Thanks for your patience :) 
Pavan.

>
>> >
>> >
>>
>>_________________________________________________________
>> _
>> >____________________
>> >
>> >
>> >
>> >By the way, I have two other questions about pipeline process test in
>> >"test_pipeline_queue".
>> >
>> >1. when do we start counting processed events (w-
>>processed_pkts)?
>> >
>> >For the fwd mode (internal_port = false), when we choose single
>stage,
>> >application increments
>> >
>> >the number events processed after "pipeline_event enqueue".
>> >However, when we choose multiple
>> >
>> >stage, application increments the number events processed before
>> >"pipleline_event_enqueue".
>>
>> We count an event as process when all the stages have completed
>and its
>> Trasnmitted.
>>
>> >So,
>> >
>> >maybe we can unify this. For example of multiple stage:
>> >
>> >
>> >
>> >                                if (cq_id == last_queue) {
>> >
>> >                                                ev.queue_id =
>> > tx_queue[ev.mbuf->port];
>> >
>> >
>> >rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>> >0);
>> >
>> >                                                pipeline_fwd_event(&ev,
>> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >                                +             pipeline_event_enqueue(dev, port, &ev);
>> >
>> >                                                w->processed_pkts++;
>> >
>> >                                } else {
>> >
>> >                                                ev.queue_id++;
>> >
>> >                                                pipeline_fwd_event(&ev,
>> >sched_type_list[cq_id]);
>> >
>> >                                +             pipeline_event_enqueue(dev, port, &ev);
>> >
>> >                                }
>> >
>> >
>> >
>> >                -              pipeline_event_enqueue(dev, port, &ev);
>> >
>> >
>>
>> The above change makes sense.
>>
>Thanks for your review, and I’ll update this change into the next
>version.
>> >
>> >2. Whether  "pipeline_event_enqueue" is needed after
>> >"pipeline_event_tx" for tx mode?
>> >
>> >For single_stage_burst_tx mode, after "pipeline_event_tx", the
>worker
>> >has to enqueue again
>> >
>> >due to  "pipeline_event_enqueue_burst", so maybe we should jump
>out of
>> >the loop after
>> >
>> >“pipeline_event_tx”,
>>
>> We call enqueue burst to release the events i.e. enqueue events with
>> RTE_EVENT_OP_RELEASE.
>>
>However,
>In case of single event, for ' pipeline_queue_worker_single_stage_tx'
>and ' pipeline_queue_worker_multi_stage_tx',
>after tx, there is no release operation.
>
>> >
>>  for example:
>> >
>> >
>> >
>> >                                                if (ev[i].sched_type ==
>> >RTE_SCHED_TYPE_ATOMIC) {
>> >
>> >
>> > pipeline_event_tx(dev, port, &ev[i]);
>> >
>> >
>> > ev[i].op = RTE_EVENT_OP_RELEASE;
>> >
>> >
>> > w->processed_pkts++;
>> >
>> >                                                +             continue;
>> >
>> >                                                } else {
>> >
>> >
>> > ev[i].queue_id++;
>> >
>> >
>> > pipeline_fwd_event(&ev[i],
>> >
>> >
>> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >                                                }
>> >
>> >                                }
>> >
>> >
>> >
>> >                                pipeline_event_enqueue_burst(dev, port,
>> > ev, nb_rx);
>> >
>> >
>> >
>> >
>> >
>> >>
>> >
>> >> >
>> >
>> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>> >
>> >> >                          +
>> >__atomic_thread_fence(__ATOMIC_RELEASE);
>> >
>> >> >                                          pipeline_event_tx(dev,
>> >> > port, &ev);
>> >
>> >> >                                          w->processed_pkts++;
>> >
>> >> >                          } else {
>> >
>> >> >                                          ev.queue_id++;
>> >
>> >> >                          +
>> >__atomic_thread_fence(__ATOMIC_RELEASE);
>> >
>> >> >                                          pipeline_fwd_event(&ev,
>> >
>> >> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >
>> >> > pipeline_event_enqueue(dev, port, &ev);
>> >
>> >> >
>> >
>> >> >However, there are two reasons to prevent this:
>> >
>> >> >
>> >
>> >> >First, compare with other tests in app/eventdev, for example, the
>> >
>> >> >eventdev perf test, the wmb is after event operation to ensure
>> >
>> >> >operation has been finished and then w->processed_pkts++.
>> >
>> >>
>> >
>> >> In case of perf_* tests start with a dequeue() and finally ends with
>> >> a
>> >
>> >> mempool_put() should also act as implicit acquire release pairs
>> >making stats
>> >
>> >> consistent?
>> >
>> >
>> >
>> >For perf tests, this consistency refers to that there is a wmb after
>> >mempool_put().
>> >
>> >Please refer to this link:
>> >
>> >https://urldefense.proofpoint.com/v2/url?u=http-
>>
>>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0m
>O
>>
>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=z
>g
>> >QHeSDiXWfI1PIIUxXBqMS6E-
>2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
>> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
>> >
>> >
>> >
>> >>
>> >
>> >> >So, if we move release barriers before tx/fwd, it may cause that
>the
>> >
>> >> >tests of app/eventdev become  inconsistent.This may reduce the
>> >
>> >> >maintainability of the code and make it difficult to understand.
>> >
>> >> >
>> >
>> >> >Second, it is a test case, though heavy thread may cause
>> >performance
>> >
>> >> >degradation, it can ensure that the operation process and the test
>> >
>> >> >result are correct. And maybe for a test case, correctness is more
>> >
>> >> >important than performance.
>> >
>> >> >
>> >
>> >>
>> >
>> >> Most of our internal perf test run on 24/48 core combinations and
>> >since
>> >
>> >> Octeontx2 event device driver supports a burst size of 1, it will
>> >> show
>> >up as
>> >
>> >> Huge performance degradation.
>> >
>> >
>> >
>> >For the impact on performance, I do the test using software driver,
>> >following are some test results:
>> >
>> >-----------------------------------------------------------------------
>> >----------------
>> >---------------------------------------------
>> >
>> >Architecture: aarch64
>> >
>> >Nics: ixgbe-82599
>> >
>> >CPU: Cortex-A72
>> >
>> >BURST_SIZE: 1
>> >
>> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
>> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
>> >
>> >Flow: one flow, 64bits package, TX rate: 1.4Mpps
>> >
>> >
>> >
>> >Without this patch:
>> >
>> >0.954 mpps avg 0.953 mpps
>> >
>> >
>> >
>> >With this patch:
>> >
>> >0.932 mpps avg 0.930 mpps
>> >
>> >-----------------------------------------------------------------------
>> >----------------
>> >---------------------------------------------
>> >
>> >
>> >
>> >Based on the result above, there is no significant performance
>> >degradation with this patch.
>> >
>> >This is because the release barrier is only for  “w-
>>processed_pkts++”.
>> >It just ensures that the worker core
>> >
>> >increments the number events processed after enqueue, and it
>doesn’t
>> >affect dequeue/enqueue:
>> >
>> >
>> >
>> >dequeue -> enqueue -> release barrier -> w->processed_pkts++
>> >
>>
>> Here enqueue already acts as an explicit release barrier.
>>
>Please refer above reasons.
>
>> >
>> >
>> >On the other hand, I infer the reason for the slight decrease in
>> >measurement performance is that the release barrier
>> >
>> >prevent “w->processed_pkts++” before that the event has been
>processed
>> >(enqueue). But I think this test result is closer
>> >
>> >to the real performance.
>> >
>> >And sorry for that we have no octentx2 device, so there is no test
>> >result on Octeontx2 event device driver. Would you please
>> >
>> >help us test this patch on octentx2 when you are convenient. Thanks
>> >very much.
>> >
>>
>> I will report the performance numbers on Monday.
>>
>
>That’s great, Thanks very much for your help.
>
>Best Regards
>Feifei
>
>> >
>> >
>> >Best Regards
>> >
>> >Feifei
>>
>> Regards,
>> Pavan.
>>
>> >
>> >
>> >
>> >>
>> >
>> >> >So, due to two reasons above, I'm ambivalent about how we
>should
>> >do in
>> >
>> >> >the next step.
>> >
>> >> >
>> >
>> >> >Best Regards
>> >
>> >> >Feifei
>> >
>> >>
>> >
>> >> Regards,
>> >
>> >> Pavan.
>> >
>> >>
>> >
>> >> >
>> >
>> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue
>worker
>> >
>> >> >> >functions")
>> >
>> >> >> >Cc:
>>
>>mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.co
>m>
>> >
>> >> >> >Cc: mailto:stable@dpdk.org<mailto:stable@dpdk.org>
>> >
>> >> >> >
>> >
>> >> >> >Signed-off-by: Phil Yang
>> ><mailto:phil.yang@arm.com<mailto:phil.yang@arm.com>>
>> >
>> >> >> >Signed-off-by: Feifei Wang
>> ><mailto:feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>
>> >
>> >> >> >Reviewed-by: Ruifeng Wang
>> ><mailto:ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>
>> >
>> >> >> >---
>> >
>> >> >> > app/test-eventdev/test_pipeline_queue.c | 64
>> >
>> >> >> >+++++++++++++++++++++----
>> >
>> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
>> >
>> >> >> >
>> >
>> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c
>b/app/test-
>> >
>> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>> >
>> >> >100644
>> >
>> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c
>> >
>> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
>> >
>> >> >> >@@ -30,7 +30,13 @@
>> >pipeline_queue_worker_single_stage_tx(void
>> >
>> >> >> >*arg)
>> >
>> >> >> >
>> >
>> >> >> >                    if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>> >
>> >> >> >                                    pipeline_event_tx(dev, port,
>> >> >> > &ev);
>> >
>> >> >> >-                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                 /* release barrier here ensures
>> >> >> >+ stored
>> >operation
>> >
>> >> >> >+                                 * of the event completes before
>> >> >> >+ the number
>> >of
>> >
>> >> >> >+                                 * processed pkts is visible to
>> >> >> >+ the main core
>> >
>> >> >> >+                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>> >1,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >                    } else {
>> >
>> >> >> >                                    ev.queue_id++;
>> >
>> >> >> >                                    pipeline_fwd_event(&ev,
>> >
>> >> >> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >> >@@ -59,7 +65,13 @@
>> >
>> >> >pipeline_queue_worker_single_stage_fwd(void
>> >
>> >> >> >*arg)
>> >
>> >> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>> >
>> >> >> >                    pipeline_fwd_event(&ev,
>> >> >> > RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >> >                    pipeline_event_enqueue(dev, port, &ev);
>> >
>> >> >> >-                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                 /* release barrier here ensures stored
>> >> >> >+ operation
>> >
>> >> >> >+                 * of the event completes before the number of
>> >
>> >> >> >+                 * processed pkts is visible to the main core
>> >
>> >> >> >+                 */
>> >
>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >    }
>> >
>> >> >> >
>> >
>> >> >> >    return 0;
>> >
>> >> >> >@@ -84,7 +96,13 @@
>> >
>> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>> >
>> >> >> >                                    if (ev[i].sched_type ==
>> >
>> >> >> >RTE_SCHED_TYPE_ATOMIC) {
>> >
>> >> >> >
>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
>> >
>> >> >> >                                                    ev[i].op =
>> >> >> > RTE_EVENT_OP_RELEASE;
>> >
>> >> >> >-                                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                                 /* release
>> >> >> >+ barrier here ensures stored
>> >
>> >> >> >operation
>> >
>> >> >> >+                                                 * of the event
>> >> >> >+ completes before the
>> >
>> >> >> >number of
>> >
>> >> >> >+                                                 * processed
>> >> >> >+ pkts is visible to the main
>> >
>> >> >> >core
>> >
>> >> >> >+                                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w-
>> >
>> >> >> >>processed_pkts), 1,
>> >
>> >> >> >+
>> >__ATOMIC_RELEASE);
>> >
>> >> >> >                                    } else {
>> >
>> >> >> >
>> >> >> > ev[i].queue_id++;
>> >
>> >> >> >
>> >> >> > pipeline_fwd_event(&ev[i],
>> >
>> >> >> >@@ -121,7 +139,13 @@
>> >
>> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>> >
>> >> >> >                    }
>> >
>> >> >> >
>> >
>> >> >> >                    pipeline_event_enqueue_burst(dev, port, ev,
>> >> >> > nb_rx);
>> >
>> >> >> >-                  w->processed_pkts += nb_rx;
>> >
>> >> >> >+
>> >
>> >> >> >+                 /* release barrier here ensures stored
>> >> >> >+ operation
>> >
>> >> >> >+                 * of the event completes before the number of
>> >
>> >> >> >+                 * processed pkts is visible to the main core
>> >
>> >> >> >+                 */
>> >
>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), nb_rx,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >    }
>> >
>> >> >> >
>> >
>> >> >> >    return 0;
>> >
>> >> >> >@@ -146,7 +170,13 @@
>> >
>> >> >pipeline_queue_worker_multi_stage_tx(void
>> >
>> >> >> >*arg)
>> >
>> >> >> >
>> >
>> >> >> >                    if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>> >
>> >> >> >                                    pipeline_event_tx(dev, port,
>> >> >> > &ev);
>> >
>> >> >> >-                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                 /* release barrier here ensures
>> >> >> >+ stored
>> >operation
>> >
>> >> >> >+                                 * of the event completes before
>> >> >> >+ the number
>> >of
>> >
>> >> >> >+                                 * processed pkts is visible to
>> >> >> >+ the main core
>> >
>> >> >> >+                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>> >1,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >                                    continue;
>> >
>> >> >> >                    }
>> >
>> >> >> >
>> >
>> >> >> >@@ -180,7 +210,13 @@
>> >
>> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>> >
>> >> >> >                                    ev.queue_id =
>> >> >> > tx_queue[ev.mbuf->port];
>> >
>> >> >> >
>> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>> >0);
>> >
>> >> >> >                                    pipeline_fwd_event(&ev,
>> >
>> >> >> >RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >> >-                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                 /* release barrier here ensures
>> >> >> >+ stored
>> >operation
>> >
>> >> >> >+                                 * of the event completes before
>> >> >> >+ the number
>> >of
>> >
>> >> >> >+                                 * processed pkts is visible to
>> >> >> >+ the main core
>> >
>> >> >> >+                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>> >1,
>> >
>> >> >> >+
>> >> >> >+ __ATOMIC_RELEASE);
>> >
>> >> >> >                    } else {
>> >
>> >> >> >                                    ev.queue_id++;
>> >
>> >> >> >                                    pipeline_fwd_event(&ev,
>> >
>> >> >> >sched_type_list[cq_id]);
>> >
>> >> >> >@@ -214,7 +250,13 @@
>> >
>> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>> >
>> >> >> >                                    if (ev[i].queue_id ==
>> >> >> > tx_queue[ev[i].mbuf-
>> >
>> >> >> >>port]) {
>> >
>> >> >> >
>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
>> >
>> >> >> >                                                    ev[i].op =
>> >> >> > RTE_EVENT_OP_RELEASE;
>> >
>> >> >> >-                                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                                 /* release
>> >> >> >+ barrier here ensures stored
>> >
>> >> >> >operation
>> >
>> >> >> >+                                                 * of the event
>> >> >> >+ completes before the
>> >
>> >> >> >number of
>> >
>> >> >> >+                                                 * processed
>> >> >> >+ pkts is visible to the main
>> >
>> >> >> >core
>> >
>> >> >> >+                                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w-
>> >
>> >> >> >>processed_pkts), 1,
>> >
>> >> >> >+
>> >__ATOMIC_RELEASE);
>> >
>> >> >> >                                                    continue;
>> >
>> >> >> >                                    }
>> >
>> >> >> >
>> >
>> >> >> >@@ -254,7 +296,13 @@
>> >
>> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>> >
>> >> >> >
>> >
>> >> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>> >
>> >> >> >
>> >> >> > pipeline_fwd_event(&ev[i],
>> >
>> >> >> >
>> >
>> >> >> >    RTE_SCHED_TYPE_ATOMIC);
>> >
>> >> >> >-                                                  w->processed_pkts++;
>> >
>> >> >> >+
>> >
>> >> >> >+                                                 /* release
>> >> >> >+ barrier here ensures stored
>> >
>> >> >> >operation
>> >
>> >> >> >+                                                 * of the event
>> >> >> >+ completes before the
>> >
>> >> >> >number of
>> >
>> >> >> >+                                                 * processed
>> >> >> >+ pkts is visible to the main
>> >
>> >> >> >core
>> >
>> >> >> >+                                                 */
>> >
>> >> >> >+
>> >> >> >+ __atomic_fetch_add(&(w-
>> >
>> >> >> >>processed_pkts), 1,
>> >
>> >> >> >+
>> >__ATOMIC_RELEASE);
>> >
>> >> >> >                                    } else {
>> >
>> >> >> >
>> >> >> > ev[i].queue_id++;
>> >
>> >> >> >
>> >> >> > pipeline_fwd_event(&ev[i],
>> >
>> >> >> >--
>> >
>> >> >> >2.17.1
>> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] 回复: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2021-01-08 10:58           ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
@ 2021-01-11  1:57             ` Feifei Wang
  2021-01-12  8:29             ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
  1 sibling, 0 replies; 12+ messages in thread
From: Feifei Wang @ 2021-01-11  1:57 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, jerinj, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd, nd, nd, nd



> -----邮件原件-----
> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> 发送时间: 2021年1月8日 18:58
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com; Harry
> van Haaren <harry.van.haaren@intel.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>; nd
> <nd@arm.com>
> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline
> test
> 
> Hi Feifei,
> 
> >Hi, Pavan
> >
> >> -----邮件原件-----
> >> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> >> 发送时间: 2021年1月8日 17:13
> >> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com;
> >Harry
> >> van Haaren <harry.van.haaren@intel.com>
> >> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> >> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng Wang
> >> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
> >pipeline
> >> test
> >>
> >> Hi Feifei,
> >>
> >> >Hi, Pavan
> >> >
> >> >
> >> >
> >> >> -----邮件原件-----
> >> >
> >> >> 发件人: Pavan Nikhilesh Bhagavatula
> ><mailto:pbhagavatula@marvell.com>
> >> >
> >> >> 发送时间: 2021年1月5日 17:29
> >> >
> >> >> 收件人: Feifei Wang <mailto:Feifei.Wang2@arm.com>;
> >mailto:jerinj@marvell.com;
> >> >Harry
> >> >
> >> >> van Haaren <mailto:harry.van.haaren@intel.com>
> >> >
> >> >> 抄送: mailto:dev@dpdk.org; nd <mailto:nd@arm.com>; Honnappa
> >Nagarahalli
> >> >
> >> >> <mailto:Honnappa.Nagarahalli@arm.com>;
> >mailto:stable@dpdk.org; Ruifeng Wang
> >> >
> >> >> <mailto:Ruifeng.Wang@arm.com>; nd <mailto:nd@arm.com>
> >> >
> >> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
> >for
> >> >pipeline
> >> >
> >> >> test
> >> >
> >> >>
> >> >
> >> >> Hi Feifei,
> >> >
> >> >>
> >> >
> >> >> >Hi, Pavan
> >> >
> >> >> >
> >> >
> >> >> >Sorry for my late reply and thanks very much for your review.
> >> >
> >> >> >
> >> >
> >> >> >> -----Original Message-----
> >> >
> >> >> >> From: Pavan Nikhilesh Bhagavatula
> >>
> >><mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.c
> >om>>
> >> >
> >> >> >> Sent: 2020年12月22日 18:33
> >> >
> >> >> >> To: Feifei Wang
> >> ><mailto:Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>;
> >> >mailto:jerinj@marvell.com<mailto:jerinj@marvell.com>;
> >> >
> >> >> >Harry van
> >> >
> >> >> >> Haaren
> >>
> >><mailto:harry.van.haaren@intel.com<mailto:harry.van.haaren@intel.
> >com>>;
> >> >Pavan Nikhilesh
> >> >
> >> >> >>
> >>
> >><pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@cavium
> >n
> >> >etworks.com>>
> >> >
> >> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd
> >> ><mailto:nd@arm.com<mailto:nd@arm.com>>; Honnappa
> >Nagarahalli
> >> >
> >> >> >>
> >>
> >><Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@ar
> >m
> >> >.com>>; mailto:stable@dpdk.org<mailto:stable@dpdk.org>; Phil
> >Yang
> >> >
> >> >> >> <mailto:Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>
> >> >
> >> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
> >> >barriers
> >> >
> >> >> >for
> >> >
> >> >> >> pipeline test
> >> >
> >> >> >>
> >> >
> >> >> >>
> >> >
> >> >> >> >Add release barriers before updating the processed packets for
> >> >
> >> >> >worker
> >> >
> >> >> >> >lcores to ensure the worker lcore has really finished data
> >> >
> >> >> >> >processing and then it can update the processed packets
> >number.
> >> >
> >> >> >> >
> >> >
> >> >> >>
> >> >
> >> >> >> I believe we can live with minor inaccuracies in stats being
> >> >
> >> >> >> presented
> >> >
> >> >> >as
> >> >
> >> >> >> atomics are pretty heavy when scheduler is limited to burst
> >> >> >> size as
> >> >1.
> >> >
> >> >> >>
> >> >
> >> >> >> One option is to move it before a pipeline operation
> >> >
> >> >> >(pipeline_event_tx,
> >> >
> >> >> >> pipeline_fwd_event etc.) as they imply implicit release barrier
> >> >> >> (as
> >> >
> >> >> >> all
> >> >
> >> >> >the
> >> >
> >> >> >> changes done to the event should be visible to the next core).
> >> >
> >> >> >
> >> >
> >> >> >If I understand correctly, your meaning is that move release
> >> >> >barriers
> >> >
> >> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure
> >the
> >> >
> >> >> >event has been processed before the next core begins to tx/fwd.
> >For
> >> >
> >> >> >example:
> >> >
> >> >>
> >> >
> >> >> What I meant was event APIs such as `rte_event_enqueue_burst`,
> >> >
> >> >> `rte_event_eth_tx_adapter_enqueue`
> >> >
> >> >> act as an implicit release barrier and the API
> >> >`rte_event_dequeue_burst` act
> >> >
> >> >> as an implicit acquire barrier.
> >> >
> >> >
> >> >
> >> >>
> >> >
> >> >> Since, pipeline_* test starts with a dequeue() and ends with an
> >> >enqueue() I
> >> >
> >> >> don’t believe we need barriers in Between.
> >> >
> >> >
> >> >
> >> >Sorry for my misunderstanding this. And I agree with you that no
> >> >barriers are
> >> >
> >> >needed between dequeue and enqueue.
> >> >
> >> >
> >> >
> >> >Now, let's go back to the beginning. Actually with this patch, our
> >> >barrier is mainly
> >> >
> >> >for the synchronous variable " w->processed_pkts ". As we all know,
> >the
> >> >event is firstly
> >> >
> >> >dequeued and then enqueued, after this, the event can be treated as
> >the
> >> >processed event
> >> >
> >> >and included in the statistics("w->processed_pkts++").
> >> >
> >> >
> >> >
> >> >Thus, we add a release barrier before " w->processed_pkts++" is to
> >> >prevent this operation
> >> >
> >> >being executed ahead of time. For example:
> >> >
> >> >dequeue  ->  w->processed_pkts++  ->  enqueue
> >> >
> >> >This cause that the worker doesn't actually finish this event
> >> >processing, but the event is treated
> >> >
> >> >as the processed one and included in the statistics.
> >> >
> >>
> >> But the current sequence is dequeue-> enqueue-> w- processed_pkts++
> >>and enqueue already acts as an explicit release barrier right?
> >>
> >
> >Sorry maybe I cannot understand how “enqueue” as an explicit release
> >barrier. I think of two possibilities:
> >1. As you say before, all the changes done to the event should be
> >visible to the next core and enqueue is a operation for event, so the
> >next core should wait for the event to be enqueued.
> >I think this is due to data dependence for the same variable. However,
> >‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot
> >prevent ‘w->processed_pkts++’ before enqueue.
> >And the main core may load updated ‘w->processed_pkts’ but actually the
> >event is still being processed. For example:
> >
> > Time Slot	   Worker 1                                      Main core
> >   1                           dequeue
> >   2                  w->processed_pkts++
> >   3                       		                              load w->processed_pkts
> >   4                          enqueue
> >
> >2. Some release barriers have been included in enqueue. There is a
> >release barrier in rte_ring_enqueue :
> >move head -> copy elements to the ring -> release barrier -> update
> >tail
> >-> w->processed_pkts++
> >However, this barrier cannot prevent ‘w->processed_pkts++’ before
> >update tail, and when update_tail has been finished, the enqueue
> >process  can be seen completed.
> 
> I was talking about case 2 in particular almost all enqueue calls have some
> kind of release barrier in place. I do agree w->processed_pkts++ might get
> reordered with tail update but since enqueue itself is a ldr + blr I was hoping
> that it wouldn't occur.
> 
> We can continue the discussion once I have some performance data.
> 
Ok, that's great. I think this is a meaningful discussion. Thanks for your effort~.

Best Regards
Feifei

> Thanks for your patience :)
> Pavan.
> 
> >
> >> >
> >> >
> >>
> >>________________________________________________________
> _
> >> _
> >> >____________________
> >> >
> >> >
> >> >
> >> >By the way, I have two other questions about pipeline process test
> >> >in "test_pipeline_queue".
> >> >
> >> >1. when do we start counting processed events (w-
> >>processed_pkts)?
> >> >
> >> >For the fwd mode (internal_port = false), when we choose single
> >stage,
> >> >application increments
> >> >
> >> >the number events processed after "pipeline_event enqueue".
> >> >However, when we choose multiple
> >> >
> >> >stage, application increments the number events processed before
> >> >"pipleline_event_enqueue".
> >>
> >> We count an event as process when all the stages have completed
> >and its
> >> Trasnmitted.
> >>
> >> >So,
> >> >
> >> >maybe we can unify this. For example of multiple stage:
> >> >
> >> >
> >> >
> >> >                                if (cq_id == last_queue) {
> >> >
> >> >                                                ev.queue_id =
> >> > tx_queue[ev.mbuf->port];
> >> >
> >> >
> >> >rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >> >0);
> >> >
> >> >
> >> >pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
> >> >
> >> >                                +             pipeline_event_enqueue(dev, port, &ev);
> >> >
> >> >                                                w->processed_pkts++;
> >> >
> >> >                                } else {
> >> >
> >> >                                                ev.queue_id++;
> >> >
> >> >
> >> >pipeline_fwd_event(&ev, sched_type_list[cq_id]);
> >> >
> >> >                                +             pipeline_event_enqueue(dev, port, &ev);
> >> >
> >> >                                }
> >> >
> >> >
> >> >
> >> >                -              pipeline_event_enqueue(dev, port, &ev);
> >> >
> >> >
> >>
> >> The above change makes sense.
> >>
> >Thanks for your review, and I’ll update this change into the next
> >version.
> >> >
> >> >2. Whether  "pipeline_event_enqueue" is needed after
> >> >"pipeline_event_tx" for tx mode?
> >> >
> >> >For single_stage_burst_tx mode, after "pipeline_event_tx", the
> >worker
> >> >has to enqueue again
> >> >
> >> >due to  "pipeline_event_enqueue_burst", so maybe we should jump
> >out of
> >> >the loop after
> >> >
> >> >“pipeline_event_tx”,
> >>
> >> We call enqueue burst to release the events i.e. enqueue events with
> >> RTE_EVENT_OP_RELEASE.
> >>
> >However,
> >In case of single event, for ' pipeline_queue_worker_single_stage_tx'
> >and ' pipeline_queue_worker_multi_stage_tx',
> >after tx, there is no release operation.
> >
> >> >
> >>  for example:
> >> >
> >> >
> >> >
> >> >                                                if (ev[i].sched_type
> >> >==
> >> >RTE_SCHED_TYPE_ATOMIC) {
> >> >
> >> >
> >> > pipeline_event_tx(dev, port, &ev[i]);
> >> >
> >> >
> >> > ev[i].op = RTE_EVENT_OP_RELEASE;
> >> >
> >> >
> >> > w->processed_pkts++;
> >> >
> >> >                                                +             continue;
> >> >
> >> >                                                } else {
> >> >
> >> >
> >> > ev[i].queue_id++;
> >> >
> >> >
> >> > pipeline_fwd_event(&ev[i],
> >> >
> >> >
> >> >RTE_SCHED_TYPE_ATOMIC);
> >> >
> >> >                                                }
> >> >
> >> >                                }
> >> >
> >> >
> >> >
> >> >                                pipeline_event_enqueue_burst(dev,
> >> > port, ev, nb_rx);
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >>
> >> >
> >> >> >
> >> >
> >> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
> >> >
> >> >> >                          +
> >> >__atomic_thread_fence(__ATOMIC_RELEASE);
> >> >
> >> >> >                                          pipeline_event_tx(dev,
> >> >> > port, &ev);
> >> >
> >> >> >                                          w->processed_pkts++;
> >> >
> >> >> >                          } else {
> >> >
> >> >> >                                          ev.queue_id++;
> >> >
> >> >> >                          +
> >> >__atomic_thread_fence(__ATOMIC_RELEASE);
> >> >
> >> >> >                                          pipeline_fwd_event(&ev,
> >> >
> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >> >
> >> >> >
> >> >> > pipeline_event_enqueue(dev, port, &ev);
> >> >
> >> >> >
> >> >
> >> >> >However, there are two reasons to prevent this:
> >> >
> >> >> >
> >> >
> >> >> >First, compare with other tests in app/eventdev, for example, the
> >> >
> >> >> >eventdev perf test, the wmb is after event operation to ensure
> >> >
> >> >> >operation has been finished and then w->processed_pkts++.
> >> >
> >> >>
> >> >
> >> >> In case of perf_* tests start with a dequeue() and finally ends
> >> >> with a
> >> >
> >> >> mempool_put() should also act as implicit acquire release pairs
> >> >making stats
> >> >
> >> >> consistent?
> >> >
> >> >
> >> >
> >> >For perf tests, this consistency refers to that there is a wmb after
> >> >mempool_put().
> >> >
> >> >Please refer to this link:
> >> >
> >> >https://urldefense.proofpoint.com/v2/url?u=http-
> >>
> >>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0m
> >O
> >>
> >>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=z
> >g
> >> >QHeSDiXWfI1PIIUxXBqMS6E-
> >2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
> >> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
> >> >
> >> >
> >> >
> >> >>
> >> >
> >> >> >So, if we move release barriers before tx/fwd, it may cause that
> >the
> >> >
> >> >> >tests of app/eventdev become  inconsistent.This may reduce the
> >> >
> >> >> >maintainability of the code and make it difficult to understand.
> >> >
> >> >> >
> >> >
> >> >> >Second, it is a test case, though heavy thread may cause
> >> >performance
> >> >
> >> >> >degradation, it can ensure that the operation process and the
> >> >> >test
> >> >
> >> >> >result are correct. And maybe for a test case, correctness is
> >> >> >more
> >> >
> >> >> >important than performance.
> >> >
> >> >> >
> >> >
> >> >>
> >> >
> >> >> Most of our internal perf test run on 24/48 core combinations and
> >> >since
> >> >
> >> >> Octeontx2 event device driver supports a burst size of 1, it will
> >> >> show
> >> >up as
> >> >
> >> >> Huge performance degradation.
> >> >
> >> >
> >> >
> >> >For the impact on performance, I do the test using software driver,
> >> >following are some test results:
> >> >
> >> >--------------------------------------------------------------------
> >> >---
> >> >----------------
> >> >---------------------------------------------
> >> >
> >> >Architecture: aarch64
> >> >
> >> >Nics: ixgbe-82599
> >> >
> >> >CPU: Cortex-A72
> >> >
> >> >BURST_SIZE: 1
> >> >
> >> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
> >> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
> >> >
> >> >Flow: one flow, 64bits package, TX rate: 1.4Mpps
> >> >
> >> >
> >> >
> >> >Without this patch:
> >> >
> >> >0.954 mpps avg 0.953 mpps
> >> >
> >> >
> >> >
> >> >With this patch:
> >> >
> >> >0.932 mpps avg 0.930 mpps
> >> >
> >> >--------------------------------------------------------------------
> >> >---
> >> >----------------
> >> >---------------------------------------------
> >> >
> >> >
> >> >
> >> >Based on the result above, there is no significant performance
> >> >degradation with this patch.
> >> >
> >> >This is because the release barrier is only for  “w-
> >>processed_pkts++”.
> >> >It just ensures that the worker core
> >> >
> >> >increments the number events processed after enqueue, and it
> >doesn’t
> >> >affect dequeue/enqueue:
> >> >
> >> >
> >> >
> >> >dequeue -> enqueue -> release barrier -> w->processed_pkts++
> >> >
> >>
> >> Here enqueue already acts as an explicit release barrier.
> >>
> >Please refer above reasons.
> >
> >> >
> >> >
> >> >On the other hand, I infer the reason for the slight decrease in
> >> >measurement performance is that the release barrier
> >> >
> >> >prevent “w->processed_pkts++” before that the event has been
> >processed
> >> >(enqueue). But I think this test result is closer
> >> >
> >> >to the real performance.
> >> >
> >> >And sorry for that we have no octentx2 device, so there is no test
> >> >result on Octeontx2 event device driver. Would you please
> >> >
> >> >help us test this patch on octentx2 when you are convenient. Thanks
> >> >very much.
> >> >
> >>
> >> I will report the performance numbers on Monday.
> >>
> >
> >That’s great, Thanks very much for your help.
> >
> >Best Regards
> >Feifei
> >
> >> >
> >> >
> >> >Best Regards
> >> >
> >> >Feifei
> >>
> >> Regards,
> >> Pavan.
> >>
> >> >
> >> >
> >> >
> >> >>
> >> >
> >> >> >So, due to two reasons above, I'm ambivalent about how we
> >should
> >> >do in
> >> >
> >> >> >the next step.
> >> >
> >> >> >
> >> >
> >> >> >Best Regards
> >> >
> >> >> >Feifei
> >> >
> >> >>
> >> >
> >> >> Regards,
> >> >
> >> >> Pavan.
> >> >
> >> >>
> >> >
> >> >> >
> >> >
> >> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue
> >worker
> >> >
> >> >> >> >functions")
> >> >
> >> >> >> >Cc:
> >>
> >>mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.co
> >m>
> >> >
> >> >> >> >Cc: mailto:stable@dpdk.org<mailto:stable@dpdk.org>
> >> >
> >> >> >> >
> >> >
> >> >> >> >Signed-off-by: Phil Yang
> >> ><mailto:phil.yang@arm.com<mailto:phil.yang@arm.com>>
> >> >
> >> >> >> >Signed-off-by: Feifei Wang
> >> ><mailto:feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>
> >> >
> >> >> >> >Reviewed-by: Ruifeng Wang
> >> ><mailto:ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>
> >> >
> >> >> >> >---
> >> >
> >> >> >> > app/test-eventdev/test_pipeline_queue.c | 64
> >> >
> >> >> >> >+++++++++++++++++++++----
> >> >
> >> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
> >> >
> >> >> >> >
> >> >
> >> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c
> >b/app/test-
> >> >
> >> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
> >> >
> >> >> >100644
> >> >
> >> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c
> >> >
> >> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
> >> >
> >> >> >> >@@ -30,7 +30,13 @@
> >> >pipeline_queue_worker_single_stage_tx(void
> >> >
> >> >> >> >*arg)
> >> >
> >> >> >> >
> >> >
> >> >> >> >                    if (ev.sched_type ==
> >> >> >> > RTE_SCHED_TYPE_ATOMIC) {
> >> >
> >> >> >> >                                    pipeline_event_tx(dev,
> >> >> >> > port, &ev);
> >> >
> >> >> >> >-                                  w->processed_pkts++;
> >> >
> >> >> >> >+
> >> >
> >> >> >> >+                                 /* release barrier here
> >> >> >> >+ ensures stored
> >> >operation
> >> >
> >> >> >> >+                                 * of the event completes
> >> >> >> >+ before the number
> >> >of
> >> >
> >> >> >> >+                                 * processed pkts is visible
> >> >> >> >+ to the main core
> >> >
> >> >> >> >+                                 */
> >> >
> >> >> >> >+
> >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >> >1,
> >> >
> >> >> >> >+
> >> >> >> >+ __ATOMIC_RELEASE);
> >> >
> >> >> >> >                    } else {
> >> >
> >> >> >> >                                    ev.queue_id++;
> >> >
> >> >> >> >                                    pipeline_fwd_event(&ev,
> >> >
> >> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >> >
> >> >> >> >@@ -59,7 +65,13 @@
> >> >
> >> >> >pipeline_queue_worker_single_stage_fwd(void
> >> >
> >> >> >> >*arg)
> >> >
> >> >> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >> >> >> > 0);
> >> >
> >> >> >> >                    pipeline_fwd_event(&ev,
> >> >> >> > RTE_SCHED_TYPE_ATOMIC);
> >> >
> >> >> >> >                    pipeline_event_enqueue(dev, port, &ev);
> >> >
> >> >> >> >-                  w->processed_pkts++;
> >> >
> >> >> >> >+
> >> >
> >> >> >> >+                 /* release barrier here ensures stored
> >> >> >> >+ operation
> >> >
> >> >> >> >+                 * of the event completes before the number
> >> >> >> >+ of
> >> >
> >> >> >> >+                 * processed pkts is visible to the main core
> >> >
> >> >> >> >+                 */
> >> >
> >> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
> >> >
> >> >> >> >+
> >> >> >> >+ __ATOMIC_RELEASE);
> >> >
> >> >> >> >    }
> >> >
> >> >> >> >
> >> >
> >> >> >> >    return 0;
> >> >
> >> >> >> >@@ -84,7 +96,13 @@
> >> >
> >> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
> >> >
> >> >> >> >                                    if (ev[i].sched_type ==
> >> >
> >> >> >> >RTE_SCHED_TYPE_ATOMIC) {
> >> >
> >> >> >> >
> >> >> >> > pipeline_event_tx(dev, port, &ev[i]);
> >> >
> >> >> >> >                                                    ev[i].op =
> >> >> >> > RTE_EVENT_OP_RELEASE;
> >> >
> >> >> >> >-                                                  w->processed_pkts++;
> >> >
> >> >> >> >+
> >> >
> >> >> >> >+                                                 /* release
> >> >> >> >+ barrier here ensures stored
> >> >
> >> >> >> >operation
> >> >
> >> >> >> >+                                                 * of the
> >> >> >> >+ event completes before the
> >> >
> >> >> >> >number of
> >> >
> >> >> >> >+                                                 * processed
> >> >> >> >+ pkts is visible to the main
> >> >
> >> >> >> >core
> >> >
> >> >> >> >+                                                 */
> >> >
> >> >> >> >+
> >> >> >> >+ __atomic_fetch_add(&(w-
> >> >
> >> >> >> >>processed_pkts), 1,
> >> >
> >> >> >> >+
> >> >__ATOMIC_RELEASE);
> >> >
> >> >> >> >                                    } else {
> >> >
> >> >> >> >
> >> >> >> > ev[i].queue_id++;
> >> >
> >> >> >> >
> >> >> >> > pipeline_fwd_event(&ev[i],
> >> >
> >> >> >> >@@ -121,7 +139,13 @@
> >> >
> >> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
> >> >
> >> >> >> >                    }
> >> >
> >> >> >> >
> >> >
> >> >> >> >                    pipeline_event_enqueue_burst(dev, port,
> >> >> >> > ev, nb_rx);
> >> >
> >> >> >> >-                  w->processed_pkts += nb_rx;
> >> >
> >> >> >> >+
> >> >
> >> >> >> >+                 /* release barrier here ensures stored
> >> >> >> >+ operation
> >> >
> >> >> >> >+                 * of the event completes before the number
> >> >> >> >+ of
> >> >
> >> >> >> >+                 * processed pkts is visible to the main core
> >> >
> >> >> >> >+                 */
> >> >
> >> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts),
> >> >> >> >+ nb_rx,
> >> >
> >> >> >> >+
> >> >> >> >+ __ATOMIC_RELEASE);
> >> >
> >> >> >> >    }
> >> >
> >> >> >> >
> >> >
> >> >> >> >    return 0;
> >> >
> >> >> >> >@@ -146,7 +170,13 @@
> >> >
> >> >> >pipeline_queue_worker_multi_stage_tx(void
> >> >
> >> >> >> >*arg)
> >> >
> >> >> >> >
> >> >
> >> >> >> >                    if (ev.queue_id ==
> >> >> >> > tx_queue[ev.mbuf->port]) {
> >> >
> >> >> >> >                                    pipeline_event_tx(dev,
> >> >> >> > port, &ev);
> >> >
> >> >> >> >-                                  w->processed_pkts++;
> >> >
> >> >> >> >+
> >> >
> >> >> >> >+                                 /* release barrier here
> >> >> >> >+ ensures stored
> >> >operation
> >> >
> >> >> >> >+                                 * of the event completes
> >> >> >> >+ before the number
> >> >of
> >> >
> >> >> >> >+                                 * processed pkts is visible
> >> >> >> >+ to the main core
> >> >
> >> >> >> >+                                 */
> >> >
> >> >> >> >+
> >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >> >1,
> >> >
> >> >> >> >+
> >> >> >> >+ __ATOMIC_RELEASE);
> >> >
> >> >> >> >                                    continue;
> >> >
> >> >> >> >                    }
> >> >
> >> >> >> >
> >> >
> >> >> >> >@@ -180,7 +210,13 @@
> >> >
> >> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
> >> >
> >> >> >> >                                    ev.queue_id =
> >> >> >> > tx_queue[ev.mbuf->port];
> >> >
> >> >> >> >
> >> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >> >0);
> >> >
> >> >> >> >                                    pipeline_fwd_event(&ev,
> >> >
> >> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >> >
> >> >> >> >-                                  w->processed_pkts++;
> >> >
> >> >> >> >+
> >> >
> >> >> >> >+                                 /* release barrier here
> >> >> >> >+ ensures stored
> >> >operation
> >> >
> >> >> >> >+                                 * of the event completes
> >> >> >> >+ before the number
> >> >of
> >> >
> >> >> >> >+                                 * processed pkts is visible
> >> >> >> >+ to the main core
> >> >
> >> >> >> >+                                 */
> >> >
> >> >> >> >+
> >> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >> >1,
> >> >
> >> >> >> >+
> >> >> >> >+ __ATOMIC_RELEASE);
> >> >
> >> >> >> >                    } else {
> >> >
> >> >> >> >                                    ev.queue_id++;
> >> >
> >> >> >> >                                    pipeline_fwd_event(&ev,
> >> >
> >> >> >> >sched_type_list[cq_id]);
> >> >
> >> >> >> >@@ -214,7 +250,13 @@
> >> >
> >> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
> >> >
> >> >> >> >                                    if (ev[i].queue_id ==
> >> >> >> > tx_queue[ev[i].mbuf-
> >> >
> >> >> >> >>port]) {
> >> >
> >> >> >> >
> >> >> >> > pipeline_event_tx(dev, port, &ev[i]);
> >> >
> >> >> >> >                                                    ev[i].op =
> >> >> >> > RTE_EVENT_OP_RELEASE;
> >> >
> >> >> >> >-                                                  w->processed_pkts++;
> >> >
> >> >> >> >+
> >> >
> >> >> >> >+                                                 /* release
> >> >> >> >+ barrier here ensures stored
> >> >
> >> >> >> >operation
> >> >
> >> >> >> >+                                                 * of the
> >> >> >> >+ event completes before the
> >> >
> >> >> >> >number of
> >> >
> >> >> >> >+                                                 * processed
> >> >> >> >+ pkts is visible to the main
> >> >
> >> >> >> >core
> >> >
> >> >> >> >+                                                 */
> >> >
> >> >> >> >+
> >> >> >> >+ __atomic_fetch_add(&(w-
> >> >
> >> >> >> >>processed_pkts), 1,
> >> >
> >> >> >> >+
> >> >__ATOMIC_RELEASE);
> >> >
> >> >> >> >                                                    continue;
> >> >
> >> >> >> >                                    }
> >> >
> >> >> >> >
> >> >
> >> >> >> >@@ -254,7 +296,13 @@
> >> >
> >> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
> >> >
> >> >> >> >
> >> >
> >> >> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
> >> >
> >> >> >> >
> >> >> >> > pipeline_fwd_event(&ev[i],
> >> >
> >> >> >> >
> >> >
> >> >> >> >    RTE_SCHED_TYPE_ATOMIC);
> >> >
> >> >> >> >-                                                  w->processed_pkts++;
> >> >
> >> >> >> >+
> >> >
> >> >> >> >+                                                 /* release
> >> >> >> >+ barrier here ensures stored
> >> >
> >> >> >> >operation
> >> >
> >> >> >> >+                                                 * of the
> >> >> >> >+ event completes before the
> >> >
> >> >> >> >number of
> >> >
> >> >> >> >+                                                 * processed
> >> >> >> >+ pkts is visible to the main
> >> >
> >> >> >> >core
> >> >
> >> >> >> >+                                                 */
> >> >
> >> >> >> >+
> >> >> >> >+ __atomic_fetch_add(&(w-
> >> >
> >> >> >> >>processed_pkts), 1,
> >> >
> >> >> >> >+
> >> >__ATOMIC_RELEASE);
> >> >
> >> >> >> >                                    } else {
> >> >
> >> >> >> >
> >> >> >> > ev[i].queue_id++;
> >> >
> >> >> >> >
> >> >> >> > pipeline_fwd_event(&ev[i],
> >> >
> >> >> >> >--
> >> >
> >> >> >> >2.17.1
> >> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2021-01-08 10:58           ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
  2021-01-11  1:57             ` [dpdk-dev] 回复: " Feifei Wang
@ 2021-01-12  8:29             ` Pavan Nikhilesh Bhagavatula
  2021-01-14  6:07               ` [dpdk-dev] 回复: " Feifei Wang
  1 sibling, 1 reply; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2021-01-12  8:29 UTC (permalink / raw)
  To: Feifei Wang, Jerin Jacob Kollanukkaran, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd, nd, nd

Hi Feifei,

>Hi Feifei,
>
>>Hi, Pavan
>>
>>> -----邮件原件-----
>>> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
>>> 发送时间: 2021年1月8日 17:13
>>> 收件人: Feifei Wang <Feifei.Wang2@arm.com>;
>jerinj@marvell.com;
>>Harry
>>> van Haaren <harry.van.haaren@intel.com>
>>> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
>>> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng
>Wang
>>> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
>>> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
>>pipeline
>>> test
>>>
>>> Hi Feifei,
>>>
>>> >Hi, Pavan
>>> >
>>> >
>>> >
>>> >> -----邮件原件-----
>>> >
>>> >> 发件人: Pavan Nikhilesh Bhagavatula
>><mailto:pbhagavatula@marvell.com>
>>> >
>>> >> 发送时间: 2021年1月5日 17:29
>>> >
>>> >> 收件人: Feifei Wang <mailto:Feifei.Wang2@arm.com>;
>>mailto:jerinj@marvell.com;
>>> >Harry
>>> >
>>> >> van Haaren <mailto:harry.van.haaren@intel.com>
>>> >
>>> >> 抄送: mailto:dev@dpdk.org; nd <mailto:nd@arm.com>;
>Honnappa
>>Nagarahalli
>>> >
>>> >> <mailto:Honnappa.Nagarahalli@arm.com>;
>>mailto:stable@dpdk.org; Ruifeng Wang
>>> >
>>> >> <mailto:Ruifeng.Wang@arm.com>; nd <mailto:nd@arm.com>
>>> >
>>> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
>>for
>>> >pipeline
>>> >
>>> >> test
>>> >
>>> >>
>>> >
>>> >> Hi Feifei,
>>> >
>>> >>
>>> >
>>> >> >Hi, Pavan
>>> >
>>> >> >
>>> >
>>> >> >Sorry for my late reply and thanks very much for your review.
>>> >
>>> >> >
>>> >
>>> >> >> -----Original Message-----
>>> >
>>> >> >> From: Pavan Nikhilesh Bhagavatula
>>>
>>><mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.
>c
>>om>>
>>> >
>>> >> >> Sent: 2020年12月22日 18:33
>>> >
>>> >> >> To: Feifei Wang
>>>
>><mailto:Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>;
>>> >mailto:jerinj@marvell.com<mailto:jerinj@marvell.com>;
>>> >
>>> >> >Harry van
>>> >
>>> >> >> Haaren
>>>
>>><mailto:harry.van.haaren@intel.com<mailto:harry.van.haaren@intel
>.
>>com>>;
>>> >Pavan Nikhilesh
>>> >
>>> >> >>
>>>
>>><pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@caviu
>m
>>n
>>> >etworks.com>>
>>> >
>>> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd
>>> ><mailto:nd@arm.com<mailto:nd@arm.com>>; Honnappa
>>Nagarahalli
>>> >
>>> >> >>
>>>
>>><Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@a
>r
>>m
>>> >.com>>; mailto:stable@dpdk.org<mailto:stable@dpdk.org>; Phil
>>Yang
>>> >
>>> >> >> <mailto:Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>
>>> >
>>> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
>>> >barriers
>>> >
>>> >> >for
>>> >
>>> >> >> pipeline test
>>> >
>>> >> >>
>>> >
>>> >> >>
>>> >
>>> >> >> >Add release barriers before updating the processed packets
>for
>>> >
>>> >> >worker
>>> >
>>> >> >> >lcores to ensure the worker lcore has really finished data
>>> >
>>> >> >> >processing and then it can update the processed packets
>>number.
>>> >
>>> >> >> >
>>> >
>>> >> >>
>>> >
>>> >> >> I believe we can live with minor inaccuracies in stats being
>>> >
>>> >> >> presented
>>> >
>>> >> >as
>>> >
>>> >> >> atomics are pretty heavy when scheduler is limited to burst
>size
>>> >> >> as
>>> >1.
>>> >
>>> >> >>
>>> >
>>> >> >> One option is to move it before a pipeline operation
>>> >
>>> >> >(pipeline_event_tx,
>>> >
>>> >> >> pipeline_fwd_event etc.) as they imply implicit release barrier
>>> >> >> (as
>>> >
>>> >> >> all
>>> >
>>> >> >the
>>> >
>>> >> >> changes done to the event should be visible to the next core).
>>> >
>>> >> >
>>> >
>>> >> >If I understand correctly, your meaning is that move release
>>> >> >barriers
>>> >
>>> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure
>>the
>>> >
>>> >> >event has been processed before the next core begins to tx/fwd.
>>For
>>> >
>>> >> >example:
>>> >
>>> >>
>>> >
>>> >> What I meant was event APIs such as `rte_event_enqueue_burst`,
>>> >
>>> >> `rte_event_eth_tx_adapter_enqueue`
>>> >
>>> >> act as an implicit release barrier and the API
>>> >`rte_event_dequeue_burst` act
>>> >
>>> >> as an implicit acquire barrier.
>>> >
>>> >
>>> >
>>> >>
>>> >
>>> >> Since, pipeline_* test starts with a dequeue() and ends with an
>>> >enqueue() I
>>> >
>>> >> don’t believe we need barriers in Between.
>>> >
>>> >
>>> >
>>> >Sorry for my misunderstanding this. And I agree with you that no
>>> >barriers are
>>> >
>>> >needed between dequeue and enqueue.
>>> >
>>> >
>>> >
>>> >Now, let's go back to the beginning. Actually with this patch, our
>>> >barrier is mainly
>>> >
>>> >for the synchronous variable " w->processed_pkts ". As we all
>know,
>>the
>>> >event is firstly
>>> >
>>> >dequeued and then enqueued, after this, the event can be treated
>as
>>the
>>> >processed event
>>> >
>>> >and included in the statistics("w->processed_pkts++").
>>> >
>>> >
>>> >
>>> >Thus, we add a release barrier before " w->processed_pkts++" is to
>>> >prevent this operation
>>> >
>>> >being executed ahead of time. For example:
>>> >
>>> >dequeue  ->  w->processed_pkts++  ->  enqueue
>>> >
>>> >This cause that the worker doesn't actually finish this event
>>> >processing, but the event is treated
>>> >
>>> >as the processed one and included in the statistics.
>>> >
>>>
>>> But the current sequence is dequeue-> enqueue-> w-
>>>processed_pkts++
>>> and enqueue already acts as an explicit release barrier right?
>>>
>>
>>Sorry maybe I cannot understand how “enqueue” as an explicit release
>>barrier. I think of two possibilities:
>>1. As you say before, all the changes done to the event should be
>visible
>>to the next core and enqueue is a operation for event, so the next
>core
>>should wait for the event to be enqueued.
>>I think this is due to data dependence for the same variable. However,
>>‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot
>>prevent ‘w->processed_pkts++’ before enqueue.
>>And the main core may load updated ‘w->processed_pkts’ but actually
>>the event is still being processed. For example:
>>
>> Time Slot	   Worker 1                                      Main core
>>   1                           dequeue
>>   2                  w->processed_pkts++
>>   3                       		                              load w-
>>processed_pkts
>>   4                          enqueue
>>
>>2. Some release barriers have been included in enqueue. There is a
>>release barrier in rte_ring_enqueue :
>>move head -> copy elements to the ring -> release barrier -> update
>tail
>>-> w->processed_pkts++
>>However, this barrier cannot prevent ‘w->processed_pkts++’ before
>>update tail, and when update_tail has been finished, the enqueue
>>process  can be seen completed.
>
>I was talking about case 2 in particular almost all enqueue calls have
>some kind of
>release barrier in place. I do agree w->processed_pkts++ might get
>reordered with
>tail update but since enqueue itself is a ldr + blr I was hoping that it
>wouldn't occur.
>
>We can continue the discussion once I have some performance data.
>
>Thanks for your patience :)
>Pavan.
>
>>
>>> >
>>> >
>>>
>>>________________________________________________________
>_
>>> _
>>> >____________________
>>> >
>>> >
>>> >
>>> >By the way, I have two other questions about pipeline process test
>in
>>> >"test_pipeline_queue".
>>> >
>>> >1. when do we start counting processed events (w-
>>>processed_pkts)?
>>> >
>>> >For the fwd mode (internal_port = false), when we choose single
>>stage,
>>> >application increments
>>> >
>>> >the number events processed after "pipeline_event enqueue".
>>> >However, when we choose multiple
>>> >
>>> >stage, application increments the number events processed before
>>> >"pipleline_event_enqueue".
>>>
>>> We count an event as process when all the stages have completed
>>and its
>>> Trasnmitted.
>>>
>>> >So,
>>> >
>>> >maybe we can unify this. For example of multiple stage:
>>> >
>>> >
>>> >
>>> >                                if (cq_id == last_queue) {
>>> >
>>> >                                                ev.queue_id =
>>> > tx_queue[ev.mbuf->port];
>>> >
>>> >
>>> >rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>>> >0);
>>> >
>>> >                                                pipeline_fwd_event(&ev,
>>> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >                                +             pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >                                                w->processed_pkts++;
>>> >
>>> >                                } else {
>>> >
>>> >                                                ev.queue_id++;
>>> >
>>> >                                                pipeline_fwd_event(&ev,
>>> >sched_type_list[cq_id]);
>>> >
>>> >                                +             pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >                                }
>>> >
>>> >
>>> >
>>> >                -              pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >
>>>
>>> The above change makes sense.
>>>
>>Thanks for your review, and I’ll update this change into the next
>>version.
>>> >
>>> >2. Whether  "pipeline_event_enqueue" is needed after
>>> >"pipeline_event_tx" for tx mode?
>>> >
>>> >For single_stage_burst_tx mode, after "pipeline_event_tx", the
>>worker
>>> >has to enqueue again
>>> >
>>> >due to  "pipeline_event_enqueue_burst", so maybe we should jump
>>out of
>>> >the loop after
>>> >
>>> >“pipeline_event_tx”,
>>>
>>> We call enqueue burst to release the events i.e. enqueue events with
>>> RTE_EVENT_OP_RELEASE.
>>>
>>However,
>>In case of single event, for ' pipeline_queue_worker_single_stage_tx'
>>and ' pipeline_queue_worker_multi_stage_tx',
>>after tx, there is no release operation.
>>

I think this was done before the feature disable_implicit_release 
`RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE ` was implemented.
We can remove the event_op_release section of the code and have a continue.

>>> >
>>>  for example:
>>> >
>>> >
>>> >
>>> >                                                if (ev[i].sched_type ==
>>> >RTE_SCHED_TYPE_ATOMIC) {
>>> >
>>> >
>>> > pipeline_event_tx(dev, port, &ev[i]);
>>> >
>>> >
>>> > ev[i].op = RTE_EVENT_OP_RELEASE;
>>> >
>>> >
>>> > w->processed_pkts++;
>>> >
>>> >                                                +             continue;
>>> >
>>> >                                                } else {
>>> >
>>> >
>>> > ev[i].queue_id++;
>>> >
>>> >
>>> > pipeline_fwd_event(&ev[i],
>>> >
>>> >
>>> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >                                                }
>>> >
>>> >                                }
>>> >
>>> >
>>> >
>>> >                                pipeline_event_enqueue_burst(dev, port,
>>> > ev, nb_rx);
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >>
>>> >
>>> >> >
>>> >
>>> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>>> >
>>> >> >                          +
>>> >__atomic_thread_fence(__ATOMIC_RELEASE);
>>> >
>>> >> >                                          pipeline_event_tx(dev,
>>> >> > port, &ev);
>>> >
>>> >> >                                          w->processed_pkts++;
>>> >
>>> >> >                          } else {
>>> >
>>> >> >                                          ev.queue_id++;
>>> >
>>> >> >                          +
>>> >__atomic_thread_fence(__ATOMIC_RELEASE);
>>> >
>>> >> >                                          pipeline_fwd_event(&ev,
>>> >
>>> >> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >
>>> >> > pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >> >
>>> >
>>> >> >However, there are two reasons to prevent this:
>>> >
>>> >> >
>>> >
>>> >> >First, compare with other tests in app/eventdev, for example,
>the
>>> >
>>> >> >eventdev perf test, the wmb is after event operation to ensure
>>> >
>>> >> >operation has been finished and then w->processed_pkts++.
>>> >
>>> >>
>>> >
>>> >> In case of perf_* tests start with a dequeue() and finally ends with
>>> >> a
>>> >
>>> >> mempool_put() should also act as implicit acquire release pairs
>>> >making stats
>>> >
>>> >> consistent?
>>> >
>>> >
>>> >
>>> >For perf tests, this consistency refers to that there is a wmb after
>>> >mempool_put().
>>> >
>>> >Please refer to this link:
>>> >
>>> >https://urldefense.proofpoint.com/v2/url?u=http-
>>>
>>>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0
>m
>>O
>>>
>>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=
>z
>>g
>>> >QHeSDiXWfI1PIIUxXBqMS6E-
>>2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
>>> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
>>> >
>>> >
>>> >
>>> >>
>>> >
>>> >> >So, if we move release barriers before tx/fwd, it may cause that
>>the
>>> >
>>> >> >tests of app/eventdev become  inconsistent.This may reduce the
>>> >
>>> >> >maintainability of the code and make it difficult to understand.
>>> >
>>> >> >
>>> >
>>> >> >Second, it is a test case, though heavy thread may cause
>>> >performance
>>> >
>>> >> >degradation, it can ensure that the operation process and the
>test
>>> >
>>> >> >result are correct. And maybe for a test case, correctness is
>more
>>> >
>>> >> >important than performance.
>>> >
>>> >> >
>>> >
>>> >>
>>> >
>>> >> Most of our internal perf test run on 24/48 core combinations
>and
>>> >since
>>> >
>>> >> Octeontx2 event device driver supports a burst size of 1, it will
>>> >> show
>>> >up as
>>> >
>>> >> Huge performance degradation.
>>> >
>>> >
>>> >
>>> >For the impact on performance, I do the test using software driver,
>>> >following are some test results:
>>> >
>>> >-----------------------------------------------------------------------
>>> >----------------
>>> >---------------------------------------------
>>> >
>>> >Architecture: aarch64
>>> >
>>> >Nics: ixgbe-82599
>>> >
>>> >CPU: Cortex-A72
>>> >
>>> >BURST_SIZE: 1
>>> >
>>> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
>>> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
>>> >
>>> >Flow: one flow, 64bits package, TX rate: 1.4Mpps
>>> >
>>> >
>>> >
>>> >Without this patch:
>>> >
>>> >0.954 mpps avg 0.953 mpps
>>> >
>>> >
>>> >
>>> >With this patch:
>>> >
>>> >0.932 mpps avg 0.930 mpps
>>> >
>>> >-----------------------------------------------------------------------
>>> >----------------
>>> >---------------------------------------------
>>> >
>>> >
>>> >
>>> >Based on the result above, there is no significant performance
>>> >degradation with this patch.
>>> >
>>> >This is because the release barrier is only for  “w-
>>>processed_pkts++”.
>>> >It just ensures that the worker core
>>> >
>>> >increments the number events processed after enqueue, and it
>>doesn’t
>>> >affect dequeue/enqueue:
>>> >
>>> >
>>> >
>>> >dequeue -> enqueue -> release barrier -> w->processed_pkts++
>>> >
>>>
>>> Here enqueue already acts as an explicit release barrier.
>>>
>>Please refer above reasons.
>>
>>> >
>>> >
>>> >On the other hand, I infer the reason for the slight decrease in
>>> >measurement performance is that the release barrier
>>> >
>>> >prevent “w->processed_pkts++” before that the event has been
>>processed
>>> >(enqueue). But I think this test result is closer
>>> >
>>> >to the real performance.
>>> >
>>> >And sorry for that we have no octentx2 device, so there is no test
>>> >result on Octeontx2 event device driver. Would you please
>>> >
>>> >help us test this patch on octentx2 when you are convenient.
>Thanks
>>> >very much.
>>> >
>>>
>>> I will report the performance numbers on Monday.
>>>
>>
>>That’s great, Thanks very much for your help.

We are seeing a ~2% performance loss with the series applied on pipeline tests.
I have taken the performance numbers on IXIA to eliminate any discrepancies in stats.

Do you see any improvement in stats being reported with and w/o the barrier with
Traffic generator vs eventdev application? because I don’t see any difference when 
verifying with IXIA.

Can we skip this patch?
Also, note that the barrier wouldn’t help in cases where:
1. SW eventdev Tx routine runs on service core which would have it's on latency in 
processing the packets.
2. HW eventdev Tx routine that have shaping and scheduling.

Regards, 
Pavan.

>>
>>Best Regards
>>Feifei
>>
>>> >
>>> >
>>> >Best Regards
>>> >
>>> >Feifei
>>>
>>> Regards,
>>> Pavan.
>>>
>>> >
>>> >
>>> >
>>> >>
>>> >
>>> >> >So, due to two reasons above, I'm ambivalent about how we
>>should
>>> >do in
>>> >
>>> >> >the next step.
>>> >
>>> >> >
>>> >
>>> >> >Best Regards
>>> >
>>> >> >Feifei
>>> >
>>> >>
>>> >
>>> >> Regards,
>>> >
>>> >> Pavan.
>>> >
>>> >>
>>> >
>>> >> >
>>> >
>>> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue
>>worker
>>> >
>>> >> >> >functions")
>>> >
>>> >> >> >Cc:
>>>
>>>mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.c
>o
>>m>
>>> >
>>> >> >> >Cc: mailto:stable@dpdk.org<mailto:stable@dpdk.org>
>>> >
>>> >> >> >
>>> >
>>> >> >> >Signed-off-by: Phil Yang
>>> ><mailto:phil.yang@arm.com<mailto:phil.yang@arm.com>>
>>> >
>>> >> >> >Signed-off-by: Feifei Wang
>>> ><mailto:feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>
>>> >
>>> >> >> >Reviewed-by: Ruifeng Wang
>>>
>><mailto:ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>
>>> >
>>> >> >> >---
>>> >
>>> >> >> > app/test-eventdev/test_pipeline_queue.c | 64
>>> >
>>> >> >> >+++++++++++++++++++++----
>>> >
>>> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
>>> >
>>> >> >> >
>>> >
>>> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c
>>b/app/test-
>>> >
>>> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>>> >
>>> >> >100644
>>> >
>>> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c
>>> >
>>> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
>>> >
>>> >> >> >@@ -30,7 +30,13 @@
>>> >pipeline_queue_worker_single_stage_tx(void
>>> >
>>> >> >> >*arg)
>>> >
>>> >> >> >
>>> >
>>> >> >> >                    if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>>> >
>>> >> >> >                                    pipeline_event_tx(dev, port,
>>> >> >> > &ev);
>>> >
>>> >> >> >-                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                 /* release barrier here ensures
>>> >> >> >+ stored
>>> >operation
>>> >
>>> >> >> >+                                 * of the event completes before
>>> >> >> >+ the number
>>> >of
>>> >
>>> >> >> >+                                 * processed pkts is visible to
>>> >> >> >+ the main core
>>> >
>>> >> >> >+                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>>> >1,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >                    } else {
>>> >
>>> >> >> >                                    ev.queue_id++;
>>> >
>>> >> >> >                                    pipeline_fwd_event(&ev,
>>> >
>>> >> >> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >> >@@ -59,7 +65,13 @@
>>> >
>>> >> >pipeline_queue_worker_single_stage_fwd(void
>>> >
>>> >> >> >*arg)
>>> >
>>> >> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>>> >
>>> >> >> >                    pipeline_fwd_event(&ev,
>>> >> >> > RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >> >                    pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >> >> >-                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                 /* release barrier here ensures stored
>>> >> >> >+ operation
>>> >
>>> >> >> >+                 * of the event completes before the number of
>>> >
>>> >> >> >+                 * processed pkts is visible to the main core
>>> >
>>> >> >> >+                 */
>>> >
>>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >    return 0;
>>> >
>>> >> >> >@@ -84,7 +96,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>>> >
>>> >> >> >                                    if (ev[i].sched_type ==
>>> >
>>> >> >> >RTE_SCHED_TYPE_ATOMIC) {
>>> >
>>> >> >> >
>>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
>>> >
>>> >> >> >                                                    ev[i].op =
>>> >> >> > RTE_EVENT_OP_RELEASE;
>>> >
>>> >> >> >-                                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                                 /* release
>>> >> >> >+ barrier here ensures stored
>>> >
>>> >> >> >operation
>>> >
>>> >> >> >+                                                 * of the event
>>> >> >> >+ completes before the
>>> >
>>> >> >> >number of
>>> >
>>> >> >> >+                                                 * processed
>>> >> >> >+ pkts is visible to the main
>>> >
>>> >> >> >core
>>> >
>>> >> >> >+                                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w-
>>> >
>>> >> >> >>processed_pkts), 1,
>>> >
>>> >> >> >+
>>> >__ATOMIC_RELEASE);
>>> >
>>> >> >> >                                    } else {
>>> >
>>> >> >> >
>>> >> >> > ev[i].queue_id++;
>>> >
>>> >> >> >
>>> >> >> > pipeline_fwd_event(&ev[i],
>>> >
>>> >> >> >@@ -121,7 +139,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>>> >
>>> >> >> >                    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >                    pipeline_event_enqueue_burst(dev, port, ev,
>>> >> >> > nb_rx);
>>> >
>>> >> >> >-                  w->processed_pkts += nb_rx;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                 /* release barrier here ensures stored
>>> >> >> >+ operation
>>> >
>>> >> >> >+                 * of the event completes before the number of
>>> >
>>> >> >> >+                 * processed pkts is visible to the main core
>>> >
>>> >> >> >+                 */
>>> >
>>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts),
>nb_rx,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >    return 0;
>>> >
>>> >> >> >@@ -146,7 +170,13 @@
>>> >
>>> >> >pipeline_queue_worker_multi_stage_tx(void
>>> >
>>> >> >> >*arg)
>>> >
>>> >> >> >
>>> >
>>> >> >> >                    if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>>> >
>>> >> >> >                                    pipeline_event_tx(dev, port,
>>> >> >> > &ev);
>>> >
>>> >> >> >-                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                 /* release barrier here ensures
>>> >> >> >+ stored
>>> >operation
>>> >
>>> >> >> >+                                 * of the event completes before
>>> >> >> >+ the number
>>> >of
>>> >
>>> >> >> >+                                 * processed pkts is visible to
>>> >> >> >+ the main core
>>> >
>>> >> >> >+                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>>> >1,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >                                    continue;
>>> >
>>> >> >> >                    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >@@ -180,7 +210,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>>> >
>>> >> >> >                                    ev.queue_id =
>>> >> >> > tx_queue[ev.mbuf->port];
>>> >
>>> >> >> >
>>> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>>> >0);
>>> >
>>> >> >> >                                    pipeline_fwd_event(&ev,
>>> >
>>> >> >> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >> >-                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                 /* release barrier here ensures
>>> >> >> >+ stored
>>> >operation
>>> >
>>> >> >> >+                                 * of the event completes before
>>> >> >> >+ the number
>>> >of
>>> >
>>> >> >> >+                                 * processed pkts is visible to
>>> >> >> >+ the main core
>>> >
>>> >> >> >+                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>>> >1,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >                    } else {
>>> >
>>> >> >> >                                    ev.queue_id++;
>>> >
>>> >> >> >                                    pipeline_fwd_event(&ev,
>>> >
>>> >> >> >sched_type_list[cq_id]);
>>> >
>>> >> >> >@@ -214,7 +250,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>>> >
>>> >> >> >                                    if (ev[i].queue_id ==
>>> >> >> > tx_queue[ev[i].mbuf-
>>> >
>>> >> >> >>port]) {
>>> >
>>> >> >> >
>>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
>>> >
>>> >> >> >                                                    ev[i].op =
>>> >> >> > RTE_EVENT_OP_RELEASE;
>>> >
>>> >> >> >-                                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                                 /* release
>>> >> >> >+ barrier here ensures stored
>>> >
>>> >> >> >operation
>>> >
>>> >> >> >+                                                 * of the event
>>> >> >> >+ completes before the
>>> >
>>> >> >> >number of
>>> >
>>> >> >> >+                                                 * processed
>>> >> >> >+ pkts is visible to the main
>>> >
>>> >> >> >core
>>> >
>>> >> >> >+                                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w-
>>> >
>>> >> >> >>processed_pkts), 1,
>>> >
>>> >> >> >+
>>> >__ATOMIC_RELEASE);
>>> >
>>> >> >> >                                                    continue;
>>> >
>>> >> >> >                                    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >@@ -254,7 +296,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>>> >
>>> >> >> >
>>> >
>>> >> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>>> >
>>> >> >> >
>>> >> >> > pipeline_fwd_event(&ev[i],
>>> >
>>> >> >> >
>>> >
>>> >> >> >    RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >> >-                                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                                 /* release
>>> >> >> >+ barrier here ensures stored
>>> >
>>> >> >> >operation
>>> >
>>> >> >> >+                                                 * of the event
>>> >> >> >+ completes before the
>>> >
>>> >> >> >number of
>>> >
>>> >> >> >+                                                 * processed
>>> >> >> >+ pkts is visible to the main
>>> >
>>> >> >> >core
>>> >
>>> >> >> >+                                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w-
>>> >
>>> >> >> >>processed_pkts), 1,
>>> >
>>> >> >> >+
>>> >__ATOMIC_RELEASE);
>>> >
>>> >> >> >                                    } else {
>>> >
>>> >> >> >
>>> >> >> > ev[i].queue_id++;
>>> >
>>> >> >> >
>>> >> >> > pipeline_fwd_event(&ev[i],
>>> >
>>> >> >> >--
>>> >
>>> >> >> >2.17.1
>>> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] 回复: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2021-01-12  8:29             ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
@ 2021-01-14  6:07               ` Feifei Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Feifei Wang @ 2021-01-14  6:07 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, jerinj, Harry van Haaren
  Cc: dev, nd, Honnappa Nagarahalli, stable, Ruifeng Wang, nd

Hi, Pavan

> -----邮件原件-----
> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> 发送时间: 2021年1月12日 16:29
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com; Harry
> van Haaren <harry.van.haaren@intel.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>; nd
> <nd@arm.com>
> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline
> test
> 
> Hi Feifei,
> 
> >Hi Feifei,
> >
> >>Hi, Pavan
> >>
> >>> -----邮件原件-----
> >>> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> >>> 发送时间: 2021年1月8日 17:13
> >>> 收件人: Feifei Wang <Feifei.Wang2@arm.com>;
> >jerinj@marvell.com;
> >>Harry
> >>> van Haaren <harry.van.haaren@intel.com>
> >>> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> >>> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng
> >Wang
> >>> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> >>> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
> >>pipeline
> >>> test
> >>>
> >>> Hi Feifei,
> >>>
> >>> >Hi, Pavan
> >>> >
> >>> >
> >>> >
> >>> >> -----邮件原件-----
> >>> >
> >>> >> 发件人: Pavan Nikhilesh Bhagavatula
> >><mailto:pbhagavatula@marvell.com>
> >>> >
> >>> >> 发送时间: 2021年1月5日 17:29
> >>> >
> >>> >> 收件人: Feifei Wang <mailto:Feifei.Wang2@arm.com>;
> >>mailto:jerinj@marvell.com;
> >>> >Harry
> >>> >
> >>> >> van Haaren <mailto:harry.van.haaren@intel.com>
> >>> >
> >>> >> 抄送: mailto:dev@dpdk.org; nd <mailto:nd@arm.com>;
> >Honnappa
> >>Nagarahalli
> >>> >
> >>> >> <mailto:Honnappa.Nagarahalli@arm.com>;
> >>mailto:stable@dpdk.org; Ruifeng Wang
> >>> >
> >>> >> <mailto:Ruifeng.Wang@arm.com>; nd <mailto:nd@arm.com>
> >>> >
> >>> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
> >>for
> >>> >pipeline
> >>> >
> >>> >> test
> >>> >
> >>> >>
> >>> >
> >>> >> Hi Feifei,
> >>> >
> >>> >>
> >>> >
> >>> >> >Hi, Pavan
> >>> >
> >>> >> >
> >>> >
> >>> >> >Sorry for my late reply and thanks very much for your review.
> >>> >
> >>> >> >
> >>> >
> >>> >> >> -----Original Message-----
> >>> >
> >>> >> >> From: Pavan Nikhilesh Bhagavatula
> >>>
> >>><mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.
> >c
> >>om>>
> >>> >
> >>> >> >> Sent: 2020年12月22日 18:33
> >>> >
> >>> >> >> To: Feifei Wang
> >>>
> >><mailto:Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>;
> >>> >mailto:jerinj@marvell.com<mailto:jerinj@marvell.com>;
> >>> >
> >>> >> >Harry van
> >>> >
> >>> >> >> Haaren
> >>>
> >>><mailto:harry.van.haaren@intel.com<mailto:harry.van.haaren@intel
> >.
> >>com>>;
> >>> >Pavan Nikhilesh
> >>> >
> >>> >> >>
> >>>
> >>><pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@caviu
> >m
> >>n
> >>> >etworks.com>>
> >>> >
> >>> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd
> >>> ><mailto:nd@arm.com<mailto:nd@arm.com>>; Honnappa
> >>Nagarahalli
> >>> >
> >>> >> >>
> >>>
> >>><Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@a
> >r
> >>m
> >>> >.com>>; mailto:stable@dpdk.org<mailto:stable@dpdk.org>; Phil
> >>Yang
> >>> >
> >>> >> >> <mailto:Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>
> >>> >
> >>> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
> >>> >barriers
> >>> >
> >>> >> >for
> >>> >
> >>> >> >> pipeline test
> >>> >
> >>> >> >>
> >>> >
> >>> >> >>
> >>> >
> >>> >> >> >Add release barriers before updating the processed packets
> >for
> >>> >
> >>> >> >worker
> >>> >
> >>> >> >> >lcores to ensure the worker lcore has really finished data
> >>> >
> >>> >> >> >processing and then it can update the processed packets
> >>number.
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >>
> >>> >
> >>> >> >> I believe we can live with minor inaccuracies in stats being
> >>> >
> >>> >> >> presented
> >>> >
> >>> >> >as
> >>> >
> >>> >> >> atomics are pretty heavy when scheduler is limited to burst
> >size
> >>> >> >> as
> >>> >1.
> >>> >
> >>> >> >>
> >>> >
> >>> >> >> One option is to move it before a pipeline operation
> >>> >
> >>> >> >(pipeline_event_tx,
> >>> >
> >>> >> >> pipeline_fwd_event etc.) as they imply implicit release
> >>> >> >> barrier (as
> >>> >
> >>> >> >> all
> >>> >
> >>> >> >the
> >>> >
> >>> >> >> changes done to the event should be visible to the next core).
> >>> >
> >>> >> >
> >>> >
> >>> >> >If I understand correctly, your meaning is that move release
> >>> >> >barriers
> >>> >
> >>> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure
> >>the
> >>> >
> >>> >> >event has been processed before the next core begins to tx/fwd.
> >>For
> >>> >
> >>> >> >example:
> >>> >
> >>> >>
> >>> >
> >>> >> What I meant was event APIs such as `rte_event_enqueue_burst`,
> >>> >
> >>> >> `rte_event_eth_tx_adapter_enqueue`
> >>> >
> >>> >> act as an implicit release barrier and the API
> >>> >`rte_event_dequeue_burst` act
> >>> >
> >>> >> as an implicit acquire barrier.
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >
> >>> >> Since, pipeline_* test starts with a dequeue() and ends with an
> >>> >enqueue() I
> >>> >
> >>> >> don’t believe we need barriers in Between.
> >>> >
> >>> >
> >>> >
> >>> >Sorry for my misunderstanding this. And I agree with you that no
> >>> >barriers are
> >>> >
> >>> >needed between dequeue and enqueue.
> >>> >
> >>> >
> >>> >
> >>> >Now, let's go back to the beginning. Actually with this patch, our
> >>> >barrier is mainly
> >>> >
> >>> >for the synchronous variable " w->processed_pkts ". As we all
> >know,
> >>the
> >>> >event is firstly
> >>> >
> >>> >dequeued and then enqueued, after this, the event can be treated
> >as
> >>the
> >>> >processed event
> >>> >
> >>> >and included in the statistics("w->processed_pkts++").
> >>> >
> >>> >
> >>> >
> >>> >Thus, we add a release barrier before " w->processed_pkts++" is to
> >>> >prevent this operation
> >>> >
> >>> >being executed ahead of time. For example:
> >>> >
> >>> >dequeue  ->  w->processed_pkts++  ->  enqueue
> >>> >
> >>> >This cause that the worker doesn't actually finish this event
> >>> >processing, but the event is treated
> >>> >
> >>> >as the processed one and included in the statistics.
> >>> >
> >>>
> >>> But the current sequence is dequeue-> enqueue-> w-
> processed_pkts++
> >>>and enqueue already acts as an explicit release barrier right?
> >>>
> >>
> >>Sorry maybe I cannot understand how “enqueue” as an explicit release
> >>barrier. I think of two possibilities:
> >>1. As you say before, all the changes done to the event should be
> >visible
> >>to the next core and enqueue is a operation for event, so the next
> >core
> >>should wait for the event to be enqueued.
> >>I think this is due to data dependence for the same variable. However,
> >>‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot
> >>prevent ‘w->processed_pkts++’ before enqueue.
> >>And the main core may load updated ‘w->processed_pkts’ but actually
> >>the event is still being processed. For example:
> >>
> >> Time Slot	   Worker 1                                      Main core
> >>   1                           dequeue
> >>   2                  w->processed_pkts++
> >>   3                       		                              load w-
> >>processed_pkts
> >>   4                          enqueue
> >>
> >>2. Some release barriers have been included in enqueue. There is a
> >>release barrier in rte_ring_enqueue :
> >>move head -> copy elements to the ring -> release barrier -> update
> >tail
> >>-> w->processed_pkts++
> >>However, this barrier cannot prevent ‘w->processed_pkts++’ before
> >>update tail, and when update_tail has been finished, the enqueue
> >>process  can be seen completed.
> >
> >I was talking about case 2 in particular almost all enqueue calls have
> >some kind of release barrier in place. I do agree w->processed_pkts++
> >might get reordered with tail update but since enqueue itself is a ldr
> >+ blr I was hoping that it wouldn't occur.
> >
> >We can continue the discussion once I have some performance data.
> >
> >Thanks for your patience :)
> >Pavan.
> >
> >>
> >>> >
> >>> >
> >>>
> >>>_______________________________________________________
> _
> >_
> >>> _
> >>> >____________________
> >>> >
> >>> >
> >>> >
> >>> >By the way, I have two other questions about pipeline process test
> >in
> >>> >"test_pipeline_queue".
> >>> >
> >>> >1. when do we start counting processed events (w-
> >>>processed_pkts)?
> >>> >
> >>> >For the fwd mode (internal_port = false), when we choose single
> >>stage,
> >>> >application increments
> >>> >
> >>> >the number events processed after "pipeline_event enqueue".
> >>> >However, when we choose multiple
> >>> >
> >>> >stage, application increments the number events processed before
> >>> >"pipleline_event_enqueue".
> >>>
> >>> We count an event as process when all the stages have completed
> >>and its
> >>> Trasnmitted.
> >>>
> >>> >So,
> >>> >
> >>> >maybe we can unify this. For example of multiple stage:
> >>> >
> >>> >
> >>> >
> >>> >                                if (cq_id == last_queue) {
> >>> >
> >>> >                                                ev.queue_id =
> >>> > tx_queue[ev.mbuf->port];
> >>> >
> >>> >
> >>> >rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >>> >0);
> >>> >
> >>> >
> >>> >pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >                                +             pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >
> >>> > w->processed_pkts++;
> >>> >
> >>> >                                } else {
> >>> >
> >>> >                                                ev.queue_id++;
> >>> >
> >>> >
> >>> >pipeline_fwd_event(&ev, sched_type_list[cq_id]);
> >>> >
> >>> >                                +             pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >                                }
> >>> >
> >>> >
> >>> >
> >>> >                -              pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >
> >>>
> >>> The above change makes sense.
> >>>
> >>Thanks for your review, and I’ll update this change into the next
> >>version.
> >>> >
> >>> >2. Whether  "pipeline_event_enqueue" is needed after
> >>> >"pipeline_event_tx" for tx mode?
> >>> >
> >>> >For single_stage_burst_tx mode, after "pipeline_event_tx", the
> >>worker
> >>> >has to enqueue again
> >>> >
> >>> >due to  "pipeline_event_enqueue_burst", so maybe we should jump
> >>out of
> >>> >the loop after
> >>> >
> >>> >“pipeline_event_tx”,
> >>>
> >>> We call enqueue burst to release the events i.e. enqueue events with
> >>> RTE_EVENT_OP_RELEASE.
> >>>
> >>However,
> >>In case of single event, for ' pipeline_queue_worker_single_stage_tx'
> >>and ' pipeline_queue_worker_multi_stage_tx',
> >>after tx, there is no release operation.
> >>
> 
> I think this was done before the feature disable_implicit_release
> `RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE ` was implemented.
> We can remove the event_op_release section of the code and have a
> continue.
> 
Ok, thanks for your explain. I will upload the new series to fix above two bugs.
 
> >>> >
> >>>  for example:
> >>> >
> >>> >
> >>> >
> >>> >                                                if
> >>> >(ev[i].sched_type ==
> >>> >RTE_SCHED_TYPE_ATOMIC) {
> >>> >
> >>> >
> >>> > pipeline_event_tx(dev, port, &ev[i]);
> >>> >
> >>> >
> >>> > ev[i].op = RTE_EVENT_OP_RELEASE;
> >>> >
> >>> >
> >>> > w->processed_pkts++;
> >>> >
> >>> >                                                +             continue;
> >>> >
> >>> >                                                } else {
> >>> >
> >>> >
> >>> > ev[i].queue_id++;
> >>> >
> >>> >
> >>> > pipeline_fwd_event(&ev[i],
> >>> >
> >>> >
> >>> >RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >                                                }
> >>> >
> >>> >                                }
> >>> >
> >>> >
> >>> >
> >>> >                                pipeline_event_enqueue_burst(dev,
> >>> > port, ev, nb_rx);
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >
> >>> >> >
> >>> >
> >>> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
> >>> >
> >>> >> >                          +
> >>> >__atomic_thread_fence(__ATOMIC_RELEASE);
> >>> >
> >>> >> >                                          pipeline_event_tx(dev,
> >>> >> > port, &ev);
> >>> >
> >>> >> >                                          w->processed_pkts++;
> >>> >
> >>> >> >                          } else {
> >>> >
> >>> >> >                                          ev.queue_id++;
> >>> >
> >>> >> >                          +
> >>> >__atomic_thread_fence(__ATOMIC_RELEASE);
> >>> >
> >>> >> >
> >>> >> > pipeline_fwd_event(&ev,
> >>> >
> >>> >> >RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >
> >>> >> > pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >> >
> >>> >
> >>> >> >However, there are two reasons to prevent this:
> >>> >
> >>> >> >
> >>> >
> >>> >> >First, compare with other tests in app/eventdev, for example,
> >the
> >>> >
> >>> >> >eventdev perf test, the wmb is after event operation to ensure
> >>> >
> >>> >> >operation has been finished and then w->processed_pkts++.
> >>> >
> >>> >>
> >>> >
> >>> >> In case of perf_* tests start with a dequeue() and finally ends
> >>> >> with a
> >>> >
> >>> >> mempool_put() should also act as implicit acquire release pairs
> >>> >making stats
> >>> >
> >>> >> consistent?
> >>> >
> >>> >
> >>> >
> >>> >For perf tests, this consistency refers to that there is a wmb
> >>> >after mempool_put().
> >>> >
> >>> >Please refer to this link:
> >>> >
> >>> >https://urldefense.proofpoint.com/v2/url?u=http-
> >>>
> >>>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0
> >m
> >>O
> >>>
> >>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=
> >z
> >>g
> >>> >QHeSDiXWfI1PIIUxXBqMS6E-
> >>2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
> >>> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >
> >>> >> >So, if we move release barriers before tx/fwd, it may cause that
> >>the
> >>> >
> >>> >> >tests of app/eventdev become  inconsistent.This may reduce the
> >>> >
> >>> >> >maintainability of the code and make it difficult to understand.
> >>> >
> >>> >> >
> >>> >
> >>> >> >Second, it is a test case, though heavy thread may cause
> >>> >performance
> >>> >
> >>> >> >degradation, it can ensure that the operation process and the
> >test
> >>> >
> >>> >> >result are correct. And maybe for a test case, correctness is
> >more
> >>> >
> >>> >> >important than performance.
> >>> >
> >>> >> >
> >>> >
> >>> >>
> >>> >
> >>> >> Most of our internal perf test run on 24/48 core combinations
> >and
> >>> >since
> >>> >
> >>> >> Octeontx2 event device driver supports a burst size of 1, it will
> >>> >> show
> >>> >up as
> >>> >
> >>> >> Huge performance degradation.
> >>> >
> >>> >
> >>> >
> >>> >For the impact on performance, I do the test using software driver,
> >>> >following are some test results:
> >>> >
> >>> >-------------------------------------------------------------------
> >>> >----
> >>> >----------------
> >>> >---------------------------------------------
> >>> >
> >>> >Architecture: aarch64
> >>> >
> >>> >Nics: ixgbe-82599
> >>> >
> >>> >CPU: Cortex-A72
> >>> >
> >>> >BURST_SIZE: 1
> >>> >
> >>> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
> >>> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
> >>> >
> >>> >Flow: one flow, 64bits package, TX rate: 1.4Mpps
> >>> >
> >>> >
> >>> >
> >>> >Without this patch:
> >>> >
> >>> >0.954 mpps avg 0.953 mpps
> >>> >
> >>> >
> >>> >
> >>> >With this patch:
> >>> >
> >>> >0.932 mpps avg 0.930 mpps
> >>> >
> >>> >-------------------------------------------------------------------
> >>> >----
> >>> >----------------
> >>> >---------------------------------------------
> >>> >
> >>> >
> >>> >
> >>> >Based on the result above, there is no significant performance
> >>> >degradation with this patch.
> >>> >
> >>> >This is because the release barrier is only for  “w-
> >>>processed_pkts++”.
> >>> >It just ensures that the worker core
> >>> >
> >>> >increments the number events processed after enqueue, and it
> >>doesn’t
> >>> >affect dequeue/enqueue:
> >>> >
> >>> >
> >>> >
> >>> >dequeue -> enqueue -> release barrier -> w->processed_pkts++
> >>> >
> >>>
> >>> Here enqueue already acts as an explicit release barrier.
> >>>
> >>Please refer above reasons.
> >>
> >>> >
> >>> >
> >>> >On the other hand, I infer the reason for the slight decrease in
> >>> >measurement performance is that the release barrier
> >>> >
> >>> >prevent “w->processed_pkts++” before that the event has been
> >>processed
> >>> >(enqueue). But I think this test result is closer
> >>> >
> >>> >to the real performance.
> >>> >
> >>> >And sorry for that we have no octentx2 device, so there is no test
> >>> >result on Octeontx2 event device driver. Would you please
> >>> >
> >>> >help us test this patch on octentx2 when you are convenient.
> >Thanks
> >>> >very much.
> >>> >
> >>>
> >>> I will report the performance numbers on Monday.
> >>>
> >>
> >>That’s great, Thanks very much for your help.
> 
> We are seeing a ~2% performance loss with the series applied on pipeline
> tests.
> I have taken the performance numbers on IXIA to eliminate any
> discrepancies in stats.
> 
> Do you see any improvement in stats being reported with and w/o the
> barrier with Traffic generator vs eventdev application? because I don’t see
> any difference when verifying with IXIA.
> 
Thanks very much for your test. We also use IXIA to test with SW driver and have ~2% performance loss according to the above data.

> Can we skip this patch?
> Also, note that the barrier wouldn’t help in cases where:
> 1. SW eventdev Tx routine runs on service core which would have it's on
> latency in processing the packets.
> 2. HW eventdev Tx routine that have shaping and scheduling.
> 

That's Ok, I'll remove this patch from this series. And thanks very much for this great discussion.

Best Regards
Feifei

> Regards,
> Pavan.
> 
> >>
> >>Best Regards
> >>Feifei
> >>
> >>> >
> >>> >
> >>> >Best Regards
> >>> >
> >>> >Feifei
> >>>
> >>> Regards,
> >>> Pavan.
> >>>
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >
> >>> >> >So, due to two reasons above, I'm ambivalent about how we
> >>should
> >>> >do in
> >>> >
> >>> >> >the next step.
> >>> >
> >>> >> >
> >>> >
> >>> >> >Best Regards
> >>> >
> >>> >> >Feifei
> >>> >
> >>> >>
> >>> >
> >>> >> Regards,
> >>> >
> >>> >> Pavan.
> >>> >
> >>> >>
> >>> >
> >>> >> >
> >>> >
> >>> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue
> >>worker
> >>> >
> >>> >> >> >functions")
> >>> >
> >>> >> >> >Cc:
> >>>
> >>>mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.c
> >o
> >>m>
> >>> >
> >>> >> >> >Cc: mailto:stable@dpdk.org<mailto:stable@dpdk.org>
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >Signed-off-by: Phil Yang
> >>> ><mailto:phil.yang@arm.com<mailto:phil.yang@arm.com>>
> >>> >
> >>> >> >> >Signed-off-by: Feifei Wang
> >>> ><mailto:feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>
> >>> >
> >>> >> >> >Reviewed-by: Ruifeng Wang
> >>>
> >><mailto:ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>
> >>> >
> >>> >> >> >---
> >>> >
> >>> >> >> > app/test-eventdev/test_pipeline_queue.c | 64
> >>> >
> >>> >> >> >+++++++++++++++++++++----
> >>> >
> >>> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c
> >>b/app/test-
> >>> >
> >>> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
> >>> >
> >>> >> >100644
> >>> >
> >>> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c
> >>> >
> >>> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
> >>> >
> >>> >> >> >@@ -30,7 +30,13 @@
> >>> >pipeline_queue_worker_single_stage_tx(void
> >>> >
> >>> >> >> >*arg)
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >                    if (ev.sched_type ==
> >>> >> >> > RTE_SCHED_TYPE_ATOMIC) {
> >>> >
> >>> >> >> >                                    pipeline_event_tx(dev,
> >>> >> >> > port, &ev);
> >>> >
> >>> >> >> >-                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                 /* release barrier here
> >>> >> >> >+ ensures stored
> >>> >operation
> >>> >
> >>> >> >> >+                                 * of the event completes
> >>> >> >> >+ before the number
> >>> >of
> >>> >
> >>> >> >> >+                                 * processed pkts is visible
> >>> >> >> >+ to the main core
> >>> >
> >>> >> >> >+                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >>> >1,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                    } else {
> >>> >
> >>> >> >> >                                    ev.queue_id++;
> >>> >
> >>> >> >> >                                    pipeline_fwd_event(&ev,
> >>> >
> >>> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >> >@@ -59,7 +65,13 @@
> >>> >
> >>> >> >pipeline_queue_worker_single_stage_fwd(void
> >>> >
> >>> >> >> >*arg)
> >>> >
> >>> >> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >>> >> >> > 0);
> >>> >
> >>> >> >> >                    pipeline_fwd_event(&ev,
> >>> >> >> > RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >> >                    pipeline_event_enqueue(dev, port, &ev);
> >>> >
> >>> >> >> >-                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                 /* release barrier here ensures stored
> >>> >> >> >+ operation
> >>> >
> >>> >> >> >+                 * of the event completes before the number
> >>> >> >> >+ of
> >>> >
> >>> >> >> >+                 * processed pkts is visible to the main
> >>> >> >> >+ core
> >>> >
> >>> >> >> >+                 */
> >>> >
> >>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >    return 0;
> >>> >
> >>> >> >> >@@ -84,7 +96,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
> >>> >
> >>> >> >> >                                    if (ev[i].sched_type ==
> >>> >
> >>> >> >> >RTE_SCHED_TYPE_ATOMIC) {
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
> >>> >
> >>> >> >> >                                                    ev[i].op
> >>> >> >> > = RTE_EVENT_OP_RELEASE;
> >>> >
> >>> >> >> >-                                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                                 /* release
> >>> >> >> >+ barrier here ensures stored
> >>> >
> >>> >> >> >operation
> >>> >
> >>> >> >> >+                                                 * of the
> >>> >> >> >+ event completes before the
> >>> >
> >>> >> >> >number of
> >>> >
> >>> >> >> >+                                                 * processed
> >>> >> >> >+ pkts is visible to the main
> >>> >
> >>> >> >> >core
> >>> >
> >>> >> >> >+                                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w-
> >>> >
> >>> >> >> >>processed_pkts), 1,
> >>> >
> >>> >> >> >+
> >>> >__ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                                    } else {
> >>> >
> >>> >> >> >
> >>> >> >> > ev[i].queue_id++;
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_fwd_event(&ev[i],
> >>> >
> >>> >> >> >@@ -121,7 +139,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
> >>> >
> >>> >> >> >                    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >                    pipeline_event_enqueue_burst(dev, port,
> >>> >> >> > ev, nb_rx);
> >>> >
> >>> >> >> >-                  w->processed_pkts += nb_rx;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                 /* release barrier here ensures stored
> >>> >> >> >+ operation
> >>> >
> >>> >> >> >+                 * of the event completes before the number
> >>> >> >> >+ of
> >>> >
> >>> >> >> >+                 * processed pkts is visible to the main
> >>> >> >> >+ core
> >>> >
> >>> >> >> >+                 */
> >>> >
> >>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts),
> >nb_rx,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >    return 0;
> >>> >
> >>> >> >> >@@ -146,7 +170,13 @@
> >>> >
> >>> >> >pipeline_queue_worker_multi_stage_tx(void
> >>> >
> >>> >> >> >*arg)
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >                    if (ev.queue_id ==
> >>> >> >> > tx_queue[ev.mbuf->port]) {
> >>> >
> >>> >> >> >                                    pipeline_event_tx(dev,
> >>> >> >> > port, &ev);
> >>> >
> >>> >> >> >-                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                 /* release barrier here
> >>> >> >> >+ ensures stored
> >>> >operation
> >>> >
> >>> >> >> >+                                 * of the event completes
> >>> >> >> >+ before the number
> >>> >of
> >>> >
> >>> >> >> >+                                 * processed pkts is visible
> >>> >> >> >+ to the main core
> >>> >
> >>> >> >> >+                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >>> >1,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                                    continue;
> >>> >
> >>> >> >> >                    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >@@ -180,7 +210,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
> >>> >
> >>> >> >> >                                    ev.queue_id =
> >>> >> >> > tx_queue[ev.mbuf->port];
> >>> >
> >>> >> >> >
> >>> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf,
> >>> >0);
> >>> >
> >>> >> >> >                                    pipeline_fwd_event(&ev,
> >>> >
> >>> >> >> >RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >> >-                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                 /* release barrier here
> >>> >> >> >+ ensures stored
> >>> >operation
> >>> >
> >>> >> >> >+                                 * of the event completes
> >>> >> >> >+ before the number
> >>> >of
> >>> >
> >>> >> >> >+                                 * processed pkts is visible
> >>> >> >> >+ to the main core
> >>> >
> >>> >> >> >+                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
> >>> >1,
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                    } else {
> >>> >
> >>> >> >> >                                    ev.queue_id++;
> >>> >
> >>> >> >> >                                    pipeline_fwd_event(&ev,
> >>> >
> >>> >> >> >sched_type_list[cq_id]);
> >>> >
> >>> >> >> >@@ -214,7 +250,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
> >>> >
> >>> >> >> >                                    if (ev[i].queue_id ==
> >>> >> >> > tx_queue[ev[i].mbuf-
> >>> >
> >>> >> >> >>port]) {
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
> >>> >
> >>> >> >> >                                                    ev[i].op
> >>> >> >> > = RTE_EVENT_OP_RELEASE;
> >>> >
> >>> >> >> >-                                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                                 /* release
> >>> >> >> >+ barrier here ensures stored
> >>> >
> >>> >> >> >operation
> >>> >
> >>> >> >> >+                                                 * of the
> >>> >> >> >+ event completes before the
> >>> >
> >>> >> >> >number of
> >>> >
> >>> >> >> >+                                                 * processed
> >>> >> >> >+ pkts is visible to the main
> >>> >
> >>> >> >> >core
> >>> >
> >>> >> >> >+                                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w-
> >>> >
> >>> >> >> >>processed_pkts), 1,
> >>> >
> >>> >> >> >+
> >>> >__ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                                                    continue;
> >>> >
> >>> >> >> >                                    }
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >@@ -254,7 +296,13 @@
> >>> >
> >>> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_fwd_event(&ev[i],
> >>> >
> >>> >> >> >
> >>> >
> >>> >> >> >    RTE_SCHED_TYPE_ATOMIC);
> >>> >
> >>> >> >> >-                                                  w->processed_pkts++;
> >>> >
> >>> >> >> >+
> >>> >
> >>> >> >> >+                                                 /* release
> >>> >> >> >+ barrier here ensures stored
> >>> >
> >>> >> >> >operation
> >>> >
> >>> >> >> >+                                                 * of the
> >>> >> >> >+ event completes before the
> >>> >
> >>> >> >> >number of
> >>> >
> >>> >> >> >+                                                 * processed
> >>> >> >> >+ pkts is visible to the main
> >>> >
> >>> >> >> >core
> >>> >
> >>> >> >> >+                                                 */
> >>> >
> >>> >> >> >+
> >>> >> >> >+ __atomic_fetch_add(&(w-
> >>> >
> >>> >> >> >>processed_pkts), 1,
> >>> >
> >>> >> >> >+
> >>> >__ATOMIC_RELEASE);
> >>> >
> >>> >> >> >                                    } else {
> >>> >
> >>> >> >> >
> >>> >> >> > ev[i].queue_id++;
> >>> >
> >>> >> >> >
> >>> >> >> > pipeline_fwd_event(&ev[i],
> >>> >
> >>> >> >> >--
> >>> >
> >>> >> >> >2.17.1
> >>> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2020-12-22  5:07 [dpdk-dev] [RFC PATCH v1 0/6] refactor smp barriers in app/eventdev Feifei Wang
@ 2020-12-22  5:07 ` Feifei Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Feifei Wang @ 2020-12-22  5:07 UTC (permalink / raw)
  To: Jerin Jacob, Harry van Haaren, Pavan Nikhilesh
  Cc: dev, nd, Honnappa.Nagarahalli, Feifei Wang, pbhagavatula, stable,
	Phil Yang

Add release barriers before updating the processed packets for worker
lcores to ensure the worker lcore has really finished data processing
and then it can update the processed packets number.

Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker functions")
Cc: pbhagavatula@marvell.com
Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-eventdev/test_pipeline_queue.c | 64 +++++++++++++++++++++----
 1 file changed, 56 insertions(+), 8 deletions(-)

diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-eventdev/test_pipeline_queue.c
index 7bebac34f..0c0ec0ceb 100644
--- a/app/test-eventdev/test_pipeline_queue.c
+++ b/app/test-eventdev/test_pipeline_queue.c
@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void *arg)
 
 		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
 			pipeline_event_tx(dev, port, &ev);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
 		} else {
 			ev.queue_id++;
 			pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
@@ -59,7 +65,13 @@ pipeline_queue_worker_single_stage_fwd(void *arg)
 		rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
 		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
 		pipeline_event_enqueue(dev, port, &ev);
-		w->processed_pkts++;
+
+		/* release barrier here ensures stored operation
+		 * of the event completes before the number of
+		 * processed pkts is visible to the main core
+		 */
+		__atomic_fetch_add(&(w->processed_pkts), 1,
+				__ATOMIC_RELEASE);
 	}
 
 	return 0;
@@ -84,7 +96,13 @@ pipeline_queue_worker_single_stage_burst_tx(void *arg)
 			if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) {
 				pipeline_event_tx(dev, port, &ev[i]);
 				ev[i].op = RTE_EVENT_OP_RELEASE;
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored operation
+				 * of the event completes before the number of
+				 * processed pkts is visible to the main core
+				 */
+				__atomic_fetch_add(&(w->processed_pkts), 1,
+						__ATOMIC_RELEASE);
 			} else {
 				ev[i].queue_id++;
 				pipeline_fwd_event(&ev[i],
@@ -121,7 +139,13 @@ pipeline_queue_worker_single_stage_burst_fwd(void *arg)
 		}
 
 		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
-		w->processed_pkts += nb_rx;
+
+		/* release barrier here ensures stored operation
+		 * of the event completes before the number of
+		 * processed pkts is visible to the main core
+		 */
+		__atomic_fetch_add(&(w->processed_pkts), nb_rx,
+				__ATOMIC_RELEASE);
 	}
 
 	return 0;
@@ -146,7 +170,13 @@ pipeline_queue_worker_multi_stage_tx(void *arg)
 
 		if (ev.queue_id == tx_queue[ev.mbuf->port]) {
 			pipeline_event_tx(dev, port, &ev);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
 			continue;
 		}
 
@@ -180,7 +210,13 @@ pipeline_queue_worker_multi_stage_fwd(void *arg)
 			ev.queue_id = tx_queue[ev.mbuf->port];
 			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
 			pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
 		} else {
 			ev.queue_id++;
 			pipeline_fwd_event(&ev, sched_type_list[cq_id]);
@@ -214,7 +250,13 @@ pipeline_queue_worker_multi_stage_burst_tx(void *arg)
 			if (ev[i].queue_id == tx_queue[ev[i].mbuf->port]) {
 				pipeline_event_tx(dev, port, &ev[i]);
 				ev[i].op = RTE_EVENT_OP_RELEASE;
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored operation
+				 * of the event completes before the number of
+				 * processed pkts is visible to the main core
+				 */
+				__atomic_fetch_add(&(w->processed_pkts), 1,
+						__ATOMIC_RELEASE);
 				continue;
 			}
 
@@ -254,7 +296,13 @@ pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
 				rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
 				pipeline_fwd_event(&ev[i],
 						RTE_SCHED_TYPE_ATOMIC);
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored operation
+				 * of the event completes before the number of
+				 * processed pkts is visible to the main core
+				 */
+				__atomic_fetch_add(&(w->processed_pkts), 1,
+						__ATOMIC_RELEASE);
 			} else {
 				ev[i].queue_id++;
 				pipeline_fwd_event(&ev[i],
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
  2020-12-22  3:22 [dpdk-dev] [RFC PATCH v1 0/6] refactor smp barriers in app/eventdev Feifei Wang
@ 2020-12-22  3:22 ` Feifei Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Feifei Wang @ 2020-12-22  3:22 UTC (permalink / raw)
  To: Jerin Jacob, Pavan Nikhilesh, Harry van Haaren
  Cc: dev, nd, Feifei Wang, pbhagavatula, stable, Phil Yang

Add release barriers before updating the processed packets for worker
lcores to ensure the worker lcore has really finished data processing
and then it can update the processed packets number.

Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker functions")
Cc: pbhagavatula@marvell.com
Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test-eventdev/test_pipeline_queue.c | 64 +++++++++++++++++++++----
 1 file changed, 56 insertions(+), 8 deletions(-)

diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-eventdev/test_pipeline_queue.c
index 7bebac34f..8e499d8d5 100644
--- a/app/test-eventdev/test_pipeline_queue.c
+++ b/app/test-eventdev/test_pipeline_queue.c
@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void *arg)
 
 		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
 			pipeline_event_tx(dev, port, &ev);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visiable to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
 		} else {
 			ev.queue_id++;
 			pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
@@ -59,7 +65,13 @@ pipeline_queue_worker_single_stage_fwd(void *arg)
 		rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
 		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
 		pipeline_event_enqueue(dev, port, &ev);
-		w->processed_pkts++;
+
+		/* release barrier here ensures stored operation
+		 * of the event completes before the number of
+		 * processed pkts is visiable to the main core
+		 */
+		__atomic_fetch_add(&(w->processed_pkts), 1,
+				__ATOMIC_RELEASE);
 	}
 
 	return 0;
@@ -84,7 +96,13 @@ pipeline_queue_worker_single_stage_burst_tx(void *arg)
 			if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) {
 				pipeline_event_tx(dev, port, &ev[i]);
 				ev[i].op = RTE_EVENT_OP_RELEASE;
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored operation
+				 * of the event completes before the number of
+				 * processed pkts is visiable to the main core
+				 */
+				__atomic_fetch_add(&(w->processed_pkts), 1,
+						__ATOMIC_RELEASE);
 			} else {
 				ev[i].queue_id++;
 				pipeline_fwd_event(&ev[i],
@@ -121,7 +139,13 @@ pipeline_queue_worker_single_stage_burst_fwd(void *arg)
 		}
 
 		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
-		w->processed_pkts += nb_rx;
+
+		/* release barrier here ensures stored operation
+		 * of the event completes before the number of
+		 * processed pkts is visiable to the main core
+		 */
+		__atomic_fetch_add(&(w->processed_pkts), nb_rx,
+				__ATOMIC_RELEASE);
 	}
 
 	return 0;
@@ -146,7 +170,13 @@ pipeline_queue_worker_multi_stage_tx(void *arg)
 
 		if (ev.queue_id == tx_queue[ev.mbuf->port]) {
 			pipeline_event_tx(dev, port, &ev);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visiable to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
 			continue;
 		}
 
@@ -180,7 +210,13 @@ pipeline_queue_worker_multi_stage_fwd(void *arg)
 			ev.queue_id = tx_queue[ev.mbuf->port];
 			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
 			pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visiable to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
 		} else {
 			ev.queue_id++;
 			pipeline_fwd_event(&ev, sched_type_list[cq_id]);
@@ -214,7 +250,13 @@ pipeline_queue_worker_multi_stage_burst_tx(void *arg)
 			if (ev[i].queue_id == tx_queue[ev[i].mbuf->port]) {
 				pipeline_event_tx(dev, port, &ev[i]);
 				ev[i].op = RTE_EVENT_OP_RELEASE;
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored operation
+				 * of the event completes before the number of
+				 * processed pkts is visiable to the main core
+				 */
+				__atomic_fetch_add(&(w->processed_pkts), 1,
+						__ATOMIC_RELEASE);
 				continue;
 			}
 
@@ -254,7 +296,13 @@ pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
 				rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
 				pipeline_fwd_event(&ev[i],
 						RTE_SCHED_TYPE_ATOMIC);
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored operation
+				 * of the event completes before the number of
+				 * processed pkts is visiable to the main core
+				 */
+				__atomic_fetch_add(&(w->processed_pkts), 1,
+						__ATOMIC_RELEASE);
 			} else {
 				ev[i].queue_id++;
 				pipeline_fwd_event(&ev[i],
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-01-14  6:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-22 10:33 [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test Pavan Nikhilesh Bhagavatula
2021-01-05  7:39 ` Feifei Wang
2021-01-05  9:29   ` Pavan Nikhilesh Bhagavatula
2021-01-08  7:12     ` [dpdk-dev] 回复: " Feifei Wang
2021-01-08  9:12       ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
2021-01-08 10:44         ` [dpdk-dev] 回复: " Feifei Wang
2021-01-08 10:58           ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
2021-01-11  1:57             ` [dpdk-dev] 回复: " Feifei Wang
2021-01-12  8:29             ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
2021-01-14  6:07               ` [dpdk-dev] 回复: " Feifei Wang
  -- strict thread matches above, loose matches on Subject: below --
2020-12-22  5:07 [dpdk-dev] [RFC PATCH v1 0/6] refactor smp barriers in app/eventdev Feifei Wang
2020-12-22  5:07 ` [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test Feifei Wang
2020-12-22  3:22 [dpdk-dev] [RFC PATCH v1 0/6] refactor smp barriers in app/eventdev Feifei Wang
2020-12-22  3:22 ` [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test Feifei Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).