DPDK patches and discussions
 help / color / mirror / Atom feed
From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
To: Feifei Wang <Feifei.Wang2@arm.com>,
	Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
	Harry van Haaren <harry.van.haaren@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"stable@dpdk.org" <stable@dpdk.org>,
	"Ruifeng Wang" <Ruifeng.Wang@arm.com>, nd <nd@arm.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
Date: Tue, 12 Jan 2021 08:29:15 +0000
Message-ID: <CO6PR18MB38283B47AB7A49AEEC45F1E9DEAA9@CO6PR18MB3828.namprd18.prod.outlook.com> (raw)
In-Reply-To: <CO6PR18MB382875FB67CD8C5B6C8D29A7DEAE9@CO6PR18MB3828.namprd18.prod.outlook.com>

Hi Feifei,

>Hi Feifei,
>
>>Hi, Pavan
>>
>>> -----邮件原件-----
>>> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
>>> 发送时间: 2021年1月8日 17:13
>>> 收件人: Feifei Wang <Feifei.Wang2@arm.com>;
>jerinj@marvell.com;
>>Harry
>>> van Haaren <harry.van.haaren@intel.com>
>>> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
>>> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng
>Wang
>>> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
>>> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for
>>pipeline
>>> test
>>>
>>> Hi Feifei,
>>>
>>> >Hi, Pavan
>>> >
>>> >
>>> >
>>> >> -----邮件原件-----
>>> >
>>> >> 发件人: Pavan Nikhilesh Bhagavatula
>><mailto:pbhagavatula@marvell.com>
>>> >
>>> >> 发送时间: 2021年1月5日 17:29
>>> >
>>> >> 收件人: Feifei Wang <mailto:Feifei.Wang2@arm.com>;
>>mailto:jerinj@marvell.com;
>>> >Harry
>>> >
>>> >> van Haaren <mailto:harry.van.haaren@intel.com>
>>> >
>>> >> 抄送: mailto:dev@dpdk.org; nd <mailto:nd@arm.com>;
>Honnappa
>>Nagarahalli
>>> >
>>> >> <mailto:Honnappa.Nagarahalli@arm.com>;
>>mailto:stable@dpdk.org; Ruifeng Wang
>>> >
>>> >> <mailto:Ruifeng.Wang@arm.com>; nd <mailto:nd@arm.com>
>>> >
>>> >> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
>>for
>>> >pipeline
>>> >
>>> >> test
>>> >
>>> >>
>>> >
>>> >> Hi Feifei,
>>> >
>>> >>
>>> >
>>> >> >Hi, Pavan
>>> >
>>> >> >
>>> >
>>> >> >Sorry for my late reply and thanks very much for your review.
>>> >
>>> >> >
>>> >
>>> >> >> -----Original Message-----
>>> >
>>> >> >> From: Pavan Nikhilesh Bhagavatula
>>>
>>><mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.
>c
>>om>>
>>> >
>>> >> >> Sent: 2020年12月22日 18:33
>>> >
>>> >> >> To: Feifei Wang
>>>
>><mailto:Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>;
>>> >mailto:jerinj@marvell.com<mailto:jerinj@marvell.com>;
>>> >
>>> >> >Harry van
>>> >
>>> >> >> Haaren
>>>
>>><mailto:harry.van.haaren@intel.com<mailto:harry.van.haaren@intel
>.
>>com>>;
>>> >Pavan Nikhilesh
>>> >
>>> >> >>
>>>
>>><pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@caviu
>m
>>n
>>> >etworks.com>>
>>> >
>>> >> >> Cc: mailto:dev@dpdk.org<mailto:dev@dpdk.org>; nd
>>> ><mailto:nd@arm.com<mailto:nd@arm.com>>; Honnappa
>>Nagarahalli
>>> >
>>> >> >>
>>>
>>><Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@a
>r
>>m
>>> >.com>>; mailto:stable@dpdk.org<mailto:stable@dpdk.org>; Phil
>>Yang
>>> >
>>> >> >> <mailto:Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>
>>> >
>>> >> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release
>>> >barriers
>>> >
>>> >> >for
>>> >
>>> >> >> pipeline test
>>> >
>>> >> >>
>>> >
>>> >> >>
>>> >
>>> >> >> >Add release barriers before updating the processed packets
>for
>>> >
>>> >> >worker
>>> >
>>> >> >> >lcores to ensure the worker lcore has really finished data
>>> >
>>> >> >> >processing and then it can update the processed packets
>>number.
>>> >
>>> >> >> >
>>> >
>>> >> >>
>>> >
>>> >> >> I believe we can live with minor inaccuracies in stats being
>>> >
>>> >> >> presented
>>> >
>>> >> >as
>>> >
>>> >> >> atomics are pretty heavy when scheduler is limited to burst
>size
>>> >> >> as
>>> >1.
>>> >
>>> >> >>
>>> >
>>> >> >> One option is to move it before a pipeline operation
>>> >
>>> >> >(pipeline_event_tx,
>>> >
>>> >> >> pipeline_fwd_event etc.) as they imply implicit release barrier
>>> >> >> (as
>>> >
>>> >> >> all
>>> >
>>> >> >the
>>> >
>>> >> >> changes done to the event should be visible to the next core).
>>> >
>>> >> >
>>> >
>>> >> >If I understand correctly, your meaning is that move release
>>> >> >barriers
>>> >
>>> >> >before pipeline_event_tx or pipeline_fwd_event. This can ensure
>>the
>>> >
>>> >> >event has been processed before the next core begins to tx/fwd.
>>For
>>> >
>>> >> >example:
>>> >
>>> >>
>>> >
>>> >> What I meant was event APIs such as `rte_event_enqueue_burst`,
>>> >
>>> >> `rte_event_eth_tx_adapter_enqueue`
>>> >
>>> >> act as an implicit release barrier and the API
>>> >`rte_event_dequeue_burst` act
>>> >
>>> >> as an implicit acquire barrier.
>>> >
>>> >
>>> >
>>> >>
>>> >
>>> >> Since, pipeline_* test starts with a dequeue() and ends with an
>>> >enqueue() I
>>> >
>>> >> don’t believe we need barriers in Between.
>>> >
>>> >
>>> >
>>> >Sorry for my misunderstanding this. And I agree with you that no
>>> >barriers are
>>> >
>>> >needed between dequeue and enqueue.
>>> >
>>> >
>>> >
>>> >Now, let's go back to the beginning. Actually with this patch, our
>>> >barrier is mainly
>>> >
>>> >for the synchronous variable " w->processed_pkts ". As we all
>know,
>>the
>>> >event is firstly
>>> >
>>> >dequeued and then enqueued, after this, the event can be treated
>as
>>the
>>> >processed event
>>> >
>>> >and included in the statistics("w->processed_pkts++").
>>> >
>>> >
>>> >
>>> >Thus, we add a release barrier before " w->processed_pkts++" is to
>>> >prevent this operation
>>> >
>>> >being executed ahead of time. For example:
>>> >
>>> >dequeue  ->  w->processed_pkts++  ->  enqueue
>>> >
>>> >This cause that the worker doesn't actually finish this event
>>> >processing, but the event is treated
>>> >
>>> >as the processed one and included in the statistics.
>>> >
>>>
>>> But the current sequence is dequeue-> enqueue-> w-
>>>processed_pkts++
>>> and enqueue already acts as an explicit release barrier right?
>>>
>>
>>Sorry maybe I cannot understand how “enqueue” as an explicit release
>>barrier. I think of two possibilities:
>>1. As you say before, all the changes done to the event should be
>visible
>>to the next core and enqueue is a operation for event, so the next
>core
>>should wait for the event to be enqueued.
>>I think this is due to data dependence for the same variable. However,
>>‘w->processed_pkts’ and ‘ev’ are different variables, so this cannot
>>prevent ‘w->processed_pkts++’ before enqueue.
>>And the main core may load updated ‘w->processed_pkts’ but actually
>>the event is still being processed. For example:
>>
>> Time Slot	   Worker 1                                      Main core
>>   1                           dequeue
>>   2                  w->processed_pkts++
>>   3                       		                              load w-
>>processed_pkts
>>   4                          enqueue
>>
>>2. Some release barriers have been included in enqueue. There is a
>>release barrier in rte_ring_enqueue :
>>move head -> copy elements to the ring -> release barrier -> update
>tail
>>-> w->processed_pkts++
>>However, this barrier cannot prevent ‘w->processed_pkts++’ before
>>update tail, and when update_tail has been finished, the enqueue
>>process  can be seen completed.
>
>I was talking about case 2 in particular almost all enqueue calls have
>some kind of
>release barrier in place. I do agree w->processed_pkts++ might get
>reordered with
>tail update but since enqueue itself is a ldr + blr I was hoping that it
>wouldn't occur.
>
>We can continue the discussion once I have some performance data.
>
>Thanks for your patience :)
>Pavan.
>
>>
>>> >
>>> >
>>>
>>>________________________________________________________
>_
>>> _
>>> >____________________
>>> >
>>> >
>>> >
>>> >By the way, I have two other questions about pipeline process test
>in
>>> >"test_pipeline_queue".
>>> >
>>> >1. when do we start counting processed events (w-
>>>processed_pkts)?
>>> >
>>> >For the fwd mode (internal_port = false), when we choose single
>>stage,
>>> >application increments
>>> >
>>> >the number events processed after "pipeline_event enqueue".
>>> >However, when we choose multiple
>>> >
>>> >stage, application increments the number events processed before
>>> >"pipleline_event_enqueue".
>>>
>>> We count an event as process when all the stages have completed
>>and its
>>> Trasnmitted.
>>>
>>> >So,
>>> >
>>> >maybe we can unify this. For example of multiple stage:
>>> >
>>> >
>>> >
>>> >                                if (cq_id == last_queue) {
>>> >
>>> >                                                ev.queue_id =
>>> > tx_queue[ev.mbuf->port];
>>> >
>>> >
>>> >rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>>> >0);
>>> >
>>> >                                                pipeline_fwd_event(&ev,
>>> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >                                +             pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >                                                w->processed_pkts++;
>>> >
>>> >                                } else {
>>> >
>>> >                                                ev.queue_id++;
>>> >
>>> >                                                pipeline_fwd_event(&ev,
>>> >sched_type_list[cq_id]);
>>> >
>>> >                                +             pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >                                }
>>> >
>>> >
>>> >
>>> >                -              pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >
>>>
>>> The above change makes sense.
>>>
>>Thanks for your review, and I’ll update this change into the next
>>version.
>>> >
>>> >2. Whether  "pipeline_event_enqueue" is needed after
>>> >"pipeline_event_tx" for tx mode?
>>> >
>>> >For single_stage_burst_tx mode, after "pipeline_event_tx", the
>>worker
>>> >has to enqueue again
>>> >
>>> >due to  "pipeline_event_enqueue_burst", so maybe we should jump
>>out of
>>> >the loop after
>>> >
>>> >“pipeline_event_tx”,
>>>
>>> We call enqueue burst to release the events i.e. enqueue events with
>>> RTE_EVENT_OP_RELEASE.
>>>
>>However,
>>In case of single event, for ' pipeline_queue_worker_single_stage_tx'
>>and ' pipeline_queue_worker_multi_stage_tx',
>>after tx, there is no release operation.
>>

I think this was done before the feature disable_implicit_release 
`RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE ` was implemented.
We can remove the event_op_release section of the code and have a continue.

>>> >
>>>  for example:
>>> >
>>> >
>>> >
>>> >                                                if (ev[i].sched_type ==
>>> >RTE_SCHED_TYPE_ATOMIC) {
>>> >
>>> >
>>> > pipeline_event_tx(dev, port, &ev[i]);
>>> >
>>> >
>>> > ev[i].op = RTE_EVENT_OP_RELEASE;
>>> >
>>> >
>>> > w->processed_pkts++;
>>> >
>>> >                                                +             continue;
>>> >
>>> >                                                } else {
>>> >
>>> >
>>> > ev[i].queue_id++;
>>> >
>>> >
>>> > pipeline_fwd_event(&ev[i],
>>> >
>>> >
>>> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >                                                }
>>> >
>>> >                                }
>>> >
>>> >
>>> >
>>> >                                pipeline_event_enqueue_burst(dev, port,
>>> > ev, nb_rx);
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >>
>>> >
>>> >> >
>>> >
>>> >> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>>> >
>>> >> >                          +
>>> >__atomic_thread_fence(__ATOMIC_RELEASE);
>>> >
>>> >> >                                          pipeline_event_tx(dev,
>>> >> > port, &ev);
>>> >
>>> >> >                                          w->processed_pkts++;
>>> >
>>> >> >                          } else {
>>> >
>>> >> >                                          ev.queue_id++;
>>> >
>>> >> >                          +
>>> >__atomic_thread_fence(__ATOMIC_RELEASE);
>>> >
>>> >> >                                          pipeline_fwd_event(&ev,
>>> >
>>> >> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >
>>> >> > pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >> >
>>> >
>>> >> >However, there are two reasons to prevent this:
>>> >
>>> >> >
>>> >
>>> >> >First, compare with other tests in app/eventdev, for example,
>the
>>> >
>>> >> >eventdev perf test, the wmb is after event operation to ensure
>>> >
>>> >> >operation has been finished and then w->processed_pkts++.
>>> >
>>> >>
>>> >
>>> >> In case of perf_* tests start with a dequeue() and finally ends with
>>> >> a
>>> >
>>> >> mempool_put() should also act as implicit acquire release pairs
>>> >making stats
>>> >
>>> >> consistent?
>>> >
>>> >
>>> >
>>> >For perf tests, this consistency refers to that there is a wmb after
>>> >mempool_put().
>>> >
>>> >Please refer to this link:
>>> >
>>> >https://urldefense.proofpoint.com/v2/url?u=http-
>>>
>>>3A__patches.dpdk.org_patch_85634_&d=DwIGaQ&c=nKjWec2b6R0
>m
>>O
>>>
>>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=
>z
>>g
>>> >QHeSDiXWfI1PIIUxXBqMS6E-
>>2_3G46nhrzGXoBpHI&s=0FwTxPXjWflh-
>>> >sdmnkY133IPlJB780x0yxe7Am3JCBw&e=
>>> >
>>> >
>>> >
>>> >>
>>> >
>>> >> >So, if we move release barriers before tx/fwd, it may cause that
>>the
>>> >
>>> >> >tests of app/eventdev become  inconsistent.This may reduce the
>>> >
>>> >> >maintainability of the code and make it difficult to understand.
>>> >
>>> >> >
>>> >
>>> >> >Second, it is a test case, though heavy thread may cause
>>> >performance
>>> >
>>> >> >degradation, it can ensure that the operation process and the
>test
>>> >
>>> >> >result are correct. And maybe for a test case, correctness is
>more
>>> >
>>> >> >important than performance.
>>> >
>>> >> >
>>> >
>>> >>
>>> >
>>> >> Most of our internal perf test run on 24/48 core combinations
>and
>>> >since
>>> >
>>> >> Octeontx2 event device driver supports a burst size of 1, it will
>>> >> show
>>> >up as
>>> >
>>> >> Huge performance degradation.
>>> >
>>> >
>>> >
>>> >For the impact on performance, I do the test using software driver,
>>> >following are some test results:
>>> >
>>> >-----------------------------------------------------------------------
>>> >----------------
>>> >---------------------------------------------
>>> >
>>> >Architecture: aarch64
>>> >
>>> >Nics: ixgbe-82599
>>> >
>>> >CPU: Cortex-A72
>>> >
>>> >BURST_SIZE: 1
>>> >
>>> >Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --
>>> >test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a
>>> >
>>> >Flow: one flow, 64bits package, TX rate: 1.4Mpps
>>> >
>>> >
>>> >
>>> >Without this patch:
>>> >
>>> >0.954 mpps avg 0.953 mpps
>>> >
>>> >
>>> >
>>> >With this patch:
>>> >
>>> >0.932 mpps avg 0.930 mpps
>>> >
>>> >-----------------------------------------------------------------------
>>> >----------------
>>> >---------------------------------------------
>>> >
>>> >
>>> >
>>> >Based on the result above, there is no significant performance
>>> >degradation with this patch.
>>> >
>>> >This is because the release barrier is only for  “w-
>>>processed_pkts++”.
>>> >It just ensures that the worker core
>>> >
>>> >increments the number events processed after enqueue, and it
>>doesn’t
>>> >affect dequeue/enqueue:
>>> >
>>> >
>>> >
>>> >dequeue -> enqueue -> release barrier -> w->processed_pkts++
>>> >
>>>
>>> Here enqueue already acts as an explicit release barrier.
>>>
>>Please refer above reasons.
>>
>>> >
>>> >
>>> >On the other hand, I infer the reason for the slight decrease in
>>> >measurement performance is that the release barrier
>>> >
>>> >prevent “w->processed_pkts++” before that the event has been
>>processed
>>> >(enqueue). But I think this test result is closer
>>> >
>>> >to the real performance.
>>> >
>>> >And sorry for that we have no octentx2 device, so there is no test
>>> >result on Octeontx2 event device driver. Would you please
>>> >
>>> >help us test this patch on octentx2 when you are convenient.
>Thanks
>>> >very much.
>>> >
>>>
>>> I will report the performance numbers on Monday.
>>>
>>
>>That’s great, Thanks very much for your help.

We are seeing a ~2% performance loss with the series applied on pipeline tests.
I have taken the performance numbers on IXIA to eliminate any discrepancies in stats.

Do you see any improvement in stats being reported with and w/o the barrier with
Traffic generator vs eventdev application? because I don’t see any difference when 
verifying with IXIA.

Can we skip this patch?
Also, note that the barrier wouldn’t help in cases where:
1. SW eventdev Tx routine runs on service core which would have it's on latency in 
processing the packets.
2. HW eventdev Tx routine that have shaping and scheduling.

Regards, 
Pavan.

>>
>>Best Regards
>>Feifei
>>
>>> >
>>> >
>>> >Best Regards
>>> >
>>> >Feifei
>>>
>>> Regards,
>>> Pavan.
>>>
>>> >
>>> >
>>> >
>>> >>
>>> >
>>> >> >So, due to two reasons above, I'm ambivalent about how we
>>should
>>> >do in
>>> >
>>> >> >the next step.
>>> >
>>> >> >
>>> >
>>> >> >Best Regards
>>> >
>>> >> >Feifei
>>> >
>>> >>
>>> >
>>> >> Regards,
>>> >
>>> >> Pavan.
>>> >
>>> >>
>>> >
>>> >> >
>>> >
>>> >> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue
>>worker
>>> >
>>> >> >> >functions")
>>> >
>>> >> >> >Cc:
>>>
>>>mailto:pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.c
>o
>>m>
>>> >
>>> >> >> >Cc: mailto:stable@dpdk.org<mailto:stable@dpdk.org>
>>> >
>>> >> >> >
>>> >
>>> >> >> >Signed-off-by: Phil Yang
>>> ><mailto:phil.yang@arm.com<mailto:phil.yang@arm.com>>
>>> >
>>> >> >> >Signed-off-by: Feifei Wang
>>> ><mailto:feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>
>>> >
>>> >> >> >Reviewed-by: Ruifeng Wang
>>>
>><mailto:ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>
>>> >
>>> >> >> >---
>>> >
>>> >> >> > app/test-eventdev/test_pipeline_queue.c | 64
>>> >
>>> >> >> >+++++++++++++++++++++----
>>> >
>>> >> >> > 1 file changed, 56 insertions(+), 8 deletions(-)
>>> >
>>> >> >> >
>>> >
>>> >> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c
>>b/app/test-
>>> >
>>> >> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
>>> >
>>> >> >100644
>>> >
>>> >> >> >--- a/app/test-eventdev/test_pipeline_queue.c
>>> >
>>> >> >> >+++ b/app/test-eventdev/test_pipeline_queue.c
>>> >
>>> >> >> >@@ -30,7 +30,13 @@
>>> >pipeline_queue_worker_single_stage_tx(void
>>> >
>>> >> >> >*arg)
>>> >
>>> >> >> >
>>> >
>>> >> >> >                    if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
>>> >
>>> >> >> >                                    pipeline_event_tx(dev, port,
>>> >> >> > &ev);
>>> >
>>> >> >> >-                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                 /* release barrier here ensures
>>> >> >> >+ stored
>>> >operation
>>> >
>>> >> >> >+                                 * of the event completes before
>>> >> >> >+ the number
>>> >of
>>> >
>>> >> >> >+                                 * processed pkts is visible to
>>> >> >> >+ the main core
>>> >
>>> >> >> >+                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>>> >1,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >                    } else {
>>> >
>>> >> >> >                                    ev.queue_id++;
>>> >
>>> >> >> >                                    pipeline_fwd_event(&ev,
>>> >
>>> >> >> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >> >@@ -59,7 +65,13 @@
>>> >
>>> >> >pipeline_queue_worker_single_stage_fwd(void
>>> >
>>> >> >> >*arg)
>>> >
>>> >> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
>>> >
>>> >> >> >                    pipeline_fwd_event(&ev,
>>> >> >> > RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >> >                    pipeline_event_enqueue(dev, port, &ev);
>>> >
>>> >> >> >-                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                 /* release barrier here ensures stored
>>> >> >> >+ operation
>>> >
>>> >> >> >+                 * of the event completes before the number of
>>> >
>>> >> >> >+                 * processed pkts is visible to the main core
>>> >
>>> >> >> >+                 */
>>> >
>>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >    return 0;
>>> >
>>> >> >> >@@ -84,7 +96,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)
>>> >
>>> >> >> >                                    if (ev[i].sched_type ==
>>> >
>>> >> >> >RTE_SCHED_TYPE_ATOMIC) {
>>> >
>>> >> >> >
>>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
>>> >
>>> >> >> >                                                    ev[i].op =
>>> >> >> > RTE_EVENT_OP_RELEASE;
>>> >
>>> >> >> >-                                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                                 /* release
>>> >> >> >+ barrier here ensures stored
>>> >
>>> >> >> >operation
>>> >
>>> >> >> >+                                                 * of the event
>>> >> >> >+ completes before the
>>> >
>>> >> >> >number of
>>> >
>>> >> >> >+                                                 * processed
>>> >> >> >+ pkts is visible to the main
>>> >
>>> >> >> >core
>>> >
>>> >> >> >+                                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w-
>>> >
>>> >> >> >>processed_pkts), 1,
>>> >
>>> >> >> >+
>>> >__ATOMIC_RELEASE);
>>> >
>>> >> >> >                                    } else {
>>> >
>>> >> >> >
>>> >> >> > ev[i].queue_id++;
>>> >
>>> >> >> >
>>> >> >> > pipeline_fwd_event(&ev[i],
>>> >
>>> >> >> >@@ -121,7 +139,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)
>>> >
>>> >> >> >                    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >                    pipeline_event_enqueue_burst(dev, port, ev,
>>> >> >> > nb_rx);
>>> >
>>> >> >> >-                  w->processed_pkts += nb_rx;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                 /* release barrier here ensures stored
>>> >> >> >+ operation
>>> >
>>> >> >> >+                 * of the event completes before the number of
>>> >
>>> >> >> >+                 * processed pkts is visible to the main core
>>> >
>>> >> >> >+                 */
>>> >
>>> >> >> >+                 __atomic_fetch_add(&(w->processed_pkts),
>nb_rx,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >    return 0;
>>> >
>>> >> >> >@@ -146,7 +170,13 @@
>>> >
>>> >> >pipeline_queue_worker_multi_stage_tx(void
>>> >
>>> >> >> >*arg)
>>> >
>>> >> >> >
>>> >
>>> >> >> >                    if (ev.queue_id == tx_queue[ev.mbuf->port]) {
>>> >
>>> >> >> >                                    pipeline_event_tx(dev, port,
>>> >> >> > &ev);
>>> >
>>> >> >> >-                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                 /* release barrier here ensures
>>> >> >> >+ stored
>>> >operation
>>> >
>>> >> >> >+                                 * of the event completes before
>>> >> >> >+ the number
>>> >of
>>> >
>>> >> >> >+                                 * processed pkts is visible to
>>> >> >> >+ the main core
>>> >
>>> >> >> >+                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>>> >1,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >                                    continue;
>>> >
>>> >> >> >                    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >@@ -180,7 +210,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)
>>> >
>>> >> >> >                                    ev.queue_id =
>>> >> >> > tx_queue[ev.mbuf->port];
>>> >
>>> >> >> >
>>> >> >> > rte_event_eth_tx_adapter_txq_set(ev.mbuf,
>>> >0);
>>> >
>>> >> >> >                                    pipeline_fwd_event(&ev,
>>> >
>>> >> >> >RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >> >-                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                 /* release barrier here ensures
>>> >> >> >+ stored
>>> >operation
>>> >
>>> >> >> >+                                 * of the event completes before
>>> >> >> >+ the number
>>> >of
>>> >
>>> >> >> >+                                 * processed pkts is visible to
>>> >> >> >+ the main core
>>> >
>>> >> >> >+                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w->processed_pkts),
>>> >1,
>>> >
>>> >> >> >+
>>> >> >> >+ __ATOMIC_RELEASE);
>>> >
>>> >> >> >                    } else {
>>> >
>>> >> >> >                                    ev.queue_id++;
>>> >
>>> >> >> >                                    pipeline_fwd_event(&ev,
>>> >
>>> >> >> >sched_type_list[cq_id]);
>>> >
>>> >> >> >@@ -214,7 +250,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)
>>> >
>>> >> >> >                                    if (ev[i].queue_id ==
>>> >> >> > tx_queue[ev[i].mbuf-
>>> >
>>> >> >> >>port]) {
>>> >
>>> >> >> >
>>> >> >> > pipeline_event_tx(dev, port, &ev[i]);
>>> >
>>> >> >> >                                                    ev[i].op =
>>> >> >> > RTE_EVENT_OP_RELEASE;
>>> >
>>> >> >> >-                                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                                 /* release
>>> >> >> >+ barrier here ensures stored
>>> >
>>> >> >> >operation
>>> >
>>> >> >> >+                                                 * of the event
>>> >> >> >+ completes before the
>>> >
>>> >> >> >number of
>>> >
>>> >> >> >+                                                 * processed
>>> >> >> >+ pkts is visible to the main
>>> >
>>> >> >> >core
>>> >
>>> >> >> >+                                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w-
>>> >
>>> >> >> >>processed_pkts), 1,
>>> >
>>> >> >> >+
>>> >__ATOMIC_RELEASE);
>>> >
>>> >> >> >                                                    continue;
>>> >
>>> >> >> >                                    }
>>> >
>>> >> >> >
>>> >
>>> >> >> >@@ -254,7 +296,13 @@
>>> >
>>> >> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
>>> >
>>> >> >> >
>>> >
>>> >> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
>>> >
>>> >> >> >
>>> >> >> > pipeline_fwd_event(&ev[i],
>>> >
>>> >> >> >
>>> >
>>> >> >> >    RTE_SCHED_TYPE_ATOMIC);
>>> >
>>> >> >> >-                                                  w->processed_pkts++;
>>> >
>>> >> >> >+
>>> >
>>> >> >> >+                                                 /* release
>>> >> >> >+ barrier here ensures stored
>>> >
>>> >> >> >operation
>>> >
>>> >> >> >+                                                 * of the event
>>> >> >> >+ completes before the
>>> >
>>> >> >> >number of
>>> >
>>> >> >> >+                                                 * processed
>>> >> >> >+ pkts is visible to the main
>>> >
>>> >> >> >core
>>> >
>>> >> >> >+                                                 */
>>> >
>>> >> >> >+
>>> >> >> >+ __atomic_fetch_add(&(w-
>>> >
>>> >> >> >>processed_pkts), 1,
>>> >
>>> >> >> >+
>>> >__ATOMIC_RELEASE);
>>> >
>>> >> >> >                                    } else {
>>> >
>>> >> >> >
>>> >> >> > ev[i].queue_id++;
>>> >
>>> >> >> >
>>> >> >> > pipeline_fwd_event(&ev[i],
>>> >
>>> >> >> >--
>>> >
>>> >> >> >2.17.1
>>> >


  parent reply	other threads:[~2021-01-12  8:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-22 10:33 Pavan Nikhilesh Bhagavatula
2021-01-05  7:39 ` Feifei Wang
2021-01-05  9:29   ` Pavan Nikhilesh Bhagavatula
2021-01-08  7:12     ` [dpdk-dev] 回复: " Feifei Wang
2021-01-08  9:12       ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
2021-01-08 10:44         ` [dpdk-dev] 回复: " Feifei Wang
2021-01-08 10:58           ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
2021-01-11  1:57             ` [dpdk-dev] 回复: " Feifei Wang
2021-01-12  8:29             ` Pavan Nikhilesh Bhagavatula [this message]
2021-01-14  6:07               ` Feifei Wang
  -- strict thread matches above, loose matches on Subject: below --
2020-12-22  5:07 [dpdk-dev] [RFC PATCH v1 0/6] refactor smp barriers in app/eventdev Feifei Wang
2020-12-22  5:07 ` [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test Feifei Wang
2020-12-22  3:22 [dpdk-dev] [RFC PATCH v1 0/6] refactor smp barriers in app/eventdev Feifei Wang
2020-12-22  3:22 ` [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test Feifei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CO6PR18MB38283B47AB7A49AEEC45F1E9DEAA9@CO6PR18MB3828.namprd18.prod.outlook.com \
    --to=pbhagavatula@marvell.com \
    --cc=Feifei.Wang2@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=dev@dpdk.org \
    --cc=harry.van.haaren@intel.com \
    --cc=jerinj@marvell.com \
    --cc=nd@arm.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git