DPDK patches and discussions
 help / color / mirror / Atom feed
From: Feifei Wang <Feifei.Wang2@arm.com>
To: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>,
	"jerinj@marvell.com" <jerinj@marvell.com>,
	Harry van Haaren <harry.van.haaren@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"stable@dpdk.org" <stable@dpdk.org>,
	Ruifeng Wang <Ruifeng.Wang@arm.com>, nd <nd@arm.com>,
	nd <nd@arm.com>
Subject: [dpdk-dev] 回复: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
Date: Fri, 8 Jan 2021 07:12:54 +0000	[thread overview]
Message-ID: <DBBPR08MB441182EC4AE339F3DD69330DC8AE0@DBBPR08MB4411.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <CO6PR18MB3828B1F80478C715078B2070DED11@CO6PR18MB3828.namprd18.prod.outlook.com>

Hi, Pavan



> -----邮件原件-----

> 发件人: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>

> 发送时间: 2021年1月5日 17:29

> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; jerinj@marvell.com; Harry

> van Haaren <harry.van.haaren@intel.com>

> 抄送: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli

> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; Ruifeng Wang

> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>

> 主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline

> test

>

> Hi Feifei,

>

> >Hi, Pavan

> >

> >Sorry for my late reply and thanks very much for your review.

> >

> >> -----Original Message-----

> >> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>>

> >> Sent: 2020年12月22日 18:33

> >> To: Feifei Wang <Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>; jerinj@marvell.com<mailto:jerinj@marvell.com>;

> >Harry van

> >> Haaren <harry.van.haaren@intel.com<mailto:harry.van.haaren@intel.com>>; Pavan Nikhilesh

> >> <pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@caviumnetworks.com>>

> >> Cc: dev@dpdk.org<mailto:dev@dpdk.org>; nd <nd@arm.com<mailto:nd@arm.com>>; Honnappa Nagarahalli

> >> <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; stable@dpdk.org<mailto:stable@dpdk.org>; Phil Yang

> >> <Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>

> >> Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers

> >for

> >> pipeline test

> >>

> >>

> >> >Add release barriers before updating the processed packets for

> >worker

> >> >lcores to ensure the worker lcore has really finished data

> >> >processing and then it can update the processed packets number.

> >> >

> >>

> >> I believe we can live with minor inaccuracies in stats being

> >> presented

> >as

> >> atomics are pretty heavy when scheduler is limited to burst size as 1.

> >>

> >> One option is to move it before a pipeline operation

> >(pipeline_event_tx,

> >> pipeline_fwd_event etc.) as they imply implicit release barrier (as

> >> all

> >the

> >> changes done to the event should be visible to the next core).

> >

> >If I understand correctly, your meaning is that move release barriers

> >before pipeline_event_tx or pipeline_fwd_event. This can ensure the

> >event has been processed before the next core begins to tx/fwd. For

> >example:

>

> What I meant was event APIs such as `rte_event_enqueue_burst`,

> `rte_event_eth_tx_adapter_enqueue`

> act as an implicit release barrier and the API `rte_event_dequeue_burst` act

> as an implicit acquire barrier.



>

> Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I

> don’t believe we need barriers in Between.



Sorry for my misunderstanding this. And I agree with you that no barriers are

needed between dequeue and enqueue.



Now, let's go back to the beginning. Actually with this patch, our barrier is mainly

for the synchronous variable " w->processed_pkts ". As we all know, the event is firstly

dequeued and then enqueued, after this, the event can be treated as the processed event

and included in the statistics("w->processed_pkts++").



Thus, we add a release barrier before " w->processed_pkts++" is to prevent this operation

being executed ahead of time. For example:

dequeue  ->  w->processed_pkts++  ->  enqueue

This cause that the worker doesn't actually finish this event processing, but the event is treated

as the processed one and included in the statistics.



______________________________________________________________________________



By the way, I have two other questions about pipeline process test in "test_pipeline_queue".

1. when do we start counting processed events (w->processed_pkts)?

For the fwd mode (internal_port = false), when we choose single stage, application increments

the number events processed after "pipeline_event enqueue". However, when we choose multiple

stage, application increments the number events processed before "pipleline_event_enqueue". So,

maybe we can unify this. For example of multiple stage:



                                if (cq_id == last_queue) {

                                                ev.queue_id = tx_queue[ev.mbuf->port];

                                                rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);

                                                pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);

                                +             pipeline_event_enqueue(dev, port, &ev);

                                                w->processed_pkts++;

                                } else {

                                                ev.queue_id++;

                                                pipeline_fwd_event(&ev, sched_type_list[cq_id]);

                                +             pipeline_event_enqueue(dev, port, &ev);

                                }



                -              pipeline_event_enqueue(dev, port, &ev);



2. Whether  "pipeline_event_enqueue" is needed after "pipeline_event_tx" for tx mode?

For single_stage_burst_tx mode, after "pipeline_event_tx", the worker has to enqueue again

due to  "pipeline_event_enqueue_burst", so maybe we should jump out of the loop after

“pipeline_event_tx”, for example:



                                                if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) {

                                                                pipeline_event_tx(dev, port, &ev[i]);

                                                                ev[i].op = RTE_EVENT_OP_RELEASE;

                                                                w->processed_pkts++;

                                                +             continue;

                                                } else {

                                                                ev[i].queue_id++;

                                                                pipeline_fwd_event(&ev[i],

                                                                                                RTE_SCHED_TYPE_ATOMIC);

                                                }

                                }



                                pipeline_event_enqueue_burst(dev, port, ev, nb_rx);





>

> >

> >if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {

> >                          +             __atomic_thread_fence(__ATOMIC_RELEASE);

> >                                          pipeline_event_tx(dev, port, &ev);

> >                                          w->processed_pkts++;

> >                          } else {

> >                                          ev.queue_id++;

> >                          +             __atomic_thread_fence(__ATOMIC_RELEASE);

> >                                          pipeline_fwd_event(&ev,

> >RTE_SCHED_TYPE_ATOMIC);

> >                                          pipeline_event_enqueue(dev, port, &ev);

> >

> >However, there are two reasons to prevent this:

> >

> >First, compare with other tests in app/eventdev, for example, the

> >eventdev perf test, the wmb is after event operation to ensure

> >operation has been finished and then w->processed_pkts++.

>

> In case of perf_* tests start with a dequeue() and finally ends with a

> mempool_put() should also act as implicit acquire release pairs making stats

> consistent?



For perf tests, this consistency refers to that there is a wmb after mempool_put().

Please refer to this link:

http://patches.dpdk.org/patch/85634/



>

> >So, if we move release barriers before tx/fwd, it may cause that the

> >tests of app/eventdev become  inconsistent.This may reduce the

> >maintainability of the code and make it difficult to understand.

> >

> >Second, it is a test case, though heavy thread may cause performance

> >degradation, it can ensure that the operation process and the test

> >result are correct. And maybe for a test case, correctness is more

> >important than performance.

> >

>

> Most of our internal perf test run on 24/48 core combinations and since

> Octeontx2 event device driver supports a burst size of 1, it will show up as

> Huge performance degradation.



For the impact on performance, I do the test using software driver, following are some test results:

------------------------------------------------------------------------------------------------------------------------------------

Architecture: aarch64

Nics: ixgbe-82599

CPU: Cortex-A72

BURST_SIZE: 1

Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a

Flow: one flow, 64bits package, TX rate: 1.4Mpps



Without this patch:

0.954 mpps avg 0.953 mpps



With this patch:

0.932 mpps avg 0.930 mpps

------------------------------------------------------------------------------------------------------------------------------------



Based on the result above, there is no significant performance degradation with this patch.

This is because the release barrier is only for  “w->processed_pkts++”. It just ensures that the worker core

increments the number events processed after enqueue, and it doesn’t affect dequeue/enqueue:



dequeue -> enqueue -> release barrier -> w->processed_pkts++



On the other hand, I infer the reason for the slight decrease in measurement performance is that the release barrier

prevent “w->processed_pkts++” before that the event has been processed (enqueue). But I think this test result is closer

to the real performance.

And sorry for that we have no octentx2 device, so there is no test result on Octeontx2 event device driver. Would you please

help us test this patch on octentx2 when you are convenient. Thanks very much.



Best Regards

Feifei



>

> >So, due to two reasons above, I'm ambivalent about how we should do in

> >the next step.

> >

> >Best Regards

> >Feifei

>

> Regards,

> Pavan.

>

> >

> >> >Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker

> >> >functions")

> >> >Cc: pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>

> >> >Cc: stable@dpdk.org<mailto:stable@dpdk.org>

> >> >

> >> >Signed-off-by: Phil Yang <phil.yang@arm.com<mailto:phil.yang@arm.com>>

> >> >Signed-off-by: Feifei Wang <feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>

> >> >Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>

> >> >---

> >> > app/test-eventdev/test_pipeline_queue.c | 64

> >> >+++++++++++++++++++++----

> >> > 1 file changed, 56 insertions(+), 8 deletions(-)

> >> >

> >> >diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-

> >> >eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb

> >100644

> >> >--- a/app/test-eventdev/test_pipeline_queue.c

> >> >+++ b/app/test-eventdev/test_pipeline_queue.c

> >> >@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void

> >> >*arg)

> >> >

> >> >                    if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {

> >> >                                    pipeline_event_tx(dev, port, &ev);

> >> >-                                  w->processed_pkts++;

> >> >+

> >> >+                                 /* release barrier here ensures stored operation

> >> >+                                 * of the event completes before the number of

> >> >+                                 * processed pkts is visible to the main core

> >> >+                                 */

> >> >+                                 __atomic_fetch_add(&(w->processed_pkts), 1,

> >> >+                                                                 __ATOMIC_RELEASE);

> >> >                    } else {

> >> >                                    ev.queue_id++;

> >> >                                    pipeline_fwd_event(&ev,

> >> >RTE_SCHED_TYPE_ATOMIC);

> >> >@@ -59,7 +65,13 @@

> >pipeline_queue_worker_single_stage_fwd(void

> >> >*arg)

> >> >                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);

> >> >                    pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);

> >> >                    pipeline_event_enqueue(dev, port, &ev);

> >> >-                  w->processed_pkts++;

> >> >+

> >> >+                 /* release barrier here ensures stored operation

> >> >+                 * of the event completes before the number of

> >> >+                 * processed pkts is visible to the main core

> >> >+                 */

> >> >+                 __atomic_fetch_add(&(w->processed_pkts), 1,

> >> >+                                                 __ATOMIC_RELEASE);

> >> >    }

> >> >

> >> >    return 0;

> >> >@@ -84,7 +96,13 @@

> >> >pipeline_queue_worker_single_stage_burst_tx(void *arg)

> >> >                                    if (ev[i].sched_type ==

> >> >RTE_SCHED_TYPE_ATOMIC) {

> >> >                                                    pipeline_event_tx(dev, port, &ev[i]);

> >> >                                                    ev[i].op = RTE_EVENT_OP_RELEASE;

> >> >-                                                  w->processed_pkts++;

> >> >+

> >> >+                                                 /* release barrier here ensures stored

> >> >operation

> >> >+                                                 * of the event completes before the

> >> >number of

> >> >+                                                 * processed pkts is visible to the main

> >> >core

> >> >+                                                 */

> >> >+                                                 __atomic_fetch_add(&(w-

> >> >>processed_pkts), 1,

> >> >+                                                                                 __ATOMIC_RELEASE);

> >> >                                    } else {

> >> >                                                    ev[i].queue_id++;

> >> >                                                    pipeline_fwd_event(&ev[i],

> >> >@@ -121,7 +139,13 @@

> >> >pipeline_queue_worker_single_stage_burst_fwd(void *arg)

> >> >                    }

> >> >

> >> >                    pipeline_event_enqueue_burst(dev, port, ev, nb_rx);

> >> >-                  w->processed_pkts += nb_rx;

> >> >+

> >> >+                 /* release barrier here ensures stored operation

> >> >+                 * of the event completes before the number of

> >> >+                 * processed pkts is visible to the main core

> >> >+                 */

> >> >+                 __atomic_fetch_add(&(w->processed_pkts), nb_rx,

> >> >+                                                 __ATOMIC_RELEASE);

> >> >    }

> >> >

> >> >    return 0;

> >> >@@ -146,7 +170,13 @@

> >pipeline_queue_worker_multi_stage_tx(void

> >> >*arg)

> >> >

> >> >                    if (ev.queue_id == tx_queue[ev.mbuf->port]) {

> >> >                                    pipeline_event_tx(dev, port, &ev);

> >> >-                                  w->processed_pkts++;

> >> >+

> >> >+                                 /* release barrier here ensures stored operation

> >> >+                                 * of the event completes before the number of

> >> >+                                 * processed pkts is visible to the main core

> >> >+                                 */

> >> >+                                 __atomic_fetch_add(&(w->processed_pkts), 1,

> >> >+                                                                 __ATOMIC_RELEASE);

> >> >                                    continue;

> >> >                    }

> >> >

> >> >@@ -180,7 +210,13 @@

> >> >pipeline_queue_worker_multi_stage_fwd(void *arg)

> >> >                                    ev.queue_id = tx_queue[ev.mbuf->port];

> >> >                                    rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);

> >> >                                    pipeline_fwd_event(&ev,

> >> >RTE_SCHED_TYPE_ATOMIC);

> >> >-                                  w->processed_pkts++;

> >> >+

> >> >+                                 /* release barrier here ensures stored operation

> >> >+                                 * of the event completes before the number of

> >> >+                                 * processed pkts is visible to the main core

> >> >+                                 */

> >> >+                                 __atomic_fetch_add(&(w->processed_pkts), 1,

> >> >+                                                                 __ATOMIC_RELEASE);

> >> >                    } else {

> >> >                                    ev.queue_id++;

> >> >                                    pipeline_fwd_event(&ev,

> >> >sched_type_list[cq_id]);

> >> >@@ -214,7 +250,13 @@

> >> >pipeline_queue_worker_multi_stage_burst_tx(void *arg)

> >> >                                    if (ev[i].queue_id == tx_queue[ev[i].mbuf-

> >> >>port]) {

> >> >                                                    pipeline_event_tx(dev, port, &ev[i]);

> >> >                                                    ev[i].op = RTE_EVENT_OP_RELEASE;

> >> >-                                                  w->processed_pkts++;

> >> >+

> >> >+                                                 /* release barrier here ensures stored

> >> >operation

> >> >+                                                 * of the event completes before the

> >> >number of

> >> >+                                                 * processed pkts is visible to the main

> >> >core

> >> >+                                                 */

> >> >+                                                 __atomic_fetch_add(&(w-

> >> >>processed_pkts), 1,

> >> >+                                                                                 __ATOMIC_RELEASE);

> >> >                                                    continue;

> >> >                                    }

> >> >

> >> >@@ -254,7 +296,13 @@

> >> >pipeline_queue_worker_multi_stage_burst_fwd(void *arg)

> >> >

> >> >    rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);

> >> >                                                    pipeline_fwd_event(&ev[i],

> >> >

> >> >    RTE_SCHED_TYPE_ATOMIC);

> >> >-                                                  w->processed_pkts++;

> >> >+

> >> >+                                                 /* release barrier here ensures stored

> >> >operation

> >> >+                                                 * of the event completes before the

> >> >number of

> >> >+                                                 * processed pkts is visible to the main

> >> >core

> >> >+                                                 */

> >> >+                                                 __atomic_fetch_add(&(w-

> >> >>processed_pkts), 1,

> >> >+                                                                                 __ATOMIC_RELEASE);

> >> >                                    } else {

> >> >                                                    ev[i].queue_id++;

> >> >                                                    pipeline_fwd_event(&ev[i],

> >> >--

> >> >2.17.1



  reply	other threads:[~2021-01-08  7:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-22 10:33 [dpdk-dev] " Pavan Nikhilesh Bhagavatula
2021-01-05  7:39 ` Feifei Wang
2021-01-05  9:29   ` Pavan Nikhilesh Bhagavatula
2021-01-08  7:12     ` Feifei Wang [this message]
2021-01-08  9:12       ` Pavan Nikhilesh Bhagavatula
2021-01-08 10:44         ` [dpdk-dev] 回复: " Feifei Wang
2021-01-08 10:58           ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
2021-01-11  1:57             ` [dpdk-dev] 回复: " Feifei Wang
2021-01-12  8:29             ` [dpdk-dev] " Pavan Nikhilesh Bhagavatula
2021-01-14  6:07               ` [dpdk-dev] 回复: " Feifei Wang
  -- strict thread matches above, loose matches on Subject: below --
2020-12-22  5:07 [dpdk-dev] [RFC PATCH v1 0/6] refactor smp barriers in app/eventdev Feifei Wang
2020-12-22  5:07 ` [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test Feifei Wang
2020-12-22  3:22 [dpdk-dev] [RFC PATCH v1 0/6] refactor smp barriers in app/eventdev Feifei Wang
2020-12-22  3:22 ` [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test Feifei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBBPR08MB441182EC4AE339F3DD69330DC8AE0@DBBPR08MB4411.eurprd08.prod.outlook.com \
    --to=feifei.wang2@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=dev@dpdk.org \
    --cc=harry.van.haaren@intel.com \
    --cc=jerinj@marvell.com \
    --cc=nd@arm.com \
    --cc=pbhagavatula@marvell.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).