DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v1 0/1] distributor test fix
       [not found] <CGME20210119035920eucas1p15da5a04b73d19e63287dc14905d765f2@eucas1p1.samsung.com>
@ 2021-01-19  3:59 ` Lukasz Wojciechowski
       [not found]   ` <CGME20210119035921eucas1p1aaea0d68975ba9481f200912eb10a40e@eucas1p1.samsung.com>
  2021-01-19  8:44   ` [dpdk-dev] [PATCH v1 0/1] distributor test fix David Marchand
  0 siblings, 2 replies; 9+ messages in thread
From: Lukasz Wojciechowski @ 2021-01-19  3:59 UTC (permalink / raw)
  Cc: dev, l.wojciechow

According to the discussion in this thread:
https://protect2.fireeye.com/v1/url?k=24ecce33-7b77f77c-24ed457c-0cc47a30
d446-e317a6beb8cfa273&q=1&e=f8bb12df-3698-4bce-a7b7-d72e22b91431&u=https%
3A%2F%2Finbox.dpdk.org%2Fdev%2FCAOE1vsOehF4ZMOWffpEv%3DQF6YOc5wXtg23PV83B
9CLiTMn8wQA%40mail.gmail.com%2F%23r

I was able to reproduce the distributor test failure in the exactly same
way as described, but on x86_64 machine with 32 cores.
So it does not seem to be the problem related to the ARM architecture.
IMO issue occurs when there are many worker threads returning at the same
time packets.

I was not able to observe the issue on ARM devices, but I used only
machines with 4 cores. So that is max 3 worker cores,
so maximum of 32*3 = 96 packets processed at the same time
which is less than 127 , so the issue cannot occur.

Can anyone verify this patch on a machine similar to one used in CI lab,
on which the issue occurred?

Lukasz Wojciechowski (1):
  test/distributor: prevent return buffer overload

 app/test/test_distributor.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH v1 1/1] test/distributor: prevent return buffer overload
       [not found]   ` <CGME20210119035921eucas1p1aaea0d68975ba9481f200912eb10a40e@eucas1p1.samsung.com>
@ 2021-01-19  3:59     ` Lukasz Wojciechowski
  2021-01-28 14:10       ` [dpdk-dev] [dpdk-stable] " David Marchand
  2021-01-28 16:46       ` [dpdk-dev] " David Hunt
  0 siblings, 2 replies; 9+ messages in thread
From: Lukasz Wojciechowski @ 2021-01-19  3:59 UTC (permalink / raw)
  To: David Hunt, Bruce Richardson; +Cc: dev, l.wojciechow, stable

The distributor library implementation uses a cyclic queue to store
packets returned from workers. These packets can be later collected
with rte_distributor_returned_pkts() call.
However the queue has limited capacity. It is able to contain only
127 packets (RTE_DISTRIB_RETURNS_MASK).

Big burst tests sent 1024 packets in 32 packets bursts without waiting
until they are processed by the distributor. In case when tests were
run with big number of worker threads, it happened that more than
127 packets were returned from workers and put into cyclic queue.
This caused packets to be dropped by the queue, making them impossible
to be collected later with rte_distributor_returned_pkts() calls.
However the test waited for all packets to be returned infinitely.

This patch fixes the big burst test by not allowing more than
queue capacity packets to be processed at the same time, making
impossible to drop any packets.
It also cleans up duplicated code in the same test.

Bugzilla ID: 612
Fixes: c0de0eb82e40 ("distributor: switch over to new API")
Cc: david.hunt@intel.com
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
---
 app/test/test_distributor.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index f4c6229f1..961f326cd 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -217,6 +217,8 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	clear_packet_count();
 	struct rte_mbuf *many_bufs[BIG_BATCH], *return_bufs[BIG_BATCH];
 	unsigned num_returned = 0;
+	unsigned int num_being_processed = 0;
+	unsigned int return_buffer_capacity = 127;/* RTE_DISTRIB_RETURNS_MASK */
 
 	/* flush out any remaining packets */
 	rte_distributor_flush(db);
@@ -233,16 +235,16 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH/BURST; i++) {
 		rte_distributor_process(db,
 				&many_bufs[i*BURST], BURST);
-		count = rte_distributor_returned_pkts(db,
-				&return_bufs[num_returned],
-				BIG_BATCH - num_returned);
-		num_returned += count;
+		num_being_processed += BURST;
+		do {
+			count = rte_distributor_returned_pkts(db,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+			num_being_processed -= count;
+			num_returned += count;
+			rte_distributor_flush(db);
+		} while (num_being_processed + BURST > return_buffer_capacity);
 	}
-	rte_distributor_flush(db);
-	count = rte_distributor_returned_pkts(db,
-		&return_bufs[num_returned],
-			BIG_BATCH - num_returned);
-	num_returned += count;
 	retries = 0;
 	do {
 		rte_distributor_flush(db);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/1] distributor test fix
  2021-01-19  3:59 ` [dpdk-dev] [PATCH v1 0/1] distributor test fix Lukasz Wojciechowski
       [not found]   ` <CGME20210119035921eucas1p1aaea0d68975ba9481f200912eb10a40e@eucas1p1.samsung.com>
@ 2021-01-19  8:44   ` David Marchand
  2021-01-19 13:06     ` Lukasz Wojciechowski
  1 sibling, 1 reply; 9+ messages in thread
From: David Marchand @ 2021-01-19  8:44 UTC (permalink / raw)
  To: Lukasz Wojciechowski; +Cc: dev, Aaron Conole, ci, Lincoln Lavoie

On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
<l.wojciechow@partner.samsung.com> wrote:
>
> According to the discussion in this thread:
> https://protect2.fireeye.com/v1/url?k=24ecce33-7b77f77c-24ed457c-0cc47a30
> d446-e317a6beb8cfa273&q=1&e=f8bb12df-3698-4bce-a7b7-d72e22b91431&u=https%
> 3A%2F%2Finbox.dpdk.org%2Fdev%2FCAOE1vsOehF4ZMOWffpEv%3DQF6YOc5wXtg23PV83B
> 9CLiTMn8wQA%40mail.gmail.com%2F%23r
>
> I was able to reproduce the distributor test failure in the exactly same
> way as described, but on x86_64 machine with 32 cores.
> So it does not seem to be the problem related to the ARM architecture.
> IMO issue occurs when there are many worker threads returning at the same
> time packets.
>
> I was not able to observe the issue on ARM devices, but I used only
> machines with 4 cores. So that is max 3 worker cores,
> so maximum of 32*3 = 96 packets processed at the same time
> which is less than 127 , so the issue cannot occur.
>
> Can anyone verify this patch on a machine similar to one used in CI lab,
> on which the issue occurred?

Thanks for looking at it, Lukasz.
Unfortunately, I can't reproduce it on my x86 system (26 workers in
the test) and I don't have a ARM machine.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/1] distributor test fix
  2021-01-19  8:44   ` [dpdk-dev] [PATCH v1 0/1] distributor test fix David Marchand
@ 2021-01-19 13:06     ` Lukasz Wojciechowski
  2021-01-28 13:34       ` David Marchand
  0 siblings, 1 reply; 9+ messages in thread
From: Lukasz Wojciechowski @ 2021-01-19 13:06 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Aaron Conole, ci, Lincoln Lavoie

Thank you David,
If you have the possibility you can try on some emulated virtual 
machine, where cores are much slower, so the workers don't return 
packages immediately.
It reproduces in 100% cases in such environment.

Best regards

Lukasz

W dniu 19.01.2021 o 09:44, David Marchand pisze:
> On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
> <l.wojciechow@partner.samsung.com> wrote:
>> According to the discussion in this thread:
>> https://protect2.fireeye.com/v1/url?k=24ecce33-7b77f77c-24ed457c-0cc47a30
>> d446-e317a6beb8cfa273&q=1&e=f8bb12df-3698-4bce-a7b7-d72e22b91431&u=https%
>> 3A%2F%2Finbox.dpdk.org%2Fdev%2FCAOE1vsOehF4ZMOWffpEv%3DQF6YOc5wXtg23PV83B
>> 9CLiTMn8wQA%40mail.gmail.com%2F%23r
>>
>> I was able to reproduce the distributor test failure in the exactly same
>> way as described, but on x86_64 machine with 32 cores.
>> So it does not seem to be the problem related to the ARM architecture.
>> IMO issue occurs when there are many worker threads returning at the same
>> time packets.
>>
>> I was not able to observe the issue on ARM devices, but I used only
>> machines with 4 cores. So that is max 3 worker cores,
>> so maximum of 32*3 = 96 packets processed at the same time
>> which is less than 127 , so the issue cannot occur.
>>
>> Can anyone verify this patch on a machine similar to one used in CI lab,
>> on which the issue occurred?
> Thanks for looking at it, Lukasz.
> Unfortunately, I can't reproduce it on my x86 system (26 workers in
> the test) and I don't have a ARM machine.
>
-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciechow@partner.samsung.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/1] distributor test fix
  2021-01-19 13:06     ` Lukasz Wojciechowski
@ 2021-01-28 13:34       ` David Marchand
  0 siblings, 0 replies; 9+ messages in thread
From: David Marchand @ 2021-01-28 13:34 UTC (permalink / raw)
  To: Lukasz Wojciechowski; +Cc: dev, Aaron Conole, ci, Lincoln Lavoie

On Tue, Jan 19, 2021 at 2:07 PM Lukasz Wojciechowski
<l.wojciechow@partner.samsung.com> wrote:
>
> Thank you David,
> If you have the possibility you can try on some emulated virtual
> machine, where cores are much slower, so the workers don't return
> packages immediately.
> It reproduces in 100% cases in such environment.

I reproduced the issue with starting a testpmd on the same cores in this system.
I usually reproduce it after 1-2 minutes of continuously running the
distributor_autotest unit test.

I've applied your fix in my tree and I will let this loop run for a while.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v1 1/1] test/distributor: prevent return buffer overload
  2021-01-19  3:59     ` [dpdk-dev] [PATCH v1 1/1] test/distributor: prevent return buffer overload Lukasz Wojciechowski
@ 2021-01-28 14:10       ` David Marchand
  2021-01-29  8:03         ` David Marchand
  2021-01-28 16:46       ` [dpdk-dev] " David Hunt
  1 sibling, 1 reply; 9+ messages in thread
From: David Marchand @ 2021-01-28 14:10 UTC (permalink / raw)
  To: Lukasz Wojciechowski, David Hunt, Bruce Richardson; +Cc: dev, dpdk stable

On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
<l.wojciechow@partner.samsung.com> wrote:
>
> The distributor library implementation uses a cyclic queue to store
> packets returned from workers. These packets can be later collected
> with rte_distributor_returned_pkts() call.
> However the queue has limited capacity. It is able to contain only
> 127 packets (RTE_DISTRIB_RETURNS_MASK).
>
> Big burst tests sent 1024 packets in 32 packets bursts without waiting
> until they are processed by the distributor. In case when tests were
> run with big number of worker threads, it happened that more than
> 127 packets were returned from workers and put into cyclic queue.
> This caused packets to be dropped by the queue, making them impossible
> to be collected later with rte_distributor_returned_pkts() calls.
> However the test waited for all packets to be returned infinitely.
>
> This patch fixes the big burst test by not allowing more than
> queue capacity packets to be processed at the same time, making
> impossible to drop any packets.
> It also cleans up duplicated code in the same test.
>
> Bugzilla ID: 612
> Fixes: c0de0eb82e40 ("distributor: switch over to new API")
> Cc: david.hunt@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>

Pasting from my reply to the cover letter:

> I reproduced the issue with starting a testpmd on the same cores in this system.
> I usually reproduce it after 1-2 minutes of continuously running the
> distributor_autotest unit test.
>
> I've applied your fix in my tree and I will let this loop run for a while.

This has been running fine for more than 30 minutes on my x86 28 cores system.
Tested-by: David Marchand <david.marchand@redhat.com>


-- 
David Marchand


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/1] test/distributor: prevent return buffer overload
  2021-01-19  3:59     ` [dpdk-dev] [PATCH v1 1/1] test/distributor: prevent return buffer overload Lukasz Wojciechowski
  2021-01-28 14:10       ` [dpdk-dev] [dpdk-stable] " David Marchand
@ 2021-01-28 16:46       ` David Hunt
  1 sibling, 0 replies; 9+ messages in thread
From: David Hunt @ 2021-01-28 16:46 UTC (permalink / raw)
  To: Lukasz Wojciechowski, Bruce Richardson, David Marchand; +Cc: dev, stable

Hi Lukasz,

On 19/1/2021 3:59 AM, Lukasz Wojciechowski wrote:
> The distributor library implementation uses a cyclic queue to store
> packets returned from workers. These packets can be later collected
> with rte_distributor_returned_pkts() call.
> However the queue has limited capacity. It is able to contain only
> 127 packets (RTE_DISTRIB_RETURNS_MASK).
>
> Big burst tests sent 1024 packets in 32 packets bursts without waiting
> until they are processed by the distributor. In case when tests were
> run with big number of worker threads, it happened that more than
> 127 packets were returned from workers and put into cyclic queue.
> This caused packets to be dropped by the queue, making them impossible
> to be collected later with rte_distributor_returned_pkts() calls.
> However the test waited for all packets to be returned infinitely.
>
> This patch fixes the big burst test by not allowing more than
> queue capacity packets to be processed at the same time, making
> impossible to drop any packets.
> It also cleans up duplicated code in the same test.
>
> Bugzilla ID: 612
> Fixes: c0de0eb82e40 ("distributor: switch over to new API")
> Cc: david.hunt@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> ---


This patch cleans up the code nicely, and it makes sense to return 
packets in the do..while. LGTM.


Reviewed-by: David Hunt <david.hunt@intel.com>




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v1 1/1] test/distributor: prevent return buffer overload
  2021-01-28 14:10       ` [dpdk-dev] [dpdk-stable] " David Marchand
@ 2021-01-29  8:03         ` David Marchand
  2021-01-29 12:36           ` Lukasz Wojciechowski
  0 siblings, 1 reply; 9+ messages in thread
From: David Marchand @ 2021-01-29  8:03 UTC (permalink / raw)
  To: Lukasz Wojciechowski, David Hunt, Bruce Richardson
  Cc: dev, dpdk stable, Ruifeng Wang (Arm Technology China), ci

On Thu, Jan 28, 2021 at 3:10 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
> <l.wojciechow@partner.samsung.com> wrote:
> >
> > The distributor library implementation uses a cyclic queue to store
> > packets returned from workers. These packets can be later collected
> > with rte_distributor_returned_pkts() call.
> > However the queue has limited capacity. It is able to contain only
> > 127 packets (RTE_DISTRIB_RETURNS_MASK).
> >
> > Big burst tests sent 1024 packets in 32 packets bursts without waiting
> > until they are processed by the distributor. In case when tests were
> > run with big number of worker threads, it happened that more than
> > 127 packets were returned from workers and put into cyclic queue.
> > This caused packets to be dropped by the queue, making them impossible
> > to be collected later with rte_distributor_returned_pkts() calls.
> > However the test waited for all packets to be returned infinitely.
> >
> > This patch fixes the big burst test by not allowing more than
> > queue capacity packets to be processed at the same time, making
> > impossible to drop any packets.
> > It also cleans up duplicated code in the same test.
> >
> > Bugzilla ID: 612
> > Fixes: c0de0eb82e40 ("distributor: switch over to new API")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Tested-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: David Hunt <david.hunt@intel.com>

Applied, thanks Lukasz.

This should fix the issue seen at UNH on the ARM server.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v1 1/1] test/distributor: prevent return buffer overload
  2021-01-29  8:03         ` David Marchand
@ 2021-01-29 12:36           ` Lukasz Wojciechowski
  0 siblings, 0 replies; 9+ messages in thread
From: Lukasz Wojciechowski @ 2021-01-29 12:36 UTC (permalink / raw)
  To: David Marchand, David Hunt, Bruce Richardson
  Cc: dev, dpdk stable, Ruifeng Wang (Arm Technology China), ci

Thank you guys!

W dniu 29.01.2021 o 09:03, David Marchand pisze:
> On Thu, Jan 28, 2021 at 3:10 PM David Marchand
> <david.marchand@redhat.com> wrote:
>> On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
>> <l.wojciechow@partner.samsung.com> wrote:
>>> The distributor library implementation uses a cyclic queue to store
>>> packets returned from workers. These packets can be later collected
>>> with rte_distributor_returned_pkts() call.
>>> However the queue has limited capacity. It is able to contain only
>>> 127 packets (RTE_DISTRIB_RETURNS_MASK).
>>>
>>> Big burst tests sent 1024 packets in 32 packets bursts without waiting
>>> until they are processed by the distributor. In case when tests were
>>> run with big number of worker threads, it happened that more than
>>> 127 packets were returned from workers and put into cyclic queue.
>>> This caused packets to be dropped by the queue, making them impossible
>>> to be collected later with rte_distributor_returned_pkts() calls.
>>> However the test waited for all packets to be returned infinitely.
>>>
>>> This patch fixes the big burst test by not allowing more than
>>> queue capacity packets to be processed at the same time, making
>>> impossible to drop any packets.
>>> It also cleans up duplicated code in the same test.
>>>
>>> Bugzilla ID: 612
>>> Fixes: c0de0eb82e40 ("distributor: switch over to new API")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
>> Tested-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: David Hunt <david.hunt@intel.com>
>
> Applied, thanks Lukasz.
>
> This should fix the issue seen at UNH on the ARM server.
>
-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciechow@partner.samsung.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-01-29 12:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20210119035920eucas1p15da5a04b73d19e63287dc14905d765f2@eucas1p1.samsung.com>
2021-01-19  3:59 ` [dpdk-dev] [PATCH v1 0/1] distributor test fix Lukasz Wojciechowski
     [not found]   ` <CGME20210119035921eucas1p1aaea0d68975ba9481f200912eb10a40e@eucas1p1.samsung.com>
2021-01-19  3:59     ` [dpdk-dev] [PATCH v1 1/1] test/distributor: prevent return buffer overload Lukasz Wojciechowski
2021-01-28 14:10       ` [dpdk-dev] [dpdk-stable] " David Marchand
2021-01-29  8:03         ` David Marchand
2021-01-29 12:36           ` Lukasz Wojciechowski
2021-01-28 16:46       ` [dpdk-dev] " David Hunt
2021-01-19  8:44   ` [dpdk-dev] [PATCH v1 0/1] distributor test fix David Marchand
2021-01-19 13:06     ` Lukasz Wojciechowski
2021-01-28 13:34       ` David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).