Re: [dpdk-dev] [Bug 826] red_autotest random failures

DPDK CI discussions
 help / color / mirror / Atom feed

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
       [not found] <bug-826-3@http.bugs.dpdk.org/>
@ 2021-11-12 13:51 ` David Marchand
  2021-11-12 14:10   ` Lincoln Lavoie
  0 siblings, 1 reply; 21+ messages in thread
From: David Marchand @ 2021-11-12 13:51 UTC (permalink / raw)
  To: Cristian Dumitrescu; +Cc: dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci

On Fri, Oct 8, 2021 at 9:24 AM <bugzilla@dpdk.org> wrote:
>
> https://bugs.dpdk.org/show_bug.cgi?id=826
>
>             Bug ID: 826
>            Summary: red_autotest random failures
>            Product: DPDK
>            Version: unspecified
>           Hardware: All
>                 OS: All
>             Status: UNCONFIRMED
>           Severity: normal
>           Priority: Normal
>          Component: other
>           Assignee: cristian.dumitrescu@intel.com
>           Reporter: david.marchand@redhat.com
>                 CC: dev@dpdk.org, jasvinder.singh@intel.com
>   Target Milestone: ---
>
> A recent failure can be found at:
> https://lab.dpdk.org/results/dashboard/patchsets/19223/
>
> 50/94 DPDK:fast-tests / red_autotest                   FAIL              0.86s
>  exit status 1


functional test 6 : use several queues (each with its own run-time data),
            use several RED configurations (such that each
configuration is sharte_red by multiple queues),
            increase average queue size to target level,
            dequeue all packets until queue is empty,
            confirm that average queue size is computed correctly
while queue is empty
            (this is a larger scale version of functional test 3)

queue          config         q avg before   q avg after    expected
    difference %   tolerance %    result
0              0              1022.0000      1022.0000      1016.0627
    0.5843         5.0000         pass
1              0              1022.0000      1022.0000      1016.0627
    0.5843         5.0000         pass
2              1              1022.0000      1022.0000      1010.1483
    1.1733         5.0000         pass
3              1              1022.0000      937.1660       1010.1483
    7.2249         5.0000         fail
-------------------------------------<fail>-------------------------------------



This failure keeps on popping in the CI.
The bug report is one month old, with no reply.


I sent a proposal of removing red_autotest from the list executed by the CI.
https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-david.marchand@redhat.com/

It might be the best solution waiting for an analysis.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-12 13:51 ` [dpdk-dev] [Bug 826] red_autotest random failures David Marchand
@ 2021-11-12 14:10   ` Lincoln Lavoie
  2021-11-12 14:15     ` David Marchand
  0 siblings, 1 reply; 21+ messages in thread
From: Lincoln Lavoie @ 2021-11-12 14:10 UTC (permalink / raw)
  To: David Marchand
  Cc: Cristian Dumitrescu, dev, Aaron Conole, Thomas Monjalon, Yigit,
	Ferruh, ci

On Fri, Nov 12, 2021 at 8:52 AM David Marchand <david.marchand@redhat.com>
wrote:

> On Fri, Oct 8, 2021 at 9:24 AM <bugzilla@dpdk.org> wrote:
> >
> > https://bugs.dpdk.org/show_bug.cgi?id=826
> >
> >             Bug ID: 826
> >            Summary: red_autotest random failures
> >            Product: DPDK
> >            Version: unspecified
> >           Hardware: All
> >                 OS: All
> >             Status: UNCONFIRMED
> >           Severity: normal
> >           Priority: Normal
> >          Component: other
> >           Assignee: cristian.dumitrescu@intel.com
> >           Reporter: david.marchand@redhat.com
> >                 CC: dev@dpdk.org, jasvinder.singh@intel.com
> >   Target Milestone: ---
> >
> > A recent failure can be found at:
> > https://lab.dpdk.org/results/dashboard/patchsets/19223/
> >
> > 50/94 DPDK:fast-tests / red_autotest                   FAIL
> 0.86s
> >  exit status 1
>
>
> functional test 6 : use several queues (each with its own run-time data),
>             use several RED configurations (such that each
> configuration is sharte_red by multiple queues),
>             increase average queue size to target level,
>             dequeue all packets until queue is empty,
>             confirm that average queue size is computed correctly
> while queue is empty
>             (this is a larger scale version of functional test 3)
>
> queue          config         q avg before   q avg after    expected
>     difference %   tolerance %    result
> 0              0              1022.0000      1022.0000      1016.0627
>     0.5843         5.0000         pass
> 1              0              1022.0000      1022.0000      1016.0627
>     0.5843         5.0000         pass
> 2              1              1022.0000      1022.0000      1010.1483
>     1.1733         5.0000         pass
> 3              1              1022.0000      937.1660       1010.1483
>     7.2249         5.0000         fail
>
> -------------------------------------<fail>-------------------------------------
>
>
>
> This failure keeps on popping in the CI.
> The bug report is one month old, with no reply.
>
>
> I sent a proposal of removing red_autotest from the list executed by the
> CI.
>
> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-david.marchand@redhat.com/
>
> It might be the best solution waiting for an analysis.
>
>
> --
> David Marchand
>
>
Hi David,

My understanding is, removing the test would require removing it from the
DPDK unit tests, we are just running the fast-tests suite for the unit
tests.  DPDK's unit test structure / framework does not allow removing or
customizing the suite of tests beyond the suites.

In the lab, Brandon has been looking into and trying different
configurations for running the tests within the containers along the lines
of the CPU pinning requirements that might be assumed by the unit tests. So
far, everything he has tried has still had the similar failures / issues.
We are still looking into it, so the bug is not sitting without action,
just no final resolution.

Cheers,
Lincoln
-- 
*Lincoln Lavoie*
Principal Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
lylavoie@iol.unh.edu
https://www.iol.unh.edu
+1-603-674-2755 (m)
<https://www.iol.unh.edu>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-12 14:10   ` Lincoln Lavoie
@ 2021-11-12 14:15     ` David Marchand
  2021-11-15 11:51       ` Dumitrescu, Cristian
  0 siblings, 1 reply; 21+ messages in thread
From: David Marchand @ 2021-11-12 14:15 UTC (permalink / raw)
  To: Lincoln Lavoie
  Cc: Cristian Dumitrescu, dev, Aaron Conole, Thomas Monjalon, Yigit,
	Ferruh, ci

On Fri, Nov 12, 2021 at 3:11 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote:
>> This failure keeps on popping in the CI.
>> The bug report is one month old, with no reply.
>>
>>
>> I sent a proposal of removing red_autotest from the list executed by the CI.
>> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-david.marchand@redhat.com/
>>
>> It might be the best solution waiting for an analysis.
>>
>>
>> --
>> David Marchand
>>
>
> Hi David,
>
> My understanding is, removing the test would require removing it from the DPDK unit tests, we are just running the fast-tests suite for the unit tests.  DPDK's unit test structure / framework does not allow removing or customizing the suite of tests beyond the suites.

https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-david.marchand@redhat.com/


>
> In the lab, Brandon has been looking into and trying different configurations for running the tests within the containers along the lines of the CPU pinning requirements that might be assumed by the unit tests. So far, everything he has tried has still had the similar failures / issues.  We are still looking into it, so the bug is not sitting without action, just no final resolution.

The mail I sent was not a comment for the investigation on UNH side.
The ask is for Cristian to have a look too.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-12 14:15     ` David Marchand
@ 2021-11-15 11:51       ` Dumitrescu, Cristian
  2021-11-15 17:26         ` Liguzinski, WojciechX
  0 siblings, 1 reply; 21+ messages in thread
From: Dumitrescu, Cristian @ 2021-11-15 11:51 UTC (permalink / raw)
  To: David Marchand, Lincoln Lavoie, Liguzinski, WojciechX, Ajmera,
	Megha, Singh, Jasvinder
  Cc: dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, November 12, 2021 2:16 PM
> To: Lincoln Lavoie <lylavoie@iol.unh.edu>
> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev
> <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon
> <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> ci@dpdk.org
> Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures
> 
> On Fri, Nov 12, 2021 at 3:11 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote:
> >> This failure keeps on popping in the CI.
> >> The bug report is one month old, with no reply.
> >>
> >>
> >> I sent a proposal of removing red_autotest from the list executed by the
> CI.
> >> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-
> 2-david.marchand@redhat.com/
> >>
> >> It might be the best solution waiting for an analysis.
> >>
> >>
> >> --
> >> David Marchand
> >>
> >
> > Hi David,
> >
> > My understanding is, removing the test would require removing it from the
> DPDK unit tests, we are just running the fast-tests suite for the unit tests.
> DPDK's unit test structure / framework does not allow removing or
> customizing the suite of tests beyond the suites.
> 
> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-
> david.marchand@redhat.com/
> 
> 
> >
> > In the lab, Brandon has been looking into and trying different
> configurations for running the tests within the containers along the lines of
> the CPU pinning requirements that might be assumed by the unit tests. So
> far, everything he has tried has still had the similar failures / issues.  We are
> still looking into it, so the bug is not sitting without action, just no final
> resolution.
> 
> The mail I sent was not a comment for the investigation on UNH side.
> The ask is for Cristian to have a look too.
> 
> 
> --
> David Marchand

Wojciech, Megha,

Are you able to take a look at why is the RED autotest failing, please?

Thanks,
Cristian



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-15 11:51       ` Dumitrescu, Cristian
@ 2021-11-15 17:26         ` Liguzinski, WojciechX
  2021-11-18 22:10           ` Liguzinski, WojciechX
  0 siblings, 1 reply; 21+ messages in thread
From: Liguzinski, WojciechX @ 2021-11-15 17:26 UTC (permalink / raw)
  To: Dumitrescu, Cristian, David Marchand, Lincoln Lavoie, Ajmera,
	Megha, Singh, Jasvinder
  Cc: dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci

Hi,

Sure, I will have a look.

Best Regards,
Wojciech


-----Original Message-----
From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com> 
Sent: Monday, November 15, 2021 12:51 PM
To: David Marchand <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>; Liguzinski, WojciechX <wojciechx.liguzinski@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>
Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org
Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, November 12, 2021 2:16 PM
> To: Lincoln Lavoie <lylavoie@iol.unh.edu>
> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev 
> <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon 
> <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; 
> ci@dpdk.org
> Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures
> 
> On Fri, Nov 12, 2021 at 3:11 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote:
> >> This failure keeps on popping in the CI.
> >> The bug report is one month old, with no reply.
> >>
> >>
> >> I sent a proposal of removing red_autotest from the list executed 
> >> by the
> CI.
> >> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-
> 2-david.marchand@redhat.com/
> >>
> >> It might be the best solution waiting for an analysis.
> >>
> >>
> >> --
> >> David Marchand
> >>
> >
> > Hi David,
> >
> > My understanding is, removing the test would require removing it 
> > from the
> DPDK unit tests, we are just running the fast-tests suite for the unit tests.
> DPDK's unit test structure / framework does not allow removing or 
> customizing the suite of tests beyond the suites.
> 
> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-
> david.marchand@redhat.com/
> 
> 
> >
> > In the lab, Brandon has been looking into and trying different
> configurations for running the tests within the containers along the 
> lines of the CPU pinning requirements that might be assumed by the 
> unit tests. So far, everything he has tried has still had the similar 
> failures / issues.  We are still looking into it, so the bug is not 
> sitting without action, just no final resolution.
> 
> The mail I sent was not a comment for the investigation on UNH side.
> The ask is for Cristian to have a look too.
> 
> 
> --
> David Marchand

Wojciech, Megha,

Are you able to take a look at why is the RED autotest failing, please?

Thanks,
Cristian



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-15 17:26         ` Liguzinski, WojciechX
@ 2021-11-18 22:10           ` Liguzinski, WojciechX
  2021-11-19  7:26             ` Thomas Monjalon
  0 siblings, 1 reply; 21+ messages in thread
From: Liguzinski, WojciechX @ 2021-11-18 22:10 UTC (permalink / raw)
  To: Dumitrescu, Cristian, David Marchand, Lincoln Lavoie, Ajmera,
	Megha, Singh, Jasvinder
  Cc: dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci, Zegota, AnnaX

Hi,

I was trying to reproduce this test failure, but for me RED tests are passing. 
I was running the exact test command like the one described in Bug 826 - 'red_autotest' on the current main branch.

Here is an example when DPDK is build without RTE_SCHED_CMAN enabled, but with this flag set to true tests are also not failing.

root@silpixa00400629:~/wojtek/dpdk/build/app/test# ./dpdk-test '-l 0-15' --file-prefix=red_autotest
EAL: Detected CPU lcores: 96
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/red_autotest/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
APP: HPET is not enabled, using TSC as default timer
RTE>>red_autotest

--------------------------------------------------------------------------------
functional test 1 : use one rte_red configuration,
                    increase average queue size to various levels,
                    compare drop rate to drop probability

                avg queue size enqueued       dropped        drop prob %    drop rate %    diff %         tolerance %    
                6              10000          0              0.0000         0.0000         0.0000         50.0000        
                12             10000          0              0.0000         0.0000         0.0000         50.0000        
                18             10000          0              0.0000         0.0000         0.0000         50.0000        
                24             10000          0              0.0000         0.0000         0.0000         50.0000        
                30             10000          0              0.0000         0.0000         0.0000         50.0000        
                36             9961           39             0.4167         0.3900         0.0000         50.0000        
                42             9898           102            1.0417         1.0200         0.0000         50.0000        
                48             9835           165            1.6667         1.6500         0.0000         50.0000        
                54             9785           215            2.2917         2.1500         0.0000         50.0000        
                60             9703           297            2.9167         2.9700         0.0000         50.0000        
                66             9627           373            3.5417         3.7300         0.0000         50.0000        
                72             9580           420            4.1667         4.2000         0.0000         50.0000        
                78             9511           489            4.7917         4.8900         0.0000         50.0000        
                84             9462           538            5.4167         5.3800         0.0000         50.0000        
                90             9398           602            6.0417         6.0200         0.0000         50.0000        
                96             9366           634            6.6667         6.3400         0.0000         50.0000        
                102            9267           733            7.2917         7.3300         0.0000         50.0000        
                108            9212           788            7.9167         7.8800         0.0000         50.0000        
                114            9146           854            8.5417         8.5400         0.0000         50.0000        
                120            9102           898            9.1667         8.9800         0.0000         50.0000        
                126            8984           1016           9.7917         10.1600        0.0000         50.0000        
                132            0              10000          100.0000       100.0000       0.0000         50.0000        
                138            0              10000          100.0000       100.0000       0.0000         50.0000        
                144            0              10000          100.0000       100.0000       0.0000         50.0000        
-------------------------------------<pass>-------------------------------------

--------------------------------------------------------------------------------
functional test 2 : use several RED configurations,
                    increase average queue size to just below maximum threshold,
                    compare drop rate to drop probability

RED config     avg queue size min threshold  max threshold  drop prob %    drop rate %    diff %         tolerance %    
0              127            32             128            9.8958         10.0100        0.0000         50.0000        
1              127            32             128            4.9479         4.9700         0.0000         50.0000        
2              127            32             128            3.2986         2.6800         0.0000         50.0000        
3              127            32             128            2.4740         1.7000         0.0000         50.0000        
4              127            32             128            1.9792         1.2700         0.0000         50.0000        
5              127            32             128            1.6493         1.0500         0.0000         50.0000        
6              127            32             128            1.4137         0.8100         0.0000         50.0000        
7              127            32             128            1.2370         0.7100         0.0000         50.0000        
8              127            32             128            1.0995         0.6200         0.0000         50.0000        
9              127            32             128            0.9896         0.5600         0.0000         50.0000        
-------------------------------------<pass>-------------------------------------

--------------------------------------------------------------------------------
functional test 3 : use one RED configuration,
                    increase average queue size to target level,
                    dequeue all packets until queue is empty,
                    confirm that average queue size is computed correctly while queue is empty

q avg before   q avg after    expected       difference %   tolerance %    result        
1022.0000      1022.0000      1016.0627      0.5843         5.0000         pass           
1022.0000      1022.0000      1016.0627      0.5843         5.0000         pass           
1022.0000      1022.0000      1016.0627      0.5843         5.0000         pass           
1022.0000      1022.0000      1016.0627      0.5843         5.0000         pass           
1022.0000      1022.0000      1016.0627      0.5843         5.0000         pass           
-------------------------------------<pass>-------------------------------------

--------------------------------------------------------------------------------
functional test 5 : use several queues (each with its own run-time data),
                    use several RED configurations (such that each configuration is shared by multiple queues),
                    increase average queue size to just below maximum threshold,
                    compare drop rate to drop probability,
                    (this is a larger scale version of functional test 2)

queue          config         avg queue size min threshold  max threshold  drop prob %    drop rate %    diff %         tolerance %    
0              0              127            32             128            9.8958         9.9200         0.0000         50.0000        
1              0              127            32             128            9.8958         9.9700         0.0000         50.0000        
2              1              127            32             128            4.9479         4.8600         0.0000         50.0000        
3              1              127            32             128            4.9479         4.9400         0.0000         50.0000        
-------------------------------------<pass>-------------------------------------

--------------------------------------------------------------------------------
functional test 6 : use several queues (each with its own run-time data),
                    use several RED configurations (such that each configuration is shared by multiple queues),
                    increase average queue size to target level,
                    dequeue all packets until queue is empty,
                    confirm that average queue size is computed correctly while queue is empty
                    (this is a larger scale version of functional test 3)

queue          config         q avg before   q avg after    expected       difference %   tolerance %    result  
0              0              1022.0000      1022.0000      1016.0627      0.5843         5.0000         pass           
1              0              1022.0000      1022.0000      1016.0627      0.5843         5.0000         pass           
2              1              1022.0000      1022.0000      1010.1483      1.1733         5.0000         pass           
3              1              1022.0000      1022.0000      1010.1483      1.1733         5.0000         pass           
-------------------------------------<pass>-------------------------------------

--------------------------------------------------------------------------------
overflow test 1 : use one RED configuration,
                  increase average queue size to target level,
                  check maximum number of bits requirte_red to represent avg_s

avg queue size  wq_log2  fraction bits  max queue avg  num bits  enqueued  dropped   drop prob %  drop rate %  
1023            12       10             0xffc00000     32        0         941366    100.00       100.00       
-------------------------------------<pass>-------------------------------------
[total: 6, pass: 6]
Test OK
RTE>>quit


Kind Regards,
Wojtek


-----Original Message-----
From: Liguzinski, WojciechX 
Sent: Monday, November 15, 2021 6:27 PM
To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; David Marchand <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>
Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org
Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures

Hi,

Sure, I will have a look.

Best Regards,
Wojciech


-----Original Message-----
From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com> 
Sent: Monday, November 15, 2021 12:51 PM
To: David Marchand <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>; Liguzinski, WojciechX <wojciechx.liguzinski@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>
Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org
Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, November 12, 2021 2:16 PM
> To: Lincoln Lavoie <lylavoie@iol.unh.edu>
> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev 
> <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon 
> <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; 
> ci@dpdk.org
> Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures
> 
> On Fri, Nov 12, 2021 at 3:11 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote:
> >> This failure keeps on popping in the CI.
> >> The bug report is one month old, with no reply.
> >>
> >>
> >> I sent a proposal of removing red_autotest from the list executed 
> >> by the
> CI.
> >> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-
> 2-david.marchand@redhat.com/
> >>
> >> It might be the best solution waiting for an analysis.
> >>
> >>
> >> --
> >> David Marchand
> >>
> >
> > Hi David,
> >
> > My understanding is, removing the test would require removing it 
> > from the
> DPDK unit tests, we are just running the fast-tests suite for the unit tests.
> DPDK's unit test structure / framework does not allow removing or 
> customizing the suite of tests beyond the suites.
> 
> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-
> david.marchand@redhat.com/
> 
> 
> >
> > In the lab, Brandon has been looking into and trying different
> configurations for running the tests within the containers along the 
> lines of the CPU pinning requirements that might be assumed by the 
> unit tests. So far, everything he has tried has still had the similar 
> failures / issues.  We are still looking into it, so the bug is not 
> sitting without action, just no final resolution.
> 
> The mail I sent was not a comment for the investigation on UNH side.
> The ask is for Cristian to have a look too.
> 
> 
> --
> David Marchand

Wojciech, Megha,

Are you able to take a look at why is the RED autotest failing, please?

Thanks,
Cristian



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-18 22:10           ` Liguzinski, WojciechX
@ 2021-11-19  7:26             ` Thomas Monjalon
  2021-11-19 16:53               ` Dumitrescu, Cristian
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Monjalon @ 2021-11-19  7:26 UTC (permalink / raw)
  To: Dumitrescu, Cristian, David Marchand, Lincoln Lavoie, Ajmera,
	Megha, Singh, Jasvinder, Liguzinski, WojciechX
  Cc: dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX

18/11/2021 23:10, Liguzinski, WojciechX:
> Hi,
> 
> I was trying to reproduce this test failure, but for me RED tests are passing. 
> I was running the exact test command like the one described in Bug 826 - 'red_autotest' on the current main branch.

The test is not always failing.
There are some failing conditions, please find them.
I think you should try in a container with more limited resources.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-19  7:26             ` Thomas Monjalon
@ 2021-11-19 16:53               ` Dumitrescu, Cristian
  2021-11-19 17:25                 ` Lincoln Lavoie
  2021-11-22  8:17                 ` David Marchand
  0 siblings, 2 replies; 21+ messages in thread
From: Dumitrescu, Cristian @ 2021-11-19 16:53 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand, Lincoln Lavoie, Ajmera, Megha,
	Singh, Jasvinder, Liguzinski, WojciechX
  Cc: dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, November 19, 2021 7:26 AM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; David Marchand
> <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>;
> Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder
> <jasvinder.singh@intel.com>; Liguzinski, WojciechX
> <wojciechx.liguzinski@intel.com>
> Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Yigit,
> Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org; Zegota, AnnaX
> <annax.zegota@intel.com>
> Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures
> 
> 18/11/2021 23:10, Liguzinski, WojciechX:
> > Hi,
> >
> > I was trying to reproduce this test failure, but for me RED tests are passing.
> > I was running the exact test command like the one described in Bug 826 -
> 'red_autotest' on the current main branch.
> 
> The test is not always failing.
> There are some failing conditions, please find them.
> I think you should try in a container with more limited resources.
> 

Hi Thomas,

This is not a fair request IMO. We want to avoid wasting everybody's time, including Wojciech's time. Can the bug originator provide the details on the setup to reproduce the failure, please? Thank you!

On a different point, we should probably tweak our autotests to differentiate between logical failures and those failures related to resources not being available, and flag the test result accordingly in the report. For example, if memory allocation fails, the test should be flagged as "Not enough resources" instead of simply "Failed". In the first case, the next step should be fixing the test setup, while in the second case the next step should be fixing the code. What do people think on this?

Regards,
Cristian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-19 16:53               ` Dumitrescu, Cristian
@ 2021-11-19 17:25                 ` Lincoln Lavoie
       [not found]                   ` <BN9PR11MB53729251C262EEBB1134A61194619@BN9PR11MB5372.namprd11.prod.outlook.com>
  2021-11-22  8:17                 ` David Marchand
  1 sibling, 1 reply; 21+ messages in thread
From: Lincoln Lavoie @ 2021-11-19 17:25 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: Thomas Monjalon, David Marchand, Ajmera, Megha, Singh, Jasvinder,
	Liguzinski, WojciechX, dev, Aaron Conole, Yigit, Ferruh, ci,
	Zegota, AnnaX

Hi All,

I'm not sure if it will help, but this is an example of a failing case in
the CI: https://lab.dpdk.org/results/dashboard/patchsets/20222/

The test is running within a docker container.  CI is set up to only allow
one active unit test at a time, so the host might be running compile jobs,
but not other unit tests.  This ensures there isn't "competition" for
resources like hugepages between two running unit test jobs.  The host is
actually a VM running on VMware vCenter, not a bare-metal host, the VM's
sole purpose is running the docker jobs.

The command to start the unit test run is pretty generic (script is below).

#!/bin/bash

####################################################
# $1 argument: extra arguments to send to meson test
####################################################

# Exit on first command failure
set -e

# Extract dpdk.tar.gz
tar xzfm dpdk.tar.gz

# Compile DPDK
cd dpdk
meson build --werror
ninja -C build install

# Unit test
cd build
meson test --suite fast-tests -t 60 $1

I think a starting point is to understand if the unit test expects or makes
assumptions on the system / environment.  If it has sole access to a CPU
core, minimum number of hugepages, etc.  If it would help, I can also give
you the DockerFile to build the container (note the RHEL images have to be
built on a licensed Redhat server, based on being able to install the
required packages).

Cheers,
Lincoln


On Fri, Nov 19, 2021 at 11:54 AM Dumitrescu, Cristian <
cristian.dumitrescu@intel.com> wrote:

>
>
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Friday, November 19, 2021 7:26 AM
> > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; David Marchand
> > <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>;
> > Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder
> > <jasvinder.singh@intel.com>; Liguzinski, WojciechX
> > <wojciechx.liguzinski@intel.com>
> > Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Yigit,
> > Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org; Zegota, AnnaX
> > <annax.zegota@intel.com>
> > Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures
> >
> > 18/11/2021 23:10, Liguzinski, WojciechX:
> > > Hi,
> > >
> > > I was trying to reproduce this test failure, but for me RED tests are
> passing.
> > > I was running the exact test command like the one described in Bug 826
> -
> > 'red_autotest' on the current main branch.
> >
> > The test is not always failing.
> > There are some failing conditions, please find them.
> > I think you should try in a container with more limited resources.
> >
>
> Hi Thomas,
>
> This is not a fair request IMO. We want to avoid wasting everybody's time,
> including Wojciech's time. Can the bug originator provide the details on
> the setup to reproduce the failure, please? Thank you!
>
> On a different point, we should probably tweak our autotests to
> differentiate between logical failures and those failures related to
> resources not being available, and flag the test result accordingly in the
> report. For example, if memory allocation fails, the test should be flagged
> as "Not enough resources" instead of simply "Failed". In the first case,
> the next step should be fixing the test setup, while in the second case the
> next step should be fixing the code. What do people think on this?
>
> Regards,
> Cristian
>


-- 
*Lincoln Lavoie*
Principal Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
lylavoie@iol.unh.edu
https://www.iol.unh.edu
+1-603-674-2755 (m)
<https://www.iol.unh.edu>

^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <BN9PR11MB53729251C262EEBB1134A61194619@BN9PR11MB5372.namprd11.prod.outlook.com>]

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
       [not found]                   ` <BN9PR11MB53729251C262EEBB1134A61194619@BN9PR11MB5372.namprd11.prod.outlook.com>
@ 2021-11-29 17:58                     ` Brandon Lo
  2021-11-30  7:51                       ` Liguzinski, WojciechX
       [not found]                     ` <SA0PR11MB46708D32B6B2EC31D3DCE17F975A9@SA0PR11MB4670.namprd11.prod.outlook.com>
  1 sibling, 1 reply; 21+ messages in thread
From: Brandon Lo @ 2021-11-29 17:58 UTC (permalink / raw)
  To: Liguzinski, WojciechX
  Cc: Lincoln Lavoie, Dumitrescu, Cristian, Thomas Monjalon,
	David Marchand, Ajmera, Megha, Singh, Jasvinder, dev,
	Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX

On Wed, Nov 24, 2021 at 2:48 AM Liguzinski, WojciechX <
wojciechx.liguzinski@intel.com> wrote:

> Hi,
>
>
>
> Thanks Lincoln, I will also have a try with such script.
>
>
>
> Cheers,
>
> Wojciech
>
>
Hello Wojciech,

I also recommend trying to run the test with around 4GB of RAM and 2GB of
hugepages to see if it fails. That is roughly the number of resources we
have per machine that is completely dedicated to unit tests. The amount of
RAM available can sometimes increase depending on how many jobs are running
per machine, but 4GB is the lowest it can go for the unit test job.

Thanks,
Brandon



-- 
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
blo@iol.unh.edu
www.iol.unh.edu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-29 17:58                     ` Brandon Lo
@ 2021-11-30  7:51                       ` Liguzinski, WojciechX
  2021-12-10 13:31                         ` Liguzinski, WojciechX
  0 siblings, 1 reply; 21+ messages in thread
From: Liguzinski, WojciechX @ 2021-11-30  7:51 UTC (permalink / raw)
  To: Brandon Lo
  Cc: Lincoln Lavoie, Dumitrescu, Cristian, Thomas Monjalon,
	David Marchand, Ajmera, Megha, Singh, Jasvinder, dev,
	Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX

Ok, thanks Brandon for the tip :)
Let’s see if I can setup the machine with such configuration.

Cheers,
Wojciech

From: Brandon Lo <blo@iol.unh.edu>
Sent: Monday, November 29, 2021 6:58 PM
To: Liguzinski, WojciechX <wojciechx.liguzinski@intel.com>
Cc: Lincoln Lavoie <lylavoie@iol.unh.edu>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Thomas Monjalon <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org; Zegota, AnnaX <annax.zegota@intel.com>
Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures

On Wed, Nov 24, 2021 at 2:48 AM Liguzinski, WojciechX <wojciechx.liguzinski@intel.com<mailto:wojciechx.liguzinski@intel.com>> wrote:
Hi,

Thanks Lincoln, I will also have a try with such script.

Cheers,
Wojciech

Hello Wojciech,

I also recommend trying to run the test with around 4GB of RAM and 2GB of hugepages to see if it fails. That is roughly the number of resources we have per machine that is completely dedicated to unit tests. The amount of RAM available can sometimes increase depending on how many jobs are running per machine, but 4GB is the lowest it can go for the unit test job.

Thanks,
Brandon

--
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
blo@iol.unh.edu<mailto:blo@iol.unh.edu>
www.iol.unh.edu<http://www.iol.unh.edu>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-30  7:51                       ` Liguzinski, WojciechX
@ 2021-12-10 13:31                         ` Liguzinski, WojciechX
  0 siblings, 0 replies; 21+ messages in thread
From: Liguzinski, WojciechX @ 2021-12-10 13:31 UTC (permalink / raw)
  To: Brandon Lo
  Cc: Lincoln Lavoie, Dumitrescu, Cristian, Thomas Monjalon,
	David Marchand, Ajmera, Megha, Singh, Jasvinder, dev,
	Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX, Danilewicz,
	MarcinX

Hi,

Unfortunately, I haven’t been able to move the investigation much further.
I have been running those tests on machines with higher amount of RAM than 4GB, but with hugepages set there to 1GB and using the script provided by Lincoln.
For several runs red_autotest tests didn’t fail even once, not giving any clue what might be the cause of what’s happening on CI.

+Adding Marcin Danilewicz
To let you know, Marcin Danilewicz will be taking over my tasks, so for any further aspects please include or direct messages to him.

Best Regards,
Wojciech


From: Liguzinski, WojciechX <wojciechx.liguzinski@intel.com>
Sent: Tuesday, November 30, 2021 8:51 AM
To: Brandon Lo <blo@iol.unh.edu>
Cc: Lincoln Lavoie <lylavoie@iol.unh.edu>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Thomas Monjalon <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org; Zegota, AnnaX <annax.zegota@intel.com>
Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures

Ok, thanks Brandon for the tip :)
Let’s see if I can setup the machine with such configuration.

Cheers,
Wojciech

From: Brandon Lo <blo@iol.unh.edu<mailto:blo@iol.unh.edu>>
Sent: Monday, November 29, 2021 6:58 PM
To: Liguzinski, WojciechX <wojciechx.liguzinski@intel.com<mailto:wojciechx.liguzinski@intel.com>>
Cc: Lincoln Lavoie <lylavoie@iol.unh.edu<mailto:lylavoie@iol.unh.edu>>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com<mailto:cristian.dumitrescu@intel.com>>; Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; David Marchand <david.marchand@redhat.com<mailto:david.marchand@redhat.com>>; Ajmera, Megha <megha.ajmera@intel.com<mailto:megha.ajmera@intel.com>>; Singh, Jasvinder <jasvinder.singh@intel.com<mailto:jasvinder.singh@intel.com>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Aaron Conole <aconole@redhat.com<mailto:aconole@redhat.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Zegota, AnnaX <annax.zegota@intel.com<mailto:annax.zegota@intel.com>>
Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures

On Wed, Nov 24, 2021 at 2:48 AM Liguzinski, WojciechX <wojciechx.liguzinski@intel.com<mailto:wojciechx.liguzinski@intel.com>> wrote:
Hi,

Thanks Lincoln, I will also have a try with such script.

Cheers,
Wojciech

Hello Wojciech,

I also recommend trying to run the test with around 4GB of RAM and 2GB of hugepages to see if it fails. That is roughly the number of resources we have per machine that is completely dedicated to unit tests. The amount of RAM available can sometimes increase depending on how many jobs are running per machine, but 4GB is the lowest it can go for the unit test job.

Thanks,
Brandon



--
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
blo@iol.unh.edu<mailto:blo@iol.unh.edu>
www.iol.unh.edu<http://www.iol.unh.edu>

^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <SA0PR11MB46708D32B6B2EC31D3DCE17F975A9@SA0PR11MB4670.namprd11.prod.outlook.com>]

[parent not found: <BY5PR11MB3926999DD139D10AD76D177F8F5B9@BY5PR11MB3926.namprd11.prod.outlook.com>]

[parent not found: <BY5PR11MB39261E9379E18C67BB4FB9938F5B9@BY5PR11MB3926.namprd11.prod.outlook.com>]

[parent not found: <BY5PR11MB3926DF1466F5815D5D2FEC798F259@BY5PR11MB3926.namprd11.prod.outlook.com>]

[parent not found: <CAOE1vsPcKAiTMPGH1VYwoTccWi7b=9DJdObdPJZhKQvqNQsFmw@mail.gmail.com>]

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
       [not found]                             ` <CAOE1vsPcKAiTMPGH1VYwoTccWi7b=9DJdObdPJZhKQvqNQsFmw@mail.gmail.com>
@ 2022-02-02 14:51                               ` Brandon Lo
  2022-02-02 17:07                                 ` Danilewicz, MarcinX
  0 siblings, 1 reply; 21+ messages in thread
From: Brandon Lo @ 2022-02-02 14:51 UTC (permalink / raw)
  To: Danilewicz, MarcinX, Lincoln Lavoie
  Cc: Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota,
	AnnaX, Yigit, Ferruh, thomas, david.marchand, ci

> On Mon, Jan 31, 2022 at 2:27 PM Danilewicz, MarcinX <marcinx.danilewicz@intel.com> wrote:
>> After some time I did some testing. As you may guess, with real hardware I could not reproduce error.
>>
>> From what I see, the problem was here:
>>
>> FT2
>> RED config,     avg queue size, min threshold,  max threshold,  drop prob %,    drop rate %,    diff %,         tolerance %    ,
>> 5              127            32             128            1.6493         0.9900         0.0000         50.0000
>> 6              127            32             128            1.4137         0.8500         0.0000         50.0000
>> 7              127            32             128            1.2370         0.7300         0.0000         50.0000
>> 8              127            32             128            1.0995         0.6200         0.0000         50.0000
>> 9              127            32             128            0.9896         0.6300         0.0000         50.0000
>> ------------------------------------------------------------------------
>>
>> Drop_rate in line 8 should not be greater than in line 9. However by looking at other results, drop_rate value in line 9 is about 0.1% greater than expected. Line 8 results are fine to me.
>>
>> How often any can see this issue? Is there a chance I could use existing docker container for testing?

Hi Marcin,

Attached is an Ubuntu 20.04 Dockerfile that we use in the lab for unit testing.

I don't think we run into the red_autotest failure too often, so it is
hard to debug. My guess is that the test will randomly fail when there
are a lot of processes running on the system, especially since we run
multiple tests per system in the lab.
In a typical worst-case scenario, we can see a machine performing 2 to
3 total compile/ABI tests along with a single unit test. We do
allocate a specific amount of resources (4GB RAM, 2 cores) to each
container, so that can be another factor that affects the frequency of
these failures.

If this issue seems too dependent on the current system load and other
situational factors, it might be good to think about running the
red_autotest separate from other tests in the lab so it does not have
to compete for resources.
Any thoughts on this?

Thanks,
Brandon


-- 
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
blo@iol.unh.edu
www.iol.unh.edu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2022-02-02 14:51                               ` Brandon Lo
@ 2022-02-02 17:07                                 ` Danilewicz, MarcinX
  2022-02-03 23:31                                   ` Danilewicz, MarcinX
  0 siblings, 1 reply; 21+ messages in thread
From: Danilewicz, MarcinX @ 2022-02-02 17:07 UTC (permalink / raw)
  To: Brandon Lo, Lincoln Lavoie
  Cc: Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota,
	AnnaX, Yigit, Ferruh, thomas, david.marchand, ci

Hi Brandon,

I'll will look into this config file to see what I can do about it 😊

" a specific amount of resources (4GB RAM, 2 cores) to each container, so that can be another factor that affects the frequency of these failures.

If this issue seems too dependent on the current system load and other situational factors, it might be good to think about running the red_autotest separate from other tests in the lab so it does not have to compete for resources.
Any thoughts on this?"

CPU family might be important, from CPU features perspective.  Previously some throughput  tests I was executing using two different machines (different core families) to get correct (expected) results. Perhaps nothing has changed since then.

I'll let you know about my findings with that docker based installation.

Kind Regards,
/Marcin
--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263

This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2022-02-02 17:07                                 ` Danilewicz, MarcinX
@ 2022-02-03 23:31                                   ` Danilewicz, MarcinX
  2022-02-04  0:11                                     ` Brandon Lo
  0 siblings, 1 reply; 21+ messages in thread
From: Danilewicz, MarcinX @ 2022-02-03 23:31 UTC (permalink / raw)
  To: Brandon Lo, Lincoln Lavoie
  Cc: Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota,
	AnnaX, Yigit, Ferruh, thomas, david.marchand, ci

Hi Brandon,

It looks like I am searching for some local script/s to generate VM? Or something for VM..?

It started from this message:
Step 11/12 : COPY scripts /scripts
COPY failed: file not found in build context or excluded by .dockerignore: stat scripts: file does not exist

I'll start searching for this, but perhaps you can enlighten me what is that and where it may be found. If possible 😊

Kind Regards,
/Marcin

-----Original Message-----
From: Danilewicz, MarcinX 
Sent: Wednesday, February 2, 2022 6:08 PM
To: Brandon Lo <blo@iol.unh.edu>; Lincoln Lavoie <lylavoie@iol.unh.edu>
Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; Zegota, AnnaX <annax.zegota@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; thomas@monjalon.net; david.marchand@redhat.com; ci@dpdk.org
Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures

Hi Brandon,

I'll will look into this config file to see what I can do about it 😊

" a specific amount of resources (4GB RAM, 2 cores) to each container, so that can be another factor that affects the frequency of these failures.

If this issue seems too dependent on the current system load and other situational factors, it might be good to think about running the red_autotest separate from other tests in the lab so it does not have to compete for resources.
Any thoughts on this?"

CPU family might be important, from CPU features perspective.  Previously some throughput  tests I was executing using two different machines (different core families) to get correct (expected) results. Perhaps nothing has changed since then.

I'll let you know about my findings with that docker based installation.

Kind Regards,
/Marcin
--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263

This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
  2022-02-03 23:31                                   ` Danilewicz, MarcinX
@ 2022-02-04  0:11                                     ` Brandon Lo
  2022-03-09 10:01                                       ` Danilewicz, MarcinX
  0 siblings, 1 reply; 21+ messages in thread
From: Brandon Lo @ 2022-02-04  0:11 UTC (permalink / raw)
  To: Danilewicz, MarcinX
  Cc: Lincoln Lavoie, Dumitrescu, Cristian, Ajmera, Megha, Singh,
	Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand,
	ci

On Thu, Feb 3, 2022 at 6:31 PM Danilewicz, MarcinX
<marcinx.danilewicz@intel.com> wrote:
>
> Hi Brandon,
>
> It looks like I am searching for some local script/s to generate VM? Or something for VM..?
>
> It started from this message:
> Step 11/12 : COPY scripts /scripts
> COPY failed: file not found in build context or excluded by .dockerignore: stat scripts: file does not exist
>
> I'll start searching for this, but perhaps you can enlighten me what is that and where it may be found. If possible 😊

Hi Marcin,

Sorry about that. You can probably remove that line from the
Dockerfile. It is used to copy short bash scripts that run meson/ninja
with DPDK.
For unit testing, we run a small bash script like this:

#!/bin/bash
tar xzfm dpdk.tar.gz
cd dpdk
meson build --werror
ninja -C build install
cd build
meson test --suite fast-tests -t 60

Thanks,
Brandon


-- 
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
blo@iol.unh.edu
www.iol.unh.edu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2022-02-04  0:11                                     ` Brandon Lo
@ 2022-03-09 10:01                                       ` Danilewicz, MarcinX
  2022-03-09 14:48                                         ` Brandon Lo
  0 siblings, 1 reply; 21+ messages in thread
From: Danilewicz, MarcinX @ 2022-03-09 10:01 UTC (permalink / raw)
  To: Brandon Lo
  Cc: Lincoln Lavoie, Dumitrescu, Cristian, Ajmera, Megha, Singh,
	Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand,
	ci

Hi Brandon,

Sorry for late response, but I was busy in mean time. But after your mail well .. I've tried to run test dpdk from docker image. Few instances of images in parallel, enough to get machine fully loaded. But in turn, red_autotest never failed. 

Is it possible for you to share some additional details? About hardware used for testing, memory sizes, etc. To get some hint how to get these failures. I've seen other test failing constantly, depending on machine I was running other autotests. Maybe tests before red_autotest are changing hardware to the state where red_autotest is failing. Anyone tried to change autotests execution order? 

Also, I've don’t almost all to reproduce error and perhaps it is better to ignore that random error for now. It looks like you are able to successfully pass that test all the time, even when is failing from time to time. Right? If that is the true error, it will come out elsewhere.

Kind Regards,
/Marcin

-----Original Message-----
From: Brandon Lo <blo@iol.unh.edu> 
Sent: Friday, February 4, 2022 1:11 AM
To: Danilewicz, MarcinX <marcinx.danilewicz@intel.com>
Cc: Lincoln Lavoie <lylavoie@iol.unh.edu>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; Zegota, AnnaX <annax.zegota@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; thomas@monjalon.net; david.marchand@redhat.com; ci@dpdk.org
Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures

On Thu, Feb 3, 2022 at 6:31 PM Danilewicz, MarcinX <marcinx.danilewicz@intel.com> wrote:
>
> Hi Brandon,
>
> It looks like I am searching for some local script/s to generate VM? Or something for VM..?
>
> It started from this message:
> Step 11/12 : COPY scripts /scripts
> COPY failed: file not found in build context or excluded by 
> .dockerignore: stat scripts: file does not exist
>
> I'll start searching for this, but perhaps you can enlighten me what 
> is that and where it may be found. If possible 😊

Hi Marcin,

Sorry about that. You can probably remove that line from the Dockerfile. It is used to copy short bash scripts that run meson/ninja with DPDK.
For unit testing, we run a small bash script like this:

#!/bin/bash
tar xzfm dpdk.tar.gz
cd dpdk
meson build --werror
ninja -C build install
cd build
meson test --suite fast-tests -t 60

Thanks,
Brandon

--
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu www.iol.unh.edu

--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263

This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
  2022-03-09 10:01                                       ` Danilewicz, MarcinX
@ 2022-03-09 14:48                                         ` Brandon Lo
  2022-03-10 17:25                                           ` Danilewicz, MarcinX
  0 siblings, 1 reply; 21+ messages in thread
From: Brandon Lo @ 2022-03-09 14:48 UTC (permalink / raw)
  To: Danilewicz, MarcinX
  Cc: Lincoln Lavoie, Dumitrescu, Cristian, Ajmera, Megha, Singh,
	Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand,
	ci

On Wed, Mar 9, 2022 at 5:01 AM Danilewicz, MarcinX
<marcinx.danilewicz@intel.com> wrote:
>
> Hi Brandon,
>
> Sorry for late response, but I was busy in mean time. But after your mail well .. I've tried to run test dpdk from docker image. Few instances of images in parallel, enough to get machine fully loaded. But in turn, red_autotest never failed.
>
> Is it possible for you to share some additional details? About hardware used for testing, memory sizes, etc. To get some hint how to get these failures. I've seen other test failing constantly, depending on machine I was running other autotests. Maybe tests before red_autotest are changing hardware to the state where red_autotest is failing. Anyone tried to change autotests execution order?

Hi Marcin,

Unfortunately, I don't have any more details other than the ones we
talked about before. It is possible that the issue is not as common
now that we limited the number of compile jobs that can happen on each
machine at the same time.
We limited the amount of RAM that each job can use again so that the
systems are not getting a high load.

> Also, I've don’t almost all to reproduce error and perhaps it is better to ignore that random error for now. It looks like you are able to successfully pass that test all the time, even when is failing from time to time. Right? If that is the true error, it will come out elsewhere.

Yes, I think it's ok if you think the test is good for now. I haven't
seen it fail in a while, so it might just be due to the load we put on
the systems.
If the issue comes up, I can contact you again.

Thanks,
Brandon



--
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824
blo@iol.unh.edu
www.iol.unh.edu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [dpdk-dev] [Bug 826] red_autotest random failures
  2022-03-09 14:48                                         ` Brandon Lo
@ 2022-03-10 17:25                                           ` Danilewicz, MarcinX
  0 siblings, 0 replies; 21+ messages in thread
From: Danilewicz, MarcinX @ 2022-03-10 17:25 UTC (permalink / raw)
  To: Brandon Lo
  Cc: Lincoln Lavoie, Dumitrescu, Cristian, Ajmera, Megha, Singh,
	Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand,
	ci

Hi Brandon,

that’s good news.

Regards,
/Marcin

-----Original Message-----
From: Brandon Lo <blo@iol.unh.edu> 
Sent: Wednesday, March 9, 2022 3:48 PM
To: Danilewicz, MarcinX <marcinx.danilewicz@intel.com>
Cc: Lincoln Lavoie <lylavoie@iol.unh.edu>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; Zegota, AnnaX <annax.zegota@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; thomas@monjalon.net; david.marchand@redhat.com; ci@dpdk.org
Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures

On Wed, Mar 9, 2022 at 5:01 AM Danilewicz, MarcinX <marcinx.danilewicz@intel.com> wrote:
>
> Hi Brandon,
>
> Sorry for late response, but I was busy in mean time. But after your mail well .. I've tried to run test dpdk from docker image. Few instances of images in parallel, enough to get machine fully loaded. But in turn, red_autotest never failed.
>
> Is it possible for you to share some additional details? About hardware used for testing, memory sizes, etc. To get some hint how to get these failures. I've seen other test failing constantly, depending on machine I was running other autotests. Maybe tests before red_autotest are changing hardware to the state where red_autotest is failing. Anyone tried to change autotests execution order?

Hi Marcin,

Unfortunately, I don't have any more details other than the ones we talked about before. It is possible that the issue is not as common now that we limited the number of compile jobs that can happen on each machine at the same time.
We limited the amount of RAM that each job can use again so that the systems are not getting a high load.

> Also, I've don’t almost all to reproduce error and perhaps it is better to ignore that random error for now. It looks like you are able to successfully pass that test all the time, even when is failing from time to time. Right? If that is the true error, it will come out elsewhere.

Yes, I think it's ok if you think the test is good for now. I haven't seen it fail in a while, so it might just be due to the load we put on the systems.
If the issue comes up, I can contact you again.

Thanks,
Brandon



--
Brandon Lo
UNH InterOperability Laboratory
21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu www.iol.unh.edu
--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-19 16:53               ` Dumitrescu, Cristian
  2021-11-19 17:25                 ` Lincoln Lavoie
@ 2021-11-22  8:17                 ` David Marchand
  2021-11-22 13:34                   ` Lincoln Lavoie
  1 sibling, 1 reply; 21+ messages in thread
From: David Marchand @ 2021-11-22  8:17 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: Thomas Monjalon, Lincoln Lavoie, Ajmera, Megha, Singh, Jasvinder,
	Liguzinski, WojciechX, dev, Aaron Conole, Yigit, Ferruh, ci,
	Zegota, AnnaX

On Fri, Nov 19, 2021 at 5:54 PM Dumitrescu, Cristian
<cristian.dumitrescu@intel.com> wrote:
> On a different point, we should probably tweak our autotests to differentiate between logical failures and those failures related to resources not being available, and flag the test result accordingly in the report. For example, if memory allocation fails, the test should be flagged as "Not enough resources" instead of simply "Failed". In the first case, the next step should be fixing the test setup, while in the second case the next step should be fixing the code. What do people think on this?

In such case, the test must return TEST_SKIPPED.

I did a pass for cores count / specific hw requirements, some time ago.
See https://git.dpdk.org/dpdk/commit/?id=e0f4a0ed4237


-- 
David Marchand


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [Bug 826] red_autotest random failures
  2021-11-22  8:17                 ` David Marchand
@ 2021-11-22 13:34                   ` Lincoln Lavoie
  0 siblings, 0 replies; 21+ messages in thread
From: Lincoln Lavoie @ 2021-11-22 13:34 UTC (permalink / raw)
  To: David Marchand
  Cc: Dumitrescu, Cristian, Thomas Monjalon, Ajmera, Megha, Singh,
	Jasvinder, Liguzinski, WojciechX, dev, Aaron Conole, Yigit,
	Ferruh, ci, Zegota, AnnaX

On Mon, Nov 22, 2021 at 3:17 AM David Marchand <david.marchand@redhat.com>
wrote:

> On Fri, Nov 19, 2021 at 5:54 PM Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com> wrote:
> > On a different point, we should probably tweak our autotests to
> differentiate between logical failures and those failures related to
> resources not being available, and flag the test result accordingly in the
> report. For example, if memory allocation fails, the test should be flagged
> as "Not enough resources" instead of simply "Failed". In the first case,
> the next step should be fixing the test setup, while in the second case the
> next step should be fixing the code. What do people think on this?
>
> In such case, the test must return TEST_SKIPPED.
>
> If the purpose of the component / function being tested is to get / create
/ reserve the resource(s), the failure might be valid. So it can't be
applied across the board.  But places where the test is checking other
functionality, this might at least prevent some failures that are transient
(i.e. based on what the test could "get" from the system at that moment in
time).



> I did a pass for cores count / specific hw requirements, some time ago.
> See https://git.dpdk.org/dpdk/commit/?id=e0f4a0ed4237
>
>
> --
> David Marchand
>
>

-- 
*Lincoln Lavoie*
Principal Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
lylavoie@iol.unh.edu
https://www.iol.unh.edu
+1-603-674-2755 (m)
<https://www.iol.unh.edu>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-03-10 17:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-826-3@http.bugs.dpdk.org/>
2021-11-12 13:51 ` [dpdk-dev] [Bug 826] red_autotest random failures David Marchand
2021-11-12 14:10   ` Lincoln Lavoie
2021-11-12 14:15     ` David Marchand
2021-11-15 11:51       ` Dumitrescu, Cristian
2021-11-15 17:26         ` Liguzinski, WojciechX
2021-11-18 22:10           ` Liguzinski, WojciechX
2021-11-19  7:26             ` Thomas Monjalon
2021-11-19 16:53               ` Dumitrescu, Cristian
2021-11-19 17:25                 ` Lincoln Lavoie
     [not found]                   ` <BN9PR11MB53729251C262EEBB1134A61194619@BN9PR11MB5372.namprd11.prod.outlook.com>
2021-11-29 17:58                     ` Brandon Lo
2021-11-30  7:51                       ` Liguzinski, WojciechX
2021-12-10 13:31                         ` Liguzinski, WojciechX
     [not found]                     ` <SA0PR11MB46708D32B6B2EC31D3DCE17F975A9@SA0PR11MB4670.namprd11.prod.outlook.com>
     [not found]                       ` <BY5PR11MB3926999DD139D10AD76D177F8F5B9@BY5PR11MB3926.namprd11.prod.outlook.com>
     [not found]                         ` <BY5PR11MB39261E9379E18C67BB4FB9938F5B9@BY5PR11MB3926.namprd11.prod.outlook.com>
     [not found]                           ` <BY5PR11MB3926DF1466F5815D5D2FEC798F259@BY5PR11MB3926.namprd11.prod.outlook.com>
     [not found]                             ` <CAOE1vsPcKAiTMPGH1VYwoTccWi7b=9DJdObdPJZhKQvqNQsFmw@mail.gmail.com>
2022-02-02 14:51                               ` Brandon Lo
2022-02-02 17:07                                 ` Danilewicz, MarcinX
2022-02-03 23:31                                   ` Danilewicz, MarcinX
2022-02-04  0:11                                     ` Brandon Lo
2022-03-09 10:01                                       ` Danilewicz, MarcinX
2022-03-09 14:48                                         ` Brandon Lo
2022-03-10 17:25                                           ` Danilewicz, MarcinX
2021-11-22  8:17                 ` David Marchand
2021-11-22 13:34                   ` Lincoln Lavoie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).