* Re: [dpdk-dev] [Bug 826] red_autotest random failures [not found] <bug-826-3@http.bugs.dpdk.org/> @ 2021-11-12 13:51 ` David Marchand 2021-11-12 14:10 ` Lincoln Lavoie 0 siblings, 1 reply; 21+ messages in thread From: David Marchand @ 2021-11-12 13:51 UTC (permalink / raw) To: Cristian Dumitrescu; +Cc: dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci On Fri, Oct 8, 2021 at 9:24 AM <bugzilla@dpdk.org> wrote: > > https://bugs.dpdk.org/show_bug.cgi?id=826 > > Bug ID: 826 > Summary: red_autotest random failures > Product: DPDK > Version: unspecified > Hardware: All > OS: All > Status: UNCONFIRMED > Severity: normal > Priority: Normal > Component: other > Assignee: cristian.dumitrescu@intel.com > Reporter: david.marchand@redhat.com > CC: dev@dpdk.org, jasvinder.singh@intel.com > Target Milestone: --- > > A recent failure can be found at: > https://lab.dpdk.org/results/dashboard/patchsets/19223/ > > 50/94 DPDK:fast-tests / red_autotest FAIL 0.86s > exit status 1 functional test 6 : use several queues (each with its own run-time data), use several RED configurations (such that each configuration is sharte_red by multiple queues), increase average queue size to target level, dequeue all packets until queue is empty, confirm that average queue size is computed correctly while queue is empty (this is a larger scale version of functional test 3) queue config q avg before q avg after expected difference % tolerance % result 0 0 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass 1 0 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass 2 1 1022.0000 1022.0000 1010.1483 1.1733 5.0000 pass 3 1 1022.0000 937.1660 1010.1483 7.2249 5.0000 fail -------------------------------------<fail>------------------------------------- This failure keeps on popping in the CI. The bug report is one month old, with no reply. I sent a proposal of removing red_autotest from the list executed by the CI. https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-david.marchand@redhat.com/ It might be the best solution waiting for an analysis. -- David Marchand ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-12 13:51 ` [dpdk-dev] [Bug 826] red_autotest random failures David Marchand @ 2021-11-12 14:10 ` Lincoln Lavoie 2021-11-12 14:15 ` David Marchand 0 siblings, 1 reply; 21+ messages in thread From: Lincoln Lavoie @ 2021-11-12 14:10 UTC (permalink / raw) To: David Marchand Cc: Cristian Dumitrescu, dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci On Fri, Nov 12, 2021 at 8:52 AM David Marchand <david.marchand@redhat.com> wrote: > On Fri, Oct 8, 2021 at 9:24 AM <bugzilla@dpdk.org> wrote: > > > > https://bugs.dpdk.org/show_bug.cgi?id=826 > > > > Bug ID: 826 > > Summary: red_autotest random failures > > Product: DPDK > > Version: unspecified > > Hardware: All > > OS: All > > Status: UNCONFIRMED > > Severity: normal > > Priority: Normal > > Component: other > > Assignee: cristian.dumitrescu@intel.com > > Reporter: david.marchand@redhat.com > > CC: dev@dpdk.org, jasvinder.singh@intel.com > > Target Milestone: --- > > > > A recent failure can be found at: > > https://lab.dpdk.org/results/dashboard/patchsets/19223/ > > > > 50/94 DPDK:fast-tests / red_autotest FAIL > 0.86s > > exit status 1 > > > functional test 6 : use several queues (each with its own run-time data), > use several RED configurations (such that each > configuration is sharte_red by multiple queues), > increase average queue size to target level, > dequeue all packets until queue is empty, > confirm that average queue size is computed correctly > while queue is empty > (this is a larger scale version of functional test 3) > > queue config q avg before q avg after expected > difference % tolerance % result > 0 0 1022.0000 1022.0000 1016.0627 > 0.5843 5.0000 pass > 1 0 1022.0000 1022.0000 1016.0627 > 0.5843 5.0000 pass > 2 1 1022.0000 1022.0000 1010.1483 > 1.1733 5.0000 pass > 3 1 1022.0000 937.1660 1010.1483 > 7.2249 5.0000 fail > > -------------------------------------<fail>------------------------------------- > > > > This failure keeps on popping in the CI. > The bug report is one month old, with no reply. > > > I sent a proposal of removing red_autotest from the list executed by the > CI. > > https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-david.marchand@redhat.com/ > > It might be the best solution waiting for an analysis. > > > -- > David Marchand > > Hi David, My understanding is, removing the test would require removing it from the DPDK unit tests, we are just running the fast-tests suite for the unit tests. DPDK's unit test structure / framework does not allow removing or customizing the suite of tests beyond the suites. In the lab, Brandon has been looking into and trying different configurations for running the tests within the containers along the lines of the CPU pinning requirements that might be assumed by the unit tests. So far, everything he has tried has still had the similar failures / issues. We are still looking into it, so the bug is not sitting without action, just no final resolution. Cheers, Lincoln -- *Lincoln Lavoie* Principal Engineer, Broadband Technologies 21 Madbury Rd., Ste. 100, Durham, NH 03824 lylavoie@iol.unh.edu https://www.iol.unh.edu +1-603-674-2755 (m) <https://www.iol.unh.edu> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-12 14:10 ` Lincoln Lavoie @ 2021-11-12 14:15 ` David Marchand 2021-11-15 11:51 ` Dumitrescu, Cristian 0 siblings, 1 reply; 21+ messages in thread From: David Marchand @ 2021-11-12 14:15 UTC (permalink / raw) To: Lincoln Lavoie Cc: Cristian Dumitrescu, dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci On Fri, Nov 12, 2021 at 3:11 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote: >> This failure keeps on popping in the CI. >> The bug report is one month old, with no reply. >> >> >> I sent a proposal of removing red_autotest from the list executed by the CI. >> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-david.marchand@redhat.com/ >> >> It might be the best solution waiting for an analysis. >> >> >> -- >> David Marchand >> > > Hi David, > > My understanding is, removing the test would require removing it from the DPDK unit tests, we are just running the fast-tests suite for the unit tests. DPDK's unit test structure / framework does not allow removing or customizing the suite of tests beyond the suites. https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2-david.marchand@redhat.com/ > > In the lab, Brandon has been looking into and trying different configurations for running the tests within the containers along the lines of the CPU pinning requirements that might be assumed by the unit tests. So far, everything he has tried has still had the similar failures / issues. We are still looking into it, so the bug is not sitting without action, just no final resolution. The mail I sent was not a comment for the investigation on UNH side. The ask is for Cristian to have a look too. -- David Marchand ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-12 14:15 ` David Marchand @ 2021-11-15 11:51 ` Dumitrescu, Cristian 2021-11-15 17:26 ` Liguzinski, WojciechX 0 siblings, 1 reply; 21+ messages in thread From: Dumitrescu, Cristian @ 2021-11-15 11:51 UTC (permalink / raw) To: David Marchand, Lincoln Lavoie, Liguzinski, WojciechX, Ajmera, Megha, Singh, Jasvinder Cc: dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci > -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: Friday, November 12, 2021 2:16 PM > To: Lincoln Lavoie <lylavoie@iol.unh.edu> > Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev > <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon > <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; > ci@dpdk.org > Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures > > On Fri, Nov 12, 2021 at 3:11 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote: > >> This failure keeps on popping in the CI. > >> The bug report is one month old, with no reply. > >> > >> > >> I sent a proposal of removing red_autotest from the list executed by the > CI. > >> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502- > 2-david.marchand@redhat.com/ > >> > >> It might be the best solution waiting for an analysis. > >> > >> > >> -- > >> David Marchand > >> > > > > Hi David, > > > > My understanding is, removing the test would require removing it from the > DPDK unit tests, we are just running the fast-tests suite for the unit tests. > DPDK's unit test structure / framework does not allow removing or > customizing the suite of tests beyond the suites. > > https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2- > david.marchand@redhat.com/ > > > > > > In the lab, Brandon has been looking into and trying different > configurations for running the tests within the containers along the lines of > the CPU pinning requirements that might be assumed by the unit tests. So > far, everything he has tried has still had the similar failures / issues. We are > still looking into it, so the bug is not sitting without action, just no final > resolution. > > The mail I sent was not a comment for the investigation on UNH side. > The ask is for Cristian to have a look too. > > > -- > David Marchand Wojciech, Megha, Are you able to take a look at why is the RED autotest failing, please? Thanks, Cristian ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-15 11:51 ` Dumitrescu, Cristian @ 2021-11-15 17:26 ` Liguzinski, WojciechX 2021-11-18 22:10 ` Liguzinski, WojciechX 0 siblings, 1 reply; 21+ messages in thread From: Liguzinski, WojciechX @ 2021-11-15 17:26 UTC (permalink / raw) To: Dumitrescu, Cristian, David Marchand, Lincoln Lavoie, Ajmera, Megha, Singh, Jasvinder Cc: dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci Hi, Sure, I will have a look. Best Regards, Wojciech -----Original Message----- From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com> Sent: Monday, November 15, 2021 12:51 PM To: David Marchand <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>; Liguzinski, WojciechX <wojciechx.liguzinski@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com> Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures > -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: Friday, November 12, 2021 2:16 PM > To: Lincoln Lavoie <lylavoie@iol.unh.edu> > Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev > <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon > <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; > ci@dpdk.org > Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures > > On Fri, Nov 12, 2021 at 3:11 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote: > >> This failure keeps on popping in the CI. > >> The bug report is one month old, with no reply. > >> > >> > >> I sent a proposal of removing red_autotest from the list executed > >> by the > CI. > >> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502- > 2-david.marchand@redhat.com/ > >> > >> It might be the best solution waiting for an analysis. > >> > >> > >> -- > >> David Marchand > >> > > > > Hi David, > > > > My understanding is, removing the test would require removing it > > from the > DPDK unit tests, we are just running the fast-tests suite for the unit tests. > DPDK's unit test structure / framework does not allow removing or > customizing the suite of tests beyond the suites. > > https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2- > david.marchand@redhat.com/ > > > > > > In the lab, Brandon has been looking into and trying different > configurations for running the tests within the containers along the > lines of the CPU pinning requirements that might be assumed by the > unit tests. So far, everything he has tried has still had the similar > failures / issues. We are still looking into it, so the bug is not > sitting without action, just no final resolution. > > The mail I sent was not a comment for the investigation on UNH side. > The ask is for Cristian to have a look too. > > > -- > David Marchand Wojciech, Megha, Are you able to take a look at why is the RED autotest failing, please? Thanks, Cristian ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-15 17:26 ` Liguzinski, WojciechX @ 2021-11-18 22:10 ` Liguzinski, WojciechX 2021-11-19 7:26 ` Thomas Monjalon 0 siblings, 1 reply; 21+ messages in thread From: Liguzinski, WojciechX @ 2021-11-18 22:10 UTC (permalink / raw) To: Dumitrescu, Cristian, David Marchand, Lincoln Lavoie, Ajmera, Megha, Singh, Jasvinder Cc: dev, Aaron Conole, Thomas Monjalon, Yigit, Ferruh, ci, Zegota, AnnaX Hi, I was trying to reproduce this test failure, but for me RED tests are passing. I was running the exact test command like the one described in Bug 826 - 'red_autotest' on the current main branch. Here is an example when DPDK is build without RTE_SCHED_CMAN enabled, but with this flag set to true tests are also not failing. root@silpixa00400629:~/wojtek/dpdk/build/app/test# ./dpdk-test '-l 0-15' --file-prefix=red_autotest EAL: Detected CPU lcores: 96 EAL: Detected NUMA nodes: 2 EAL: Detected static linkage of DPDK EAL: Multi-process socket /var/run/dpdk/red_autotest/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized TELEMETRY: No legacy callbacks, legacy socket not created APP: HPET is not enabled, using TSC as default timer RTE>>red_autotest -------------------------------------------------------------------------------- functional test 1 : use one rte_red configuration, increase average queue size to various levels, compare drop rate to drop probability avg queue size enqueued dropped drop prob % drop rate % diff % tolerance % 6 10000 0 0.0000 0.0000 0.0000 50.0000 12 10000 0 0.0000 0.0000 0.0000 50.0000 18 10000 0 0.0000 0.0000 0.0000 50.0000 24 10000 0 0.0000 0.0000 0.0000 50.0000 30 10000 0 0.0000 0.0000 0.0000 50.0000 36 9961 39 0.4167 0.3900 0.0000 50.0000 42 9898 102 1.0417 1.0200 0.0000 50.0000 48 9835 165 1.6667 1.6500 0.0000 50.0000 54 9785 215 2.2917 2.1500 0.0000 50.0000 60 9703 297 2.9167 2.9700 0.0000 50.0000 66 9627 373 3.5417 3.7300 0.0000 50.0000 72 9580 420 4.1667 4.2000 0.0000 50.0000 78 9511 489 4.7917 4.8900 0.0000 50.0000 84 9462 538 5.4167 5.3800 0.0000 50.0000 90 9398 602 6.0417 6.0200 0.0000 50.0000 96 9366 634 6.6667 6.3400 0.0000 50.0000 102 9267 733 7.2917 7.3300 0.0000 50.0000 108 9212 788 7.9167 7.8800 0.0000 50.0000 114 9146 854 8.5417 8.5400 0.0000 50.0000 120 9102 898 9.1667 8.9800 0.0000 50.0000 126 8984 1016 9.7917 10.1600 0.0000 50.0000 132 0 10000 100.0000 100.0000 0.0000 50.0000 138 0 10000 100.0000 100.0000 0.0000 50.0000 144 0 10000 100.0000 100.0000 0.0000 50.0000 -------------------------------------<pass>------------------------------------- -------------------------------------------------------------------------------- functional test 2 : use several RED configurations, increase average queue size to just below maximum threshold, compare drop rate to drop probability RED config avg queue size min threshold max threshold drop prob % drop rate % diff % tolerance % 0 127 32 128 9.8958 10.0100 0.0000 50.0000 1 127 32 128 4.9479 4.9700 0.0000 50.0000 2 127 32 128 3.2986 2.6800 0.0000 50.0000 3 127 32 128 2.4740 1.7000 0.0000 50.0000 4 127 32 128 1.9792 1.2700 0.0000 50.0000 5 127 32 128 1.6493 1.0500 0.0000 50.0000 6 127 32 128 1.4137 0.8100 0.0000 50.0000 7 127 32 128 1.2370 0.7100 0.0000 50.0000 8 127 32 128 1.0995 0.6200 0.0000 50.0000 9 127 32 128 0.9896 0.5600 0.0000 50.0000 -------------------------------------<pass>------------------------------------- -------------------------------------------------------------------------------- functional test 3 : use one RED configuration, increase average queue size to target level, dequeue all packets until queue is empty, confirm that average queue size is computed correctly while queue is empty q avg before q avg after expected difference % tolerance % result 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass -------------------------------------<pass>------------------------------------- -------------------------------------------------------------------------------- functional test 5 : use several queues (each with its own run-time data), use several RED configurations (such that each configuration is shared by multiple queues), increase average queue size to just below maximum threshold, compare drop rate to drop probability, (this is a larger scale version of functional test 2) queue config avg queue size min threshold max threshold drop prob % drop rate % diff % tolerance % 0 0 127 32 128 9.8958 9.9200 0.0000 50.0000 1 0 127 32 128 9.8958 9.9700 0.0000 50.0000 2 1 127 32 128 4.9479 4.8600 0.0000 50.0000 3 1 127 32 128 4.9479 4.9400 0.0000 50.0000 -------------------------------------<pass>------------------------------------- -------------------------------------------------------------------------------- functional test 6 : use several queues (each with its own run-time data), use several RED configurations (such that each configuration is shared by multiple queues), increase average queue size to target level, dequeue all packets until queue is empty, confirm that average queue size is computed correctly while queue is empty (this is a larger scale version of functional test 3) queue config q avg before q avg after expected difference % tolerance % result 0 0 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass 1 0 1022.0000 1022.0000 1016.0627 0.5843 5.0000 pass 2 1 1022.0000 1022.0000 1010.1483 1.1733 5.0000 pass 3 1 1022.0000 1022.0000 1010.1483 1.1733 5.0000 pass -------------------------------------<pass>------------------------------------- -------------------------------------------------------------------------------- overflow test 1 : use one RED configuration, increase average queue size to target level, check maximum number of bits requirte_red to represent avg_s avg queue size wq_log2 fraction bits max queue avg num bits enqueued dropped drop prob % drop rate % 1023 12 10 0xffc00000 32 0 941366 100.00 100.00 -------------------------------------<pass>------------------------------------- [total: 6, pass: 6] Test OK RTE>>quit Kind Regards, Wojtek -----Original Message----- From: Liguzinski, WojciechX Sent: Monday, November 15, 2021 6:27 PM To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; David Marchand <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com> Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures Hi, Sure, I will have a look. Best Regards, Wojciech -----Original Message----- From: Dumitrescu, Cristian <cristian.dumitrescu@intel.com> Sent: Monday, November 15, 2021 12:51 PM To: David Marchand <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>; Liguzinski, WojciechX <wojciechx.liguzinski@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com> Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures > -----Original Message----- > From: David Marchand <david.marchand@redhat.com> > Sent: Friday, November 12, 2021 2:16 PM > To: Lincoln Lavoie <lylavoie@iol.unh.edu> > Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; dev > <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Thomas Monjalon > <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; > ci@dpdk.org > Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures > > On Fri, Nov 12, 2021 at 3:11 PM Lincoln Lavoie <lylavoie@iol.unh.edu> wrote: > >> This failure keeps on popping in the CI. > >> The bug report is one month old, with no reply. > >> > >> > >> I sent a proposal of removing red_autotest from the list executed > >> by the > CI. > >> https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502- > 2-david.marchand@redhat.com/ > >> > >> It might be the best solution waiting for an analysis. > >> > >> > >> -- > >> David Marchand > >> > > > > Hi David, > > > > My understanding is, removing the test would require removing it > > from the > DPDK unit tests, we are just running the fast-tests suite for the unit tests. > DPDK's unit test structure / framework does not allow removing or > customizing the suite of tests beyond the suites. > > https://patchwork.dpdk.org/project/dpdk/patch/20211027140458.2502-2- > david.marchand@redhat.com/ > > > > > > In the lab, Brandon has been looking into and trying different > configurations for running the tests within the containers along the > lines of the CPU pinning requirements that might be assumed by the > unit tests. So far, everything he has tried has still had the similar > failures / issues. We are still looking into it, so the bug is not > sitting without action, just no final resolution. > > The mail I sent was not a comment for the investigation on UNH side. > The ask is for Cristian to have a look too. > > > -- > David Marchand Wojciech, Megha, Are you able to take a look at why is the RED autotest failing, please? Thanks, Cristian ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-18 22:10 ` Liguzinski, WojciechX @ 2021-11-19 7:26 ` Thomas Monjalon 2021-11-19 16:53 ` Dumitrescu, Cristian 0 siblings, 1 reply; 21+ messages in thread From: Thomas Monjalon @ 2021-11-19 7:26 UTC (permalink / raw) To: Dumitrescu, Cristian, David Marchand, Lincoln Lavoie, Ajmera, Megha, Singh, Jasvinder, Liguzinski, WojciechX Cc: dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX 18/11/2021 23:10, Liguzinski, WojciechX: > Hi, > > I was trying to reproduce this test failure, but for me RED tests are passing. > I was running the exact test command like the one described in Bug 826 - 'red_autotest' on the current main branch. The test is not always failing. There are some failing conditions, please find them. I think you should try in a container with more limited resources. ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-19 7:26 ` Thomas Monjalon @ 2021-11-19 16:53 ` Dumitrescu, Cristian 2021-11-19 17:25 ` Lincoln Lavoie 2021-11-22 8:17 ` David Marchand 0 siblings, 2 replies; 21+ messages in thread From: Dumitrescu, Cristian @ 2021-11-19 16:53 UTC (permalink / raw) To: Thomas Monjalon, David Marchand, Lincoln Lavoie, Ajmera, Megha, Singh, Jasvinder, Liguzinski, WojciechX Cc: dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX > -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Friday, November 19, 2021 7:26 AM > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; David Marchand > <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>; > Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder > <jasvinder.singh@intel.com>; Liguzinski, WojciechX > <wojciechx.liguzinski@intel.com> > Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Yigit, > Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org; Zegota, AnnaX > <annax.zegota@intel.com> > Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures > > 18/11/2021 23:10, Liguzinski, WojciechX: > > Hi, > > > > I was trying to reproduce this test failure, but for me RED tests are passing. > > I was running the exact test command like the one described in Bug 826 - > 'red_autotest' on the current main branch. > > The test is not always failing. > There are some failing conditions, please find them. > I think you should try in a container with more limited resources. > Hi Thomas, This is not a fair request IMO. We want to avoid wasting everybody's time, including Wojciech's time. Can the bug originator provide the details on the setup to reproduce the failure, please? Thank you! On a different point, we should probably tweak our autotests to differentiate between logical failures and those failures related to resources not being available, and flag the test result accordingly in the report. For example, if memory allocation fails, the test should be flagged as "Not enough resources" instead of simply "Failed". In the first case, the next step should be fixing the test setup, while in the second case the next step should be fixing the code. What do people think on this? Regards, Cristian ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-19 16:53 ` Dumitrescu, Cristian @ 2021-11-19 17:25 ` Lincoln Lavoie [not found] ` <BN9PR11MB53729251C262EEBB1134A61194619@BN9PR11MB5372.namprd11.prod.outlook.com> 2021-11-22 8:17 ` David Marchand 1 sibling, 1 reply; 21+ messages in thread From: Lincoln Lavoie @ 2021-11-19 17:25 UTC (permalink / raw) To: Dumitrescu, Cristian Cc: Thomas Monjalon, David Marchand, Ajmera, Megha, Singh, Jasvinder, Liguzinski, WojciechX, dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX Hi All, I'm not sure if it will help, but this is an example of a failing case in the CI: https://lab.dpdk.org/results/dashboard/patchsets/20222/ The test is running within a docker container. CI is set up to only allow one active unit test at a time, so the host might be running compile jobs, but not other unit tests. This ensures there isn't "competition" for resources like hugepages between two running unit test jobs. The host is actually a VM running on VMware vCenter, not a bare-metal host, the VM's sole purpose is running the docker jobs. The command to start the unit test run is pretty generic (script is below). #!/bin/bash #################################################### # $1 argument: extra arguments to send to meson test #################################################### # Exit on first command failure set -e # Extract dpdk.tar.gz tar xzfm dpdk.tar.gz # Compile DPDK cd dpdk meson build --werror ninja -C build install # Unit test cd build meson test --suite fast-tests -t 60 $1 I think a starting point is to understand if the unit test expects or makes assumptions on the system / environment. If it has sole access to a CPU core, minimum number of hugepages, etc. If it would help, I can also give you the DockerFile to build the container (note the RHEL images have to be built on a licensed Redhat server, based on being able to install the required packages). Cheers, Lincoln On Fri, Nov 19, 2021 at 11:54 AM Dumitrescu, Cristian < cristian.dumitrescu@intel.com> wrote: > > > > -----Original Message----- > > From: Thomas Monjalon <thomas@monjalon.net> > > Sent: Friday, November 19, 2021 7:26 AM > > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; David Marchand > > <david.marchand@redhat.com>; Lincoln Lavoie <lylavoie@iol.unh.edu>; > > Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder > > <jasvinder.singh@intel.com>; Liguzinski, WojciechX > > <wojciechx.liguzinski@intel.com> > > Cc: dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Yigit, > > Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org; Zegota, AnnaX > > <annax.zegota@intel.com> > > Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures > > > > 18/11/2021 23:10, Liguzinski, WojciechX: > > > Hi, > > > > > > I was trying to reproduce this test failure, but for me RED tests are > passing. > > > I was running the exact test command like the one described in Bug 826 > - > > 'red_autotest' on the current main branch. > > > > The test is not always failing. > > There are some failing conditions, please find them. > > I think you should try in a container with more limited resources. > > > > Hi Thomas, > > This is not a fair request IMO. We want to avoid wasting everybody's time, > including Wojciech's time. Can the bug originator provide the details on > the setup to reproduce the failure, please? Thank you! > > On a different point, we should probably tweak our autotests to > differentiate between logical failures and those failures related to > resources not being available, and flag the test result accordingly in the > report. For example, if memory allocation fails, the test should be flagged > as "Not enough resources" instead of simply "Failed". In the first case, > the next step should be fixing the test setup, while in the second case the > next step should be fixing the code. What do people think on this? > > Regards, > Cristian > -- *Lincoln Lavoie* Principal Engineer, Broadband Technologies 21 Madbury Rd., Ste. 100, Durham, NH 03824 lylavoie@iol.unh.edu https://www.iol.unh.edu +1-603-674-2755 (m) <https://www.iol.unh.edu> ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <BN9PR11MB53729251C262EEBB1134A61194619@BN9PR11MB5372.namprd11.prod.outlook.com>]
* Re: [dpdk-dev] [Bug 826] red_autotest random failures [not found] ` <BN9PR11MB53729251C262EEBB1134A61194619@BN9PR11MB5372.namprd11.prod.outlook.com> @ 2021-11-29 17:58 ` Brandon Lo 2021-11-30 7:51 ` Liguzinski, WojciechX [not found] ` <SA0PR11MB46708D32B6B2EC31D3DCE17F975A9@SA0PR11MB4670.namprd11.prod.outlook.com> 1 sibling, 1 reply; 21+ messages in thread From: Brandon Lo @ 2021-11-29 17:58 UTC (permalink / raw) To: Liguzinski, WojciechX Cc: Lincoln Lavoie, Dumitrescu, Cristian, Thomas Monjalon, David Marchand, Ajmera, Megha, Singh, Jasvinder, dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX On Wed, Nov 24, 2021 at 2:48 AM Liguzinski, WojciechX < wojciechx.liguzinski@intel.com> wrote: > Hi, > > > > Thanks Lincoln, I will also have a try with such script. > > > > Cheers, > > Wojciech > > Hello Wojciech, I also recommend trying to run the test with around 4GB of RAM and 2GB of hugepages to see if it fails. That is roughly the number of resources we have per machine that is completely dedicated to unit tests. The amount of RAM available can sometimes increase depending on how many jobs are running per machine, but 4GB is the lowest it can go for the unit test job. Thanks, Brandon -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu www.iol.unh.edu ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-29 17:58 ` Brandon Lo @ 2021-11-30 7:51 ` Liguzinski, WojciechX 2021-12-10 13:31 ` Liguzinski, WojciechX 0 siblings, 1 reply; 21+ messages in thread From: Liguzinski, WojciechX @ 2021-11-30 7:51 UTC (permalink / raw) To: Brandon Lo Cc: Lincoln Lavoie, Dumitrescu, Cristian, Thomas Monjalon, David Marchand, Ajmera, Megha, Singh, Jasvinder, dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX Ok, thanks Brandon for the tip :) Let’s see if I can setup the machine with such configuration. Cheers, Wojciech From: Brandon Lo <blo@iol.unh.edu> Sent: Monday, November 29, 2021 6:58 PM To: Liguzinski, WojciechX <wojciechx.liguzinski@intel.com> Cc: Lincoln Lavoie <lylavoie@iol.unh.edu>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Thomas Monjalon <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org; Zegota, AnnaX <annax.zegota@intel.com> Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures On Wed, Nov 24, 2021 at 2:48 AM Liguzinski, WojciechX <wojciechx.liguzinski@intel.com<mailto:wojciechx.liguzinski@intel.com>> wrote: Hi, Thanks Lincoln, I will also have a try with such script. Cheers, Wojciech Hello Wojciech, I also recommend trying to run the test with around 4GB of RAM and 2GB of hugepages to see if it fails. That is roughly the number of resources we have per machine that is completely dedicated to unit tests. The amount of RAM available can sometimes increase depending on how many jobs are running per machine, but 4GB is the lowest it can go for the unit test job. Thanks, Brandon -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu<mailto:blo@iol.unh.edu> www.iol.unh.edu<http://www.iol.unh.edu> ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-30 7:51 ` Liguzinski, WojciechX @ 2021-12-10 13:31 ` Liguzinski, WojciechX 0 siblings, 0 replies; 21+ messages in thread From: Liguzinski, WojciechX @ 2021-12-10 13:31 UTC (permalink / raw) To: Brandon Lo Cc: Lincoln Lavoie, Dumitrescu, Cristian, Thomas Monjalon, David Marchand, Ajmera, Megha, Singh, Jasvinder, dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX, Danilewicz, MarcinX Hi, Unfortunately, I haven’t been able to move the investigation much further. I have been running those tests on machines with higher amount of RAM than 4GB, but with hugepages set there to 1GB and using the script provided by Lincoln. For several runs red_autotest tests didn’t fail even once, not giving any clue what might be the cause of what’s happening on CI. +Adding Marcin Danilewicz To let you know, Marcin Danilewicz will be taking over my tasks, so for any further aspects please include or direct messages to him. Best Regards, Wojciech From: Liguzinski, WojciechX <wojciechx.liguzinski@intel.com> Sent: Tuesday, November 30, 2021 8:51 AM To: Brandon Lo <blo@iol.unh.edu> Cc: Lincoln Lavoie <lylavoie@iol.unh.edu>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Thomas Monjalon <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; dev <dev@dpdk.org>; Aaron Conole <aconole@redhat.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; ci@dpdk.org; Zegota, AnnaX <annax.zegota@intel.com> Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures Ok, thanks Brandon for the tip :) Let’s see if I can setup the machine with such configuration. Cheers, Wojciech From: Brandon Lo <blo@iol.unh.edu<mailto:blo@iol.unh.edu>> Sent: Monday, November 29, 2021 6:58 PM To: Liguzinski, WojciechX <wojciechx.liguzinski@intel.com<mailto:wojciechx.liguzinski@intel.com>> Cc: Lincoln Lavoie <lylavoie@iol.unh.edu<mailto:lylavoie@iol.unh.edu>>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com<mailto:cristian.dumitrescu@intel.com>>; Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; David Marchand <david.marchand@redhat.com<mailto:david.marchand@redhat.com>>; Ajmera, Megha <megha.ajmera@intel.com<mailto:megha.ajmera@intel.com>>; Singh, Jasvinder <jasvinder.singh@intel.com<mailto:jasvinder.singh@intel.com>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Aaron Conole <aconole@redhat.com<mailto:aconole@redhat.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; ci@dpdk.org<mailto:ci@dpdk.org>; Zegota, AnnaX <annax.zegota@intel.com<mailto:annax.zegota@intel.com>> Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures On Wed, Nov 24, 2021 at 2:48 AM Liguzinski, WojciechX <wojciechx.liguzinski@intel.com<mailto:wojciechx.liguzinski@intel.com>> wrote: Hi, Thanks Lincoln, I will also have a try with such script. Cheers, Wojciech Hello Wojciech, I also recommend trying to run the test with around 4GB of RAM and 2GB of hugepages to see if it fails. That is roughly the number of resources we have per machine that is completely dedicated to unit tests. The amount of RAM available can sometimes increase depending on how many jobs are running per machine, but 4GB is the lowest it can go for the unit test job. Thanks, Brandon -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu<mailto:blo@iol.unh.edu> www.iol.unh.edu<http://www.iol.unh.edu> ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <SA0PR11MB46708D32B6B2EC31D3DCE17F975A9@SA0PR11MB4670.namprd11.prod.outlook.com>]
[parent not found: <BY5PR11MB3926999DD139D10AD76D177F8F5B9@BY5PR11MB3926.namprd11.prod.outlook.com>]
[parent not found: <BY5PR11MB39261E9379E18C67BB4FB9938F5B9@BY5PR11MB3926.namprd11.prod.outlook.com>]
[parent not found: <BY5PR11MB3926DF1466F5815D5D2FEC798F259@BY5PR11MB3926.namprd11.prod.outlook.com>]
[parent not found: <CAOE1vsPcKAiTMPGH1VYwoTccWi7b=9DJdObdPJZhKQvqNQsFmw@mail.gmail.com>]
* Re: [dpdk-dev] [Bug 826] red_autotest random failures [not found] ` <CAOE1vsPcKAiTMPGH1VYwoTccWi7b=9DJdObdPJZhKQvqNQsFmw@mail.gmail.com> @ 2022-02-02 14:51 ` Brandon Lo 2022-02-02 17:07 ` Danilewicz, MarcinX 0 siblings, 1 reply; 21+ messages in thread From: Brandon Lo @ 2022-02-02 14:51 UTC (permalink / raw) To: Danilewicz, MarcinX, Lincoln Lavoie Cc: Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand, ci > On Mon, Jan 31, 2022 at 2:27 PM Danilewicz, MarcinX <marcinx.danilewicz@intel.com> wrote: >> After some time I did some testing. As you may guess, with real hardware I could not reproduce error. >> >> From what I see, the problem was here: >> >> FT2 >> RED config, avg queue size, min threshold, max threshold, drop prob %, drop rate %, diff %, tolerance % , >> 5 127 32 128 1.6493 0.9900 0.0000 50.0000 >> 6 127 32 128 1.4137 0.8500 0.0000 50.0000 >> 7 127 32 128 1.2370 0.7300 0.0000 50.0000 >> 8 127 32 128 1.0995 0.6200 0.0000 50.0000 >> 9 127 32 128 0.9896 0.6300 0.0000 50.0000 >> ------------------------------------------------------------------------ >> >> Drop_rate in line 8 should not be greater than in line 9. However by looking at other results, drop_rate value in line 9 is about 0.1% greater than expected. Line 8 results are fine to me. >> >> How often any can see this issue? Is there a chance I could use existing docker container for testing? Hi Marcin, Attached is an Ubuntu 20.04 Dockerfile that we use in the lab for unit testing. I don't think we run into the red_autotest failure too often, so it is hard to debug. My guess is that the test will randomly fail when there are a lot of processes running on the system, especially since we run multiple tests per system in the lab. In a typical worst-case scenario, we can see a machine performing 2 to 3 total compile/ABI tests along with a single unit test. We do allocate a specific amount of resources (4GB RAM, 2 cores) to each container, so that can be another factor that affects the frequency of these failures. If this issue seems too dependent on the current system load and other situational factors, it might be good to think about running the red_autotest separate from other tests in the lab so it does not have to compete for resources. Any thoughts on this? Thanks, Brandon -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu www.iol.unh.edu ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2022-02-02 14:51 ` Brandon Lo @ 2022-02-02 17:07 ` Danilewicz, MarcinX 2022-02-03 23:31 ` Danilewicz, MarcinX 0 siblings, 1 reply; 21+ messages in thread From: Danilewicz, MarcinX @ 2022-02-02 17:07 UTC (permalink / raw) To: Brandon Lo, Lincoln Lavoie Cc: Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand, ci Hi Brandon, I'll will look into this config file to see what I can do about it 😊 " a specific amount of resources (4GB RAM, 2 cores) to each container, so that can be another factor that affects the frequency of these failures. If this issue seems too dependent on the current system load and other situational factors, it might be good to think about running the red_autotest separate from other tests in the lab so it does not have to compete for resources. Any thoughts on this?" CPU family might be important, from CPU features perspective. Previously some throughput tests I was executing using two different machines (different core families) to get correct (expected) results. Perhaps nothing has changed since then. I'll let you know about my findings with that docker based installation. Kind Regards, /Marcin -------------------------------------------------------------- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2022-02-02 17:07 ` Danilewicz, MarcinX @ 2022-02-03 23:31 ` Danilewicz, MarcinX 2022-02-04 0:11 ` Brandon Lo 0 siblings, 1 reply; 21+ messages in thread From: Danilewicz, MarcinX @ 2022-02-03 23:31 UTC (permalink / raw) To: Brandon Lo, Lincoln Lavoie Cc: Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand, ci Hi Brandon, It looks like I am searching for some local script/s to generate VM? Or something for VM..? It started from this message: Step 11/12 : COPY scripts /scripts COPY failed: file not found in build context or excluded by .dockerignore: stat scripts: file does not exist I'll start searching for this, but perhaps you can enlighten me what is that and where it may be found. If possible 😊 Kind Regards, /Marcin -----Original Message----- From: Danilewicz, MarcinX Sent: Wednesday, February 2, 2022 6:08 PM To: Brandon Lo <blo@iol.unh.edu>; Lincoln Lavoie <lylavoie@iol.unh.edu> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; Zegota, AnnaX <annax.zegota@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; thomas@monjalon.net; david.marchand@redhat.com; ci@dpdk.org Subject: RE: [dpdk-dev] [Bug 826] red_autotest random failures Hi Brandon, I'll will look into this config file to see what I can do about it 😊 " a specific amount of resources (4GB RAM, 2 cores) to each container, so that can be another factor that affects the frequency of these failures. If this issue seems too dependent on the current system load and other situational factors, it might be good to think about running the red_autotest separate from other tests in the lab so it does not have to compete for resources. Any thoughts on this?" CPU family might be important, from CPU features perspective. Previously some throughput tests I was executing using two different machines (different core families) to get correct (expected) results. Perhaps nothing has changed since then. I'll let you know about my findings with that docker based installation. Kind Regards, /Marcin -------------------------------------------------------------- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [dpdk-dev] [Bug 826] red_autotest random failures 2022-02-03 23:31 ` Danilewicz, MarcinX @ 2022-02-04 0:11 ` Brandon Lo 2022-03-09 10:01 ` Danilewicz, MarcinX 0 siblings, 1 reply; 21+ messages in thread From: Brandon Lo @ 2022-02-04 0:11 UTC (permalink / raw) To: Danilewicz, MarcinX Cc: Lincoln Lavoie, Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand, ci On Thu, Feb 3, 2022 at 6:31 PM Danilewicz, MarcinX <marcinx.danilewicz@intel.com> wrote: > > Hi Brandon, > > It looks like I am searching for some local script/s to generate VM? Or something for VM..? > > It started from this message: > Step 11/12 : COPY scripts /scripts > COPY failed: file not found in build context or excluded by .dockerignore: stat scripts: file does not exist > > I'll start searching for this, but perhaps you can enlighten me what is that and where it may be found. If possible 😊 Hi Marcin, Sorry about that. You can probably remove that line from the Dockerfile. It is used to copy short bash scripts that run meson/ninja with DPDK. For unit testing, we run a small bash script like this: #!/bin/bash tar xzfm dpdk.tar.gz cd dpdk meson build --werror ninja -C build install cd build meson test --suite fast-tests -t 60 Thanks, Brandon -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu www.iol.unh.edu ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2022-02-04 0:11 ` Brandon Lo @ 2022-03-09 10:01 ` Danilewicz, MarcinX 2022-03-09 14:48 ` Brandon Lo 0 siblings, 1 reply; 21+ messages in thread From: Danilewicz, MarcinX @ 2022-03-09 10:01 UTC (permalink / raw) To: Brandon Lo Cc: Lincoln Lavoie, Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand, ci Hi Brandon, Sorry for late response, but I was busy in mean time. But after your mail well .. I've tried to run test dpdk from docker image. Few instances of images in parallel, enough to get machine fully loaded. But in turn, red_autotest never failed. Is it possible for you to share some additional details? About hardware used for testing, memory sizes, etc. To get some hint how to get these failures. I've seen other test failing constantly, depending on machine I was running other autotests. Maybe tests before red_autotest are changing hardware to the state where red_autotest is failing. Anyone tried to change autotests execution order? Also, I've don’t almost all to reproduce error and perhaps it is better to ignore that random error for now. It looks like you are able to successfully pass that test all the time, even when is failing from time to time. Right? If that is the true error, it will come out elsewhere. Kind Regards, /Marcin -----Original Message----- From: Brandon Lo <blo@iol.unh.edu> Sent: Friday, February 4, 2022 1:11 AM To: Danilewicz, MarcinX <marcinx.danilewicz@intel.com> Cc: Lincoln Lavoie <lylavoie@iol.unh.edu>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; Zegota, AnnaX <annax.zegota@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; thomas@monjalon.net; david.marchand@redhat.com; ci@dpdk.org Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures On Thu, Feb 3, 2022 at 6:31 PM Danilewicz, MarcinX <marcinx.danilewicz@intel.com> wrote: > > Hi Brandon, > > It looks like I am searching for some local script/s to generate VM? Or something for VM..? > > It started from this message: > Step 11/12 : COPY scripts /scripts > COPY failed: file not found in build context or excluded by > .dockerignore: stat scripts: file does not exist > > I'll start searching for this, but perhaps you can enlighten me what > is that and where it may be found. If possible 😊 Hi Marcin, Sorry about that. You can probably remove that line from the Dockerfile. It is used to copy short bash scripts that run meson/ninja with DPDK. For unit testing, we run a small bash script like this: #!/bin/bash tar xzfm dpdk.tar.gz cd dpdk meson build --werror ninja -C build install cd build meson test --suite fast-tests -t 60 Thanks, Brandon -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu www.iol.unh.edu -------------------------------------------------------------- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [dpdk-dev] [Bug 826] red_autotest random failures 2022-03-09 10:01 ` Danilewicz, MarcinX @ 2022-03-09 14:48 ` Brandon Lo 2022-03-10 17:25 ` Danilewicz, MarcinX 0 siblings, 1 reply; 21+ messages in thread From: Brandon Lo @ 2022-03-09 14:48 UTC (permalink / raw) To: Danilewicz, MarcinX Cc: Lincoln Lavoie, Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand, ci On Wed, Mar 9, 2022 at 5:01 AM Danilewicz, MarcinX <marcinx.danilewicz@intel.com> wrote: > > Hi Brandon, > > Sorry for late response, but I was busy in mean time. But after your mail well .. I've tried to run test dpdk from docker image. Few instances of images in parallel, enough to get machine fully loaded. But in turn, red_autotest never failed. > > Is it possible for you to share some additional details? About hardware used for testing, memory sizes, etc. To get some hint how to get these failures. I've seen other test failing constantly, depending on machine I was running other autotests. Maybe tests before red_autotest are changing hardware to the state where red_autotest is failing. Anyone tried to change autotests execution order? Hi Marcin, Unfortunately, I don't have any more details other than the ones we talked about before. It is possible that the issue is not as common now that we limited the number of compile jobs that can happen on each machine at the same time. We limited the amount of RAM that each job can use again so that the systems are not getting a high load. > Also, I've don’t almost all to reproduce error and perhaps it is better to ignore that random error for now. It looks like you are able to successfully pass that test all the time, even when is failing from time to time. Right? If that is the true error, it will come out elsewhere. Yes, I think it's ok if you think the test is good for now. I haven't seen it fail in a while, so it might just be due to the load we put on the systems. If the issue comes up, I can contact you again. Thanks, Brandon -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu www.iol.unh.edu ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [dpdk-dev] [Bug 826] red_autotest random failures 2022-03-09 14:48 ` Brandon Lo @ 2022-03-10 17:25 ` Danilewicz, MarcinX 0 siblings, 0 replies; 21+ messages in thread From: Danilewicz, MarcinX @ 2022-03-10 17:25 UTC (permalink / raw) To: Brandon Lo Cc: Lincoln Lavoie, Dumitrescu, Cristian, Ajmera, Megha, Singh, Jasvinder, Zegota, AnnaX, Yigit, Ferruh, thomas, david.marchand, ci Hi Brandon, that’s good news. Regards, /Marcin -----Original Message----- From: Brandon Lo <blo@iol.unh.edu> Sent: Wednesday, March 9, 2022 3:48 PM To: Danilewicz, MarcinX <marcinx.danilewicz@intel.com> Cc: Lincoln Lavoie <lylavoie@iol.unh.edu>; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Ajmera, Megha <megha.ajmera@intel.com>; Singh, Jasvinder <jasvinder.singh@intel.com>; Zegota, AnnaX <annax.zegota@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; thomas@monjalon.net; david.marchand@redhat.com; ci@dpdk.org Subject: Re: [dpdk-dev] [Bug 826] red_autotest random failures On Wed, Mar 9, 2022 at 5:01 AM Danilewicz, MarcinX <marcinx.danilewicz@intel.com> wrote: > > Hi Brandon, > > Sorry for late response, but I was busy in mean time. But after your mail well .. I've tried to run test dpdk from docker image. Few instances of images in parallel, enough to get machine fully loaded. But in turn, red_autotest never failed. > > Is it possible for you to share some additional details? About hardware used for testing, memory sizes, etc. To get some hint how to get these failures. I've seen other test failing constantly, depending on machine I was running other autotests. Maybe tests before red_autotest are changing hardware to the state where red_autotest is failing. Anyone tried to change autotests execution order? Hi Marcin, Unfortunately, I don't have any more details other than the ones we talked about before. It is possible that the issue is not as common now that we limited the number of compile jobs that can happen on each machine at the same time. We limited the amount of RAM that each job can use again so that the systems are not getting a high load. > Also, I've don’t almost all to reproduce error and perhaps it is better to ignore that random error for now. It looks like you are able to successfully pass that test all the time, even when is failing from time to time. Right? If that is the true error, it will come out elsewhere. Yes, I think it's ok if you think the test is good for now. I haven't seen it fail in a while, so it might just be due to the load we put on the systems. If the issue comes up, I can contact you again. Thanks, Brandon -- Brandon Lo UNH InterOperability Laboratory 21 Madbury Rd, Suite 100, Durham, NH 03824 blo@iol.unh.edu www.iol.unh.edu -------------------------------------------------------------- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-19 16:53 ` Dumitrescu, Cristian 2021-11-19 17:25 ` Lincoln Lavoie @ 2021-11-22 8:17 ` David Marchand 2021-11-22 13:34 ` Lincoln Lavoie 1 sibling, 1 reply; 21+ messages in thread From: David Marchand @ 2021-11-22 8:17 UTC (permalink / raw) To: Dumitrescu, Cristian Cc: Thomas Monjalon, Lincoln Lavoie, Ajmera, Megha, Singh, Jasvinder, Liguzinski, WojciechX, dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX On Fri, Nov 19, 2021 at 5:54 PM Dumitrescu, Cristian <cristian.dumitrescu@intel.com> wrote: > On a different point, we should probably tweak our autotests to differentiate between logical failures and those failures related to resources not being available, and flag the test result accordingly in the report. For example, if memory allocation fails, the test should be flagged as "Not enough resources" instead of simply "Failed". In the first case, the next step should be fixing the test setup, while in the second case the next step should be fixing the code. What do people think on this? In such case, the test must return TEST_SKIPPED. I did a pass for cores count / specific hw requirements, some time ago. See https://git.dpdk.org/dpdk/commit/?id=e0f4a0ed4237 -- David Marchand ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [dpdk-dev] [Bug 826] red_autotest random failures 2021-11-22 8:17 ` David Marchand @ 2021-11-22 13:34 ` Lincoln Lavoie 0 siblings, 0 replies; 21+ messages in thread From: Lincoln Lavoie @ 2021-11-22 13:34 UTC (permalink / raw) To: David Marchand Cc: Dumitrescu, Cristian, Thomas Monjalon, Ajmera, Megha, Singh, Jasvinder, Liguzinski, WojciechX, dev, Aaron Conole, Yigit, Ferruh, ci, Zegota, AnnaX On Mon, Nov 22, 2021 at 3:17 AM David Marchand <david.marchand@redhat.com> wrote: > On Fri, Nov 19, 2021 at 5:54 PM Dumitrescu, Cristian > <cristian.dumitrescu@intel.com> wrote: > > On a different point, we should probably tweak our autotests to > differentiate between logical failures and those failures related to > resources not being available, and flag the test result accordingly in the > report. For example, if memory allocation fails, the test should be flagged > as "Not enough resources" instead of simply "Failed". In the first case, > the next step should be fixing the test setup, while in the second case the > next step should be fixing the code. What do people think on this? > > In such case, the test must return TEST_SKIPPED. > > If the purpose of the component / function being tested is to get / create / reserve the resource(s), the failure might be valid. So it can't be applied across the board. But places where the test is checking other functionality, this might at least prevent some failures that are transient (i.e. based on what the test could "get" from the system at that moment in time). > I did a pass for cores count / specific hw requirements, some time ago. > See https://git.dpdk.org/dpdk/commit/?id=e0f4a0ed4237 > > > -- > David Marchand > > -- *Lincoln Lavoie* Principal Engineer, Broadband Technologies 21 Madbury Rd., Ste. 100, Durham, NH 03824 lylavoie@iol.unh.edu https://www.iol.unh.edu +1-603-674-2755 (m) <https://www.iol.unh.edu> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2022-03-10 17:25 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <bug-826-3@http.bugs.dpdk.org/> 2021-11-12 13:51 ` [dpdk-dev] [Bug 826] red_autotest random failures David Marchand 2021-11-12 14:10 ` Lincoln Lavoie 2021-11-12 14:15 ` David Marchand 2021-11-15 11:51 ` Dumitrescu, Cristian 2021-11-15 17:26 ` Liguzinski, WojciechX 2021-11-18 22:10 ` Liguzinski, WojciechX 2021-11-19 7:26 ` Thomas Monjalon 2021-11-19 16:53 ` Dumitrescu, Cristian 2021-11-19 17:25 ` Lincoln Lavoie [not found] ` <BN9PR11MB53729251C262EEBB1134A61194619@BN9PR11MB5372.namprd11.prod.outlook.com> 2021-11-29 17:58 ` Brandon Lo 2021-11-30 7:51 ` Liguzinski, WojciechX 2021-12-10 13:31 ` Liguzinski, WojciechX [not found] ` <SA0PR11MB46708D32B6B2EC31D3DCE17F975A9@SA0PR11MB4670.namprd11.prod.outlook.com> [not found] ` <BY5PR11MB3926999DD139D10AD76D177F8F5B9@BY5PR11MB3926.namprd11.prod.outlook.com> [not found] ` <BY5PR11MB39261E9379E18C67BB4FB9938F5B9@BY5PR11MB3926.namprd11.prod.outlook.com> [not found] ` <BY5PR11MB3926DF1466F5815D5D2FEC798F259@BY5PR11MB3926.namprd11.prod.outlook.com> [not found] ` <CAOE1vsPcKAiTMPGH1VYwoTccWi7b=9DJdObdPJZhKQvqNQsFmw@mail.gmail.com> 2022-02-02 14:51 ` Brandon Lo 2022-02-02 17:07 ` Danilewicz, MarcinX 2022-02-03 23:31 ` Danilewicz, MarcinX 2022-02-04 0:11 ` Brandon Lo 2022-03-09 10:01 ` Danilewicz, MarcinX 2022-03-09 14:48 ` Brandon Lo 2022-03-10 17:25 ` Danilewicz, MarcinX 2021-11-22 8:17 ` David Marchand 2021-11-22 13:34 ` Lincoln Lavoie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).