DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/1] make file prefix unit test more resilient
@ 2023-09-14 10:42 Bruce Richardson
  2023-09-14 10:42 ` [PATCH 1/1] app/test: skip PCI bus scan when testing prefix flags Bruce Richardson
  2023-09-20 10:00 ` [PATCH 0/1] make file prefix unit test more resilient David Marchand
  0 siblings, 2 replies; 9+ messages in thread
From: Bruce Richardson @ 2023-09-14 10:42 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

When examining the IOL testing failures for patch series [1], I observed
that the failures reported were in the eal_flags_file_prefix unit test.
I was able to reproduce this on my system by passing an additional
"--on-pci" flag to the test run, since the log to the test has errors
about device availability. Adding the "no-pci" flag to the individual
test commands used by the unit tests fixed the issue thereafter,
allowing the test to pass in all cases for me. Therefore, I am
submitting this patch in the hopes of making the test more robust, since
the observed failures seem unrelated to the original patchset [1] I
submitted.

[1] http://patches.dpdk.org/project/dpdk/list/?series=29406

Bruce Richardson (1):
  app/test: skip PCI bus scan when testing prefix flags

 app/test/test_eal_flags.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] app/test: skip PCI bus scan when testing prefix flags
  2023-09-14 10:42 [PATCH 0/1] make file prefix unit test more resilient Bruce Richardson
@ 2023-09-14 10:42 ` Bruce Richardson
  2023-09-20 10:00 ` [PATCH 0/1] make file prefix unit test more resilient David Marchand
  1 sibling, 0 replies; 9+ messages in thread
From: Bruce Richardson @ 2023-09-14 10:42 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

When testing the file prefix handling, and the memory cleanup for the
various prefixes, we don't need to worry about PCI devices. Therefore
skip the device scan on startup. In my testing, this makes the test runs
more resilient.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_eal_flags.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 6cb4b06757..48d26e8871 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -1203,47 +1203,47 @@ test_file_prefix(void)
 #endif
 
 	/* this should fail unless the test itself is run with "memtest" prefix */
-	const char *argv0[] = {prgname, mp_flag, "-m",
+	const char *argv0[] = {prgname, mp_flag, "--no-pci", "-m",
 			DEFAULT_MEM_SIZE, "--file-prefix=" memtest };
 
 	/* primary process with memtest1 and default mem mode */
-	const char *argv1[] = {prgname, "-m",
+	const char *argv1[] = {prgname, "--no-pci", "-m",
 			DEFAULT_MEM_SIZE, "--file-prefix=" memtest1 };
 
 	/* primary process with memtest1 and legacy mem mode */
-	const char *argv2[] = {prgname, "-m",
+	const char *argv2[] = {prgname, "--no-pci", "-m",
 			DEFAULT_MEM_SIZE, "--file-prefix=" memtest1,
 			"--legacy-mem" };
 
 	/* primary process with memtest2 and legacy mem mode */
-	const char *argv3[] = {prgname, "-m",
+	const char *argv3[] = {prgname, "--no-pci", "-m",
 			DEFAULT_MEM_SIZE, "--file-prefix=" memtest2,
 			"--legacy-mem" };
 
 	/* primary process with memtest2 and default mem mode */
-	const char *argv4[] = {prgname, "-m",
+	const char *argv4[] = {prgname, "--no-pci", "-m",
 			DEFAULT_MEM_SIZE, "--file-prefix=" memtest2 };
 
 	/* primary process with --in-memory mode */
-	const char * const argv5[] = {prgname, "-m",
+	const char * const argv5[] = {prgname, "--no-pci", "-m",
 		DEFAULT_MEM_SIZE, "--in-memory" };
 
 	/* primary process with memtest1 and --in-memory mode */
-	const char * const argv6[] = {prgname, "-m",
+	const char * const argv6[] = {prgname, "--no-pci", "-m",
 		DEFAULT_MEM_SIZE, "--in-memory",
 		"--file-prefix=" memtest1 };
 
 	/* primary process with parent file-prefix and --in-memory mode */
-	const char * const argv7[] = {prgname, "-m",
+	const char * const argv7[] = {prgname, "--no-pci", "-m",
 		DEFAULT_MEM_SIZE, "--in-memory", "--file-prefix", prefix };
 
 	/* primary process with memtest1 and --single-file-segments mode */
-	const char * const argv8[] = {prgname, "-m",
+	const char * const argv8[] = {prgname, "--no-pci", "-m",
 		DEFAULT_MEM_SIZE, "--single-file-segments",
 		"--file-prefix=" memtest1 };
 
 	/* primary process with memtest1 and --huge-unlink=never mode */
-	const char * const argv9[] = {prgname, "-m",
+	const char * const argv9[] = {prgname, "--no-pci", "-m",
 		DEFAULT_MEM_SIZE, "--huge-unlink=never",
 		"--file-prefix=" memtest1 };
 
-- 
2.39.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/1] make file prefix unit test more resilient
  2023-09-14 10:42 [PATCH 0/1] make file prefix unit test more resilient Bruce Richardson
  2023-09-14 10:42 ` [PATCH 1/1] app/test: skip PCI bus scan when testing prefix flags Bruce Richardson
@ 2023-09-20 10:00 ` David Marchand
  2023-09-20 10:09   ` Bruce Richardson
  1 sibling, 1 reply; 9+ messages in thread
From: David Marchand @ 2023-09-20 10:00 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Aaron Conole, Ferruh Yigit, Thomas Monjalon

On Thu, Sep 14, 2023 at 12:42 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> When examining the IOL testing failures for patch series [1], I observed
> that the failures reported were in the eal_flags_file_prefix unit test.
> I was able to reproduce this on my system by passing an additional
> "--on-pci" flag to the test run, since the log to the test has errors
> about device availability. Adding the "no-pci" flag to the individual

Something is not clear to me.

While I understand that passing "no-pci" helps avoiding the issue (as
described below), I have some trouble understanding this passage
(above) with "--on-pci".
How did you reproduce the issue?


> test commands used by the unit tests fixed the issue thereafter,
> allowing the test to pass in all cases for me. Therefore, I am
> submitting this patch in the hopes of making the test more robust, since
> the observed failures seem unrelated to the original patchset [1] I
> submitted.
>
> [1] http://patches.dpdk.org/project/dpdk/list/?series=29406
>
> Bruce Richardson (1):
>   app/test: skip PCI bus scan when testing prefix flags
>
>  app/test/test_eal_flags.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)

Iiuc, the problem is that the file_prefix unit test can fail if any
DPDK subsystem forgets to release some memory and some hugepages are
left behind at the cleanup step.
Passing --no-pci as you suggest hides issues coming from PCI drivers.

This is something I tried to fix too, with
https://patchwork.dpdk.org/project/dpdk/list/?series=29288 though my
fix only handles a part of the issue (here, the ethdev drivers).

Another way to make the file prefix more robust would be to remove the
check on released memory, or move it to another test.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/1] make file prefix unit test more resilient
  2023-09-20 10:00 ` [PATCH 0/1] make file prefix unit test more resilient David Marchand
@ 2023-09-20 10:09   ` Bruce Richardson
  2023-09-22 12:57     ` Thomas Monjalon
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce Richardson @ 2023-09-20 10:09 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Aaron Conole, Ferruh Yigit, Thomas Monjalon

On Wed, Sep 20, 2023 at 12:00:08PM +0200, David Marchand wrote:
> On Thu, Sep 14, 2023 at 12:42 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > When examining the IOL testing failures for patch series [1], I observed
> > that the failures reported were in the eal_flags_file_prefix unit test.
> > I was able to reproduce this on my system by passing an additional
> > "--on-pci" flag to the test run, since the log to the test has errors
> > about device availability. Adding the "no-pci" flag to the individual
> 
> Something is not clear to me.
> 
> While I understand that passing "no-pci" helps avoiding the issue (as
> described below), I have some trouble understanding this passage
> (above) with "--on-pci".

That's a typo for no-pci. When I ran the test on my system with the main
process using no-pci, I was able to reproduce the issue seen in the IOL
lab. Otherwise I couldn't reproduce it.

> How did you reproduce the issue?
> 
> 
> > test commands used by the unit tests fixed the issue thereafter,
> > allowing the test to pass in all cases for me. Therefore, I am
> > submitting this patch in the hopes of making the test more robust, since
> > the observed failures seem unrelated to the original patchset [1] I
> > submitted.
> >
> > [1] http://patches.dpdk.org/project/dpdk/list/?series=29406
> >
> > Bruce Richardson (1):
> >   app/test: skip PCI bus scan when testing prefix flags
> >
> >  app/test/test_eal_flags.c | 20 ++++++++++----------
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> Iiuc, the problem is that the file_prefix unit test can fail if any
> DPDK subsystem forgets to release some memory and some hugepages are
> left behind at the cleanup step.
> Passing --no-pci as you suggest hides issues coming from PCI drivers.
> 
> This is something I tried to fix too, with
> https://patchwork.dpdk.org/project/dpdk/list/?series=29288 though my
> fix only handles a part of the issue (here, the ethdev drivers).
> 
> Another way to make the file prefix more robust would be to remove the
> check on released memory, or move it to another test.
> 
I actually think the test is a good one to have. Also, taking in your patch
to help with the issue is a good idea also.

I'd still suggest that this patch be considered anyway, as there is no need
to do PCI bus scanning as part of this test. Therefore I'd view it as a
harmless addition that may help things.

/Bruce

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/1] make file prefix unit test more resilient
  2023-09-20 10:09   ` Bruce Richardson
@ 2023-09-22 12:57     ` Thomas Monjalon
  2023-09-22 13:23       ` Bruce Richardson
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Monjalon @ 2023-09-22 12:57 UTC (permalink / raw)
  To: David Marchand, Bruce Richardson; +Cc: dev, Aaron Conole, Ferruh Yigit

20/09/2023 12:09, Bruce Richardson:
> On Wed, Sep 20, 2023 at 12:00:08PM +0200, David Marchand wrote:
> > On Thu, Sep 14, 2023 at 12:42 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > When examining the IOL testing failures for patch series [1], I observed
> > > that the failures reported were in the eal_flags_file_prefix unit test.
> > > I was able to reproduce this on my system by passing an additional
> > > "--on-pci" flag to the test run, since the log to the test has errors
> > > about device availability. Adding the "no-pci" flag to the individual
> > 
> > Something is not clear to me.
> > 
> > While I understand that passing "no-pci" helps avoiding the issue (as
> > described below), I have some trouble understanding this passage
> > (above) with "--on-pci".
> 
> That's a typo for no-pci. When I ran the test on my system with the main
> process using no-pci, I was able to reproduce the issue seen in the IOL
> lab. Otherwise I couldn't reproduce it.
> 
> > How did you reproduce the issue?
> > 
> > 
> > > test commands used by the unit tests fixed the issue thereafter,
> > > allowing the test to pass in all cases for me. Therefore, I am
> > > submitting this patch in the hopes of making the test more robust, since
> > > the observed failures seem unrelated to the original patchset [1] I
> > > submitted.
> > >
> > > [1] http://patches.dpdk.org/project/dpdk/list/?series=29406
> > >
> > > Bruce Richardson (1):
> > >   app/test: skip PCI bus scan when testing prefix flags
> > >
> > >  app/test/test_eal_flags.c | 20 ++++++++++----------
> > >  1 file changed, 10 insertions(+), 10 deletions(-)
> > 
> > Iiuc, the problem is that the file_prefix unit test can fail if any
> > DPDK subsystem forgets to release some memory and some hugepages are
> > left behind at the cleanup step.
> > Passing --no-pci as you suggest hides issues coming from PCI drivers.
> > 
> > This is something I tried to fix too, with
> > https://patchwork.dpdk.org/project/dpdk/list/?series=29288 though my
> > fix only handles a part of the issue (here, the ethdev drivers).
> > 
> > Another way to make the file prefix more robust would be to remove the
> > check on released memory, or move it to another test.
> > 
> I actually think the test is a good one to have. Also, taking in your patch
> to help with the issue is a good idea also.
> 
> I'd still suggest that this patch be considered anyway, as there is no need
> to do PCI bus scanning as part of this test. Therefore I'd view it as a
> harmless addition that may help things.

I'm hesitating.
This test is checking if some memory is left, and I think it is sane.
If we add --no-pci, we reduce the coverage of this check.

Now that the root cause is fixed by David in ethdev
(https://patches.dpdk.org/project/dpdk/patch/20230821085806.3062613-4-david.marchand@redhat.com/)
we could continue checking memory freeing with PCI drivers.
So I tend to reject this patch.

Other opinions?




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/1] make file prefix unit test more resilient
  2023-09-22 12:57     ` Thomas Monjalon
@ 2023-09-22 13:23       ` Bruce Richardson
  2023-09-23  8:21         ` Thomas Monjalon
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce Richardson @ 2023-09-22 13:23 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: David Marchand, dev, Aaron Conole, Ferruh Yigit, anatoly.burakov

On Fri, Sep 22, 2023 at 02:57:32PM +0200, Thomas Monjalon wrote:
> 20/09/2023 12:09, Bruce Richardson:
> > On Wed, Sep 20, 2023 at 12:00:08PM +0200, David Marchand wrote:
> > > On Thu, Sep 14, 2023 at 12:42 PM Bruce Richardson
> > > <bruce.richardson@intel.com> wrote:
> > > >
> > > > When examining the IOL testing failures for patch series [1], I observed
> > > > that the failures reported were in the eal_flags_file_prefix unit test.
> > > > I was able to reproduce this on my system by passing an additional
> > > > "--on-pci" flag to the test run, since the log to the test has errors
> > > > about device availability. Adding the "no-pci" flag to the individual
> > > 
> > > Something is not clear to me.
> > > 
> > > While I understand that passing "no-pci" helps avoiding the issue (as
> > > described below), I have some trouble understanding this passage
> > > (above) with "--on-pci".
> > 
> > That's a typo for no-pci. When I ran the test on my system with the main
> > process using no-pci, I was able to reproduce the issue seen in the IOL
> > lab. Otherwise I couldn't reproduce it.
> > 
> > > How did you reproduce the issue?
> > > 
> > > 
> > > > test commands used by the unit tests fixed the issue thereafter,
> > > > allowing the test to pass in all cases for me. Therefore, I am
> > > > submitting this patch in the hopes of making the test more robust, since
> > > > the observed failures seem unrelated to the original patchset [1] I
> > > > submitted.
> > > >
> > > > [1] http://patches.dpdk.org/project/dpdk/list/?series=29406
> > > >
> > > > Bruce Richardson (1):
> > > >   app/test: skip PCI bus scan when testing prefix flags
> > > >
> > > >  app/test/test_eal_flags.c | 20 ++++++++++----------
> > > >  1 file changed, 10 insertions(+), 10 deletions(-)
> > > 
> > > Iiuc, the problem is that the file_prefix unit test can fail if any
> > > DPDK subsystem forgets to release some memory and some hugepages are
> > > left behind at the cleanup step.
> > > Passing --no-pci as you suggest hides issues coming from PCI drivers.
> > > 
> > > This is something I tried to fix too, with
> > > https://patchwork.dpdk.org/project/dpdk/list/?series=29288 though my
> > > fix only handles a part of the issue (here, the ethdev drivers).
> > > 
> > > Another way to make the file prefix more robust would be to remove the
> > > check on released memory, or move it to another test.
> > > 
> > I actually think the test is a good one to have. Also, taking in your patch
> > to help with the issue is a good idea also.
> > 
> > I'd still suggest that this patch be considered anyway, as there is no need
> > to do PCI bus scanning as part of this test. Therefore I'd view it as a
> > harmless addition that may help things.
> 
> I'm hesitating.
> This test is checking if some memory is left, and I think it is sane.
> If we add --no-pci, we reduce the coverage of this check.
> 
> Now that the root cause is fixed by David in ethdev
> (https://patches.dpdk.org/project/dpdk/patch/20230821085806.3062613-4-david.marchand@redhat.com/)
> we could continue checking memory freeing with PCI drivers.
> So I tend to reject this patch.
> 
> Other opinions?
> 
No objection to this patch being rejected if not necessary.

However, I'd question if the normal case is actually checking for freeing
memory in PCI drivers. I suspect that in EAL cleanup we delete all files we
use, irrespective of whether the mappings are still in use. Then when the
process exits the hugepages will be completely freed back - even if some
components leaked memory. I believe this case is checking for correct EAL
cleanup of hugepage files, not for any memory leaks, and in that regard
omitting some components should make no difference.

/Bruce

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/1] make file prefix unit test more resilient
  2023-09-22 13:23       ` Bruce Richardson
@ 2023-09-23  8:21         ` Thomas Monjalon
  2023-09-25  8:02           ` Bruce Richardson
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Monjalon @ 2023-09-23  8:21 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: David Marchand, dev, Aaron Conole, Ferruh Yigit, anatoly.burakov

22/09/2023 15:23, Bruce Richardson:
> On Fri, Sep 22, 2023 at 02:57:32PM +0200, Thomas Monjalon wrote:
> > 20/09/2023 12:09, Bruce Richardson:
> > > On Wed, Sep 20, 2023 at 12:00:08PM +0200, David Marchand wrote:
> > > > On Thu, Sep 14, 2023 at 12:42 PM Bruce Richardson
> > > > <bruce.richardson@intel.com> wrote:
> > > > >
> > > > > When examining the IOL testing failures for patch series [1], I observed
> > > > > that the failures reported were in the eal_flags_file_prefix unit test.
> > > > > I was able to reproduce this on my system by passing an additional
> > > > > "--on-pci" flag to the test run, since the log to the test has errors
> > > > > about device availability. Adding the "no-pci" flag to the individual
> > > > 
> > > > Something is not clear to me.
> > > > 
> > > > While I understand that passing "no-pci" helps avoiding the issue (as
> > > > described below), I have some trouble understanding this passage
> > > > (above) with "--on-pci".
> > > 
> > > That's a typo for no-pci. When I ran the test on my system with the main
> > > process using no-pci, I was able to reproduce the issue seen in the IOL
> > > lab. Otherwise I couldn't reproduce it.
> > > 
> > > > How did you reproduce the issue?
> > > > 
> > > > 
> > > > > test commands used by the unit tests fixed the issue thereafter,
> > > > > allowing the test to pass in all cases for me. Therefore, I am
> > > > > submitting this patch in the hopes of making the test more robust, since
> > > > > the observed failures seem unrelated to the original patchset [1] I
> > > > > submitted.
> > > > >
> > > > > [1] http://patches.dpdk.org/project/dpdk/list/?series=29406
> > > > >
> > > > > Bruce Richardson (1):
> > > > >   app/test: skip PCI bus scan when testing prefix flags
> > > > >
> > > > >  app/test/test_eal_flags.c | 20 ++++++++++----------
> > > > >  1 file changed, 10 insertions(+), 10 deletions(-)
> > > > 
> > > > Iiuc, the problem is that the file_prefix unit test can fail if any
> > > > DPDK subsystem forgets to release some memory and some hugepages are
> > > > left behind at the cleanup step.
> > > > Passing --no-pci as you suggest hides issues coming from PCI drivers.
> > > > 
> > > > This is something I tried to fix too, with
> > > > https://patchwork.dpdk.org/project/dpdk/list/?series=29288 though my
> > > > fix only handles a part of the issue (here, the ethdev drivers).
> > > > 
> > > > Another way to make the file prefix more robust would be to remove the
> > > > check on released memory, or move it to another test.
> > > > 
> > > I actually think the test is a good one to have. Also, taking in your patch
> > > to help with the issue is a good idea also.
> > > 
> > > I'd still suggest that this patch be considered anyway, as there is no need
> > > to do PCI bus scanning as part of this test. Therefore I'd view it as a
> > > harmless addition that may help things.
> > 
> > I'm hesitating.
> > This test is checking if some memory is left, and I think it is sane.
> > If we add --no-pci, we reduce the coverage of this check.
> > 
> > Now that the root cause is fixed by David in ethdev
> > (https://patches.dpdk.org/project/dpdk/patch/20230821085806.3062613-4-david.marchand@redhat.com/)
> > we could continue checking memory freeing with PCI drivers.
> > So I tend to reject this patch.
> > 
> > Other opinions?
> > 
> No objection to this patch being rejected if not necessary.
> 
> However, I'd question if the normal case is actually checking for freeing
> memory in PCI drivers. I suspect that in EAL cleanup we delete all files we
> use, irrespective of whether the mappings are still in use. Then when the
> process exits the hugepages will be completely freed back - even if some
> components leaked memory. I believe this case is checking for correct EAL
> cleanup of hugepage files, not for any memory leaks, and in that regard
> omitting some components should make no difference.

You're right, that's why I'm hesitating.
Fortunately it helped to discover a memory leak.
Do we want to add a new specific test for memory leaks,
or is it OK to have it in this one?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/1] make file prefix unit test more resilient
  2023-09-23  8:21         ` Thomas Monjalon
@ 2023-09-25  8:02           ` Bruce Richardson
  2023-09-26 15:08             ` Aaron Conole
  0 siblings, 1 reply; 9+ messages in thread
From: Bruce Richardson @ 2023-09-25  8:02 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: David Marchand, dev, Aaron Conole, Ferruh Yigit, anatoly.burakov

On Sat, Sep 23, 2023 at 10:21:04AM +0200, Thomas Monjalon wrote:
> 22/09/2023 15:23, Bruce Richardson:
> > On Fri, Sep 22, 2023 at 02:57:32PM +0200, Thomas Monjalon wrote:
> > > 20/09/2023 12:09, Bruce Richardson:
> > > > On Wed, Sep 20, 2023 at 12:00:08PM +0200, David Marchand wrote:
> > > > > On Thu, Sep 14, 2023 at 12:42 PM Bruce Richardson
> > > > > <bruce.richardson@intel.com> wrote:
> > > > > >
> > > > > > When examining the IOL testing failures for patch series [1], I observed
> > > > > > that the failures reported were in the eal_flags_file_prefix unit test.
> > > > > > I was able to reproduce this on my system by passing an additional
> > > > > > "--on-pci" flag to the test run, since the log to the test has errors
> > > > > > about device availability. Adding the "no-pci" flag to the individual
> > > > > 
> > > > > Something is not clear to me.
> > > > > 
> > > > > While I understand that passing "no-pci" helps avoiding the issue (as
> > > > > described below), I have some trouble understanding this passage
> > > > > (above) with "--on-pci".
> > > > 
> > > > That's a typo for no-pci. When I ran the test on my system with the main
> > > > process using no-pci, I was able to reproduce the issue seen in the IOL
> > > > lab. Otherwise I couldn't reproduce it.
> > > > 
> > > > > How did you reproduce the issue?
> > > > > 
> > > > > 
> > > > > > test commands used by the unit tests fixed the issue thereafter,
> > > > > > allowing the test to pass in all cases for me. Therefore, I am
> > > > > > submitting this patch in the hopes of making the test more robust, since
> > > > > > the observed failures seem unrelated to the original patchset [1] I
> > > > > > submitted.
> > > > > >
> > > > > > [1] http://patches.dpdk.org/project/dpdk/list/?series=29406
> > > > > >
> > > > > > Bruce Richardson (1):
> > > > > >   app/test: skip PCI bus scan when testing prefix flags
> > > > > >
> > > > > >  app/test/test_eal_flags.c | 20 ++++++++++----------
> > > > > >  1 file changed, 10 insertions(+), 10 deletions(-)
> > > > > 
> > > > > Iiuc, the problem is that the file_prefix unit test can fail if any
> > > > > DPDK subsystem forgets to release some memory and some hugepages are
> > > > > left behind at the cleanup step.
> > > > > Passing --no-pci as you suggest hides issues coming from PCI drivers.
> > > > > 
> > > > > This is something I tried to fix too, with
> > > > > https://patchwork.dpdk.org/project/dpdk/list/?series=29288 though my
> > > > > fix only handles a part of the issue (here, the ethdev drivers).
> > > > > 
> > > > > Another way to make the file prefix more robust would be to remove the
> > > > > check on released memory, or move it to another test.
> > > > > 
> > > > I actually think the test is a good one to have. Also, taking in your patch
> > > > to help with the issue is a good idea also.
> > > > 
> > > > I'd still suggest that this patch be considered anyway, as there is no need
> > > > to do PCI bus scanning as part of this test. Therefore I'd view it as a
> > > > harmless addition that may help things.
> > > 
> > > I'm hesitating.
> > > This test is checking if some memory is left, and I think it is sane.
> > > If we add --no-pci, we reduce the coverage of this check.
> > > 
> > > Now that the root cause is fixed by David in ethdev
> > > (https://patches.dpdk.org/project/dpdk/patch/20230821085806.3062613-4-david.marchand@redhat.com/)
> > > we could continue checking memory freeing with PCI drivers.
> > > So I tend to reject this patch.
> > > 
> > > Other opinions?
> > > 
> > No objection to this patch being rejected if not necessary.
> > 
> > However, I'd question if the normal case is actually checking for freeing
> > memory in PCI drivers. I suspect that in EAL cleanup we delete all files we
> > use, irrespective of whether the mappings are still in use. Then when the
> > process exits the hugepages will be completely freed back - even if some
> > components leaked memory. I believe this case is checking for correct EAL
> > cleanup of hugepage files, not for any memory leaks, and in that regard
> > omitting some components should make no difference.
> 
> You're right, that's why I'm hesitating.
> Fortunately it helped to discover a memory leak.
> Do we want to add a new specific test for memory leaks,
> or is it OK to have it in this one?
> 

Not really sure. I'd tend towards saying that special memory leak checkers
like valgrind are better to use than trying to detect them in unit tests
directly. However, not an expert in this area.

/Bruce

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/1] make file prefix unit test more resilient
  2023-09-25  8:02           ` Bruce Richardson
@ 2023-09-26 15:08             ` Aaron Conole
  0 siblings, 0 replies; 9+ messages in thread
From: Aaron Conole @ 2023-09-26 15:08 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Thomas Monjalon, David Marchand, dev, Ferruh Yigit, anatoly.burakov

Bruce Richardson <bruce.richardson@intel.com> writes:

> On Sat, Sep 23, 2023 at 10:21:04AM +0200, Thomas Monjalon wrote:
>> 22/09/2023 15:23, Bruce Richardson:
>> > On Fri, Sep 22, 2023 at 02:57:32PM +0200, Thomas Monjalon wrote:
>> > > 20/09/2023 12:09, Bruce Richardson:
>> > > > On Wed, Sep 20, 2023 at 12:00:08PM +0200, David Marchand wrote:
>> > > > > On Thu, Sep 14, 2023 at 12:42 PM Bruce Richardson
>> > > > > <bruce.richardson@intel.com> wrote:
>> > > > > >
>> > > > > > When examining the IOL testing failures for patch series [1], I observed
>> > > > > > that the failures reported were in the eal_flags_file_prefix unit test.
>> > > > > > I was able to reproduce this on my system by passing an additional
>> > > > > > "--on-pci" flag to the test run, since the log to the test has errors
>> > > > > > about device availability. Adding the "no-pci" flag to the individual
>> > > > > 
>> > > > > Something is not clear to me.
>> > > > > 
>> > > > > While I understand that passing "no-pci" helps avoiding the issue (as
>> > > > > described below), I have some trouble understanding this passage
>> > > > > (above) with "--on-pci".
>> > > > 
>> > > > That's a typo for no-pci. When I ran the test on my system with the main
>> > > > process using no-pci, I was able to reproduce the issue seen in the IOL
>> > > > lab. Otherwise I couldn't reproduce it.
>> > > > 
>> > > > > How did you reproduce the issue?
>> > > > > 
>> > > > > 
>> > > > > > test commands used by the unit tests fixed the issue thereafter,
>> > > > > > allowing the test to pass in all cases for me. Therefore, I am
>> > > > > > submitting this patch in the hopes of making the test more robust, since
>> > > > > > the observed failures seem unrelated to the original patchset [1] I
>> > > > > > submitted.
>> > > > > >
>> > > > > > [1] http://patches.dpdk.org/project/dpdk/list/?series=29406
>> > > > > >
>> > > > > > Bruce Richardson (1):
>> > > > > >   app/test: skip PCI bus scan when testing prefix flags
>> > > > > >
>> > > > > >  app/test/test_eal_flags.c | 20 ++++++++++----------
>> > > > > >  1 file changed, 10 insertions(+), 10 deletions(-)
>> > > > > 
>> > > > > Iiuc, the problem is that the file_prefix unit test can fail if any
>> > > > > DPDK subsystem forgets to release some memory and some hugepages are
>> > > > > left behind at the cleanup step.
>> > > > > Passing --no-pci as you suggest hides issues coming from PCI drivers.
>> > > > > 
>> > > > > This is something I tried to fix too, with
>> > > > > https://patchwork.dpdk.org/project/dpdk/list/?series=29288 though my
>> > > > > fix only handles a part of the issue (here, the ethdev drivers).
>> > > > > 
>> > > > > Another way to make the file prefix more robust would be to remove the
>> > > > > check on released memory, or move it to another test.
>> > > > > 
>> > > > I actually think the test is a good one to have. Also, taking in your patch
>> > > > to help with the issue is a good idea also.
>> > > > 
>> > > > I'd still suggest that this patch be considered anyway, as there is no need
>> > > > to do PCI bus scanning as part of this test. Therefore I'd view it as a
>> > > > harmless addition that may help things.
>> > > 
>> > > I'm hesitating.
>> > > This test is checking if some memory is left, and I think it is sane.
>> > > If we add --no-pci, we reduce the coverage of this check.
>> > > 
>> > > Now that the root cause is fixed by David in ethdev
>> > > (https://patches.dpdk.org/project/dpdk/patch/20230821085806.3062613-4-david.marchand@redhat.com/)
>> > > we could continue checking memory freeing with PCI drivers.
>> > > So I tend to reject this patch.
>> > > 
>> > > Other opinions?
>> > > 
>> > No objection to this patch being rejected if not necessary.
>> > 
>> > However, I'd question if the normal case is actually checking for freeing
>> > memory in PCI drivers. I suspect that in EAL cleanup we delete all files we
>> > use, irrespective of whether the mappings are still in use. Then when the
>> > process exits the hugepages will be completely freed back - even if some
>> > components leaked memory. I believe this case is checking for correct EAL
>> > cleanup of hugepage files, not for any memory leaks, and in that regard
>> > omitting some components should make no difference.
>> 
>> You're right, that's why I'm hesitating.
>> Fortunately it helped to discover a memory leak.
>> Do we want to add a new specific test for memory leaks,
>> or is it OK to have it in this one?
>> 
>
> Not really sure. I'd tend towards saying that special memory leak checkers
> like valgrind are better to use than trying to detect them in unit tests
> directly. However, not an expert in this area.

I do tend to agree that we should rely on more generic memory infra like
valgrind.  However, the way we use mempools doesn't always lend itself
to leak checkers like valgrind which usually expect to own all the
individual blocks.  Maybe newer versions can work with our mempools though?

> /Bruce


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-09-26 15:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-14 10:42 [PATCH 0/1] make file prefix unit test more resilient Bruce Richardson
2023-09-14 10:42 ` [PATCH 1/1] app/test: skip PCI bus scan when testing prefix flags Bruce Richardson
2023-09-20 10:00 ` [PATCH 0/1] make file prefix unit test more resilient David Marchand
2023-09-20 10:09   ` Bruce Richardson
2023-09-22 12:57     ` Thomas Monjalon
2023-09-22 13:23       ` Bruce Richardson
2023-09-23  8:21         ` Thomas Monjalon
2023-09-25  8:02           ` Bruce Richardson
2023-09-26 15:08             ` Aaron Conole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).