From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B648542644; Tue, 26 Sep 2023 17:08:22 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 490CC402D6; Tue, 26 Sep 2023 17:08:22 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id EC23A4027D for ; Tue, 26 Sep 2023 17:08:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695740900; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Chzron4K5wRZqU91b++238gY6bn50XEtpsnqQc6bIF4=; b=HYJMNIFPTEF5xHfwSJbMw39WbQBAnTdWWT2NmsJrVXHjf5jiG7hKsoxKzPE8XVP4qo/vav ceY01fYKj3aJjd4u/I1Y3NP8MAOoJI4O7aMm2buurHwWm6brOD1CQS9o6weWac0pFOrhDc bep6FuOh6bbRVRN7gJs4yuLt+dkiCNs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-694-BSnYIGcAM8SkD6_SJGL90Q-1; Tue, 26 Sep 2023 11:08:15 -0400 X-MC-Unique: BSnYIGcAM8SkD6_SJGL90Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5FCEC18162C4; Tue, 26 Sep 2023 15:08:15 +0000 (UTC) Received: from RHTPC1VM0NT (unknown [10.22.8.239]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E5ABB10EE402; Tue, 26 Sep 2023 15:08:14 +0000 (UTC) From: Aaron Conole To: Bruce Richardson Cc: Thomas Monjalon , David Marchand , , Ferruh Yigit , Subject: Re: [PATCH 0/1] make file prefix unit test more resilient References: <20230914104215.71408-1-bruce.richardson@intel.com> <11510039.CDJkKcVGEf@thomas> <4113725.VLH7GnMWUR@thomas> Date: Tue, 26 Sep 2023 11:08:14 -0400 In-Reply-To: (Bruce Richardson's message of "Mon, 25 Sep 2023 09:02:51 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Bruce Richardson writes: > On Sat, Sep 23, 2023 at 10:21:04AM +0200, Thomas Monjalon wrote: >> 22/09/2023 15:23, Bruce Richardson: >> > On Fri, Sep 22, 2023 at 02:57:32PM +0200, Thomas Monjalon wrote: >> > > 20/09/2023 12:09, Bruce Richardson: >> > > > On Wed, Sep 20, 2023 at 12:00:08PM +0200, David Marchand wrote: >> > > > > On Thu, Sep 14, 2023 at 12:42=E2=80=AFPM Bruce Richardson >> > > > > wrote: >> > > > > > >> > > > > > When examining the IOL testing failures for patch series [1], = I observed >> > > > > > that the failures reported were in the eal_flags_file_prefix u= nit test. >> > > > > > I was able to reproduce this on my system by passing an additi= onal >> > > > > > "--on-pci" flag to the test run, since the log to the test has= errors >> > > > > > about device availability. Adding the "no-pci" flag to the ind= ividual >> > > > >=20 >> > > > > Something is not clear to me. >> > > > >=20 >> > > > > While I understand that passing "no-pci" helps avoiding the issu= e (as >> > > > > described below), I have some trouble understanding this passage >> > > > > (above) with "--on-pci". >> > > >=20 >> > > > That's a typo for no-pci. When I ran the test on my system with th= e main >> > > > process using no-pci, I was able to reproduce the issue seen in th= e IOL >> > > > lab. Otherwise I couldn't reproduce it. >> > > >=20 >> > > > > How did you reproduce the issue? >> > > > >=20 >> > > > >=20 >> > > > > > test commands used by the unit tests fixed the issue thereafte= r, >> > > > > > allowing the test to pass in all cases for me. Therefore, I am >> > > > > > submitting this patch in the hopes of making the test more rob= ust, since >> > > > > > the observed failures seem unrelated to the original patchset = [1] I >> > > > > > submitted. >> > > > > > >> > > > > > [1] http://patches.dpdk.org/project/dpdk/list/?series=3D29406 >> > > > > > >> > > > > > Bruce Richardson (1): >> > > > > > app/test: skip PCI bus scan when testing prefix flags >> > > > > > >> > > > > > app/test/test_eal_flags.c | 20 ++++++++++---------- >> > > > > > 1 file changed, 10 insertions(+), 10 deletions(-) >> > > > >=20 >> > > > > Iiuc, the problem is that the file_prefix unit test can fail if = any >> > > > > DPDK subsystem forgets to release some memory and some hugepages= are >> > > > > left behind at the cleanup step. >> > > > > Passing --no-pci as you suggest hides issues coming from PCI dri= vers. >> > > > >=20 >> > > > > This is something I tried to fix too, with >> > > > > https://patchwork.dpdk.org/project/dpdk/list/?series=3D29288 tho= ugh my >> > > > > fix only handles a part of the issue (here, the ethdev drivers). >> > > > >=20 >> > > > > Another way to make the file prefix more robust would be to remo= ve the >> > > > > check on released memory, or move it to another test. >> > > > >=20 >> > > > I actually think the test is a good one to have. Also, taking in y= our patch >> > > > to help with the issue is a good idea also. >> > > >=20 >> > > > I'd still suggest that this patch be considered anyway, as there i= s no need >> > > > to do PCI bus scanning as part of this test. Therefore I'd view it= as a >> > > > harmless addition that may help things. >> > >=20 >> > > I'm hesitating. >> > > This test is checking if some memory is left, and I think it is sane= . >> > > If we add --no-pci, we reduce the coverage of this check. >> > >=20 >> > > Now that the root cause is fixed by David in ethdev >> > > (https://patches.dpdk.org/project/dpdk/patch/20230821085806.3062613-= 4-david.marchand@redhat.com/) >> > > we could continue checking memory freeing with PCI drivers. >> > > So I tend to reject this patch. >> > >=20 >> > > Other opinions? >> > >=20 >> > No objection to this patch being rejected if not necessary. >> >=20 >> > However, I'd question if the normal case is actually checking for free= ing >> > memory in PCI drivers. I suspect that in EAL cleanup we delete all fil= es we >> > use, irrespective of whether the mappings are still in use. Then when = the >> > process exits the hugepages will be completely freed back - even if so= me >> > components leaked memory. I believe this case is checking for correct = EAL >> > cleanup of hugepage files, not for any memory leaks, and in that regar= d >> > omitting some components should make no difference. >>=20 >> You're right, that's why I'm hesitating. >> Fortunately it helped to discover a memory leak. >> Do we want to add a new specific test for memory leaks, >> or is it OK to have it in this one? >>=20 > > Not really sure. I'd tend towards saying that special memory leak checker= s > like valgrind are better to use than trying to detect them in unit tests > directly. However, not an expert in this area. I do tend to agree that we should rely on more generic memory infra like valgrind. However, the way we use mempools doesn't always lend itself to leak checkers like valgrind which usually expect to own all the individual blocks. Maybe newer versions can work with our mempools though? > /Bruce