From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C5A18A0547; Sun, 26 Sep 2021 14:21:00 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 959644003D; Sun, 26 Sep 2021 14:21:00 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 0D8534003C for ; Sun, 26 Sep 2021 14:20:58 +0200 (CEST) Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4HHPrN3vv3z8ymw; Sun, 26 Sep 2021 20:16:20 +0800 (CST) Received: from dggema767-chm.china.huawei.com (10.1.198.209) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2308.8; Sun, 26 Sep 2021 20:20:56 +0800 Received: from [10.66.74.184] (10.66.74.184) by dggema767-chm.china.huawei.com (10.1.198.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2308.8; Sun, 26 Sep 2021 20:20:55 +0800 To: Thomas Monjalon CC: , , References: <20210907034108.58763-1-lihuisong@huawei.com> <2969231.2FXvsJDIr2@thomas> <4f4e5d5f-3e52-6974-fa07-c1c587d92e80@huawei.com> <2004569.RrOHqjGOaX@thomas> From: Huisong Li Message-ID: <430246ab-36ce-402f-8570-d305ada9d720@huawei.com> Date: Sun, 26 Sep 2021 20:20:27 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <2004569.RrOHqjGOaX@thomas> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.66.74.184] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggema767-chm.china.huawei.com (10.1.198.209) X-CFilter-Loop: Reflected Subject: Re: [dpdk-dev] [RFC V1] examples/l3fwd-power: fix memory leak for rte_pci_device X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 在 2021/9/18 16:46, Thomas Monjalon 写道: > 18/09/2021 05:24, Huisong Li: >> 在 2021/9/17 20:50, Thomas Monjalon 写道: >>> 17/09/2021 04:13, Huisong Li: >>>> 在 2021/9/16 18:36, Thomas Monjalon 写道: >>>>> 16/09/2021 10:01, Huisong Li: >>>>>> 在 2021/9/8 15:20, Thomas Monjalon 写道: >>>>>>> 08/09/2021 04:01, Huisong Li: >>>>>>>> 在 2021/9/7 16:53, Thomas Monjalon 写道: >>>>>>>>> 07/09/2021 05:41, Huisong Li: >>>>>>>>>> Calling rte_eth_dev_close() will release resources of eth device and close >>>>>>>>>> it. But rte_pci_device struct isn't released when app exit, which will lead >>>>>>>>>> to memory leak. >>>>>>>>> That's a PMD issue. >>>>>>>>> When the last port of a PCI device is closed, the device should be freed. >>>>>>>> Why is this a PMD problem? I don't understand. >>>>>>> In the PMD close function, freeing of PCI device must be managed, >>>>>>> so the app doesn't have to bother. >>>>>> I know what you mean. Currently, there are two ways to close PMD device >>>>>> (rte_eth_dev_close() and rte_dev_remove()). >>>>>> >>>>>> For rte_dev_remove(), eth device can be closed and rte_pci_device also >>>>>> can be freed, so it can make app not care about that. >>>>>> >>>>>> But dev_close() is only used to close eth device, and nothing about >>>>>> rte_pci_device is involved in the framework layer >>>>>> >>>>>> call stack of dev_close(). The rte_pci_device is allocated and >>>>>> initialized when the rte_pci_bus scans "/sys/bus/pci/devices" directory. >>>>>> >>>>>> Generally, the PMD of eth devices operates on the basis of eth devices, >>>>>> and rarely on rte_pci_device. >>>>> No. The PMD is doing the relation between the PCI device and the ethdev port. >>>> It seems that the ethdev layer can create eth devices based on >>>> rte_pci_device, but does not release rte_pci_device. >>> No, the ethdev layer does not manage any bus. >>> Only the PMD does that. >> I don't mean that the ethdev layer manages the bus. >> >> I mean, it neither allocate rte_pci_device nor free it. >> >>>>>> And the rte_pci_device corresponding to the eth devices managed and >>>>>> processed by rte_pci_bus. >>>>>> >>>>>> So, PMD is closed only based on the port ID of the eth device, whilch >>>>>> only shuts down eth devices, not frees rte_pci_device >>>>>> and remove it from rte_pci_bus. >>>>> Not really. >>>> I do not see any PMD driver releasing rte_pci_device in dev_close(). >>> Maybe not but we should. >> I'm sure. >> >> As far as I know, the PMD does not free rte_pci_device for devices under >> the PCI bus, whether ethdev or dmadev. >> >>>>> If there is no port using the PCI device, it should be released. >>>> Yes. >>>>>>>> As far as I know, most apps or examples in the DPDK project have only >>>>>>>> one port for a pci device. >>>>>>> The number of ports per PCI device is driver-specific. >>>>>>> >>>>>>>> When the port is closed, the rte_pci_device should be freed. But none of >>>>>>>> the apps seem to do this. >>>>>>> That's because from the app point of view, only ports should be managed. >>>>>>> The hardware device is managed by the PMD. >>>>>>> Only drivers (PMDs) have to do the relation between class ports >>>>>>> and hardware devices. >>>>>> Yes. But the current app only closes the port to disable the PMD, and >>>>>> the rte_pci_device cannot be freed. >>>>> Why not? >>>> Because most apps in DPDK call dev_close() to close the eth device >>>> corresponding to a port. >>> You don't say why the underlying PCI device could not be freed. >> From the current implementation, rte_eth_dev_close() in ethdev layer >> and dev_close() in PMD both do not free it. >>>>>> Because rte_pci_device cannot be released in dev_close() of PMD, and is >>>>>> managed by framework layer. >>>>> No >>>>> >>>>>> Btw. Excluding rte_dev_probe() and rte_dev_remove(), it seems that the >>>>>> DPDK framework only automatically >>>>>> scans PCI devices, but does not automatically release PCI devices when >>>>>> the process exits. >>>>> Indeed, because such freeing is the responsibility of the PMD. >>>> Do you mean to free rte_pci_device in the dev_close() API? >>> I mean free the PCI device in the PMD implementation of dev_close. >> I don't think it's reasonable. > What is not reasonable, is to not free a device which is closed. > >> In the normal process, the rte_pci_device is allocated rte_eal_init() >> when pci bus scan "/sys/bus/pci/devices" >> >> by calling rte_bus_scan() and insert to rte_pci_bus.device_list. >> >> Then, calling rte_bus_probe() in rte_eal_init to match rte_pci_device >> and rte_pci_driver registered under rte_pci_bus >> >> to generate an eth device. >> >> From this point of view, the rte_pci_device should be managed and >> released by the rte_pci_bus. >> >> Generally, the uninstallation operation should be reversed. Release the >> eth device first and then release the rte_pci_device. > Same for mbuf in mempool: allocation is done by the app, > free is done by the PMD. That doesn't seem to be the case. In the rx direction, the mbuf is allocated by PMD. So it should be freed by the PMD. > Not everything is symmetrical. > >> Therefore the rte_pci_device does not be freed in the PMD >> implementation of dev_close. >> >>>> How should PMD free it? What should we do? Any good suggestions? >>> Check that there is no other port sharing the same PCI device, >>> then call the PMD callback for rte_pci_remove_t. >> For primary and secondary processes, their rte_pci_device is independent. > Yes it requires to free on both primary and secondary. > >> Is this for a scenario where there are multiple representor ports under >> the same PCI address in the same processe? > A PCI device can have multiple physical or representor ports. Got it. > >>>> Would it be more appropriate to do this in rte_eal_cleanup() if it >>>> cann't be done in the API above? >>> rte_eal_cleanup is a last cleanup for what was not done earlier. >>> We could do that but first we should properly free devices when closed. >>> >> Totally, it is appropriate that rte_eal_cleanup is responsible for >> releasing devices under the pci bus. > Yes, but if a device is closed while the rest of the app keep running, > we should not wait to free it. From this point of view, it seems to make sense. However, according to the OVS-DPDK usage, it calls dev_close() first, and then check whether all ports under the PCI address are closed to free rte_pci_device by calling rte_dev_remove(). If we do not want the user to be aware of this, and we want rte_pci_device to be freed in a timely manner. Can we add a code logic calculating the number of ports under a PCI address and calling rte_dev_remove() to rte_eth_dev_close() to free rte_pci_device and delete it from rte_pci_bus? If we do, we may need to make some extra work, otherwise some applications, such as OVS-DPDK, will fail due to a second call to rte_dev_remove(). > > > .