From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0C471A0C47; Sat, 18 Sep 2021 05:31:10 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C5DA640041; Sat, 18 Sep 2021 05:31:09 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 45ADD4003D for ; Sat, 18 Sep 2021 05:31:08 +0200 (CEST) Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4HBGSs0ttxz8ybV; Sat, 18 Sep 2021 11:26:37 +0800 (CST) Received: from dggema767-chm.china.huawei.com (10.1.198.209) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2308.8; Sat, 18 Sep 2021 11:31:05 +0800 Received: from [10.66.74.184] (10.66.74.184) by dggema767-chm.china.huawei.com (10.1.198.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2308.8; Sat, 18 Sep 2021 11:31:05 +0800 From: Huisong Li To: Ferruh Yigit , Thomas Monjalon CC: Andrew Rybchenko , "dev@dpdk.org" , Anatoly Burakov , David Marchand References: <1627908397-51565-1-git-send-email-lihuisong@huawei.com> <1627957839-38279-1-git-send-email-lihuisong@huawei.com> <9670389d-ebbc-9d9c-0cac-c7e8826ecb6f@huawei.com> <21383486.34YfpWhNxb@thomas> Message-ID: <1c1bf5d6-dc33-28cd-8345-f8e1ed4d3213@huawei.com> Date: Sat, 18 Sep 2021 11:31:05 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.66.74.184] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggema767-chm.china.huawei.com (10.1.198.209) X-CFilter-Loop: Reflected Subject: Re: [dpdk-dev] [RFC V2] ethdev: fix issue that dev close in PMD calls twice X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, All Can you take a look at this problem? Thanks. 在 2021/8/25 17:53, Huisong Li 写道: > > 在 2021/8/24 22:42, Ferruh Yigit 写道: >> On 8/19/2021 4:45 AM, Huisong Li wrote: >>> 在 2021/8/18 19:24, Ferruh Yigit 写道: >>>> On 8/13/2021 9:16 AM, Huisong Li wrote: >>>>> 在 2021/8/13 14:12, Thomas Monjalon 写道: >>>>>> 13/08/2021 04:11, Huisong Li: >>>>>>> Hi, all >>>>>>> >>>>>>> This patch can enhance the security of device uninstallation to >>>>>>> eliminate dependency on user usage methods. >>>>>>> >>>>>>> Can you check this patch? >>>>>>> >>>>>>> >>>>>>> 在 2021/8/3 10:30, Huisong Li 写道: >>>>>>>> Ethernet devices in DPDK can be released by rte_eth_dev_close() >>>>>>>> and >>>>>>>> rte_dev_remove(). These APIs both call xxx_dev_close() in PMD >>>>>>>> layer >>>>>>>> to uninstall hardware. However, the two APIs do not have explicit >>>>>>>> invocation restrictions. In other words, at the ethdev layer, >>>>>>>> it is >>>>>>>> possible to call rte_eth_dev_close() before calling >>>>>>>> rte_dev_remove() >>>>>>>> or rte_eal_hotplug_remove(). In such a bad scenario, >>>>>> It is not a bad scenario. >>>>>> If there is no more port for the device after calling close, >>>>>> the device should be removed automatically. >>>>>> Keep in mind "close" is for one port, "remove" is for the entire >>>>>> device >>>>>> which can have more than one port. >>>>> I know. >>>>> >>>>> dev_close() is for removing an eth device. And rte_dev_remove() >>>>> can be used >>>>> >>>>> for removing the rte device and all its eth devices belonging to >>>>> the rte device. >>>>> >>>>> In rte_dev_remove(), "remove" is executed in primary or one of >>>>> secondary, >>>>> >>>>> all eth devices having same pci address will be closed and removed. >>>>> >>>>>>>> the primary >>>>>>>> process may be fine, but it may cause that xxx_dev_close() in >>>>>>>> the PMD >>>>>>>> layer will be called twice in the secondary process. So this patch >>>>>>>> fixes it. >>>>>> If a port is closed in primary, it should be the same in secondary. >>>>>> >>>>>> >>>>>>>> +    /* >>>>>>>> +     * The eth_dev->data->name doesn't be cleared by the >>>>>>>> secondary process, >>>>>>>> +     * so above "eth_dev" isn't NULL after rte_eth_dev_close() >>>>>>>> called. >>>>>> This assumption is not clear. All should be closed together. >>>>> However, dev_close() does not have the feature similar to >>>>> rte_dev_remove(). >>>>> >>>>> Namely, it is not guaranteed that all eth devices are closed >>>>> together in ethdev >>>>> layer. It depends on app or user. >>>>> >>>>> If the app does not close together, the operation of repeatedly >>>>> uninstalling an >>>>> eth device in the secondary process >>>>> >>>>> will be triggered when dev_close() is first called by one >>>>> secondary process, and >>>>> then rte_dev_remove() is called. >>>>> >>>>> So I think it should be avoided. >>>> First of all, I am not sure about calling 'rte_eth_dev_close()' or >>>> 'rte_dev_remove()' from the secondary process. >>>> There are explicit checks in various locations to prevent clearing >>>> resources >>>> completely from secondary process. >>> There's no denying that. >>> >>> Generally, hardware resources of eth device and shared data of the >>> primary and >>> secondary process >>> >>> are cleared by primary, which are controled by ethdev layer or PMD >>> layer. >>> >>> But there may be some private data or resources of each process >>> (primary or >>> secondary ), such as mp action >>> >>> registered by rte_mp_action_register() or others.  For these >>> resources, the >>> secondary process still needs to clear. >>> >>> Namely, both primary and secondary processes need to prevent >>> repeated offloading >>> of resources. >>> >>>> Calling 'rte_eth_dev_close()' or 'rte_dev_remove()' by secondary is >>>> technically >>>> can be done but application needs to be extra cautious and should >>>> take extra >>>> measures and synchronization to make it work. >>>> Regular use-case is secondary processes do the packet processing >>>> and all control >>>> commands run by primary. >>> You are right. We have a consensus that 'rte_eth_dev_close()' or >>> 'rte_dev_remove()' >>> >>> can be called by primary and secondary processes. >>> >>> But DPDK framework cannot assume user behavior.😁 >>> >>> We need to make it more secure and reliable for both primary and >>> secondary >>> processes. >>> >>>> In primary, if you call 'rte_eth_dev_close()' it will clear all >>>> ethdev resources >>>> and further 'rte_dev_remove()' call will detect missing ethdev >>>> resources and >>>> won't try to clear them again. >>>> >>>> In secondary, if you call 'rte_eth_dev_close()', it WON'T clear all >>>> resources >>>> and further 'rte_dev_remove()' call (either from primary or >>>> secondary) will try >>>> to clean ethdev resources again. You are trying to prevent this >>>> retry in remove >>>> happening for secondary process. >>> Right. However, if secondary process in PMD layer has its own >>> private resources >>> to be >>> >>> cleared, it still need to do it by calling 'rte_eth_dev_close()' or >>> 'rte_dev_remove()'. >>> >>>> In secondary it won't free ethdev resources anyway if you let it >>>> continue, but I >>>> guess here you are trying to prevent the PMD dev_close() called >>>> again. Why? Is >>>> it just for optimization or does it cause unexpected behavior in >>>> the PMD? >>>> >>>> >>>> Overall, to free resources you need to do the 'rte_eth_dev_close()' or >>>> 'rte_dev_remove()' in the primary anyway. So instead of this >>>> workaround, I would >>>> suggest making PMD dev_close() safe to be called multiple times (if >>>> this is the >>>> problem.) >>> In conclusion,  primary and secondary processes in PMD layer may >>> have their own >>> >>> private data and resources, which need to be processed and released. >>> >>> Currently,  these for PMD are either handled and cleaned up in >>> dev_close() or >>> remove(). >>> >>> However, code streams in rte_dev_remove() cannot ensure that the >>> uninstallation >>> >>> from secondary process will not be repeated if rte_eth_dev_close() >>> is first >>> called by >>> >>> secondary(primary is ok, plese review this patch). >>> >>> I think, this is the same for each PMD and is better suited to doing >>> it in >>> ethdev layer. >>> >> This patch prevents to call dev_close() twice in the secondary >> process, is this >> fixing a theoretical problem or an actual problem? >> >> If it is an actual problem can you please provide details, callstack >> of the >> problematic case? > > We analyzed the code when modifying the bug and found that the problem > did exist. > > The ethdev layer did not guarantee the security. > > The general function of the two interfaces is as follows: > > rte_eth_dev_close() --> release eth device > > rte_dev_remove() -->   release eth device + remove and free > rte_pci_device(primary andsecondary). > > According to the OVS application scenario, first call dev_close() and > then call > > remove(), which is possible. > > We constructed this scenario using testpmd to start the secondary > process(the multi-process > > patch of testpmd is being uploaded.). It is proved that the ethdev > layer cannot guarantee > > this security. The callstack is as follows: > > ************************************************************************************************************************************************** > > > gdb) set args -a 0000:7d:00.0 -l 2-3 --file-prefix=rte_lee > --proc-type=auto -- -i --rxq 4 --txq 4 --num-procs=2 --proc-id=1 > (gdb) b hns3_dev_uninit > Breakpoint 1 at 0xbca7d0: file ../drivers/net/hns3/hns3_ethdev.c, line > 7806. > (gdb) b hns3_dev_close > Breakpoint 2 at 0xbc6d44: file ../drivers/net/hns3/hns3_ethdev.c, line > 6189. > (gdb) r > Starting program: > /home/lihuisong/v500/gcov-test/dpdk/build/app/dpdk-testpmd -a > 0000:7d:00.0 -l 2-3 --file-prefix=rte_lee --proc-type=auto -- -i --rxq > 4 --txq 4 --num-procs=2 --proc-id=1 > Missing separate debuginfo for /root/lib/libnuma.so.1 > Try: yum --enablerepo='*debug*' install > /usr/lib/debug/.build-id/ce/4eea0b0f2150f70a080bbd8835e43e78373096.debug > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > EAL: Detected 128 lcore(s) > EAL: Detected 4 NUMA nodes > EAL: Auto-detected process type: SECONDARY > EAL: Detected static linkage of DPDK > [New Thread 0xfffff7a0ad10 (LWP 17717)] > EAL: Multi-process socket > /var/run/dpdk/rte_lee/mp_socket_17714_66aa8d3a60a9 > [New Thread 0xfffff7209d10 (LWP 17718)] > EAL: Selected IOVA mode 'VA' > EAL: VFIO support initialized > [New Thread 0xfffff69f8d10 (LWP 17719)] > EAL: Using IOMMU type 1 (Type 1) > EAL: Probe PCI driver: net_hns3 (19e5:a222) device: 0000:7d:00.0 > (socket 0) > [New Thread 0xfffff61f7d10 (LWP 17720)] > TELEMETRY: No legacy callbacks, legacy socket not created > Interactive-mode selected > 0000:7d:00.0 hns3_dev_mtu_set(): Failed to set mtu, port 0 must be > stopped before configuration > Failed to set MTU to 1500 for port 0 > testpmd: create a new mbuf pool : n=155456, size=2176, > socket=0 > testpmd: preferred mempool ops selected: ring_mp_mc > > Warning! port-topology=paired and odd forward ports number, the last > port will pair with itself. > > Configuring Port 0 (socket 0) > Port 0: 00:18:2D:00:00:9E > Checking link statuses... > Done > testpmd> port close 0 > Closing ports... > > Breakpoint 2, hns3_dev_close (eth_dev=0x21cdb80 ) >     at ../drivers/net/hns3/hns3_ethdev.c:6189 > 6189        struct hns3_adapter *hns = eth_dev->data->dev_private; > Missing separate debuginfos, use: debuginfo-install > glibc-2.17-260.el7.aarch64 libpcap-1.5.3-11.el7.aarch64 > openssl-libs-1.0.2k-16.el7.aarch64 zlib-1.2.7-18.el7.aarch64 > (gdb) bt > #0  hns3_dev_close (eth_dev=0x21cdb80 ) at > ../drivers/net/hns3/hns3_ethdev.c:6189 > #1  0x0000000000742eac in rte_eth_dev_close (port_id=0) at > ../lib/ethdev/rte_ethdev.c:1870 > #2  0x0000000000542a8c in close_port (pid=0) at > ../app/test-pmd/testpmd.c:2895 > #3  0x00000000004df82c in cmd_operate_specific_port_parsed > (parsed_result=0xffffffffb0f0, >     cl=0x2475080, data=0x0) at ../app/test-pmd/cmdline.c:1272 > #4  0x0000000000738ce4 in cmdline_parse (cl=0x2475080, buf=0x24750c8 > "port close 0\n") >     at ../lib/cmdline/cmdline_parse.c:290 > #5  0x0000000000736b7c in cmdline_valid_buffer (rdl=0x2475090, > buf=0x24750c8 "port close 0\n", >     size=14) at ../lib/cmdline/cmdline.c:26 > #6  0x000000000073c078 in rdline_char_in (rdl=0x2475090, c=10 '\n') >     at ../lib/cmdline/cmdline_rdline.c:421 > #7  0x0000000000736fcc in cmdline_in (cl=0x2475080, buf=0xfffffffff25f > "\n\200\362\377\377\377\377", >     size=1) at ../lib/cmdline/cmdline.c:149 > #8  0x0000000000737270 in cmdline_interact (cl=0x2475080) at > ../lib/cmdline/cmdline.c:223 > #9  0x00000000004f0ccc in prompt () at ../app/test-pmd/cmdline.c:17882 > #10 0x0000000000545528 in main (argc=8, argv=0xfffffffff470) at > ../app/test-pmd/testpmd.c:3998 > (gdb) c > Continuing. > Port 0 is closed > Done > testpmd> device detach 0000:7d:00.0 > Removing a device... > [Switching to Thread 0xfffff7a0ad10 (LWP 17717)] > > Breakpoint 1, hns3_dev_uninit (eth_dev=0x21cdb80 ) >     at ../drivers/net/hns3/hns3_ethdev.c:7806 > 7806        struct hns3_adapter *hns = eth_dev->data->dev_private; > (gdb) bt > #0  hns3_dev_uninit (eth_dev=0x21cdb80 ) at > ../drivers/net/hns3/hns3_ethdev.c:7806 > #1  0x0000000000bb8668 in rte_eth_dev_pci_generic_remove > (pci_dev=0x247a600, >     dev_uninit=0xbca7c4 ) at > ../lib/ethdev/ethdev_pci.h:155 > #2  0x0000000000bca89c in eth_hns3_pci_remove (pci_dev=0x247a600) >     at ../drivers/net/hns3/hns3_ethdev.c:7833 > #3  0x00000000007e85f4 in rte_pci_detach_dev (dev=0x247a600) at > ../drivers/bus/pci/pci_common.c:287 > #4  0x00000000007e8f14 in pci_unplug (dev=0x247a610) at > ../drivers/bus/pci/pci_common.c:570 > #5  0x0000000000775678 in local_dev_remove (dev=0x247a610) at > ../lib/eal/common/eal_common_dev.c:319 > #6  0x000000000078f114 in __handle_primary_request (param=0xfffff00008c0) >     at ../lib/eal/common/hotplug_mp.c:284 > #7  0x000000000079ebf4 in eal_alarm_callback (arg=0x0) at > ../lib/eal/linux/eal_alarm.c:92 > #8  0x00000000007a3f30 in eal_intr_process_interrupts > (events=0xfffff7a0a3e0, nfds=1) >     at ../lib/eal/linux/eal_interrupts.c:998 > #9  0x00000000007a4224 in eal_intr_handle_interrupts (pfd=10, totalfds=2) >     at ../lib/eal/linux/eal_interrupts.c:1071 > #10 0x00000000007a442c in eal_intr_thread_main (arg=0x0) at > ../lib/eal/linux/eal_interrupts.c:1142 > #11 0x0000000000789388 in ctrl_thread_init (arg=0x246ef40) >     at ../lib/eal/common/eal_common_thread.c:202 > #12 0x0000fffff7bf3c48 in start_thread () from /lib64/libpthread.so.0 > #13 0x0000fffff7b45600 in thread_start () from /lib64/libc.so.6 > > ************************************************************************************************************************************************** > > > Note: above log is from the secondary process. > >>>> And again, please re-consider calling 'rte_eth_dev_close()' or >>>> 'rte_dev_remove()' from the secondary process. >>>> >>>>>>>> +     * Namely, whether "eth_dev" is NULL cannot be used to >>>>>>>> determine whether >>>>>>>> +     * an ethdev port has been released. >>>>>>>> +     * For both primary process and secondary process, >>>>>>>> eth_dev->state is >>>>>>>> +     * RTE_ETH_DEV_UNUSED, which means the ethdev port has >>>>>>>> been released. >>>>>>>> +     */ >>>>>>>> +    if (eth_dev->state == RTE_ETH_DEV_UNUSED) { >>>>>>>> +        RTE_ETHDEV_LOG(INFO, "The ethdev port has been >>>>>>>> released."); >>>>>>>> +        return 0; >>>>>>>> +    } >>>>>> . >>>> . >> .