From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B91AFA04C1; Wed, 20 Nov 2019 14:54:49 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4D6672B8E; Wed, 20 Nov 2019 14:54:48 +0100 (CET) Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) by dpdk.org (Postfix) with ESMTP id 036552B87; Wed, 20 Nov 2019 14:54:46 +0100 (CET) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 2354D5C3; Wed, 20 Nov 2019 08:51:39 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Wed, 20 Nov 2019 08:51:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=mesmtp; bh=6xOcw2Up+FcSm0H2vSaZHXPC29p91cpscxSgSmYBYFI=; b=pK0gNRimfSf+ p2FUcpDQVxjwGQBste9E7zLk0fH90yg7rB+zQWw3JrG4D8+VcIkh2sZqbbGU7piP lMFeNMHSDef2ALOJgkAGvzq4XUQbmFLEQY6U3zFZ1G09NVup2qRvftx+CFVHuaIF bHmsgEwum07eAapQMeWWQFtcYsFl4Zk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=6xOcw2Up+FcSm0H2vSaZHXPC29p91cpscxSgSmYBY FI=; b=oaaBAaPe3ApgPQ2xjY6IMBeEcAw2XjZBCs4AtogZxXvTKA1d+ReYa6w05 IjWm+jTjQR5Z8Jn7AdwTy8UhS5cFBZAbY0iPfJKFvidGQvRxxx+JCk8akkBUeIPE ipuDF3jjoDeAFNcbCY1/53cs7hcyNMoQYiUsJn88HB/CrIwQazLW2sl4m0YLhbDI UqkCVmYf8sqrZKCziSkYCfYZRPemRT+pZ2yMBRPYiKAuIPyJ1M27BJ6PdMbdLFVX bFwWTu4mElRHmY8xpsGLHOabTAtGmuxL/3KPgZh0cM1rR0TS8WVSzlE1VhEqxk8y h8B+iKo/PYqkgPPhtnr/rzwSUND0A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrudehtddgheehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtufertddttddvnecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecukf hppeejjedrudefgedrvddtfedrudekgeenucfrrghrrghmpehmrghilhhfrhhomhepthhh ohhmrghssehmohhnjhgrlhhonhdrnhgvthenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id 7C5873060062; Wed, 20 Nov 2019 08:51:37 -0500 (EST) From: Thomas Monjalon To: David Marchand , dpdk stable Cc: Matan Azrad , dev Date: Wed, 20 Nov 2019 14:51:36 +0100 Message-ID: <2607286.WWoMLU17V0@xps> In-Reply-To: References: <1573548459-6931-1-git-send-email-matan@mellanox.com> <1574243271-27734-1-git-send-email-matan@mellanox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v2] bus/pci: fix driver detach clear X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 20/11/2019 14:03, David Marchand: > On Wed, Nov 20, 2019 at 10:48 AM Matan Azrad wrote: > > > > When a rte_device is unplugged, the driver should be detached from the > > device. > > > > The PCI detach driver operation wrongly didn't clear the driver from the > > device structure what remain the device in probe state from the EAL > > point of view. > > > > For example, when a device is removed twice using rte_dev_remove, it > > cause a crash in EAL. > > I can see a crash when using port detach in testpmd with a virtio pci device. > > testpmd> port attach 0000:07:00.0 > Attaching a new port... > EAL: PCI device 0000:07:00.0 on NUMA socket -1 > EAL: Invalid NUMA socket, default to 0 > EAL: probe driver: 1af4:1041 net_virtio > Port 1 is attached. Now total ports is 2 > Done > testpmd> port close 1 > Closing ports... > EAL: Releasing pci mapped resource for 0000:07:00.0 > EAL: Calling pci_unmap_resource for 0000:07:00.0 at 0x2200006000 > Done > testpmd> port detach 1 > Removing a device... > > Breakpoint 1, local_dev_remove (dev=0x1de64b0) at > /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315 > 315 if (dev->bus->unplug == NULL) { > Missing separate debuginfos, use: debuginfo-install > glibc-2.17-292.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 > libpcap-1.5.3-11.el7.x86_64 numactl-libs-2.0.12-3.el7.x86_64 > (gdb) p *dev > $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x1cf8078 > "0000:07:00.0", driver = 0x16c68f0 , bus = > 0x16b2640 , numa_node = 0, devargs = 0x1cf8060} > (gdb) c > Continuing. > Device of port 1 is detached > Now total ports is 1 > Done > > > On the first detach, the pci bus frees the rte_pci_device which embeds > the rte_device object. > > static int > pci_unplug(struct rte_device *dev) > { > struct rte_pci_device *pdev; > int ret; > > pdev = RTE_DEV_TO_PCI(dev); > ret = rte_pci_detach_dev(pdev); > if (ret == 0) { > rte_pci_remove_device(pdev); > rte_devargs_remove(dev->devargs); > free(pdev); > } > return ret; > } > > > > testpmd> port detach 1 > Removing a device... > > Breakpoint 1, local_dev_remove (dev=0x1de64b0) at > /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315 > 315 if (dev->bus->unplug == NULL) { > (gdb) p *dev > $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0xa
out of bounds>, driver = 0x0, bus = 0x4637, numa_node = 1, devargs = > 0x40000002e040018} > (gdb) c > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > 0x00000000007c1ddd in local_dev_remove (dev=0x1de64b0) at > /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315 > 315 if (dev->bus->unplug == NULL) { > > > On the second detach, testpmd passes the same rte_device pointer it > extracts from rte_eth_devices, but the malloc'd location has been > reused (with watchpoint on the location, I found somewhere around > rte_mp_request_sync/opendir()), and then *crunch* on dev->bus. > > > From my pov: > - testpmd is wrongly reusing a pointer coming from rte_eth_devices[], > without caring about the port state (this is what your second patch > fixes), > - testpmd is directly kicking pointers in rte_eth_devices[] (setting > ->device = NULL for its own logic), which is bad too, > - this patch just hides the reuse of a freed pointer, I agree with most of your analysis. So we agree that patch 2 is a real fix. We agree that tespmd should be fixed in next release to not update .device pointer. But keep it for now as it may be a workaround for some drivers (need to be deeply analyzed). But about this patch 1, it is resetting rte_device.driver, which is used by the function rte_dev_is_probed(). It says rte_device has no rte_driver attached anymore. This patch is the same idea as 391797f04208 ("drivers/bus: move driver assignment to end of probing") So I consider this is a real fix.