From: Thomas Monjalon <thomas@monjalon.net>
To: David Marchand <david.marchand@redhat.com>,
dpdk stable <stable@dpdk.org>
Cc: Matan Azrad <matan@mellanox.com>, dev <dev@dpdk.org>
Subject: Re: [dpdk-stable] [PATCH v2] bus/pci: fix driver detach clear
Date: Wed, 20 Nov 2019 14:51:36 +0100 [thread overview]
Message-ID: <2607286.WWoMLU17V0@xps> (raw)
In-Reply-To: <CAJFAV8wL=zv7kV+86jdzYT+c340iZAwFcPs1RJvrVoZ66FTFjg@mail.gmail.com>
20/11/2019 14:03, David Marchand:
> On Wed, Nov 20, 2019 at 10:48 AM Matan Azrad <matan@mellanox.com> wrote:
> >
> > When a rte_device is unplugged, the driver should be detached from the
> > device.
> >
> > The PCI detach driver operation wrongly didn't clear the driver from the
> > device structure what remain the device in probe state from the EAL
> > point of view.
> >
> > For example, when a device is removed twice using rte_dev_remove, it
> > cause a crash in EAL.
>
> I can see a crash when using port detach in testpmd with a virtio pci device.
>
> testpmd> port attach 0000:07:00.0
> Attaching a new port...
> EAL: PCI device 0000:07:00.0 on NUMA socket -1
> EAL: Invalid NUMA socket, default to 0
> EAL: probe driver: 1af4:1041 net_virtio
> Port 1 is attached. Now total ports is 2
> Done
> testpmd> port close 1
> Closing ports...
> EAL: Releasing pci mapped resource for 0000:07:00.0
> EAL: Calling pci_unmap_resource for 0000:07:00.0 at 0x2200006000
> Done
> testpmd> port detach 1
> Removing a device...
>
> Breakpoint 1, local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315 if (dev->bus->unplug == NULL) {
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-292.el7.x86_64 libgcc-4.8.5-39.el7.x86_64
> libpcap-1.5.3-11.el7.x86_64 numactl-libs-2.0.12-3.el7.x86_64
> (gdb) p *dev
> $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x1cf8078
> "0000:07:00.0", driver = 0x16c68f0 <rte_virtio_pmd+16>, bus =
> 0x16b2640 <rte_pci_bus>, numa_node = 0, devargs = 0x1cf8060}
> (gdb) c
> Continuing.
> Device of port 1 is detached
> Now total ports is 1
> Done
>
>
> On the first detach, the pci bus frees the rte_pci_device which embeds
> the rte_device object.
>
> static int
> pci_unplug(struct rte_device *dev)
> {
> struct rte_pci_device *pdev;
> int ret;
>
> pdev = RTE_DEV_TO_PCI(dev);
> ret = rte_pci_detach_dev(pdev);
> if (ret == 0) {
> rte_pci_remove_device(pdev);
> rte_devargs_remove(dev->devargs);
> free(pdev);
> }
> return ret;
> }
>
>
>
> testpmd> port detach 1
> Removing a device...
>
> Breakpoint 1, local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315 if (dev->bus->unplug == NULL) {
> (gdb) p *dev
> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0xa <Address 0xa
> out of bounds>, driver = 0x0, bus = 0x4637, numa_node = 1, devargs =
> 0x40000002e040018}
> (gdb) c
> Continuing.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000007c1ddd in local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315 if (dev->bus->unplug == NULL) {
>
>
> On the second detach, testpmd passes the same rte_device pointer it
> extracts from rte_eth_devices, but the malloc'd location has been
> reused (with watchpoint on the location, I found somewhere around
> rte_mp_request_sync/opendir()), and then *crunch* on dev->bus.
>
>
> From my pov:
> - testpmd is wrongly reusing a pointer coming from rte_eth_devices[],
> without caring about the port state (this is what your second patch
> fixes),
> - testpmd is directly kicking pointers in rte_eth_devices[] (setting
> ->device = NULL for its own logic), which is bad too,
> - this patch just hides the reuse of a freed pointer,
I agree with most of your analysis.
So we agree that patch 2 is a real fix.
We agree that tespmd should be fixed in next release to not update
.device pointer. But keep it for now as it may be a workaround for
some drivers (need to be deeply analyzed).
But about this patch 1, it is resetting rte_device.driver,
which is used by the function rte_dev_is_probed().
It says rte_device has no rte_driver attached anymore.
This patch is the same idea as
391797f04208 ("drivers/bus: move driver assignment to end of probing")
So I consider this is a real fix.
next prev parent reply other threads:[~2019-11-20 13:54 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-12 8:47 [dpdk-stable] [PATCH 1/2] " Matan Azrad
2019-11-12 8:47 ` [dpdk-stable] [PATCH 2/2] app/testpmd: fix invalid port detaching Matan Azrad
2019-11-12 11:20 ` Iremonger, Bernard
2019-11-20 22:52 ` David Marchand
2020-01-23 13:19 ` [dpdk-stable] [dpdk-dev] " Yigit, Ferruh
2020-01-23 14:05 ` Matan Azrad
2020-01-23 14:48 ` Ferruh Yigit
2020-01-23 15:29 ` Matan Azrad
2020-01-23 18:14 ` Ferruh Yigit
2020-01-23 19:25 ` Matan Azrad
2020-01-24 16:28 ` Ferruh Yigit
2020-01-25 18:56 ` Matan Azrad
2020-02-03 15:58 ` Ferruh Yigit
2020-02-03 17:10 ` Matan Azrad
2020-02-12 13:49 ` Ferruh Yigit
2019-11-19 22:40 ` [dpdk-stable] [PATCH 1/2] bus/pci: fix driver detach clear Thomas Monjalon
2019-11-20 9:02 ` Matan Azrad
2019-11-20 9:47 ` [dpdk-stable] [PATCH v2] " Matan Azrad
2019-11-20 13:03 ` David Marchand
2019-11-20 13:44 ` Matan Azrad
2019-11-20 13:51 ` Thomas Monjalon [this message]
2019-11-20 17:22 ` David Marchand
2019-11-20 22:52 ` David Marchand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2607286.WWoMLU17V0@xps \
--to=thomas@monjalon.net \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=matan@mellanox.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).