DPDK patches and discussions
 help / color / mirror / Atom feed
From: Matan Azrad <matan@mellanox.com>
To: Ferruh Yigit <ferruh.yigit@intel.com>,
	"Yigit, Ferruh" <ferruh.yigit@linux.intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Bernard Iremonger <bernard.iremonger@intel.com>
Cc: Gaetan Rivet <gaetan.rivet@6wind.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	"stable@dpdk.org" <stable@dpdk.org>,
	David Marchand <david.marchand@redhat.com>,
	Jeff Guo <jia.guo@intel.com>, Qi Zhang <qi.z.zhang@intel.com>
Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH 2/2] app/testpmd: fix invalid port detaching
Date: Thu, 23 Jan 2020 19:25:52 +0000	[thread overview]
Message-ID: <AM0PR0502MB401948ACB5364A9EE97E1402D20F0@AM0PR0502MB4019.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <4df501fa-06d8-744d-27cd-f5742992e109@intel.com>

Hi

From: Ferruh Yigit
> On 1/23/2020 3:29 PM, Matan Azrad wrote:
> >
> > Hi
> >
> > From: Ferruh Yigit
> >> On 1/23/2020 2:05 PM, Matan Azrad wrote:
> >>> Hi
> >>>
> >>> From: Yigit, Ferruh
> >>>> On 11/12/2019 8:47 AM, Matan Azrad wrote:
> >>>>> The port was not validated before detaching.
> >>>>>
> >>>>> Ignore port detach operation when the port is not valid.
> >>>>>
> >>>>> Fixes: f8e5baa2662d ("app/testpmd: check not detaching device
> >>>>> twice")
> >>>>> Cc: thomas@monjalon.net
> >>>>> Cc: stable@dpdk.org
> >>>>>
> >>>>> Signed-off-by: Matan Azrad <matan@mellanox.com>
> >>>>> ---
> >>>>>  app/test-pmd/testpmd.c | 3 +++
> >>>>>  1 file changed, 3 insertions(+)
> >>>>>
> >>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >>>>> 4444346..370eefe 100644
> >>>>> --- a/app/test-pmd/testpmd.c
> >>>>> +++ b/app/test-pmd/testpmd.c
> >>>>> @@ -2545,6 +2545,9 @@ struct extmem_param {
> >>>>>
> >>>>>  	printf("Removing a device...\n");
> >>>>>
> >>>>> +	if (port_id_is_invalid(port_id, ENABLED_WARN))
> >>>>> +		return;
> >>>>> +
> >>>>>  	dev = rte_eth_devices[port_id].device;
> >>>>>  	if (dev == NULL) {
> >>>>>  		printf("Device already removed\n");
> >>>>>
> >>>>
> >>>> The patch is already in 19.11 [1] but it is breaking the testpmd
> >>>> hotplug support.
> >>>> Before 'detach_port_device()' called, the port has been stopped and
> >>>> closed [2], which will make port fail from 'port_id_is_invalid()'
> >>>> check and the device removal path never fully called.
> >>>> The implication is, since device not detached, vfio request
> >>>> interrupt keeps triggered continuously and re-starts the detach
> >>>> path, but because of the half cleaned device it fails and app gets
> >>>> stuck with a
> >> continuous log [3].
> >>>>
> >>>> I wonder if the actual hotplug has been tested with this patch, the
> >>>> commit log is not clear about the motivation and implication of the
> >>>> patch, I am not clear why this check is added but I am sending a
> >>>> patch soon to remove it back.
> >>>
> >>> The motivation of this patch was to prevent double detach on same
> >>> port,
> >> so the user cannot call detach of invalid port.
> >>
> >> What is the definition of the 'invalid port', if you mean device
> >> already detached case, in the second call of the function "if (dev ==
> >> NULL)" check should prevent it going forward.
> >
> > No, ethdev doesn't zero the device pointer when it release a port.
> 
> As far as I can see it does, please see below.

The code below is problematic because:

1. It is very bad that the application changing ethdev structure directly.
2. The below code run over valid port only, not on invalid port(UNUSED state).

So, the device pointer will still be valid if the port is invalid.

All of this shows that this function try to detach only a valid port (probably mainly because it is called by Testpmd detach command).

> > So even if the port is in unused state already - means invalid, the device
> pointer still may be valid and point to the last port that used the same id.
> 
> If the port is closed, it is unused state, and ethdev layer resources freed but
> as you said device related structures are still there, device pointer is still valid
> and it is still in probed device list etc.. We need to able to detach the device
> even after it is unused state.

Yes, but detach is for device, not for port.
The device pointer must be taken only when the port is in valid state.
Why?
Because if the port is in UNUSED state it is free to be allocated again by ethdev layer for other device, then, the device pointer may point to other device.

> "stop -> close -> detach" is a normal order, we shouldn't prevent it, but your
> check does prevent it.

Yes, this is good order, but the pointer of the device should be taken before close.
My patch prevent accessing invalid structure.
And yes, Testpmd detach stays broken after my patch and after this patch too.


> 
> I am not very clear about your concern here, "point to the last port that used
> the same id", can you please clarify?

Yes, when ethdev layer allocates a port ID for a new device, it tries to find UNUSED port.
When found, the port will move to ATTACHED after the PMD finishes its probing function.

So, any UNUSED port may be allocated for other device and then, the device pointer points to other device.

> 
> >
> >
> >> But according the 'port_id_is_invalid()' API, a closed port is an
> >> invalid port, I think that is wrong in this context.
> >
> > Why?
> 
> Closed port is 'invalid' for using it, because ethdev resources are freed. But it
> is not 'invalid' to detach it, why a port being closed should prevent freeing its
> device layer resources?

I didn't said that, I said that the device pointer should be taken when the port is valid.


> 
> >
> > You are going to look on ethdev portid structure, don't you think we should
> valid the port before using its structure?
> 
> Is your main concern "rte_eth_devices[port_id].device" can be dangling
> pointer?
> 
> 1) It is not.
> 2) The check you added to replace it is not correct check.
> 
Didn't said that.

It just may point to other device.
It is not correct to take information from invalid structure.

Don't you agree that the structure is not valid when the port is not valid?

> >
> >>>
> >>> I agree this patch is not good and we need a fix but I think the bug
> >>> is
> >> conceptual.
> >>>
> >>> Testpmd tries to do detach by port_id which is derived by ethdev
> >>> port id
> >> while detach work with rte_device.
> >>>
> >>> For example:
> >>> you can see in the line above after +++: dev =
> >>> rte_eth_devices[port_id].device, Testpmd may access invalid  or
> >> reallocated ethdev structure to get the device name and may even
> >> detach unwanted rte_device.
> >>
> >> I thinks whichever function calling 'detach_port_device()' should
> >> check the port validity.
> >> 'detach_port_device()' doesn't know if port reallocated or not, it
> >> will free the given port_id, and when freeing done
> >> 'rte_eth_devices[port_id].device' will be NULL, this looks to me a valid
> check.
> >
> > Please validate me, check ethdev, I don't think so,
> 'rte_eth_devices[port_id].device still valid after detach.
> 
> This is a long stack trace, but what happens is:
> 
> rte_dev_remove
>   bus unpug
>     driver remove
>       rte_eth_dev_pci_release
>         eth_dev->device = NULL;

The last line doesn't happen here because the rte_eth_dev_pci_release moves the port to UNUSED.
And it is bad that application is trying to do it.

> 
> Please check the driver you are testing remove() ops
> (rte_pci_driver.remove()) does cleans the ethdev fields.
> 
> A little more detailed stack trace for my environment:
> #0  rte_eth_dev_pci_release (eth_dev=..) at  rte_ethdev_pci.h:143
> #1  rte_eth_dev_pci_generic_remove (pci_dev=.., dev_uninit=..) at
> rte_ethdev_pci.h:199
> #2  eth_i40e_pci_remove (pci_dev=..) at i40e_ethdev.c:710
> #3  rte_pci_detach_dev (dev=..) at pci_common.c:243
> #4  pci_unplug (dev=..) at pci_common.c:537
> #5  local_dev_remove (dev=..) at eal_common_dev.c:321
> #6  rte_dev_remove (dev=..) at eal_common_dev.c:402
> #7  detach_port_device (port_id=0) at testpmd.c:2663
> #8  cmd_operate_detach_port_parsed (parsed_result=.., cl=.., data=0x0) at
> cmdline.c:1501
> #9  cmdline_parse (cl=.., buf=.."port detach 0\n") at cmdline_parse.c:295
> #10 cmdline_valid_buffer (rdl=.., buf="port detach 0\n", size=15) at
> cmdline.c:31
> #11 rdline_char_in (rdl=.., c=10 '\n') at  cmdline_rdline.c:421
> #12 cmdline_in (cl=.., buf=.."\n", size=1) at cmdline.c:148
> #13 cmdline_interact (cl=..) at cmdline.c:227
> #14 prompt () at cmdline.c:19644
> #15 main (argc=3, argv=..) at testpmd.c:3617
> 
Not all the drivers are doing it.
I think it is good if we will do it by ethdev release function.


> >
> >> The caller of the 'detach_port_device()' should ensure correct
> >> port_id passed to the function.
> >
> > What is correct port id, if the port was released , is it correct?
> 
> You are right, there is no good answer for it, I was thinking application state
> information can be used but no ethdev should able to provide this
> information, we need 'is_freed' kind of check for it, currently
> 'rte_eth_devices[port_id].device' is used for that purpose.

It is wrong to take device from invalid structure. (I explained a lot above).
Better way to save the rte_device in the start(before close) and call detach by rte_device when we sure that all the ports of this rte_device are released(mlx4 can manage 2 ports one rte_device, also any device supports representors).

Let's do correct fix.


> 
> >
> >>>
> >>> So, detach is broken with and without this patch.
> >>
> >> I can't see how it is broken without the check, how the problem you
> >> mentioned can be reproduced? Or is it a theoretical issue?
> >> But with this check hotplug support is %100 reproducible broken.
> >>
> >>>
> >>>
> >>> I think Testpmd should change the concept of rte_device mapping and
> >>> put
> >> attention to next:
> >>> 1. Don't detach by ethdev port ID.
> >>> 2. Multiple ethdev port IDs may related to the same rte_device.
> >>>
> >>> The Testpmd user should be sure that all the port IDs of the
> >>> rte_device are
> >> released before the detach call and Testpmd maybe need to validate it.
> >>> And like attach, detach should be triggered by PCI address \
> >>> rte_device
> >> name.
> >>>
> >>
> >> We need to know about port_id too to be able to stop/close it.
> >> And sure no objection to improve the hotplug support but it is broken
> >> now, lets fix it first.
> >>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> Regards,
> >>>> ferruh
> >>>>
> >>>>
> >>>> [1]
> >>>>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg
> >>>> it
> >>>> .dp
> >>>>
> >>
> dk.org%2Fdpdk%2Fcommit%2F%3Fid%3D43d0e304980a1527bcac92dc679057
> >>>>
> >>
> b189e2545a&amp;data=02%7C01%7Cmatan%40mellanox.com%7Cc3f40356d
> >>>>
> >>
> d124e20faf708d7a006e68c%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7
> >>>>
> >>
> C0%7C637153823809699996&amp;sdata=dBy9m%2BxCA%2Bme1IpX2LqPARa
> >>>> 62giznKi8Xbtu220GA%2Bg%3D&amp;reserved=0
> >>>>
> >>>> [2]
> >>>> rmv_port_callback
> >>>>   stop_port(port_id);
> >>>>   close_port(port_id);
> >>>>   detach_port_device(port_id);
> >>>>
> >>>> [3]
> >>>> EAL: can not get port by device 0000:00:05.0!
> >>>> EAL: can not get port by device 0000:00:05.0!
> >>>> EAL: can not get port by device 0000:00:05.0!
> >>>> EAL: can not get port by device 0000:00:05.0!
> >>>> EAL: can not get port by device 0000:00:05.0!
> >>>> EAL: can not get port by device 0000:00:05.0!
> >>>> ...
> >


  reply	other threads:[~2020-01-23 19:25 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-12  8:47 [dpdk-dev] [PATCH 1/2] bus/pci: fix driver detach clear Matan Azrad
2019-11-12  8:47 ` [dpdk-dev] [PATCH 2/2] app/testpmd: fix invalid port detaching Matan Azrad
2019-11-12 11:20   ` Iremonger, Bernard
2019-11-20 22:52     ` [dpdk-dev] [dpdk-stable] " David Marchand
2020-01-23 13:19   ` [dpdk-dev] " Yigit, Ferruh
2020-01-23 14:05     ` Matan Azrad
2020-01-23 14:48       ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
2020-01-23 15:29         ` Matan Azrad
2020-01-23 18:14           ` Ferruh Yigit
2020-01-23 19:25             ` Matan Azrad [this message]
2020-01-24 16:28               ` Ferruh Yigit
2020-01-25 18:56                 ` Matan Azrad
2020-02-03 15:58                   ` Ferruh Yigit
2020-02-03 17:10                     ` Matan Azrad
2020-02-12 13:49                       ` Ferruh Yigit
2020-02-13 12:37                         ` Thomas Monjalon
2020-02-13 13:36                           ` Thomas Monjalon
2020-02-13 14:00                             ` Ferruh Yigit
2019-11-19 22:40 ` [dpdk-dev] [dpdk-stable] [PATCH 1/2] bus/pci: fix driver detach clear Thomas Monjalon
2019-11-20  9:02   ` Matan Azrad
2019-11-20  9:47 ` [dpdk-dev] [PATCH v2] " Matan Azrad
2019-11-20 13:03   ` [dpdk-dev] [dpdk-stable] " David Marchand
2019-11-20 13:44     ` Matan Azrad
2019-11-20 13:51     ` Thomas Monjalon
2019-11-20 17:22       ` David Marchand
2019-11-20 22:52   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM0PR0502MB401948ACB5364A9EE97E1402D20F0@AM0PR0502MB4019.eurprd05.prod.outlook.com \
    --to=matan@mellanox.com \
    --cc=bernard.iremonger@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=ferruh.yigit@linux.intel.com \
    --cc=gaetan.rivet@6wind.com \
    --cc=jia.guo@intel.com \
    --cc=qi.z.zhang@intel.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).