DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] FW: [SPDK] Re: Potential defect in pci_unplug()
       [not found] <748FDEF7-6BB0-4D27-B630-4C991D06B6F8@intel.com>
@ 2020-09-29  1:15 ` Niu, Yawei
  2020-10-09 16:09   ` Gaëtan Rivet
  0 siblings, 1 reply; 4+ messages in thread
From: Niu, Yawei @ 2020-09-29  1:15 UTC (permalink / raw)
  To: dev; +Cc: Harris, James R, Vanda, Sydney M



On 2020/9/28, 11:44 PM, "Harris, James R" <james.r.harris@intel.com> wrote:

    Hi Niu,
    
    I agree, this doesn't look right.  Could you file an SPDK issue for this to make sure we track it?  And then try sending an e-mail to the DPDK mailing list?  I'm open to submitting a patch to our DPDK submodule short-term, but only if we get agreement with DPDK community that this fix is correct.
    
    Thanks,
    
    Jim
    
    
    On 9/28/20, 12:17 AM, "Niu, Yawei" <yawei.niu@intel.com> wrote:
    
        Hi,
    
        In the pci_unplug() (dpdk/drivers/bus/pci/pci_common.c), why do we call rte_devargs_remove() to remove the saved device args? That looks a defect to me, since that’ll result in the hot removed device failed to be detected when it’s plugged back (when white list option was provided), the situation is described following:
    
          1.  App starts with white list option provided, let’s suppose the device in white list is: 0000:81:00.0;
          2.  rte_devargs_add() is called to add this device arg on rte_eal_init();
          3.  Issue “echo 1 > /sys/bus/pci/devices/0000\:81\:00.0/remove” to hot remove the device;
          4.  pci_unplug() is called to remove the device, and rte_devargs_remove() is called, so this device arg for white list is remove too;
          5.  Issue “echo 1 > /sys/bus/pci/rescan” then “PCI_WHITELIST=”0000:81:00.0” spdk/script/setup.sh” to do hot plug;
          6.  In pci_scan_one(), new dev is generated, however, the dev->device.devargs is set NULL since the device args has been removed on device unplug;
          7.  rte_pci_probe() does white list scan, however, it unexpectedly skips the device because the devargs of the device is NULL now;
    
        I don’t fully understand the DPDK code, but this rte_devargs_remove() in pci_unplug() doesn’t make sense to me (when I commented it out, seems everything will work as expected). Is this a glitch or I missed anything?
    
        Thanks
        -Niu
    
    
        _______________________________________________
        SPDK mailing list -- spdk@lists.01.org
        To unsubscribe send an email to spdk-leave@lists.01.org
    
    _______________________________________________
    SPDK mailing list -- spdk@lists.01.org
    To unsubscribe send an email to spdk-leave@lists.01.org
    


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] FW: [SPDK] Re: Potential defect in pci_unplug()
  2020-09-29  1:15 ` [dpdk-dev] FW: [SPDK] Re: Potential defect in pci_unplug() Niu, Yawei
@ 2020-10-09 16:09   ` Gaëtan Rivet
  2020-10-13  4:31     ` Niu, Yawei
  0 siblings, 1 reply; 4+ messages in thread
From: Gaëtan Rivet @ 2020-10-09 16:09 UTC (permalink / raw)
  To: Niu, Yawei; +Cc: dev, spdk, Harris, James R, Vanda, Sydney M

Hello,

On 29/09/20 01:15 +0000, Niu, Yawei wrote:
> 
> 
> On 2020/9/28, 11:44 PM, "Harris, James R" <james.r.harris@intel.com> wrote:
> 
>     Hi Niu,
>     
>     I agree, this doesn't look right.  Could you file an SPDK issue for this to make sure we track it?  And then try sending an e-mail to the DPDK mailing list?  I'm open to submitting a patch to our DPDK submodule short-term, but only if we get agreement with DPDK community that this fix is correct.
>     
>     Thanks,
>     
>     Jim
>     
>     
>     On 9/28/20, 12:17 AM, "Niu, Yawei" <yawei.niu@intel.com> wrote:
>     
>         Hi,
>     
>         In the pci_unplug() (dpdk/drivers/bus/pci/pci_common.c), why do we call rte_devargs_remove() to remove the saved device args? That looks a defect to me, since that’ll result in the hot removed device failed to be detected when it’s plugged back (when white list option was provided), the situation is described following:
>     
>           1.  App starts with white list option provided, let’s suppose the device in white list is: 0000:81:00.0;
>           2.  rte_devargs_add() is called to add this device arg on rte_eal_init();
>           3.  Issue “echo 1 > /sys/bus/pci/devices/0000\:81\:00.0/remove” to hot remove the device;
>           4.  pci_unplug() is called to remove the device, and rte_devargs_remove() is called, so this device arg for white list is remove too;
>           5.  Issue “echo 1 > /sys/bus/pci/rescan” then “PCI_WHITELIST=”0000:81:00.0” spdk/script/setup.sh” to do hot plug;
>           6.  In pci_scan_one(), new dev is generated, however, the dev->device.devargs is set NULL since the device args has been removed on device unplug;
>           7.  rte_pci_probe() does white list scan, however, it unexpectedly skips the device because the devargs of the device is NULL now;
>     
>         I don’t fully understand the DPDK code, but this rte_devargs_remove() in pci_unplug() doesn’t make sense to me (when I commented it out, seems everything will work as expected). Is this a glitch or I missed anything?

This is not a glitch, the behavior is intended.
When a device is unplugged from the system, there is no expectation that
it will reappear at the same PCI address.

When an application receives the RMV event for the device, it is alerted
that the device was removed. It is its responsibility then to detect
when a device it wants to use is added back. It can do so via udev on
linux for example, but it is tightly coupled with the operating system
or even the distribution. BSD and windows won't work that way.

This is the reason this responsibility was left to the application.

The failsafe PMD attempts to offer transparent hotplug capability to
existing applications. It had to deal with this exact issue. The solution
used there has been to use either:

  * the exec() sub-device type, that takes a shell command as
    parameter, which should return a valid device after successful
    execution. Intention is for the user to declare a script that will
    detect device hotplug on the system using other matches than PCI
    address (as it is unreliable), such as MAC address.

  * the fd() sub-device type, which is a pipe where another piece of code
    will send device ID into. This is what vdev_netvsc does currently:
    it detects on its own valid NetVSC devices and when one is suitable,
    it sends its PCI id in this pipe to failsafe driver.

See https://doc.dpdk.org/guides/nics/fail_safe.html
    section 55.3.1. Fail-safe command line parameters for documentation.

When a new device is hotplugged into DPDK, it is described by an
rte_devargs. This new devargs is the one that will be used for
identification during scan and probe. As it replaces the old, the old
one needs to be removed as it is stale.

Regards,
-- 
Gaëtan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] FW: [SPDK] Re: Potential defect in pci_unplug()
  2020-10-09 16:09   ` Gaëtan Rivet
@ 2020-10-13  4:31     ` Niu, Yawei
  2020-10-13 23:05       ` Harris, James R
  0 siblings, 1 reply; 4+ messages in thread
From: Niu, Yawei @ 2020-10-13  4:31 UTC (permalink / raw)
  To: Gaëtan Rivet; +Cc: dev, spdk, Harris, James R, Vanda, Sydney M

Thanks a lot for the reply, Gaetan!

So seems we have to explicitly use failsafe PMD to deal with this re-plug case? I'm not quite sure if that'll cooperate with current SPDK hotplug code seamlessly.

I think it would be great if SPDK/DPDK can fix it internally (reconstruct the devargs on re-plug somehow?) and provide a transparent hotplug capability to apps.

Thanks
-Niu

On 2020/10/10, 12:10 AM, "Gaëtan Rivet" <grive@u256.net> wrote:

    Hello,
    
    On 29/09/20 01:15 +0000, Niu, Yawei wrote:
    > 
    > 
    > On 2020/9/28, 11:44 PM, "Harris, James R" <james.r.harris@intel.com> wrote:
    > 
    >     Hi Niu,
    >     
    >     I agree, this doesn't look right.  Could you file an SPDK issue for this to make sure we track it?  And then try sending an e-mail to the DPDK mailing list?  I'm open to submitting a patch to our DPDK submodule short-term, but only if we get agreement with DPDK community that this fix is correct.
    >     
    >     Thanks,
    >     
    >     Jim
    >     
    >     
    >     On 9/28/20, 12:17 AM, "Niu, Yawei" <yawei.niu@intel.com> wrote:
    >     
    >         Hi,
    >     
    >         In the pci_unplug() (dpdk/drivers/bus/pci/pci_common.c), why do we call rte_devargs_remove() to remove the saved device args? That looks a defect to me, since that’ll result in the hot removed device failed to be detected when it’s plugged back (when white list option was provided), the situation is described following:
    >     
    >           1.  App starts with white list option provided, let’s suppose the device in white list is: 0000:81:00.0;
    >           2.  rte_devargs_add() is called to add this device arg on rte_eal_init();
    >           3.  Issue “echo 1 > /sys/bus/pci/devices/0000\:81\:00.0/remove” to hot remove the device;
    >           4.  pci_unplug() is called to remove the device, and rte_devargs_remove() is called, so this device arg for white list is remove too;
    >           5.  Issue “echo 1 > /sys/bus/pci/rescan” then “PCI_WHITELIST=”0000:81:00.0” spdk/script/setup.sh” to do hot plug;
    >           6.  In pci_scan_one(), new dev is generated, however, the dev->device.devargs is set NULL since the device args has been removed on device unplug;
    >           7.  rte_pci_probe() does white list scan, however, it unexpectedly skips the device because the devargs of the device is NULL now;
    >     
    >         I don’t fully understand the DPDK code, but this rte_devargs_remove() in pci_unplug() doesn’t make sense to me (when I commented it out, seems everything will work as expected). Is this a glitch or I missed anything?
    
    This is not a glitch, the behavior is intended.
    When a device is unplugged from the system, there is no expectation that
    it will reappear at the same PCI address.
    
    When an application receives the RMV event for the device, it is alerted
    that the device was removed. It is its responsibility then to detect
    when a device it wants to use is added back. It can do so via udev on
    linux for example, but it is tightly coupled with the operating system
    or even the distribution. BSD and windows won't work that way.
    
    This is the reason this responsibility was left to the application.
    
    The failsafe PMD attempts to offer transparent hotplug capability to
    existing applications. It had to deal with this exact issue. The solution
    used there has been to use either:
    
      * the exec() sub-device type, that takes a shell command as
        parameter, which should return a valid device after successful
        execution. Intention is for the user to declare a script that will
        detect device hotplug on the system using other matches than PCI
        address (as it is unreliable), such as MAC address.
    
      * the fd() sub-device type, which is a pipe where another piece of code
        will send device ID into. This is what vdev_netvsc does currently:
        it detects on its own valid NetVSC devices and when one is suitable,
        it sends its PCI id in this pipe to failsafe driver.
    
    See https://doc.dpdk.org/guides/nics/fail_safe.html
        section 55.3.1. Fail-safe command line parameters for documentation.
    
    When a new device is hotplugged into DPDK, it is described by an
    rte_devargs. This new devargs is the one that will be used for
    identification during scan and probe. As it replaces the old, the old
    one needs to be removed as it is stale.
    
    Regards,
    -- 
    Gaëtan
    


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] FW: [SPDK] Re: Potential defect in pci_unplug()
  2020-10-13  4:31     ` Niu, Yawei
@ 2020-10-13 23:05       ` Harris, James R
  0 siblings, 0 replies; 4+ messages in thread
From: Harris, James R @ 2020-10-13 23:05 UTC (permalink / raw)
  To: Niu, Yawei, Gaëtan Rivet; +Cc: dev, spdk, Vanda, Sydney M


On 10/12/20, 9:31 PM, "Niu, Yawei" <yawei.niu@intel.com> wrote:

    Thanks a lot for the reply, Gaetan!

    So seems we have to explicitly use failsafe PMD to deal with this re-plug case? I'm not quite sure if that'll cooperate with current SPDK hotplug code seamlessly.

    I think it would be great if SPDK/DPDK can fix it internally (reconstruct the devargs on re-plug somehow?) and provide a transparent hotplug capability to apps.

    Thanks
    -Niu

Yes, thanks Gaëtan.  It makes sense the DPDK only honors the PCI whitelist for the specific devices that were present at those BDFs at init time.

Niu - the failsafe PMD isn't needed here.  I think you can get what you need by avoiding the whitelist completely.  Let's continue that part of the discussion on the SPDK mailing list.

Thanks,

-Jim
 

    On 2020/10/10, 12:10 AM, "Gaëtan Rivet" <grive@u256.net> wrote:

        Hello,

        On 29/09/20 01:15 +0000, Niu, Yawei wrote:
        >
        >
        > On 2020/9/28, 11:44 PM, "Harris, James R" <james.r.harris@intel.com> wrote:
        >
        >     Hi Niu,
        >
        >     I agree, this doesn't look right.  Could you file an SPDK issue for this to make sure we track it?  And then try sending an e-mail to the DPDK mailing list?  I'm open to submitting a patch to our DPDK submodule short-term, but only if we get agreement with DPDK community that this fix is correct.
        >
        >     Thanks,
        >
        >     Jim
        >
        >
        >     On 9/28/20, 12:17 AM, "Niu, Yawei" <yawei.niu@intel.com> wrote:
        >
        >         Hi,
        >
        >         In the pci_unplug() (dpdk/drivers/bus/pci/pci_common.c), why do we call rte_devargs_remove() to remove the saved device args? That looks a defect to me, since that’ll result in the hot removed device failed to be detected when it’s plugged back (when white list option was provided), the situation is described following:
        >
        >           1.  App starts with white list option provided, let’s suppose the device in white list is: 0000:81:00.0;
        >           2.  rte_devargs_add() is called to add this device arg on rte_eal_init();
        >           3.  Issue “echo 1 > /sys/bus/pci/devices/0000\:81\:00.0/remove” to hot remove the device;
        >           4.  pci_unplug() is called to remove the device, and rte_devargs_remove() is called, so this device arg for white list is remove too;
        >           5.  Issue “echo 1 > /sys/bus/pci/rescan” then “PCI_WHITELIST=”0000:81:00.0” spdk/script/setup.sh” to do hot plug;
        >           6.  In pci_scan_one(), new dev is generated, however, the dev->device.devargs is set NULL since the device args has been removed on device unplug;
        >           7.  rte_pci_probe() does white list scan, however, it unexpectedly skips the device because the devargs of the device is NULL now;
        >
        >         I don’t fully understand the DPDK code, but this rte_devargs_remove() in pci_unplug() doesn’t make sense to me (when I commented it out, seems everything will work as expected). Is this a glitch or I missed anything?

        This is not a glitch, the behavior is intended.
        When a device is unplugged from the system, there is no expectation that
        it will reappear at the same PCI address.

        When an application receives the RMV event for the device, it is alerted
        that the device was removed. It is its responsibility then to detect
        when a device it wants to use is added back. It can do so via udev on
        linux for example, but it is tightly coupled with the operating system
        or even the distribution. BSD and windows won't work that way.

        This is the reason this responsibility was left to the application.

        The failsafe PMD attempts to offer transparent hotplug capability to
        existing applications. It had to deal with this exact issue. The solution
        used there has been to use either:

          * the exec() sub-device type, that takes a shell command as
            parameter, which should return a valid device after successful
            execution. Intention is for the user to declare a script that will
            detect device hotplug on the system using other matches than PCI
            address (as it is unreliable), such as MAC address.

          * the fd() sub-device type, which is a pipe where another piece of code
            will send device ID into. This is what vdev_netvsc does currently:
            it detects on its own valid NetVSC devices and when one is suitable,
            it sends its PCI id in this pipe to failsafe driver.

        See https://doc.dpdk.org/guides/nics/fail_safe.html
            section 55.3.1. Fail-safe command line parameters for documentation.

        When a new device is hotplugged into DPDK, it is described by an
        rte_devargs. This new devargs is the one that will be used for
        identification during scan and probe. As it replaces the old, the old
        one needs to be removed as it is stale.

        Regards,
        --
        Gaëtan




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-10-13 23:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <748FDEF7-6BB0-4D27-B630-4C991D06B6F8@intel.com>
2020-09-29  1:15 ` [dpdk-dev] FW: [SPDK] Re: Potential defect in pci_unplug() Niu, Yawei
2020-10-09 16:09   ` Gaëtan Rivet
2020-10-13  4:31     ` Niu, Yawei
2020-10-13 23:05       ` Harris, James R

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).