From: Long Li <longli@microsoft.com>
To: Matan Azrad <matan@nvidia.com>,
Stephen Hemminger <stephen@networkplumber.org>
Cc: "matan@mellanox.com" <matan@mellanox.com>,
"grive@u246.net" <grive@u246.net>, "dev@dpdk.org" <dev@dpdk.org>,
Raslan Darawsheh <rasland@nvidia.com>
Subject: Re: [dpdk-dev] [PATCH] net/vdev_netvsc: handle removal of associated pci device
Date: Fri, 25 Sep 2020 20:30:50 +0000 [thread overview]
Message-ID: <BN8PR21MB11557A4DE46111E83DC59EB0CE360@BN8PR21MB1155.namprd21.prod.outlook.com> (raw)
In-Reply-To: <MW2PR12MB2492734AD35B95CEE2FDE1A7DF200@MW2PR12MB2492.namprd12.prod.outlook.com>
HI Matan,
While troubleshooting a failure in DPDK on device removal when VF device briefly disappears and comes back, I notice the failsafe driver is trying repeatedly to start a sub device (after this sub device has been successfully configured, but later hot removed from the kernel). This is due to repeated alarms calling fs_dev_start(). I trace into this commit:
ae80146 net/failsafe: fix removed device handling
The implementation of fs_err() is interesting:
+fs_err(struct sub_device *sdev, int err)
+{
+ /* A device removal shouldn't be reported as an error. */
+ if (sdev->remove == 1 || err == -EIO)
+ return rte_errno = 0;
+ return err;
+}
If I change this function to:
@@ -497,7 +497,7 @@ int failsafe_eth_new_event_callback(uint16_t port_id
fs_err(struct sub_device *sdev, int err)
{
/* A device removal shouldn't be reported as an error. */
- if (sdev->remove == 1 || err == -EIO)
+ if (sdev->remove == 1 && err == -EIO)
return rte_errno = 0;
return err;
}
The hung is going away. I don't know the reason why we use a || in the if(). If a call to rte_eth_dev_start() returning EIO (as the case in fs_dev_start), the best choice would be bail out and fail this sub device.
Can you please take a look?
Thanks,
Long
________________________________
From: Matan Azrad <matan@nvidia.com>
Sent: Tuesday, September 15, 2020 12:00 AM
To: Long Li <longli@microsoft.com>; Stephen Hemminger <stephen@networkplumber.org>
Cc: matan@mellanox.com <matan@mellanox.com>; grive@u246.net <grive@u246.net>; dev@dpdk.org <dev@dpdk.org>; Raslan Darawsheh <rasland@nvidia.com>
Subject: RE: [dpdk-dev] [PATCH] net/vdev_netvsc: handle removal of associated pci device
Hi Li
From: Long Li <longli@microsoft.com>
> >Subject: Re: [dpdk-dev] [PATCH] net/vdev_netvsc: handle removal of
> >associated pci device
> >
> >Hi Stephen
> >
> >From: Stephen Hemminger:
> >> On Sun, 6 Sep 2020 12:38:18 +0000
> >> Matan Azrad <matan@nvidia.com> wrote:
> >>
> >> > Hi Stephen
> >> >
> >> > From: Stephen Hemminger:
> >> > > The vdev_netvsc was not detecting when the associated PCI device
> >> > > (SRIOV) was removed. Because of that it would keep feeding the
> >> > > same
> >> > > (removed) device to failsafe PMD which would then unsuccessfully
> >> > > try and probe for it.
> >> > >
> >> > > Change to use a mark/sweep method to detect that PCI device was
> >> > > removed, and also only tell failsafe about new PCI devices.
> >> > > Vdev_netvsc does not have to keep stuffing the pipe with the same
> >> > > already existing PCI device.
> >> >
> >> > As I know, the vdev_netvsc driver doesn't call to failsafe if the
> >> > PCI device is
> >> not detected by the readlink command(considered as removed)...
> >> > Am I missing something?
> >>
> >> The original code is broken because ctx_yield is not cleared, it
> >> keeps sending the same value.
> >
> >Looking on the code again, It looks like ctx->yield has no effect on
> >the next pipe write, It is just used for log.
> >
> >After the PCI interface matching to the netvsc interface, the pipe
> >write is triggered only if the readlink commands success to see the
> >plugged-in PCI
> >device:
> >readlink /sys/class/net/[iface]/device/subsystem shows "pci"
> >readlink /sys/class/net/[iface]/device shows the pci device ID.
> >
> >So, the assumption is when the above readlink failed on the interface
> >the device is removed(plugged-out) and the fd write will not happen.
> >
> >The code will continue to retry probe again and again until success
> >only for plugged-in pci device matched the netvsc device.
>
> Hi Matan,
>
> The original code keeps writing to pipe even it's the same PCI device.
Yes, the vdev_netvsc writes any plugged-in device to the associated netvsc device fd.
> The
> new code writes to pipe for a new device, only once. See the following code:
>
> + /* Skip if this is same device already sent to failsafe */
> + if (strcmp(addr, ctx->yield) == 0)
> + return 0;
>
I understand you want to optimize the pipe writing to be written only after plugged-in hot event.
The current solution suffers from race: the PCI device may be plugged-out and plugged-in in short time shorter than the driver alarm delay, then the PCI device plugged-in detection will lost.
My suggestion:
Add validation to the plugged-in device probing state and that it is owned by failsafe(using ownership API) - don't write the pipe if so.
Matan
> This patch also saves lots of CPU since it no longer writes to pipe all the time.
> You are correct about the code will continue to probe on a new PCI device.
> But someone has to do it to handle hot-add.
>
> Thanks,
> Long
>
>
> >
> >> It looks like device removal and add was never tested.
> >
> >This is basic test we have to test plug-in plug-out and it passed every
> >day in the last years.
> >
> >Maybe something new and special in your setup?
> >
> >> If you test removal you will see that vdev_netvsc:
> >> 1. Sends same PCI device repeatedly to failsafe (every alarm call)
> >> This is harmless, but useless.
> >> 2. When device is removed, keeps doing #1
next prev parent reply other threads:[~2020-09-25 20:30 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-19 17:53 Stephen Hemminger
2020-09-06 8:11 ` Long Li
2020-09-06 12:38 ` Matan Azrad
2020-09-06 18:33 ` Stephen Hemminger
2020-09-07 8:09 ` Matan Azrad
2020-09-15 4:53 ` Long Li
2020-09-15 7:00 ` Matan Azrad
2020-09-25 20:30 ` Long Li [this message]
2020-10-19 22:33 ` Thomas Monjalon
2020-10-19 22:36 ` Thomas Monjalon
2020-10-20 9:13 ` Gaëtan Rivet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BN8PR21MB11557A4DE46111E83DC59EB0CE360@BN8PR21MB1155.namprd21.prod.outlook.com \
--to=longli@microsoft.com \
--cc=dev@dpdk.org \
--cc=grive@u246.net \
--cc=matan@mellanox.com \
--cc=matan@nvidia.com \
--cc=rasland@nvidia.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).