patches for DPDK stable branches
 help / color / mirror / Atom feed
From: "Gaëtan Rivet" <grive@u256.net>
To: Long Li <longli@linuxonhyperv.com>
Cc: dev@dpdk.org, Long Li <longli@microsoft.com>, stable@dpdk.org
Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] net/failsafe: check correct error code while handling sub-device add
Date: Fri, 9 Oct 2020 18:20:03 +0200	[thread overview]
Message-ID: <20201009162003.5ucroctwjpwhv64f@u256.net> (raw)
In-Reply-To: <20201005094215.u4kt64ycbk35kbeg@u256.net>

On 05/10/20 11:42 +0200, Gaëtan Rivet wrote:
> Hi,
> 
> On 02/10/20 17:01 -0700, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> > 
> > When adding a sub-device, it's possible that the sub-device is configured
> > successfully but later fails to start. This error should not be masked.
> 
> Some of those errors are meant to be masked: -EIO, when the device is
> marked as removed at the ethdev level (see eth_err() in rte_ethdev.c:819).
> 
> > The driver needs to check the error status to prevent endless loop of
> > trying to start the sub-device.
> 
> If the ethdev layer error is due to the device being removed, and
> failsafe loops on trying to sync the eth device to its own state, then
> an RMV event should have been emitted but wasn't or it was missed by
> failsafe.
> 
> If the ethdev layer error is *not* due to the device being removed, the
> error should be != -EIO, and sdev->remove should not be set, so fs_err()
> should not mask it and it should be seen by the app.
> 
> Can you provide the following details:
> 
>  * What is the return code of rte_eth_dev_start() that is masked in your
>    start loop?
> 
>  * Is the device marked as removed in failsafe?
> 
>  * Is the device marked as removed in ethdev?
> 
>  * Was there an RMV event generated for the device? Whether yes or no,
>    is it correct?
> 
> Thanks,
> 

Hello Li,

I've found the previous mail thread [1] where you described how you got this
error. In your description, you say that you try unplug then quick
replug, before any event is processed?

If that's the case, it seems a clear race condition, and an issue of
missing the removal event of the device. I would not say yet that the
bug is in failsafe, but it could be in ethdev.

Can you please check whether the device removal event was properly
generated in rte_ethdev? Failsafe (and any other hotplug support layer
actually) will depend on it so it should be first checked to work.

Thanks,

[1]: http://mails.dpdk.org/archives/dev/2020-September/182977.html

-- 
Gaëtan

  reply	other threads:[~2020-10-09 16:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-03  0:01 [dpdk-stable] " Long Li
2020-10-05  9:42 ` Gaëtan Rivet
2020-10-09 16:20   ` Gaëtan Rivet [this message]
2020-10-09 20:30     ` [dpdk-stable] [dpdk-dev] " Long Li
2020-10-12 14:22       ` Gaëtan Rivet
2020-10-13 17:14         ` Long Li
2020-10-15 10:11 ` [dpdk-stable] " Gaëtan Rivet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201009162003.5ucroctwjpwhv64f@u256.net \
    --to=grive@u256.net \
    --cc=dev@dpdk.org \
    --cc=longli@linuxonhyperv.com \
    --cc=longli@microsoft.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).