DPDK patches and discussions
 help / color / mirror / Atom feed
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Ferruh Yigit <ferruh.yigit@intel.com>
Cc: Matan Azrad <matan@mellanox.com>,
	Gaetan Rivet <gaetan.rivet@6wind.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v6 1/6] ethdev: add devop to check removal status
Date: Thu, 18 Jan 2018 18:57:57 +0100	[thread overview]
Message-ID: <20180118175757.GT4256@6wind.com> (raw)
In-Reply-To: <2ffac5e9-6be5-0c82-18c4-8b72710630ae@intel.com>

On Thu, Jan 18, 2018 at 05:18:22PM +0000, Ferruh Yigit wrote:
> On 1/18/2018 11:27 AM, Matan Azrad wrote:
> > There is time between the physical removal of the device until PMDs get
> > a RMV interrupt. At this time DPDK PMDs and applications still don't
> > know about the removal.
> > 
> > Current removal detection is achieved only by registration to device RMV
> > event and the notification comes asynchronously. So, there is no option
> > to detect a device removal synchronously.
> > Applications and other DPDK entities may want to check a device removal
> > synchronously and to take an immediate decision accordingly.
> 
> So we will have two methods to detect device removal, one is asynchronous as you
> mentioned.
> Device removal will cause an interrupt which trigger to run user callback.
> 
> New method is synchronous, but still triggered from application. I mean
> application should do a rte_eth_dev_is_removed() to learn about status, what is
> the use case here, polling continuously? Won't this also cause some latency
> unless you dedicate a core just polling device status?

They are complementary. The use case is when devices get suddenly physically
pulled out of their chassis (you need to picture a raging sysadmin for
that), or logically in the case of a hypervisor removing a SR-IOV device
from a VM, this happens without prior notice.

It takes time for the PCI unplug notification to travel from the kernel to
DPDK, up to several seconds, during which the DPDK application may execute
control path operations on it. These may fail due to the now non-existent
device (e.g. no ACK will be returned by the device after adding a new MAC),
and these failures may be misinterpreted (e.g. permission denied, invalid
argument and so on).

To address this problem, PMDs that support physical hotplug must have all
their devops internally check for device removal before returning any other
error, in order to possibly convert the original error code to EIO.

Now patching each and every devop in each PMD with basically the same code
being counterproductive, this series puts this check at a higher level,
inside rte_ethdev. Since this results in a new devop, it can be exposed to
applications for free, as these may find a use for it as well.

> > Add new dev op called is_removed to allow DPDK entities to check an
> > Ethernet device removal status immediately.
> > 
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > Acked-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
> >  lib/librte_ether/rte_ethdev.c           | 28 +++++++++++++++++++++++++---
> >  lib/librte_ether/rte_ethdev.h           | 20 ++++++++++++++++++++
> >  lib/librte_ether/rte_ethdev_version.map |  1 +
> >  3 files changed, 46 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > index b349599..c93cec1 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -114,7 +114,8 @@ enum {
> >  rte_eth_find_next(uint16_t port_id)
> >  {
> >  	while (port_id < RTE_MAX_ETHPORTS &&
> > -	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED)
> > +	       rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
> > +	       rte_eth_devices[port_id].state != RTE_ETH_DEV_REMOVED)
> 
> If device is removed, why we are not allowed to re-use port_id assigned to it?
> Overall I am not clear with RTE_ETH_DEV_REMOVED state, why we are not directly
> setting RTE_ETH_DEV_UNUSED?
> 
> And state RTE_ETH_DEV_REMOVED set in ethdev layer, and ethdev layer won't let
> reusing it, so what changes the state of dev? Will it stay as it is during
> lifetime of the application?

While it switched to the REMOVED state, the underlying PMD still holds the
entry at this point; data is still allocated and so on. It will switch to
UNUSED after the PMD instance is fully de-initialized. In the meantime the
entry still needs to be skipped.

> >  		port_id++;
> >  
> >  	if (port_id >= RTE_MAX_ETHPORTS)
> > @@ -262,8 +263,7 @@ struct rte_eth_dev *
> >  rte_eth_dev_is_valid_port(uint16_t port_id)
> >  {
> >  	if (port_id >= RTE_MAX_ETHPORTS ||
> > -	    (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
> > -	     rte_eth_devices[port_id].state != RTE_ETH_DEV_DEFERRED))
> > +	    (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED))
> >  		return 0;
> >  	else
> >  		return 1;
> > @@ -1094,6 +1094,28 @@ struct rte_eth_dev *
> >  }
> >  
> >  int
> > +rte_eth_dev_is_removed(uint16_t port_id)
> > +{
> > +	struct rte_eth_dev *dev;
> > +	int ret;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
> > +
> > +	if (dev->state == RTE_ETH_DEV_REMOVED)
> > +		return 1;
> 
> Isn't this conflict with below API documentation:
> 
> "
>  * @return
>  *   - 0 when the Ethernet device is removed, otherwise 1.
> "

Documentation is indeed wrong here. Matan?

> 
> > +
> > +	ret = dev->dev_ops->is_removed(dev);
> > +	if (ret != 0)
> > +		dev->state = RTE_ETH_DEV_REMOVED;
> 
> It isn't clear what "dev_ops->is_removed(dev)" should return, and this causing
> incompatible usages in PMDs by time.
> Please add some documentation about expected return values for dev_ops.

It should be clarified as a boolean value (yes = nonzero, no = zero), like
most is*() functions (isalpha(), isblank() and so on).

> And this not real remove, PMD signals us and we stop using that device, but
> device can be there, right?

"Removal" in the sense of "device removal" not "PMD removal" which is
usually described as "unbinding". This was chosen based on the
similarly-named "removal" (RMV) event for consistency.

> If there is a real removal, can be possible to use eal hotplug?

Possibly, although I think it doesn't remove the case for this devop, right?

-- 
Adrien Mazarguil
6WIND

  reply	other threads:[~2018-01-18 17:58 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-02 15:42 [dpdk-dev] [PATCH 0/3] Fail-safe fix removal handling lack Matan Azrad
2017-11-02 15:42 ` [dpdk-dev] [PATCH 1/3] net/failsafe: " Matan Azrad
2017-11-06  8:19   ` Gaëtan Rivet
2017-11-02 15:42 ` [dpdk-dev] [PATCH 2/3] net/mlx4: adjust removal error Matan Azrad
2017-11-03 13:05   ` Adrien Mazarguil
2017-11-05  6:52     ` Matan Azrad
2017-11-06 16:51       ` Adrien Mazarguil
2017-11-02 15:42 ` [dpdk-dev] [PATCH 3/3] net/mlx5: " Matan Azrad
2017-11-03 13:06   ` Adrien Mazarguil
2017-11-05  6:57     ` Matan Azrad
2017-12-13 14:29 ` [dpdk-dev] [PATCH v2 0/4] Fail-safe fix removal handling lack Matan Azrad
2017-12-13 14:29   ` [dpdk-dev] [PATCH v2 1/4] ethdev: add devop to check removal status Matan Azrad
2017-12-13 14:29   ` [dpdk-dev] [PATCH v2 2/4] net/mlx4: support a device removal check operation Matan Azrad
2017-12-13 14:29   ` [dpdk-dev] [PATCH v2 3/4] net/mlx5: " Matan Azrad
2017-12-13 14:29   ` [dpdk-dev] [PATCH v2 4/4] net/failsafe: fix removed device handling Matan Azrad
2017-12-13 15:16     ` Gaëtan Rivet
2017-12-13 15:48       ` Matan Azrad
2017-12-13 16:09         ` Gaëtan Rivet
2017-12-13 17:09           ` Thomas Monjalon
2017-12-14 10:40             ` Matan Azrad
2017-12-13 21:55           ` Gaëtan Rivet
2017-12-14 10:40             ` Matan Azrad
2017-12-14 10:48               ` Gaëtan Rivet
2017-12-14 13:07                 ` Matan Azrad
2017-12-14 13:27                   ` Gaëtan Rivet
2017-12-14 14:43                     ` Matan Azrad
2017-12-19 17:10   ` [dpdk-dev] [PATCH v3 0/6] Fail-safe\ethdev: fix removal handling lack Matan Azrad
2017-12-19 17:10     ` [dpdk-dev] [PATCH v3 1/6] ethdev: add devop to check removal status Matan Azrad
2017-12-19 17:20       ` Stephen Hemminger
2017-12-19 17:24         ` Matan Azrad
2017-12-19 20:51           ` Thomas Monjalon
2017-12-19 22:13             ` Gaëtan Rivet
2017-12-20  8:39               ` Matan Azrad
2018-01-07  9:53       ` Thomas Monjalon
2017-12-19 17:10     ` [dpdk-dev] [PATCH v3 2/6] net/mlx4: support a device removal check operation Matan Azrad
2017-12-19 17:10     ` [dpdk-dev] [PATCH v3 3/6] net/mlx5: " Matan Azrad
2017-12-19 17:10     ` [dpdk-dev] [PATCH v3 4/6] ethdev: adjust APIs removal error report Matan Azrad
2018-01-07  9:56       ` Thomas Monjalon
2017-12-19 17:10     ` [dpdk-dev] [PATCH v3 5/6] ethdev: adjust flow " Matan Azrad
2018-01-07  9:58       ` Thomas Monjalon
2017-12-19 17:10     ` [dpdk-dev] [PATCH v3 6/6] net/failsafe: fix removed device handling Matan Azrad
2017-12-19 22:21       ` Gaëtan Rivet
2017-12-20 10:58         ` Matan Azrad
2018-01-08 10:57           ` Gaëtan Rivet
2018-01-08 12:55             ` Matan Azrad
2018-01-08 13:46               ` Gaëtan Rivet
2018-01-08 14:00                 ` Matan Azrad
2018-01-08 14:31                   ` Gaëtan Rivet
2018-01-10 12:30     ` [dpdk-dev] [PATCH v4 0/6] Fail-safe\ethdev: fix removal handling lack Matan Azrad
2018-01-10 12:31       ` [dpdk-dev] [PATCH v4 1/6] ethdev: add devop to check removal status Matan Azrad
2018-01-10 12:31       ` [dpdk-dev] [PATCH v4 2/6] net/mlx4: support a device removal check operation Matan Azrad
2018-01-10 12:31       ` [dpdk-dev] [PATCH v4 3/6] net/mlx5: " Matan Azrad
2018-01-10 12:31       ` [dpdk-dev] [PATCH v4 4/6] ethdev: adjust APIs removal error report Matan Azrad
2018-01-10 12:31       ` [dpdk-dev] [PATCH v4 5/6] ethdev: adjust flow " Matan Azrad
2018-01-10 12:31       ` [dpdk-dev] [PATCH v4 6/6] net/failsafe: fix removed device handling Matan Azrad
2018-01-10 12:43         ` Matan Azrad
2018-01-10 13:51           ` Gaëtan Rivet
2018-01-10 13:47         ` Gaëtan Rivet
2018-01-17 20:19       ` [dpdk-dev] [PATCH v5 0/6] Fail-safe\ethdev: fix removal handling lack Matan Azrad
2018-01-17 20:19         ` [dpdk-dev] [PATCH v5 1/6] ethdev: add devop to check removal status Matan Azrad
2018-01-17 20:40           ` Ferruh Yigit
2018-01-17 20:19         ` [dpdk-dev] [PATCH v5 2/6] net/mlx4: support a device removal check operation Matan Azrad
2018-01-17 20:19         ` [dpdk-dev] [PATCH v5 3/6] net/mlx5: " Matan Azrad
2018-01-17 20:19         ` [dpdk-dev] [PATCH v5 4/6] ethdev: adjust APIs removal error report Matan Azrad
2018-01-17 20:19         ` [dpdk-dev] [PATCH v5 5/6] ethdev: adjust flow " Matan Azrad
2018-01-17 20:19         ` [dpdk-dev] [PATCH v5 6/6] net/failsafe: fix removed device handling Matan Azrad
2018-01-18  8:44           ` Gaëtan Rivet
2018-01-18 11:27         ` [dpdk-dev] [PATCH v6 0/6] Fail-safe\ethdev: fix removal handling lack Matan Azrad
2018-01-18 11:27           ` [dpdk-dev] [PATCH v6 1/6] ethdev: add devop to check removal status Matan Azrad
2018-01-18 17:18             ` Ferruh Yigit
2018-01-18 17:57               ` Adrien Mazarguil [this message]
2018-01-18 18:02               ` Matan Azrad
2018-01-18 11:27           ` [dpdk-dev] [PATCH v6 2/6] net/mlx4: support a device removal check operation Matan Azrad
2018-01-18 16:59             ` Adrien Mazarguil
2018-01-18 11:27           ` [dpdk-dev] [PATCH v6 3/6] net/mlx5: " Matan Azrad
2018-01-18 16:59             ` Adrien Mazarguil
2018-01-18 11:27           ` [dpdk-dev] [PATCH v6 4/6] ethdev: adjust APIs removal error report Matan Azrad
2018-01-18 17:31             ` Ferruh Yigit
2018-01-18 18:10               ` Matan Azrad
2018-01-19 16:19                 ` Ferruh Yigit
2018-01-19 17:35                   ` Ananyev, Konstantin
2018-01-19 17:54                   ` Thomas Monjalon
2018-01-19 18:13                     ` Ferruh Yigit
2018-01-19 18:16                       ` Thomas Monjalon
2018-01-20 19:04                         ` Matan Azrad
2018-01-20 20:28                           ` Thomas Monjalon
2018-01-20 20:45                             ` Matan Azrad
2018-01-21 20:07                   ` Ferruh Yigit
2018-01-18 11:27           ` [dpdk-dev] [PATCH v6 5/6] ethdev: adjust flow " Matan Azrad
2018-01-18 11:27           ` [dpdk-dev] [PATCH v6 6/6] net/failsafe: fix removed device handling Matan Azrad
2018-01-20 21:12           ` [dpdk-dev] [PATCH v7 0/6] Fail-safe\ethdev: fix removal handling lack Matan Azrad
2018-01-20 21:12             ` [dpdk-dev] [PATCH v7 1/6] ethdev: add devop to check removal status Matan Azrad
2018-01-20 21:12             ` [dpdk-dev] [PATCH v7 2/6] net/mlx4: support a device removal check operation Matan Azrad
2018-01-20 21:12             ` [dpdk-dev] [PATCH v7 3/6] net/mlx5: " Matan Azrad
2018-01-20 21:12             ` [dpdk-dev] [PATCH v7 4/6] ethdev: adjust APIs removal error report Matan Azrad
2018-01-20 21:12             ` [dpdk-dev] [PATCH v7 5/6] ethdev: adjust flow " Matan Azrad
2018-01-20 21:12             ` [dpdk-dev] [PATCH v7 6/6] net/failsafe: fix removed device handling Matan Azrad
2018-01-21 20:28             ` [dpdk-dev] [PATCH v7 0/6] Fail-safe\ethdev: fix removal handling lack Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180118175757.GT4256@6wind.com \
    --to=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=gaetan.rivet@6wind.com \
    --cc=matan@mellanox.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).