From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gaetan.rivet@6wind.com>
Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68])
 by dpdk.org (Postfix) with ESMTP id A24047D06
 for <stable@dpdk.org>; Wed, 13 Dec 2017 17:09:29 +0100 (CET)
Received: by mail-wm0-f68.google.com with SMTP id f140so6144726wmd.2
 for <stable@dpdk.org>; Wed, 13 Dec 2017 08:09:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:content-transfer-encoding:in-reply-to
 :user-agent; bh=ekAYmRKgGJNjtvsyDpuR86rwNzJZ/py0AI89kK1njfg=;
 b=bGUBwgedkzYslAi4bsIZxti0xBUEkyreBiPHO88LBjHlwe+1nPx0Isy3SPEGsF69Rw
 fDwKtslEjpqHUU1Zo6HQNwXqNSGx6lW8qoPfy7NofC+6OGTuJUfaXmk8VYjB7P7BjjHt
 6luv634O7pzhiodZBXVQnm4TR9gmkxteNtcNt4w3JXV9dweCVMmjNlvnI/LDHoUKd26T
 y4KIV9rvWKsuMl1ICV0WwxlbLjgFZfDL31eA8BAXjgLBs8H8gtKYQ3LW8K8xTzXBTEOk
 6m2l8DLilBEgCR7BYQT48sEmX/hMIz8rkk92YHPxAyidKlupmi4yyC0rQQxn/oEEeU2U
 x1ow==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:content-transfer-encoding
 :in-reply-to:user-agent;
 bh=ekAYmRKgGJNjtvsyDpuR86rwNzJZ/py0AI89kK1njfg=;
 b=XDOWksqPMOLyxzyfwRGRlxTeEUaDB6mBVKLkydclrz8bjlVKZY404/7U2EXULCgBe+
 +Blbky0T8xFzO7jeT7Tx6tNVnADrBR48hdl4zKCyU4x3sjyOK1/m1c+axciKMSSiCiQp
 ubI15IL10WA0wW/FEL5X97A6JuK33nxAsndmTtzUWSUFyGTWZ6nhBUEYX48nvZU4d+dP
 bnSahyIUEeoDGcf7yNebRdxEBxf0tHf28r29T+ZJMa9sbWAhrl5Pi6j38FyxjT+qStmj
 hGenYPP7DXviDRLB5KoSc/Ay0g7xaTew/S90Uvr9bpF63bKPbOkFX8NFJPVD8vdpKv80
 /oKA==
X-Gm-Message-State: AKGB3mKawt72Qeklcl8KGzq8e6fuyWRmnAdp1alIlTJBWgMSkobDHMl9
 kDC+tiNPX4uPQPLA/rDVaiutcg==
X-Google-Smtp-Source: ACJfBotBN9FiGOnOw9mXPzZqxWISazS5IuyHIvl0BNkWAQx+mDBbfV4Ry3QsKj55EJoQGbHRxjR/3Q==
X-Received: by 10.80.245.26 with SMTP id t26mr8207533edm.301.1513181369225;
 Wed, 13 Dec 2017 08:09:29 -0800 (PST)
Received: from bidouze.vm.6wind.com (host.78.145.23.62.rev.coltfrance.com.
 [62.23.145.78])
 by smtp.gmail.com with ESMTPSA id z56sm1647258edb.72.2017.12.13.08.09.28
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Wed, 13 Dec 2017 08:09:28 -0800 (PST)
Date: Wed, 13 Dec 2017 17:09:16 +0100
From: =?iso-8859-1?Q?Ga=EBtan?= Rivet <gaetan.rivet@6wind.com>
To: Matan Azrad <matan@mellanox.com>
Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>,
 Thomas Monjalon <thomas@monjalon.net>,
 "dev@dpdk.org" <dev@dpdk.org>, "stable@dpdk.org" <stable@dpdk.org>
Message-ID: <20171213160916.e3rmxmhfhqz72wco@bidouze.vm.6wind.com>
References: <1509637324-13525-1-git-send-email-matan@mellanox.com>
 <1513175370-16583-1-git-send-email-matan@mellanox.com>
 <1513175370-16583-5-git-send-email-matan@mellanox.com>
 <20171213151641.g42zr7zupbsdgxsv@bidouze.vm.6wind.com>
 <HE1PR0502MB3659B4A6797E3A7A2FC78834D2350@HE1PR0502MB3659.eurprd05.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <HE1PR0502MB3659B4A6797E3A7A2FC78834D2350@HE1PR0502MB3659.eurprd05.prod.outlook.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Subject: Re: [dpdk-stable] [PATCH v2 4/4] net/failsafe: fix removed device
	handling
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Dec 2017 16:09:29 -0000

On Wed, Dec 13, 2017 at 03:48:46PM +0000, Matan Azrad wrote:
> Hi Gaetan
> Thanks for the review.
> Some comments..
> 
> > -----Original Message-----
> > From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> > Sent: Wednesday, December 13, 2017 5:17 PM
> > To: Matan Azrad <matan@mellanox.com>
> > Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Thomas Monjalon
> > <thomas@monjalon.net>; dev@dpdk.org; stable@dpdk.org
> > Subject: Re: [PATCH v2 4/4] net/failsafe: fix removed device handling
> > 
> > Hi Matan,
> > 
> > On Wed, Dec 13, 2017 at 02:29:30PM +0000, Matan Azrad wrote:
> > > There is time between the physical removal of the device until
> > > sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
> > > applications still don't know about the removal and may call
> > > sub-device control operation which should return an error.
> > >
> > > In previous code this error is reported to the application contrary to
> > > fail-safe principle that the app should not be aware of device removal.
> > >
> > > Add an removal check in each relevant control command error flow and
> > > prevent an error report to application when the sub-device is removed.
> > >
> > > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> > > Fixes: b737a1e ("net/failsafe: support flow API")
> > > Cc: stable@dpdk.org
> > >
> > 
> > This patch is not a fix.
> > It relies on an eth_dev API evolution. Without this evolution, this patch is
> > meaningless and would break compilation if backported in stable branch.
> > 
> 
> It is a fix because the bug is finally solved by this patch.
> I agree that it cannot be backported itself, but maybe all the series should be backported.
> Other idea:
> Add new patch which documents the bug and backport it.
> Remove it in this patch and remove cc stable from it.
> What do you think?
> 

I think you could write a crude version that would not rely on the
ethdev evolution (checking sdev->remove only), which would be incomplete
but still better than nothing.
And why not in this patch document the issue.
Without any dependency outside failsafe, this could be backported.

Then complete the fix with the API evolution if the new devops is
accepted.

> > Please remove those tags.
> > 
> > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > ---
> > >  drivers/net/failsafe/failsafe_flow.c    | 18 ++++++++++-------
> > >  drivers/net/failsafe/failsafe_ops.c     | 34 ++++++++++++++++++++++-----
> > ------
> > >  drivers/net/failsafe/failsafe_private.h | 10 ++++++++++
> > >  3 files changed, 44 insertions(+), 18 deletions(-)
> > 
> > < ... >
> > 
> > > +/*
> > > + * Check if sub device was removed.
> > > + */
> > > +static inline int
> > > +fs_is_removed(struct sub_device *sdev) {
> > > +	if (sdev->remove == 1 || rte_eth_dev_is_removed(PORT_ID(sdev))
> > != 0)
> > > +		return 1;
> > > +	return 0;
> > > +}
> > 
> > Have you considered adding this check within the subdev iterator itself?
> > I think it would prevent you from having to add it to each return value
> > checks.
> > 
> > It is still MT-unsafe anyway.
> >
> 
> This fix doesn't come to solve the MT issue, It comes to solve the error report to application because of removal.
> Adding the check in subdev iterator doesn't make sense for this issue.
> 
> Matan. 

If you add this check in the iterator itself, you would skip removed
devices before attempting operating upon them, right?

Then it should probably help with your issue, unless you tested it and
verified that it didnt?

Something like this:

---8<---

diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index d81cc3ca6..62ddc0689 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -316,8 +316,12 @@ fs_find_next(struct rte_eth_dev *dev,
        subs = PRIV(dev)->subs;
        tail = PRIV(dev)->subs_tail;
        while (sid < tail) {
+               if (min_state > DEV_PROBED &&
+                   fs_is_removed(&sub[sid]))
+                       goto next;
                if (subs[sid].state >= min_state)
                        break;
+next:
                sid++;
        }
        *sid_out = sid;

--->8---

Only issue being that it is completely racy, but as this MT-unsafe property is
inescapable we might as well ignore it and go for KISS.

If that's enough, I would prefer instead of having this additional check
added to all rte_eth operations.

-- 
Gaëtan Rivet
6WIND