From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gaetan.rivet@6wind.com>
Received: from mail-wm0-f66.google.com (mail-wm0-f66.google.com [74.125.82.66])
 by dpdk.org (Postfix) with ESMTP id 8FF311B3E7
 for <dev@dpdk.org>; Thu,  8 Feb 2018 18:19:45 +0100 (CET)
Received: by mail-wm0-f66.google.com with SMTP id x4so367464wmc.0
 for <dev@dpdk.org>; Thu, 08 Feb 2018 09:19:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:content-transfer-encoding:in-reply-to
 :user-agent; bh=kAmOzHj+OmipiXB1wxwqQ23jFJA/P3z6V2TFAtWLMyc=;
 b=DmtKN8qDHGUBzq3JWJdZ6YIdX2JtEBJ/AuRO0Usz8NYs8yHEPvSL4vdlusO6Tg2Mz7
 8vQk9wdHjgYXpgrQxatOGGBnKg0cZb6xY6du7OKmnTUWdrmWEEtvlZIKxEpz7S02lBUf
 L0SrisOOSy5IocS3G0VPftbMgKM5Sbr5bTUeIblyWaKInCK7YU5cmqecFYYjXRuOVeC2
 6HoW+Jpyw9uEAk8rfUAEbuE++bq7g3LzwI6+PapJQKbBgVAtjx8xmCu8LTWrBn29/4hd
 xWxkmZ2qCih7zy3IXELw/Dk10eMhE5ICiVahti1cyjKhb/jehVIYEChHvuekx9IF+uuE
 rg5A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:content-transfer-encoding
 :in-reply-to:user-agent;
 bh=kAmOzHj+OmipiXB1wxwqQ23jFJA/P3z6V2TFAtWLMyc=;
 b=EdWb+rOqaK54mAlWJwRqu1EgIhWPVFTLmzXpeybWfGamM0dybUvfH68I2n+haBqGaf
 Zd2I5/6m2Vitk6eEAT0GwejSDDimt6zSW4VV26mygyFusBs7nkMX47Y2z5ZHRFfYcoVa
 nuJ/LWASwF2RNOjOPjbP3mSR+XBJyy1zhaCVFOI3lqKOGQgwbELMJ/esYma7LhDbnRVa
 jgoz173hm409NzsXdE8+kuH7S2HddVCEeV7dXWnFfqFjKpvRlwaeafqmE2+Pss6oC9o7
 BtySA82i9jvgT7g6QPSJSwyrtuJYMT2ceXUVNTxxYWc8mOXLQYy1bjyVrVOI3ygpyJj+
 NzEw==
X-Gm-Message-State: APf1xPCHj1bUvgz2KNdyYh/ALMoZ1U4PvdLu48n7ZZ0xXme5FikmPGQ4
 /ohB1gHPs1FA+rkAR5u2X8a2KrIU
X-Google-Smtp-Source: AH8x225MxfpdyZlcQ99WgH8ydvJBqBxmsLjDQcX6tlt8yJ+V/i1HGPKZgZR2KwgJLJRW0aIhh/DyPg==
X-Received: by 10.28.159.7 with SMTP id i7mr51426wme.57.1518110385035;
 Thu, 08 Feb 2018 09:19:45 -0800 (PST)
Received: from bidouze.vm.6wind.com (host.78.145.23.62.rev.coltfrance.com.
 [62.23.145.78])
 by smtp.gmail.com with ESMTPSA id a139sm414167wme.1.2018.02.08.09.19.43
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Thu, 08 Feb 2018 09:19:44 -0800 (PST)
Date: Thu, 8 Feb 2018 18:19:31 +0100
From: =?iso-8859-1?Q?Ga=EBtan?= Rivet <gaetan.rivet@6wind.com>
To: Matan Azrad <matan@mellanox.com>
Cc: dev@dpdk.org, stable@dpdk.org
Message-ID: <20180208171931.oxiqp6433pmft36m@bidouze.vm.6wind.com>
References: <1518092427-4333-1-git-send-email-matan@mellanox.com>
 <1518107653-15466-1-git-send-email-matan@mellanox.com>
 <1518107653-15466-3-git-send-email-matan@mellanox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1518107653-15466-3-git-send-email-matan@mellanox.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Subject: Re: [dpdk-dev] [PATCH v5 2/3] net/failsafe: fix removal scope
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Feb 2018 17:19:45 -0000

Hi Matan,

Thanks for dealing with this.

On Thu, Feb 08, 2018 at 04:34:12PM +0000, Matan Azrad wrote:
> Fail-safe PMD uses per sub-device flag called "remove" to indicate the
> scope where the sub-device isn't synchronized with the fail-safe state.
> 
> This flag is set when fail-safe gets RMV notification about the
> physical removal of the sub-device and should be unset when the
> sub-device completes all the configurations cause it to arrive to the
> fail-safe state.
> 
> The previous code wrongly unsets the flag after calling to the
> sub-device PMD dev_configure() operation and before all the
> configurations were done.
> 
> Change the remove flag unsetting to be only after the sub-device
> successes to arrive to the fail-safe state.
> 

I'm not sure this is the right way to do this.
I think it's clear that it was a mistake to set sdev->remove to 0
only during fs_dev_configure.

The flag itself only means "there is something to be done on this
device, please clean up".

Once the clean-up has happened, then the flag is not necessary anymore
and should be reset.

So I thought that this fix would actually put the flag reset within
fs_dev_remove, right before reinstalling the hotplug alarm.

At this point, the device state would have been set back to
DEV_UNDEFINED, so the remove flag is unnecessary for any operation
trying to avoid unplugged slaves.

The "remove" flag is initialized at 0 when sub-devices are allocated
(during fail-safe init). This means that there would be a difference in
the state of the slave between its first initialization and any
subsequent init, after one successful plugout.

> Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  drivers/net/failsafe/failsafe_ether.c | 2 ++
>  drivers/net/failsafe/failsafe_ops.c   | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index 4c6e938..ca42376 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -377,6 +377,8 @@
>  				      i);
>  				goto err_remove;
>  			}
> +			if (PRIV(dev)->state < DEV_STARTED)
> +				sdev->remove = 0;

Here the remove flag should already be 0. If it isn't, this is a
(logical) bug, which should be properly addressed instead of patched
in this way.

>  		}
>  	}
>  	/*
> diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> index 7a67e16..a7c2dba 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -131,7 +131,6 @@
>  			dev->data->dev_conf.intr_conf.lsc = 0;
>  		}
>  		DEBUG("Configuring sub-device %d", i);
> -		sdev->remove = 0;

This is correct.

>  		ret = rte_eth_dev_configure(PORT_ID(sdev),
>  					dev->data->nb_rx_queues,
>  					dev->data->nb_tx_queues,
> @@ -197,6 +196,7 @@
>  			return ret;
>  		}
>  		sdev->state = DEV_STARTED;
> +		sdev->remove = 0;

This seems unnecessary, if this operation was already performed once the
device has been properly removed.

>  	}
>  	if (PRIV(dev)->state < DEV_STARTED)
>  		PRIV(dev)->state = DEV_STARTED;
> -- 
> 1.8.3.1
> 

-- 
Gaƫtan Rivet
6WIND