From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68]) by dpdk.org (Postfix) with ESMTP id EEE681B1C3 for ; Thu, 26 Oct 2017 18:20:43 +0200 (CEST) Received: by mail-wm0-f68.google.com with SMTP id q124so9066155wmb.0 for ; Thu, 26 Oct 2017 09:20:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=ejmuJctQCkl8UInul8tVcgqf+QPc/OQwW2dkdLxCgOI=; b=g4ierjdeb5guwJnIhoesQX/dde+3EUnMAp1DQtcgID2XO/drVD+L7T/wywMWvl6n+x 03yO4w2/x1cJqxPABIOlXsIaYq4hcFhQfvnWsEMEy0qtPN3JQIQlmWzn3eQtJtkZr1+B hv1uCwHRhmB06uOOxO25AogaS35CXZhq7OXEyvukwiJypIImxwaf+LwnV+s+Icc1GXIU XzTFZbqlNrsaAHibrFakMKXriRvOO4iGylHm32ojscKdK6kzAkBICYdn0kphwVilLAMS wulrTmSDH2EKNYk9a2ciu62nCL6jhpM9EXrptuChSv5pYb7wggRMUcnscFozVxXSjV3I OZwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=ejmuJctQCkl8UInul8tVcgqf+QPc/OQwW2dkdLxCgOI=; b=WvlgMUDypRYOdtiWPDepg0pecBLsEi9/bRIKlBriUwQa66rYO9jrbnjHgDikHkn455 PXntAMo4IQ5oGSAhEudENSey1KptCcG5THYwU/aw2Udx/8LlfIzB+znC4WIMM39paNt8 vup/f524qQ7w9OaG6rEX3Ss4X3KOLstq1XuaOzSd4u+5g1zv/PSRdn6o50B/RqMW69bj O9RFpma6Allhyc+wFiUJFICHR5/gkn31HgCvp6EjrKV9tpyEXS8zF+U70NTx17VnGnS3 hJ5AYgPFdP0efItG24ipWi2RFsFwLo7Dp+vdkcMSRdZlCp9tSRg3evm+w1JGMPAzrq2R T8dw== X-Gm-Message-State: AMCzsaUJde7bZcIjKGMb9ZSoq1lKipZlf7JK1H6O4i3htnOdNmjfhYyx Bnlvt0mMm6UfslxID7su6NFreA== X-Google-Smtp-Source: ABhQp+RjurLyugTL4EJkq0k8HXzXH0Nz4+XLCuieChdZtLdZ0dpjeoVHBEiYDssBCcRHBX7CH2mL/A== X-Received: by 10.28.132.141 with SMTP id g135mr1929511wmd.37.1509034843338; Thu, 26 Oct 2017 09:20:43 -0700 (PDT) Received: from bidouze.vm.6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id f19sm1271734wmf.4.2017.10.26.09.20.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Oct 2017 09:20:42 -0700 (PDT) Date: Thu, 26 Oct 2017 18:20:29 +0200 From: =?iso-8859-1?Q?Ga=EBtan?= Rivet To: Matan Azrad Cc: dev@dpdk.org, stable@dpdk.org Message-ID: <20171026162029.GC10890@bidouze.vm.6wind.com> References: <1508651468-31866-1-git-send-email-matan@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1508651468-31866-1-git-send-email-matan@mellanox.com> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-stable] [PATCH] net/failsafe: fix Rx clean race X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Oct 2017 16:20:44 -0000 Hello Matan, I think the commit log could be shorter. Proposing this, feel free to expand it if you prefer. ---8<--- When removing a device, the fail-safe checks that it is not within its datapath before cleaning it. When checking whether an Rx burst should be performed on a device, the remove flag is not checked. Thus the port could still enter its datapath and miss a removal round. Furthermore, there is a race between the thread removing the device and the polling thread. Check the remove flag before entering a sub-device Rx burst when in safe mode. This check mitigates the aforementioned race condition. --->8--- Otherwise, On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote: > In case of plug out, the RMV interrupt callback sets the remove flag of > the removed sub-device. The next hotplug alarm cycle should read this > flag and if the data path are clean it should remove the sub-device. > > In case of fail-safe RX burst calling from application, fail-afe tries > to call to all STARTED sub-device rx_burst functions. The remove flag > is not checked here and fail-safe may call to the removed sub-device > rx_burst function. > > The above 2 cases run in different threads and there is a race between > the removed sub-device RX clean check to the removed sub-device > rx_burst call makes the sub device RX unclean. > > If the application calls to rx_burst in loop, the probability to get RX > clean is not enough, especially when there are few sub-devices or if the > rx_burst function of the removed sub-device takes a lot of time. > > Each time the sub-device data path is unclean, the second oportunity to > check it again should be only in the hotplug alarm next cycle; the > default time between cycles is 2 seconds. > > In this loop when fail-safe tries to remove the sub-device, the > sub-device may appear back and fail-safe cannot plug it in back until > the removal process is completted. In this time fail-safe may lose the > primary sub-device services and may hurt application performance. > > This patch adds a remove flag check in safe rx_burst function. > By this way, at most one more hotplug alarm cycle is necessary > to get the sub-device clean for actual removal. > > Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions") > Cc: stable@dpdk.org > > Signed-off-by: Matan Azrad Acked-by: Gaetan Rivet > --- > drivers/net/failsafe/failsafe_rxtx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c > index 7311421..70157c8 100644 > --- a/drivers/net/failsafe/failsafe_rxtx.c > +++ b/drivers/net/failsafe/failsafe_rxtx.c > @@ -43,7 +43,8 @@ > { > return (ETH(sdev) == NULL) || > (ETH(sdev)->rx_pkt_burst == NULL) || > - (sdev->state != DEV_STARTED); > + (sdev->state != DEV_STARTED) || > + (sdev->remove != 0); > } > > static inline int > -- > 1.8.3.1 > -- Gaëtan Rivet 6WIND