From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 70BA41C00; Thu, 26 Oct 2017 21:10:23 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP; 26 Oct 2017 12:10:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,301,1505804400"; d="scan'208";a="914147310" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.24.13.44]) ([10.24.13.44]) by FMSMGA003.fm.intel.com with ESMTP; 26 Oct 2017 12:10:21 -0700 To: =?UTF-8?Q?Ga=c3=abtan_Rivet?= , Matan Azrad Cc: dev@dpdk.org, stable@dpdk.org References: <1508651468-31866-1-git-send-email-matan@mellanox.com> <20171026162029.GC10890@bidouze.vm.6wind.com> From: Ferruh Yigit Message-ID: <39b75f3c-7631-7824-587b-4fd3549b1741@intel.com> Date: Thu, 26 Oct 2017 12:10:21 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171026162029.GC10890@bidouze.vm.6wind.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Oct 2017 19:10:24 -0000 On 10/26/2017 9:20 AM, Gaƫtan Rivet wrote: > Hello Matan, > > I think the commit log could be shorter. > Proposing this, feel free to expand it if you prefer. > > ---8<--- > > When removing a device, the fail-safe checks that it is not within its > datapath before cleaning it. > > When checking whether an Rx burst should be performed on a device, the > remove flag is not checked. Thus the port could still enter its datapath > and miss a removal round. Furthermore, there is a race between the > thread removing the device and the polling thread. > > Check the remove flag before entering a sub-device Rx burst when in safe > mode. This check mitigates the aforementioned race condition. > > --->8--- > > Otherwise, > > On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote: >> In case of plug out, the RMV interrupt callback sets the remove flag of >> the removed sub-device. The next hotplug alarm cycle should read this >> flag and if the data path are clean it should remove the sub-device. >> >> In case of fail-safe RX burst calling from application, fail-afe tries >> to call to all STARTED sub-device rx_burst functions. The remove flag >> is not checked here and fail-safe may call to the removed sub-device >> rx_burst function. >> >> The above 2 cases run in different threads and there is a race between >> the removed sub-device RX clean check to the removed sub-device >> rx_burst call makes the sub device RX unclean. >> >> If the application calls to rx_burst in loop, the probability to get RX >> clean is not enough, especially when there are few sub-devices or if the >> rx_burst function of the removed sub-device takes a lot of time. >> >> Each time the sub-device data path is unclean, the second oportunity to >> check it again should be only in the hotplug alarm next cycle; the >> default time between cycles is 2 seconds. >> >> In this loop when fail-safe tries to remove the sub-device, the >> sub-device may appear back and fail-safe cannot plug it in back until >> the removal process is completted. In this time fail-safe may lose the >> primary sub-device services and may hurt application performance. >> >> This patch adds a remove flag check in safe rx_burst function. >> By this way, at most one more hotplug alarm cycle is necessary >> to get the sub-device clean for actual removal. >> >> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions") >> Cc: stable@dpdk.org >> >> Signed-off-by: Matan Azrad > > Acked-by: Gaetan Rivet Applied to dpdk-next-net/master, thanks. (used suggested commit log, thanks.)