From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by dpdk.org (Postfix) with ESMTP id 3CC741B1C3 for ; Thu, 26 Oct 2017 18:20:43 +0200 (CEST) Received: by mail-wm0-f67.google.com with SMTP id m72so9067177wmc.1 for ; Thu, 26 Oct 2017 09:20:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=ejmuJctQCkl8UInul8tVcgqf+QPc/OQwW2dkdLxCgOI=; b=g4ierjdeb5guwJnIhoesQX/dde+3EUnMAp1DQtcgID2XO/drVD+L7T/wywMWvl6n+x 03yO4w2/x1cJqxPABIOlXsIaYq4hcFhQfvnWsEMEy0qtPN3JQIQlmWzn3eQtJtkZr1+B hv1uCwHRhmB06uOOxO25AogaS35CXZhq7OXEyvukwiJypIImxwaf+LwnV+s+Icc1GXIU XzTFZbqlNrsaAHibrFakMKXriRvOO4iGylHm32ojscKdK6kzAkBICYdn0kphwVilLAMS wulrTmSDH2EKNYk9a2ciu62nCL6jhpM9EXrptuChSv5pYb7wggRMUcnscFozVxXSjV3I OZwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=ejmuJctQCkl8UInul8tVcgqf+QPc/OQwW2dkdLxCgOI=; b=bAxqyfPb0iksQEQ/66g4aQ8IiY7vic1oEVzdqrUO4ybB00owyiFNKftkvhjuft20op xM8u4Jv3QE7iQ22wCRl8KIcxCCjun61LEeCZMga+0HRUpBNvsKs4OUNn2LL/Cyxls9pD PNACcijqzBwqVJW3TxQ3uuPuCtFFttqyYvnO9O/0F5IJKX0+oR9BYaL+upxcuXOcnCKM agFwqUIfgRXpmJoarnJsrN3T1pG1w+KC1nBb2EfsuaM7saj5nYzFBZEGB41frhrFkCjo p+3YVkAi9Q4V19z7FkFh3IPHAEzpvlqoY++txmrUoocADqVrrG/XzXO4dgUtO8TQey7V d/aQ== X-Gm-Message-State: AMCzsaW9sO5x6GDUkWMILK8Ojdb6yC9xGt3xOv54sozhZeXvHWsdYOyx 4uCa2uzCaqn6A1x/VFMs0dyD9w== X-Google-Smtp-Source: ABhQp+RjurLyugTL4EJkq0k8HXzXH0Nz4+XLCuieChdZtLdZ0dpjeoVHBEiYDssBCcRHBX7CH2mL/A== X-Received: by 10.28.132.141 with SMTP id g135mr1929511wmd.37.1509034843338; Thu, 26 Oct 2017 09:20:43 -0700 (PDT) Received: from bidouze.vm.6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id f19sm1271734wmf.4.2017.10.26.09.20.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Oct 2017 09:20:42 -0700 (PDT) Date: Thu, 26 Oct 2017 18:20:29 +0200 From: =?iso-8859-1?Q?Ga=EBtan?= Rivet To: Matan Azrad Cc: dev@dpdk.org, stable@dpdk.org Message-ID: <20171026162029.GC10890@bidouze.vm.6wind.com> References: <1508651468-31866-1-git-send-email-matan@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1508651468-31866-1-git-send-email-matan@mellanox.com> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Oct 2017 16:20:44 -0000 Hello Matan, I think the commit log could be shorter. Proposing this, feel free to expand it if you prefer. ---8<--- When removing a device, the fail-safe checks that it is not within its datapath before cleaning it. When checking whether an Rx burst should be performed on a device, the remove flag is not checked. Thus the port could still enter its datapath and miss a removal round. Furthermore, there is a race between the thread removing the device and the polling thread. Check the remove flag before entering a sub-device Rx burst when in safe mode. This check mitigates the aforementioned race condition. --->8--- Otherwise, On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote: > In case of plug out, the RMV interrupt callback sets the remove flag of > the removed sub-device. The next hotplug alarm cycle should read this > flag and if the data path are clean it should remove the sub-device. > > In case of fail-safe RX burst calling from application, fail-afe tries > to call to all STARTED sub-device rx_burst functions. The remove flag > is not checked here and fail-safe may call to the removed sub-device > rx_burst function. > > The above 2 cases run in different threads and there is a race between > the removed sub-device RX clean check to the removed sub-device > rx_burst call makes the sub device RX unclean. > > If the application calls to rx_burst in loop, the probability to get RX > clean is not enough, especially when there are few sub-devices or if the > rx_burst function of the removed sub-device takes a lot of time. > > Each time the sub-device data path is unclean, the second oportunity to > check it again should be only in the hotplug alarm next cycle; the > default time between cycles is 2 seconds. > > In this loop when fail-safe tries to remove the sub-device, the > sub-device may appear back and fail-safe cannot plug it in back until > the removal process is completted. In this time fail-safe may lose the > primary sub-device services and may hurt application performance. > > This patch adds a remove flag check in safe rx_burst function. > By this way, at most one more hotplug alarm cycle is necessary > to get the sub-device clean for actual removal. > > Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions") > Cc: stable@dpdk.org > > Signed-off-by: Matan Azrad Acked-by: Gaetan Rivet > --- > drivers/net/failsafe/failsafe_rxtx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c > index 7311421..70157c8 100644 > --- a/drivers/net/failsafe/failsafe_rxtx.c > +++ b/drivers/net/failsafe/failsafe_rxtx.c > @@ -43,7 +43,8 @@ > { > return (ETH(sdev) == NULL) || > (ETH(sdev)->rx_pkt_burst == NULL) || > - (sdev->state != DEV_STARTED); > + (sdev->state != DEV_STARTED) || > + (sdev->remove != 0); > } > > static inline int > -- > 1.8.3.1 > -- Gaëtan Rivet 6WIND