DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race
@ 2017-10-22  5:51 Matan Azrad
  2017-10-26 16:20 ` Gaëtan Rivet
  0 siblings, 1 reply; 3+ messages in thread
From: Matan Azrad @ 2017-10-22  5:51 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev, stable

In case of plug out, the RMV interrupt callback sets the remove flag of
the removed sub-device. The next hotplug alarm cycle should read this
flag and if the data path are clean it should remove the sub-device.

In case of fail-safe RX burst calling from application, fail-afe tries
to call to all STARTED sub-device rx_burst functions. The remove flag
is not checked here and fail-safe may call to the removed sub-device
rx_burst function.

The above 2 cases run in different threads and there is a race between
the removed sub-device RX clean check to the removed sub-device
rx_burst call makes the sub device RX unclean.

If the application calls to rx_burst in loop, the probability to get RX
clean is not enough, especially when there are few sub-devices or if the
rx_burst function of the removed sub-device takes a lot of time.

Each time the sub-device data path is unclean, the second oportunity to
check it again should be only in the hotplug alarm next cycle; the
default time between cycles is 2 seconds.

In this loop when fail-safe tries to remove the sub-device, the
sub-device may appear back and fail-safe cannot plug it in back until
the removal process is completted. In this time fail-safe may lose the
primary sub-device services and may hurt application performance.

This patch adds a remove flag check in safe rx_burst function.
By this way, at most one more hotplug alarm cycle is necessary
to get the sub-device clean for actual removal.

Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
Cc: stable@dpdk.org

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/failsafe/failsafe_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
index 7311421..70157c8 100644
--- a/drivers/net/failsafe/failsafe_rxtx.c
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -43,7 +43,8 @@
 {
 	return (ETH(sdev) == NULL) ||
 		(ETH(sdev)->rx_pkt_burst == NULL) ||
-		(sdev->state != DEV_STARTED);
+		(sdev->state != DEV_STARTED) ||
+		(sdev->remove != 0);
 }
 
 static inline int
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race
  2017-10-22  5:51 [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race Matan Azrad
@ 2017-10-26 16:20 ` Gaëtan Rivet
  2017-10-26 19:10   ` Ferruh Yigit
  0 siblings, 1 reply; 3+ messages in thread
From: Gaëtan Rivet @ 2017-10-26 16:20 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, stable

Hello Matan,

I think the commit log could be shorter.
Proposing this, feel free to expand it if you prefer.

---8<---

When removing a device, the fail-safe checks that it is not within its
datapath before cleaning it.

When checking whether an Rx burst should be performed on a device, the
remove flag is not checked. Thus the port could still enter its datapath
and miss a removal round. Furthermore, there is a race between the
thread removing the device and the polling thread.

Check the remove flag before entering a sub-device Rx burst when in safe
mode. This check mitigates the aforementioned race condition.

--->8---

Otherwise,

On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote:
> In case of plug out, the RMV interrupt callback sets the remove flag of
> the removed sub-device. The next hotplug alarm cycle should read this
> flag and if the data path are clean it should remove the sub-device.
> 
> In case of fail-safe RX burst calling from application, fail-afe tries
> to call to all STARTED sub-device rx_burst functions. The remove flag
> is not checked here and fail-safe may call to the removed sub-device
> rx_burst function.
> 
> The above 2 cases run in different threads and there is a race between
> the removed sub-device RX clean check to the removed sub-device
> rx_burst call makes the sub device RX unclean.
> 
> If the application calls to rx_burst in loop, the probability to get RX
> clean is not enough, especially when there are few sub-devices or if the
> rx_burst function of the removed sub-device takes a lot of time.
> 
> Each time the sub-device data path is unclean, the second oportunity to
> check it again should be only in the hotplug alarm next cycle; the
> default time between cycles is 2 seconds.
> 
> In this loop when fail-safe tries to remove the sub-device, the
> sub-device may appear back and fail-safe cannot plug it in back until
> the removal process is completted. In this time fail-safe may lose the
> primary sub-device services and may hurt application performance.
> 
> This patch adds a remove flag check in safe rx_burst function.
> By this way, at most one more hotplug alarm cycle is necessary
> to get the sub-device clean for actual removal.
> 
> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>

Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

> ---
>  drivers/net/failsafe/failsafe_rxtx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
> index 7311421..70157c8 100644
> --- a/drivers/net/failsafe/failsafe_rxtx.c
> +++ b/drivers/net/failsafe/failsafe_rxtx.c
> @@ -43,7 +43,8 @@
>  {
>  	return (ETH(sdev) == NULL) ||
>  		(ETH(sdev)->rx_pkt_burst == NULL) ||
> -		(sdev->state != DEV_STARTED);
> +		(sdev->state != DEV_STARTED) ||
> +		(sdev->remove != 0);
>  }
>  
>  static inline int
> -- 
> 1.8.3.1
> 

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race
  2017-10-26 16:20 ` Gaëtan Rivet
@ 2017-10-26 19:10   ` Ferruh Yigit
  0 siblings, 0 replies; 3+ messages in thread
From: Ferruh Yigit @ 2017-10-26 19:10 UTC (permalink / raw)
  To: Gaëtan Rivet, Matan Azrad; +Cc: dev, stable

On 10/26/2017 9:20 AM, Gaëtan Rivet wrote:
> Hello Matan,
> 
> I think the commit log could be shorter.
> Proposing this, feel free to expand it if you prefer.
> 
> ---8<---
> 
> When removing a device, the fail-safe checks that it is not within its
> datapath before cleaning it.
> 
> When checking whether an Rx burst should be performed on a device, the
> remove flag is not checked. Thus the port could still enter its datapath
> and miss a removal round. Furthermore, there is a race between the
> thread removing the device and the polling thread.
> 
> Check the remove flag before entering a sub-device Rx burst when in safe
> mode. This check mitigates the aforementioned race condition.
> 
> --->8---
> 
> Otherwise,
> 
> On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote:
>> In case of plug out, the RMV interrupt callback sets the remove flag of
>> the removed sub-device. The next hotplug alarm cycle should read this
>> flag and if the data path are clean it should remove the sub-device.
>>
>> In case of fail-safe RX burst calling from application, fail-afe tries
>> to call to all STARTED sub-device rx_burst functions. The remove flag
>> is not checked here and fail-safe may call to the removed sub-device
>> rx_burst function.
>>
>> The above 2 cases run in different threads and there is a race between
>> the removed sub-device RX clean check to the removed sub-device
>> rx_burst call makes the sub device RX unclean.
>>
>> If the application calls to rx_burst in loop, the probability to get RX
>> clean is not enough, especially when there are few sub-devices or if the
>> rx_burst function of the removed sub-device takes a lot of time.
>>
>> Each time the sub-device data path is unclean, the second oportunity to
>> check it again should be only in the hotplug alarm next cycle; the
>> default time between cycles is 2 seconds.
>>
>> In this loop when fail-safe tries to remove the sub-device, the
>> sub-device may appear back and fail-safe cannot plug it in back until
>> the removal process is completted. In this time fail-safe may lose the
>> primary sub-device services and may hurt application performance.
>>
>> This patch adds a remove flag check in safe rx_burst function.
>> By this way, at most one more hotplug alarm cycle is necessary
>> to get the sub-device clean for actual removal.
>>
>> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Matan Azrad <matan@mellanox.com>
> 
> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>

Applied to dpdk-next-net/master, thanks.

(used suggested commit log, thanks.)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-10-26 19:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-22  5:51 [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race Matan Azrad
2017-10-26 16:20 ` Gaëtan Rivet
2017-10-26 19:10   ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).