* [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race
@ 2017-10-22 5:51 Matan Azrad
2017-10-26 16:20 ` Gaëtan Rivet
0 siblings, 1 reply; 3+ messages in thread
From: Matan Azrad @ 2017-10-22 5:51 UTC (permalink / raw)
To: Gaetan Rivet; +Cc: dev, stable
In case of plug out, the RMV interrupt callback sets the remove flag of
the removed sub-device. The next hotplug alarm cycle should read this
flag and if the data path are clean it should remove the sub-device.
In case of fail-safe RX burst calling from application, fail-afe tries
to call to all STARTED sub-device rx_burst functions. The remove flag
is not checked here and fail-safe may call to the removed sub-device
rx_burst function.
The above 2 cases run in different threads and there is a race between
the removed sub-device RX clean check to the removed sub-device
rx_burst call makes the sub device RX unclean.
If the application calls to rx_burst in loop, the probability to get RX
clean is not enough, especially when there are few sub-devices or if the
rx_burst function of the removed sub-device takes a lot of time.
Each time the sub-device data path is unclean, the second oportunity to
check it again should be only in the hotplug alarm next cycle; the
default time between cycles is 2 seconds.
In this loop when fail-safe tries to remove the sub-device, the
sub-device may appear back and fail-safe cannot plug it in back until
the removal process is completted. In this time fail-safe may lose the
primary sub-device services and may hurt application performance.
This patch adds a remove flag check in safe rx_burst function.
By this way, at most one more hotplug alarm cycle is necessary
to get the sub-device clean for actual removal.
Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/failsafe/failsafe_rxtx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
index 7311421..70157c8 100644
--- a/drivers/net/failsafe/failsafe_rxtx.c
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -43,7 +43,8 @@
{
return (ETH(sdev) == NULL) ||
(ETH(sdev)->rx_pkt_burst == NULL) ||
- (sdev->state != DEV_STARTED);
+ (sdev->state != DEV_STARTED) ||
+ (sdev->remove != 0);
}
static inline int
--
1.8.3.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race
2017-10-22 5:51 [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race Matan Azrad
@ 2017-10-26 16:20 ` Gaëtan Rivet
2017-10-26 19:10 ` Ferruh Yigit
0 siblings, 1 reply; 3+ messages in thread
From: Gaëtan Rivet @ 2017-10-26 16:20 UTC (permalink / raw)
To: Matan Azrad; +Cc: dev, stable
Hello Matan,
I think the commit log could be shorter.
Proposing this, feel free to expand it if you prefer.
---8<---
When removing a device, the fail-safe checks that it is not within its
datapath before cleaning it.
When checking whether an Rx burst should be performed on a device, the
remove flag is not checked. Thus the port could still enter its datapath
and miss a removal round. Furthermore, there is a race between the
thread removing the device and the polling thread.
Check the remove flag before entering a sub-device Rx burst when in safe
mode. This check mitigates the aforementioned race condition.
--->8---
Otherwise,
On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote:
> In case of plug out, the RMV interrupt callback sets the remove flag of
> the removed sub-device. The next hotplug alarm cycle should read this
> flag and if the data path are clean it should remove the sub-device.
>
> In case of fail-safe RX burst calling from application, fail-afe tries
> to call to all STARTED sub-device rx_burst functions. The remove flag
> is not checked here and fail-safe may call to the removed sub-device
> rx_burst function.
>
> The above 2 cases run in different threads and there is a race between
> the removed sub-device RX clean check to the removed sub-device
> rx_burst call makes the sub device RX unclean.
>
> If the application calls to rx_burst in loop, the probability to get RX
> clean is not enough, especially when there are few sub-devices or if the
> rx_burst function of the removed sub-device takes a lot of time.
>
> Each time the sub-device data path is unclean, the second oportunity to
> check it again should be only in the hotplug alarm next cycle; the
> default time between cycles is 2 seconds.
>
> In this loop when fail-safe tries to remove the sub-device, the
> sub-device may appear back and fail-safe cannot plug it in back until
> the removal process is completted. In this time fail-safe may lose the
> primary sub-device services and may hurt application performance.
>
> This patch adds a remove flag check in safe rx_burst function.
> By this way, at most one more hotplug alarm cycle is necessary
> to get the sub-device clean for actual removal.
>
> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
> Cc: stable@dpdk.org
>
> Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
> ---
> drivers/net/failsafe/failsafe_rxtx.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
> index 7311421..70157c8 100644
> --- a/drivers/net/failsafe/failsafe_rxtx.c
> +++ b/drivers/net/failsafe/failsafe_rxtx.c
> @@ -43,7 +43,8 @@
> {
> return (ETH(sdev) == NULL) ||
> (ETH(sdev)->rx_pkt_burst == NULL) ||
> - (sdev->state != DEV_STARTED);
> + (sdev->state != DEV_STARTED) ||
> + (sdev->remove != 0);
> }
>
> static inline int
> --
> 1.8.3.1
>
--
Gaëtan Rivet
6WIND
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race
2017-10-26 16:20 ` Gaëtan Rivet
@ 2017-10-26 19:10 ` Ferruh Yigit
0 siblings, 0 replies; 3+ messages in thread
From: Ferruh Yigit @ 2017-10-26 19:10 UTC (permalink / raw)
To: Gaëtan Rivet, Matan Azrad; +Cc: dev, stable
On 10/26/2017 9:20 AM, Gaëtan Rivet wrote:
> Hello Matan,
>
> I think the commit log could be shorter.
> Proposing this, feel free to expand it if you prefer.
>
> ---8<---
>
> When removing a device, the fail-safe checks that it is not within its
> datapath before cleaning it.
>
> When checking whether an Rx burst should be performed on a device, the
> remove flag is not checked. Thus the port could still enter its datapath
> and miss a removal round. Furthermore, there is a race between the
> thread removing the device and the polling thread.
>
> Check the remove flag before entering a sub-device Rx burst when in safe
> mode. This check mitigates the aforementioned race condition.
>
> --->8---
>
> Otherwise,
>
> On Sun, Oct 22, 2017 at 05:51:08AM +0000, Matan Azrad wrote:
>> In case of plug out, the RMV interrupt callback sets the remove flag of
>> the removed sub-device. The next hotplug alarm cycle should read this
>> flag and if the data path are clean it should remove the sub-device.
>>
>> In case of fail-safe RX burst calling from application, fail-afe tries
>> to call to all STARTED sub-device rx_burst functions. The remove flag
>> is not checked here and fail-safe may call to the removed sub-device
>> rx_burst function.
>>
>> The above 2 cases run in different threads and there is a race between
>> the removed sub-device RX clean check to the removed sub-device
>> rx_burst call makes the sub device RX unclean.
>>
>> If the application calls to rx_burst in loop, the probability to get RX
>> clean is not enough, especially when there are few sub-devices or if the
>> rx_burst function of the removed sub-device takes a lot of time.
>>
>> Each time the sub-device data path is unclean, the second oportunity to
>> check it again should be only in the hotplug alarm next cycle; the
>> default time between cycles is 2 seconds.
>>
>> In this loop when fail-safe tries to remove the sub-device, the
>> sub-device may appear back and fail-safe cannot plug it in back until
>> the removal process is completted. In this time fail-safe may lose the
>> primary sub-device services and may hurt application performance.
>>
>> This patch adds a remove flag check in safe rx_burst function.
>> By this way, at most one more hotplug alarm cycle is necessary
>> to get the sub-device clean for actual removal.
>>
>> Fixes: 72a57bfd9a0e ("net/failsafe: add fast burst functions")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Matan Azrad <matan@mellanox.com>
>
> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Applied to dpdk-next-net/master, thanks.
(used suggested commit log, thanks.)
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-10-26 19:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-22 5:51 [dpdk-dev] [PATCH] net/failsafe: fix Rx clean race Matan Azrad
2017-10-26 16:20 ` Gaëtan Rivet
2017-10-26 19:10 ` Ferruh Yigit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).