From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id AF6B62C55; Wed, 29 Nov 2017 20:17:52 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Nov 2017 11:17:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,338,1508828400"; d="scan'208";a="7337058" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.241.225.149]) ([10.241.225.149]) by FMSMGA003.fm.intel.com with ESMTP; 29 Nov 2017 11:17:50 -0800 To: =?UTF-8?Q?Ga=c3=abtan_Rivet?= , Ophir Munk Cc: "dev@dpdk.org" , Thomas Monjalon , Olga Shern , "stable@dpdk.org" , Matan Azrad References: <1506203877-2090-1-git-send-email-ophirmu@mellanox.com> <1507243328-11287-1-git-send-email-ophirmu@mellanox.com> <20171020103518.GH3596@bidouze.vm.6wind.com> <20171023083612.GK3596@bidouze.vm.6wind.com> From: Ferruh Yigit Message-ID: <32c10eb9-fa51-a409-4720-6a92c3b97398@intel.com> Date: Wed, 29 Nov 2017 11:17:50 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20171023083612.GK3596@bidouze.vm.6wind.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v3] net/failsafe: fix calling device during RMV events X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Nov 2017 19:17:53 -0000 On 10/23/2017 1:36 AM, Gaëtan Rivet wrote: > On Mon, Oct 23, 2017 at 07:17:41AM +0000, Ophir Munk wrote: >> Hi Gaetan, >> Thanks for your quick reply. Please see comments inline. >> >> Regards, >> Ophir >> >>> -----Original Message----- >>> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com] >>> Sent: Friday, October 20, 2017 1:35 PM >>> To: Ophir Munk >>> Cc: dev@dpdk.org; Thomas Monjalon ; Olga Shern >>> ; stable@dpdk.org >>> Subject: Re: [PATCH v3] net/failsafe: fix calling device during RMV events >>> >>> Hi Ophir, >>> >>> Sorry about the delay, >>> I have a few remarks, I think this patch could be simpler. >>> >>> First, about the commit logline: >>> "calling device" is not descriptive enough. I'd suggest >>> >>> net/failsafe: fix device configuration during RMV events >>> >>> But I'm not a native speaker either, so use it if you think it is better, or don't, >>> it's only a suggestion :). >>> >>> On Thu, Oct 05, 2017 at 10:42:08PM +0000, Ophir Munk wrote: >>>> This commit prevents control path operations from failing after a sub >>>> device removal. >>>> >>>> Following are the failure steps: >>>> 1. The physical device is removed due to change in one of PF >>>> parameters (e.g. MTU) 2. The interrupt thread flags the device 3. >>>> Within 2 seconds Interrupt thread initializes the actual device >>>> removal, then every 2 seconds it tries to re-sync (plug in) the >>>> device. The trials fail as long as VF parameter mismatches the PF >>> parameter. >>>> 4. A control thread initiates a control operation on failsafe which >>>> initiates this operation on the device. >>>> 5. A race condition occurs between the control thread and interrupt >>>> thread when accessing the device data structures. >>>> >>>> This commit prevents the race condition in step 5. Before this commit >>>> if a device was removed and then a control thread operation was >>>> initiated on failsafe - in some cases failsafe called the sub device >>>> operation instead of avoiding it. Such cases could lead to operations >>> failures. >>>> >>> >>> This is a nitpick, but as said earlier, this is not preventing the race condition. >>> This race is still present and can still wreak havok on unsuspecting users. >>> >>> If an application has a weak threading model, it will be subject to this race >>> condition still. It is possible to prevent it fully with proper care from the >>> application standpoint, but this is not specific to fail-safe and does not >>> concern us here. >>> >>> Anyway, it's really a nitpick, I just wanted to point it out. This is not too >>> important for this patch. >>> >>>> This commit fixes failsafe criteria to determine when the device is >>>> removed such that it will avoid calling the sub device operations >>>> during that time and will only call them otherwise. >>>> >>>> Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD") >>>> Cc: stable@dpdk.org >>>> >>>> Signed-off-by: Ophir Munk >>>> --- >>>> v3: >>>> 1. Rebase v2 >>>> >>>> 2. Please ignore checkpatch checks on arguments re-usage - they are >>> confirmed. >>>> CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side- >>> effects? >>>> #217: FILE: drivers/net/failsafe/failsafe_private.h:241: >>>> >>>> 3. Add rationales (copy from an email which accompanied v2): >>>> >>>> On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote: >>>>> >>>>> Hi Ophir, >>>>> >>>>> On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote: >>>>>> This commit prevents control path operations from failing after a >>>>>> sub device has informed failsafe it has been removed. >>>>>> >>>>>> Before this commit if a device was removed and then a control path >>>>> >>>>> Here are the steps if I understood correctly: >>>>> >>>>> 0. The physical device is removed >>>>> 1. The interrupt thread flags the device 2. A control lcore >>>>> initiates a control operation 3. The alarm triggers, waking up the eal-intr- >>> thread, >>>>> initiating the actual device removal. >>>>> 4. Race condition occurs between control lcore and interrupt thread. >>>>> >>>>> "if a device was removed" is ambiguous I think (are we speaking >>>>> about the physical port? Is it only flagged? Is it after the removal of the >>> device itself?). >>>>> From the context I gather that you mean the device is flagged to be >>>>> removed, but it won't be as clear in a few month when we revisit this bug >>> :) . >>>>> >>>>> Could you please rephrase this so that the whole context of the >>>>> issue is available? >>>>> >>>> >>>> Done. Commit message was rephrased based on your comments >>>> >>>>>> operations was initiated on failsafe - in some cases failsafe >>>>>> called the sub device operation instead of avoiding it. Such cases >>>>>> could lead to operations failures. >>>>>> >>>>>> This commit fixes failsafe criteria to determine when the device >>>>>> is removed such that it will avoid calling the sub device >>>>>> operations during that time and will only call them otherwise. >>>>>> >>>>> >>>>> This commit mitigates the race condition, reducing the probability >>>>> for it to have an effect. It does not, however, remove this race >>>>> condition, which is inherent to the DPDK architecture at the moment. >>>>> >>>>> A proper fix, a more detailed workaround and additional >>>>> documentation warning users writing applications to mind their threads >>> could be interesting. >>>>> >>>> >>>> The race condition occurs in the last step and may lead to >>>> segmentation faults (accessing data structures of the same device by 2 >>>> threads) The previous steps ("the physical device is removed", etc) were not >>> recreated and tested but probably cannot lead to segmentation fault. >>>> >>>>> But let's focus on this patch for the time being. >>>>> >>>>>> Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD") >>>>>> Cc: stable@dpdk.org >>>>>> >>>>>> Signed-off-by: Ophir Munk >>>>>> --- >>>>>> drivers/net/failsafe/failsafe_ether.c | 1 + >>>>>> drivers/net/failsafe/failsafe_ops.c | 52 >>>>> +++++++++++++++++++++++++++++------ >>>>>> 2 files changed, 45 insertions(+), 8 deletions(-) >>>>>> >>>>>> diff --git a/drivers/net/failsafe/failsafe_ether.c >>>>>> b/drivers/net/failsafe/failsafe_ether.c >>>>>> index a3a8cce..1def110 100644 >>>>>> --- a/drivers/net/failsafe/failsafe_ether.c >>>>>> +++ b/drivers/net/failsafe/failsafe_ether.c >>>>>> @@ -378,6 +378,7 @@ >>>>> >>>>> Could you please generate your patches with the function name in the >>> diff? >>>> >>>> Done >>>> >>>>> >>>>>> i); >>>>>> goto err_remove; >>>>>> } >>>>>> + sdev->remove = 0; >>>>> >>>>> You are adding this here, within failsafe_eth_dev_state_sync, and >>>>> removing it from the dev_configure ops. >>>>> >>>>> 10 lines above, the call to dev_configure is done, meaning that the >>>>> remove flag was resetted at this point. >>>>> >>>>> Can you explain why you prefer resetting the flag here? >>>>> >>>>> The position of this flag reset will be dependent upon my subsequent >>>>> remarks anyway, so hold that thought :) . >>>>> >>>> >>>> The motivation for resetting the "remove" flag within >>> failsafe_eth_dev_state_sync is as follows: >>>> Previously to this patch the "remove" flag was designed to signal the need >>> to remove the sub device. >>>> Once the sub device was removed and before being reconfigured the >>> "remove" flag was reset. >>>> >>>> After this patch the scope of the "remove" flag was *extended* to >>>> indicate the sub device status as being "plugged out" by resetting this flag >>> only after a successful call to failsafe_eth_dev_state_sync(). >>>> The "plug out" status could last a very long time (seconds, minutes, days, >>> weeks, ...). >>>> >>>> Previously to this patch failsafe based the "plugged out" status on >>>> the sub device state as being below ACTIVE however every 2 seconds >>>> dev_configure() was called where the sub device was assigned sdev- >>>>> state = DEV_ACTIVE; therefore the sub device state became ACTIVE for >>> some time every 2 seconds. >>>> This is where the race condition occurred: failsafe considered the sub >>>> device as "Plugged in" for some time every 2 seconds (based on its ACTIVE >>> state) while it was actually plugged out. >>>> >>>> After this patch the "Plugged out" status is based on the "remove" flag. >>>> >>> >>> Sorry, I do not agree with this semantical change on the "remove" flag. >>> You are essentially adding a new device state, which could be fine per se, but >>> should not be done here. >>> >>> The enum dev_state is there for this purpose. >>> >>> The flag dev->remove, calls for an operation to be done upon the concerned >>> device. It is not meant to become a new device state. >>> >>> A point about the work methodoly here: if you wanted to change this >>> semantic, which could be legitimate and sometimes called for, you should >>> have proposed it either during a discussion in a response to my previous >>> email, or introducing the change as a separate patch. This point is important >>> enough for it to have its own patch, meaning we would have a whole thread >>> dedicated to it instead of having to interleave commentaries between >>> related-but-separate diffs on the code. >>> >>> But anyway, if you think you need to express a PLUGOUT state, I'd suggest >>> adding a state between DEV_UNDEFINED and DEV_PARSED. >>> DEV_UNDEFINED means that the device is in limbo and has no existence per >>> se (its parsing failed for example, it is not clear whether the parameters are >>> correct, etc...). DEV_PLUGOUT could mean then that the device has been >>> successfully probed at least once, meaning that it could possibly have >>> residuals from this probing still there, or specific care to be taken when >>> manipulating it. >>> >>> However, I'm not yet convinced that this new state is necessary. I think you >>> can mitigate this race condition without having to add it. If you insist in >>> introducing this state, please do so in a separate patch, with proper >>> definition about the meaning of this state: >>> >>> + When it should be valid for a device to be in this state. >>> + Which operation corresponds to getting into and out of this state. >>> + Why this state is interesting and what could not be expressed before >>> that is thus being fixed by introducing this state. >>> >>> But please verify twice whether you absolutely need to complexify the >>> current fail-safe internals before going all in and basing your work upon it :) >>> >> >> Indeed what I am currently missing in failsafe is knowing the device is in a PLUGOUT state. >> Your suggestion to add a new state DEV_PLUGOUT cannot be used with the current implementation >> as the device states are modified during an alarm hotplug handling every 2 seconds. >> In fs_hotplug_alarm() we call failsafe_eth_dev_state_sync(dev) which eventually calls ->dev_configure(dev) >> where we assign: sdev->state = DEV_ACTIVE; >> then when sync fails fs_hotplug_alarm() calls failsafe_dev_remove(dev) which will call fs_dev_remove(sdev); where the sub devices >> states are changed from ACTIVE down to DEV_UNDEFINED. >> Having said that it means that during a PLUGOUT event - the states are modified with each invocation of the fs_hotplug_alarm >> every 2 seconds. So even if we added DEV_PLUGOUT state - it will not remain fixed during the hotplug alarm handling. >> I have also verified all of this with printouts. >> When seeing a sub device in state "DEV_ACTIVE" - we cannot tell whether the device is currently in "PLUGOUT situation" or "PLUGIN situation" >> This allows operations such as fs_mtu_set() on sub-devices which are in "PLUGOUT situation" while their state is >> DEV_ACTIVE to be manipulated, which I think should have been avoided. >> >> Please note fs_mtu_set() implementation: >> tatic int fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) >> { >> .. >> FOREACH_SUBDEV_ACTIVE(sdev, i, dev) { >> // ***** We are here while the device can be in a "PLUGOUT situation" *** >> >> To summarize: >> I am missing a way to know in failsafe that a sub-device is currently plugged out > > (sdev->state < DEV_ACTIVE && !sdev->remove) means that the device is > plugged out. > >> 1. I suggested extending the "remove" flag scope for this purpose. It has minimal changes with current failsafe implementation. You prefer not using "remove". > > I prefer using it, but as a flag, not as a device state. > >> 2. You suggested adding a new state DEV_PLUGOUT. I don't think it will work with current implementation (as explained above) or may require a redesign of current implementation. >> > > I do not suggest adding a new state DEV_PLUGOUT. > I suggest using sdev->remove properly. > >> Can you suggest another way? >> > > 0. In a separate commit, move the > sdev->remove = 0; > from fs_dev_configure, into the case DEV_UNDEFINED of the switch > within fs_dev_remove. This is cleaner and more logical anyway. > > 1. Check that sdev->remove is not set in fs_find_next. > If sdev->remove is set, then the device should be skipped. > > 2. In failsafe_dev_remove, do not use the FOREACH_SUBDEV_STATE iterator, > but manually iterate over all sub-devices using the subs_tail and > subs_head values. > As the generic iterator would skip over devices which have > sdev->remove set, this function would not work anymore. > > 3. Find the places I have missed that needs this manual iterator to be > used instead of the FOREACH_SUBDEV{,_STATE} ones. I think there is at > least other place that I cannot recall, there might be more. > > If you think this does not work, please tell me why, because then it > means that I have misunderstood something about the race condition you > are trying to fix. Reminder of this patch remaining from previous release. >