From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 549A241E05; Tue, 7 Mar 2023 13:27:05 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 372A640E03; Tue, 7 Mar 2023 13:27:05 +0100 (CET) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by mails.dpdk.org (Postfix) with ESMTP id 29F734067E for ; Tue, 7 Mar 2023 13:27:02 +0100 (CET) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4PWF7K1VfhzKmM2; Tue, 7 Mar 2023 20:26:53 +0800 (CST) Received: from [10.67.100.224] (10.67.100.224) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Tue, 7 Mar 2023 20:26:59 +0800 Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode To: Ferruh Yigit , Konstantin Ananyev , Ajit Khaparde CC: Konstantin Ananyev , Thomas Monjalon , Andrew Rybchenko , "dev@dpdk.org" References: <20230301030610.49468-1-fengchengwen@huawei.com> <20230301030610.49468-2-fengchengwen@huawei.com> <0f387ca1eee34a7f92745de7b59a71a1@huawei.com> <18c5b676-ae72-e646-89af-d6cd636d923f@yandex.ru> <5b40049f-6f22-fc3f-b13e-da1793c2fd1a@amd.com> <9092fdae9c1d4c53a00a8f23eb1129ec@huawei.com> <0effdaa9-5045-0635-775c-6e1eda0d5dba@amd.com> <9dc5d714-07a4-c32e-e557-efe8e7fe2d16@huawei.com> <5b110017-8679-4603-5c25-742ed83e4bda@amd.com> From: fengchengwen Message-ID: <16463f17-6f71-308b-c27b-9134346002c9@huawei.com> Date: Tue, 7 Mar 2023 20:26:59 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <5b110017-8679-4603-5c25-742ed83e4bda@amd.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.100.224] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2023/3/7 20:07, Ferruh Yigit wrote: > On 3/7/2023 8:25 AM, fengchengwen wrote: >> >> >> On 2023/3/6 19:13, Konstantin Ananyev wrote: >>> >>> >>>>>>>>>>> In the proactive error handling mode, the PMD will set the data path >>>>>>>>>>> pointers to dummy functions and then try recovery, in this period the >>>>>>>>>>> application may still invoking data path API. This will introduce a >>>>>>>>>>> race-condition with data path which may lead to crash [1]. >>>>>>>>>>> >>>>>>>>>>> Although the PMD added delay after setting data path pointers to cover >>>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't >>>>>>>>>>> solve the problem. >>>>>>>>>>> >>>>>>>>>>> To solve the race-condition problem fundamentally, the following >>>>>>>>>>> requirements are added: >>>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after >>>>>>>>>>> report RTE_ETH_EVENT_ERR_RECOVERING event. >>>>>>>>>>> 2. The application should stop data path API invocation when process >>>>>>>>>>> the RTE_ETH_EVENT_ERR_RECOVERING event. >>>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before >>>>>>>>>>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>>> 4. The application should enable data path API invocation when process >>>>>>>>>>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>>> >>>>>>>>> >>>>>>>>> How this is solving the race-condition, by pushing responsibility to >>>>>>>>> stop data path to application? >>>>>>>> >>>>>>>> Exactly, it becomes application responsibility to make sure data-path is >>>>>>>> stopped/suspended before recovery will continue. >>>>>>>> >>>>>>> >>>>>>> From documentation of the feature: >>>>>>> >>>>>>> `` >>>>>>> Because the PMD recovers automatically, >>>>>>> the application can only sense that the data flow is disconnected for a >>>>>>> while and the control API returns an error in this period. >>>>>>> >>>>>>> In order to sense the error happening/recovering, as well as to restore >>>>>>> some additional configuration, three events are available: >>>>>>> `` >>>>>>> >>>>>>> It looks like initial design is to use events mainly inform application >>>>>>> about what happened and mainly for re-configuration. >>>>>>> >>>>>>> Although I am don't disagree to involve the application, I am not sure >>>>>>> that is part of current design. >>>>>> >>>>>> I thought we all agreed that initial design contain some fallacies that >>>>>> need to fixed, no? >>>>>> Statement that with current rte_ethdev design error recovery can be done >>>>>> without interaction with the app (to stop/suspend data/control path) >>>>>> is the main one I think. >>>>>> It needs some interaction with app layer, one way or another. >>>>>> >>>>>>>>> >>>>>>>>> What if application is not interested in recovery modes at all and not >>>>>>>>> registered any callback for the recovery? >>>>>>>> >>>>>>>> >>>>>>>> Are you saying there is no way for application to disable >>>>>>>> automatic recovery in PMD if it is not interested >>>>>>>> (or can't full-fill per-requesties for it)? >>>>>>>> If so, then yes it is a problem and we need to fix it. >>>>>>>> I assumed that such mechanism to disable unwanted events already exists, >>>>>>>> but I can't find anything. >>>>>>>> Wonder what would be the easiest way here - can PMD make a decision >>>>>>>> based on callback return value, or do we need a new API to >>>>>>>> enable/disable callbacks, or ...? >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> As far as I can see automatic recovery is not configurable by app. >>>>>>> >>>>>>> But that is not all, PMD sends events to application but PMD can't know >>>>>>> if application is handling them or not, so with current design PMD can't >>>>>>> rely on to app. >>>>>> >>>>>> Well, PMD invokes user provided callback. >>>>>> One way to fix that problem - if there is no callback provided, >>>>>> or callback returns an error code - PMD can assume that recovery >>>>>> should not be done. >>>>>> That is probably not the best design choice, but at least it will allow >>>>>> to fix the problem without too many changes and introducing new API. >>>>>> That could be sort of a 'quick fix'. >>>>>> In a meanwhile we can think about new/better approach for that. >>>>>> >>>>> >>>>> -rc2 for 23.03 is a few days away. >>>>> >>>>> What do you think to have 'quick fix' as modifying how driver updates >>>>> burst ops to prevent the race condition, for this release? >> >> The 'quick fix', do you mean only update function pointer (without rxq setting) ? >> Currently the PMDs which announced support "proactive error handling mode" already >> do this. >> > > Yes. > I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()' > updates all fields in 'rte_eth_fp_ops' but only function pointer seems > changed in the driver, resulting only function pointers to be updated. > > The discussion about race condition started with patch [1], which > mentions a crash because of a race condition. Later in discussions, > recovery event given as a sample for where the race can occur, that is > why we are here. > > But after above info, although there is race condition and a bigger > update (that needs application involvement) is required for recovery > mechanism, there is no crash and NO 'quick fix' is required for recovery. > > @Konstantin, @Chengwen, can you please confirm above understanding is > correct? Yes, that's what. > > > > [1] > https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/ > >>>>> >>>>> And plan a design update for the next release? >>>> +1 on the overall approach. >>> >>> Yep, agree. >> >> Hope for better solution. >> And also, I notice only the openvswitch (from all open-source software which based-on DPDK) >> registers RTE_ETH_EVENT_INTR_RESET callback . >> >> Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible >> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism. >> >>> >>>> >>>>> >>>>> >>>>>>> >>>>>>>>> I think driver should not rely on application for this, unless >>>>>>>>> application explicitly says (to driver) that it is handling recovery, >>>>>>>>> right now there is no way for driver to know this. >>>>>>>> >>>>>>>> I think it is visa-versa: >>>>>>>> application should not enable auto-recovery if it can't meet >>>>>>>> per-requeststies for it (provide appropriate callback). >>>>>>>> >>>>>>> >>>>>>> I agree on above, we are saying similar thing in different perspective. >>>>>> >>>>>> Ok, that's good we are on the same page. >>>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>>> Also, this patch introduce a driver internal function >>>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/ >>>>>>>>>>> >>>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode") >>>>>>>>>>> Cc: stable@dpdk.org >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Chengwen Feng >>>>>>>>>>> --- >>>>>>>>>>> doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++--------- >>>>>>>>>>> lib/ethdev/ethdev_driver.c | 8 +++++++ >>>>>>>>>>> lib/ethdev/ethdev_driver.h | 10 ++++++++ >>>>>>>>>>> lib/ethdev/rte_ethdev.h | 32 >>>>>>>>>>> +++++++++++++++---------- >>>>>>>>>>> lib/ethdev/version.map | 1 + >>>>>>>>>>> 5 files changed, 46 insertions(+), 25 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>>> index c145a9066c..e380ff135a 100644 >>>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery >>>>>>>>>>> in PASSIVE mode, >>>>>>>>>>> the PMD automatically recovers from error in PROACTIVE mode, >>>>>>>>>>> and only a small amount of work is required for the application. >>>>>>>>>>> >>>>>>>>>>> -During error detection and automatic recovery, >>>>>>>>>>> -the PMD sets the data path pointers to dummy functions >>>>>>>>>>> -(which will prevent the crash), >>>>>>>>>>> -and also make sure the control path operations fail with a return >>>>>>>>>>> code ``-EBUSY``. >>>>>>>>>>> - >>>>>>>>>>> -Because the PMD recovers automatically, >>>>>>>>>>> -the application can only sense that the data flow is disconnected >>>>>>>>>>> for a while >>>>>>>>>>> -and the control API returns an error in this period. >>>>>>>>>>> +During error detection and automatic recovery, the PMD sets the >>>>>>>>>>> data path >>>>>>>>>>> +pointers to dummy functions and also make sure the control path >>>>>>>>>>> operations >>>>>>>>>>> +failed with a return code ``-EBUSY``. >>>>>>>>>>> >>>>>>>>>>> In order to sense the error happening/recovering, >>>>>>>>>>> as well as to restore some additional configuration, >>>>>>>>>>> @@ -653,9 +648,9 @@ three events are available: >>>>>>>>>>> >>>>>>>>>>> ``RTE_ETH_EVENT_ERR_RECOVERING`` >>>>>>>>>>> Notify the application that an error is detected >>>>>>>>>>> - and the recovery is being started. >>>>>>>>>>> + and the recovery is about to start. >>>>>>>>>>> Upon receiving the event, the application should not invoke >>>>>>>>>>> - any control path function until receiving >>>>>>>>>>> + any control and data path API until receiving >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event. >>>>>>>>>>> >>>>>>>>>>> .. note:: >>>>>>>>>>> @@ -666,8 +661,9 @@ three events are available: >>>>>>>>>>> >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` >>>>>>>>>>> Notify the application that the recovery from error is successful, >>>>>>>>>>> - the PMD already re-configures the port, >>>>>>>>>>> - and the effect is the same as a restart operation. >>>>>>>>>>> + the PMD already re-configures the port. >>>>>>>>>>> + The application should restore some additional configuration, >>>>>>>>>>> and then >>>>>>>>>>> + enable data path API invocation. >>>>>>>>>>> >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` >>>>>>>>>>> Notify the application that the recovery from error failed, >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c >>>>>>>>>>> index 0be1e8ca04..f994653fe9 100644 >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c >>>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev >>>>>>>>>>> *dev, const char *ring_name, >>>>>>>>>>> return rc; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> +void >>>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) >>>>>>>>>>> +{ >>>>>>>>>>> + if (dev == NULL) >>>>>>>>>>> + return; >>>>>>>>>>> + eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> const struct rte_memzone * >>>>>>>>>>> rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char >>>>>>>>>>> *ring_name, >>>>>>>>>>> uint16_t queue_id, size_t size, unsigned int align, >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h >>>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644 >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h >>>>>>>>>>> @@ -1621,6 +1621,16 @@ int >>>>>>>>>>> rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const >>>>>>>>>>> char *name, >>>>>>>>>>> uint16_t queue_id); >>>>>>>>>>> >>>>>>>>>>> +/** >>>>>>>>>>> + * @internal >>>>>>>>>>> + * Setup eth fast-path API to ethdev values. >>>>>>>>>>> + * >>>>>>>>>>> + * @param dev >>>>>>>>>>> + * Pointer to struct rte_eth_dev. >>>>>>>>>>> + */ >>>>>>>>>>> +__rte_internal >>>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev); >>>>>>>>>>> + >>>>>>>>>>> /** >>>>>>>>>>> * @internal >>>>>>>>>>> * Atomically set the link status for the specific device. >>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h >>>>>>>>>>> index 049641d57c..44ee7229c1 100644 >>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h >>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h >>>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type { >>>>>>>>>>> */ >>>>>>>>>>> RTE_ETH_EVENT_RX_AVAIL_THRESH, >>>>>>>>>>> /** Port recovering from a hardware or firmware error. >>>>>>>>>>> - * If PMD supports proactive error recovery, >>>>>>>>>>> - * it should trigger this event to notify application >>>>>>>>>>> - * that it detected an error and the recovery is being started. >>>>>>>>>>> - * Upon receiving the event, the application should not invoke >>>>>>>>>>> any control path API >>>>>>>>>>> - * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until >>>>>>>>>>> receiving >>>>>>>>>>> - * RTE_ETH_EVENT_RECOVERY_SUCCESS or >>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event. >>>>>>>>>>> - * The PMD will set the data path pointers to dummy functions, >>>>>>>>>>> - * and re-set the data path pointers to non-dummy functions >>>>>>>>>>> - * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>>> - * It means that the application cannot send or receive any >>>>>>>>>>> packets >>>>>>>>>>> - * during this period. >>>>>>>>>>> + * >>>>>>>>>>> + * If PMD supports proactive error recovery, it should trigger >>>>>>>>>>> this >>>>>>>>>>> + * event to notify application that it detected an error and the >>>>>>>>>>> + * recovery is about to start. >>>>>>>>>>> + * >>>>>>>>>>> + * Upon receiving the event, the application should not invoke any >>>>>>>>>>> + * control and data path API until receiving >>>>>>>>>>> + * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED >>>>>>>>>>> + * event. >>>>>>>>>>> + * >>>>>>>>>>> + * Once this event is reported, the PMD will set the data path >>>>>>>>>>> pointers >>>>>>>>>>> + * to dummy functions, and re-set the data path pointers to valid >>>>>>>>>>> + * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS >>>>>>>>>>> event. >>>>>>>>>>> + * >>>>>>>>>>> * @note Before the PMD reports the recovery result, >>>>>>>>>>> * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event >>>>>>>>>>> again, >>>>>>>>>>> * because a larger error may occur during the recovery. >>>>>>>>>>> */ >>>>>>>>>>> RTE_ETH_EVENT_ERR_RECOVERING, >>>>>>>>>>> /** Port recovers successfully from the error. >>>>>>>>>>> - * The PMD already re-configured the port, >>>>>>>>>>> - * and the effect is the same as a restart operation. >>>>>>>>>>> + * >>>>>>>>>>> + * The PMD already re-configured the port: >>>>>>>>>>> * a) The following operation will be retained: (alphabetically) >>>>>>>>>>> * - DCB configuration >>>>>>>>>>> * - FEC configuration >>>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type { >>>>>>>>>>> * (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP) >>>>>>>>>>> * c) Any other configuration will not be stored >>>>>>>>>>> * and will need to be re-configured. >>>>>>>>>>> + * >>>>>>>>>>> + * The application should restore some additional configuration >>>>>>>>>>> + * (see above case b/c), and then enable data path API invocation. >>>>>>>>>>> */ >>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_SUCCESS, >>>>>>>>>>> /** Port recovery failed. >>>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map >>>>>>>>>>> index 357d1a88c0..c273e0bdae 100644 >>>>>>>>>>> --- a/lib/ethdev/version.map >>>>>>>>>>> +++ b/lib/ethdev/version.map >>>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL { >>>>>>>>>>> rte_eth_devices; >>>>>>>>>>> rte_eth_dma_zone_free; >>>>>>>>>>> rte_eth_dma_zone_reserve; >>>>>>>>>>> + rte_eth_fp_ops_setup; >>>>>>>>>>> rte_eth_hairpin_queue_peer_bind; >>>>>>>>>>> rte_eth_hairpin_queue_peer_unbind; >>>>>>>>>>> rte_eth_hairpin_queue_peer_update; >>>>>>>>>>> -- >>>>>>>>>> Acked-by: Konstantin Ananyev >>>>>>>>>> >>>>>>>>>>> 2.17.1 >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> > > . >