From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EC30041DFD; Tue, 7 Mar 2023 09:39:06 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DA5D840E03; Tue, 7 Mar 2023 09:39:06 +0100 (CET) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by mails.dpdk.org (Postfix) with ESMTP id 1DFC14067E for ; Tue, 7 Mar 2023 09:39:05 +0100 (CET) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4PW83Y54QLzrS6H; Tue, 7 Mar 2023 16:38:17 +0800 (CST) Received: from [10.67.100.224] (10.67.100.224) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Tue, 7 Mar 2023 16:39:02 +0800 Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode To: Honnappa Nagarahalli , Konstantin Ananyev , "dev@dpdk.org" , "thomas@monjalon.net" , Ferruh Yigit , Andrew Rybchenko , Kalesh AP , "Ajit Khaparde (ajit.khaparde@broadcom.com)" CC: nd References: <20230301030610.49468-1-fengchengwen@huawei.com> <20230301030610.49468-2-fengchengwen@huawei.com> From: fengchengwen Message-ID: <95edd6ca-fe1f-fd7c-719f-0a9e6d7c45b5@huawei.com> Date: Tue, 7 Mar 2023 16:39:02 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.100.224] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2023/3/7 13:34, Honnappa Nagarahalli wrote: > > >> -----Original Message----- >> From: Konstantin Ananyev >> Sent: Sunday, March 5, 2023 9:24 AM >> To: Honnappa Nagarahalli ; >> dev@dpdk.org; Chengwen Feng ; >> thomas@monjalon.net; Ferruh Yigit ; Andrew >> Rybchenko ; Kalesh AP > anakkur.purayil@broadcom.com>; Ajit Khaparde >> (ajit.khaparde@broadcom.com) >> Cc: nd >> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling >> mode >> >> >>>>>> >>>>>> In the proactive error handling mode, the PMD will set the data >>>>>> path pointers to dummy functions and then try recovery, in this >>>>>> period the application may still invoking data path API. This will >>>>>> introduce a race-condition with data path which may lead to crash [1]. >>>>>> >>>>>> Although the PMD added delay after setting data path pointers to >>>>>> cover the above race-condition, it reduces the probability, but it >>>>>> doesn't solve the problem. >>>>>> >>>>>> To solve the race-condition problem fundamentally, the following >>>>>> requirements are added: >>>>>> 1. The PMD should set the data path pointers to dummy functions after >>>>>> report RTE_ETH_EVENT_ERR_RECOVERING event. >>>>> Do you mean to say, PMD should set the data path pointers after >>>>> calling the >>>> call back function? >>>>> The PMD is running in the context of multiple EAL threads. How do >>>>> these >>>> threads synchronize such that only one thread sets these data pointers? >>>> >>>> As I understand this event callback supposed to be called in the >>>> context of EAL interrupt thread (whoever is more familiar with >>>> original idea, feel free to correct me if I missed something). >>> I could not figure this out. It looks to be called from the data plane thread >> context. >>> I also have a thought on alternate design at the end, appreciate if you can >> take a look. >>> >>>> How it is going to signal data-path threads that they need to >>>> stop/suspend calling data-path API - that's I suppose is left to application >> to decide... >>>> Same as right now it is application responsibility to stop data-path >>>> threads before doing dev_stop()/dev/_config()/etc. >>> Ok, good, this expectation is not new. The application must have a >> mechanism already. >>> >>>> >>>> >>>>> >>>>>> 2. The application should stop data path API invocation when process >>>>>> the RTE_ETH_EVENT_ERR_RECOVERING event. >>>>> Any thoughts on how an application can do this? >>> We can ignore this question as there is already similar expectation set for >> earlier functionalities. >>> >>>>> >>>>>> 3. The PMD should set the data path pointers to valid functions before >>>>>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>> 4. The application should enable data path API invocation when process >>>>>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>> Do you mean to say that the application should not call the datapath >>>>> APIs >>>> while the PMD is running the recovery process? >>>> >>>> Yes, I believe that's the intention. >>> Ok, this is good and makes sense. >>> >>>> >>>>>> >>>>>> Also, this patch introduce a driver internal function >>>>>> rte_eth_fp_ops_setup which used as an help function for PMD. >>>>>> >>>>>> [1] >>>>>> >>>> >> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2 >>>>>> - >>>>>> ashok.k.kaladi@intel.com/ >>>>>> >>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode") >>>>>> Cc: stable@dpdk.org >>>>>> >>>>>> Signed-off-by: Chengwen Feng >>>>>> --- >>>>>> doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++--------- >>>>>> lib/ethdev/ethdev_driver.c | 8 +++++++ >>>>>> lib/ethdev/ethdev_driver.h | 10 ++++++++ >>>>>> lib/ethdev/rte_ethdev.h | 32 +++++++++++++++---------- >>>>>> lib/ethdev/version.map | 1 + >>>>>> 5 files changed, 46 insertions(+), 25 deletions(-) >>>>>> >>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst >>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst >>>>>> index c145a9066c..e380ff135a 100644 >>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst >>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst >>>>>> @@ -638,14 +638,9 @@ different from the application invokes >>>>>> recovery in PASSIVE mode, the PMD automatically recovers from >>>>>> error in PROACTIVE mode, and only a small amount of work is >>>>>> required for the >>>> application. >>>>>> >>>>>> -During error detection and automatic recovery, -the PMD sets the >>>>>> data path pointers to dummy functions -(which will prevent the >>>>>> crash), -and also make sure the control path operations fail with a >>>>>> return >>>> code ``-EBUSY``. >>>>>> - >>>>>> -Because the PMD recovers automatically, -the application can only >>>>>> sense that the data flow is disconnected for a while -and the >>>>>> control API returns an error in this period. >>>>>> +During error detection and automatic recovery, the PMD sets the >>>>>> +data path pointers to dummy functions and also make sure the >>>>>> +control path operations failed with a return code ``-EBUSY``. >>>>>> >>>>>> In order to sense the error happening/recovering, as well as to >>>>>> restore some additional configuration, @@ -653,9 +648,9 @@ three >>>>>> events >>>> are available: >>>>>> >>>>>> ``RTE_ETH_EVENT_ERR_RECOVERING`` >>>>>> Notify the application that an error is detected >>>>>> - and the recovery is being started. >>>>>> + and the recovery is about to start. >>>>>> Upon receiving the event, the application should not invoke >>>>>> - any control path function until receiving >>>>>> + any control and data path API until receiving >>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or >>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event. >>>>>> >>>>>> .. note:: >>>>>> @@ -666,8 +661,9 @@ three events are available: >>>>>> >>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` >>>>>> Notify the application that the recovery from error is successful, >>>>>> - the PMD already re-configures the port, >>>>>> - and the effect is the same as a restart operation. >>>>>> + the PMD already re-configures the port. >>>>>> + The application should restore some additional configuration, >>>>>> + and then >>>>> What is the additional configuration? Is this specific to each NIC/PMD? >>>>> I thought, this is an auto recovery process and the application does >>>>> not require >>>> to reconfigure anything. If the application has to restore the >>>> configuration, how does auto recovery differ from typical recovery >> process? >>>>> >>>>>> + enable data path API invocation. >>>>>> >>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` >>>>>> Notify the application that the recovery from error failed, >>>>>> diff --git a/lib/ethdev/ethdev_driver.c >>>>>> b/lib/ethdev/ethdev_driver.c index >>>>>> 0be1e8ca04..f994653fe9 100644 >>>>>> --- a/lib/ethdev/ethdev_driver.c >>>>>> +++ b/lib/ethdev/ethdev_driver.c >>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct >> rte_eth_dev >>>>>> *dev, const char *ring_name, >>>>>> return rc; >>>>>> } >>>>>> >>>>>> +void >>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) { >>>>>> + if (dev == NULL) >>>>>> + return; >>>>>> + eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); } >>>>>> + >>>>>> const struct rte_memzone * >>>>>> rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const >>>>>> char *ring_name, >>>>>> uint16_t queue_id, size_t size, unsigned int align, diff >> - >>>> -git >>>>>> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index >>>>>> 2c9d615fb5..0d964d1f67 100644 >>>>>> --- a/lib/ethdev/ethdev_driver.h >>>>>> +++ b/lib/ethdev/ethdev_driver.h >>>>>> @@ -1621,6 +1621,16 @@ int >>>>>> rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const >>>>>> char *name, >>>>>> uint16_t queue_id); >>>>>> >>>>>> +/** >>>>>> + * @internal >>>>>> + * Setup eth fast-path API to ethdev values. >>>>>> + * >>>>>> + * @param dev >>>>>> + * Pointer to struct rte_eth_dev. >>>>>> + */ >>>>>> +__rte_internal >>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev); >>>>>> + >>>>>> /** >>>>>> * @internal >>>>>> * Atomically set the link status for the specific device. >>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h >>>>>> index >>>>>> 049641d57c..44ee7229c1 100644 >>>>>> --- a/lib/ethdev/rte_ethdev.h >>>>>> +++ b/lib/ethdev/rte_ethdev.h >>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type { >>>>>> */ >>>>>> RTE_ETH_EVENT_RX_AVAIL_THRESH, >>>>>> /** Port recovering from a hardware or firmware error. >>>>>> - * If PMD supports proactive error recovery, >>>>>> - * it should trigger this event to notify application >>>>>> - * that it detected an error and the recovery is being started. >>>>>> - * Upon receiving the event, the application should not invoke any >>>>>> control path API >>>>>> - * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving >>>>>> - * RTE_ETH_EVENT_RECOVERY_SUCCESS or >>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event. >>>>>> - * The PMD will set the data path pointers to dummy functions, >>>>>> - * and re-set the data path pointers to non-dummy functions >>>>>> - * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>> - * It means that the application cannot send or receive any packets >>>>>> - * during this period. >>>>>> + * >>>>>> + * If PMD supports proactive error recovery, it should trigger this >>>>>> + * event to notify application that it detected an error and the >>>>>> + * recovery is about to start. >>>>>> + * >>>>>> + * Upon receiving the event, the application should not invoke any >>>>>> + * control and data path API until receiving >>>>>> + * RTE_ETH_EVENT_RECOVERY_SUCCESS or >>>>>> RTE_ETH_EVENT_RECOVERY_FAILED >>>>>> + * event. >>>>>> + * >>>>>> + * Once this event is reported, the PMD will set the data path pointers >>>>>> + * to dummy functions, and re-set the data path pointers to valid >>>>>> + * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS >>>>>> event. >>>>> Why do we need to set the data path pointers to dummy functions if >>>>> the >>>> application is restricted from invoking any control and data path >>>> APIs till the recovery process is completed? >>>> >>>> You are right, in theory it is not mandatory. >>>> Though it helps to flag a problem if user will still try to call them >>>> while recovery is in progress. >>> Ok, may be in debug mode. >>> I mean, we have already set an expectation to the application that it should >> not call and the application has implemented a method to do the same. Why >> do we need to complicate this? >>> If the application calls the APIs, it is a programming error. >> >> >> My preference would be to keep it this way for both debug and non-debug >> mode. >> It doesn't cost anything to us in terms of perfomance, but helps to catch >> problems with wrong behaving app. > > This is also causing a synchronization problem. i.e. if this has to be done correctly, we need to use correct synchronization mechanisms. > We cannot set the function pointers and assume that data will be visible to other threads/cores in the correct order. > A possible mechanism (though I see some problems with this) could be to use a guard variable, which indicates when it is safe to use the function pointers on the data plane threads. This would require a load-acquire in the data plane threads. > >> >>> >>>> Again, same as we doing in dev_stop(). >>> >>>> >>>>> >>>>>> + * >>>>>> * @note Before the PMD reports the recovery result, >>>>>> * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event >>>>>> again, >>>>>> * because a larger error may occur during the recovery. >>>>>> */ >>>>>> RTE_ETH_EVENT_ERR_RECOVERING, >>>>> I understand this is not a change in this patch. But, just >>>>> wondering, what is the >>>> purpose of this? How is the application supposed to use this? >>>>> >>>>>> /** Port recovers successfully from the error. >>>>>> - * The PMD already re-configured the port, >>>>>> - * and the effect is the same as a restart operation. >>>>>> + * >>>>>> + * The PMD already re-configured the port: >>>>>> * a) The following operation will be retained: (alphabetically) >>>>>> * - DCB configuration >>>>>> * - FEC configuration >>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type { >>>>>> * (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP) >>>>>> * c) Any other configuration will not be stored >>>>>> * and will need to be re-configured. >>>>>> + * >>>>>> + * The application should restore some additional configuration >>>>>> + * (see above case b/c), and then enable data path API invocation. >>>>>> */ >>>>>> RTE_ETH_EVENT_RECOVERY_SUCCESS, >>>>>> /** Port recovery failed. >>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index >>>>>> 357d1a88c0..c273e0bdae 100644 >>>>>> --- a/lib/ethdev/version.map >>>>>> +++ b/lib/ethdev/version.map >>>>>> @@ -320,6 +320,7 @@ INTERNAL { >>>>>> rte_eth_devices; >>>>>> rte_eth_dma_zone_free; >>>>>> rte_eth_dma_zone_reserve; >>>>>> + rte_eth_fp_ops_setup; >>>>>> rte_eth_hairpin_queue_peer_bind; >>>>>> rte_eth_hairpin_queue_peer_unbind; >>>>>> rte_eth_hairpin_queue_peer_update; >>>>>> -- >>>>>> 2.17.1 >>>>> >>> >>> Is there any reason not to design this in the same way as >> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself? >> >> I suppose it is a question for the authors of original patch... > Appreciate if the authors could comment on this. The main cause is that the hardware implementation limit, I will try to explain from hns3 PMD's view. For a global reset, all the function need responsed within a centain period of time. otherwise, the reset will fail. and also the reset requirement a few steps (all may take a long time). When with multiple functions in one DPDK, and trigger a global reset, the rte_eth_dev_reset will not cover this scene: 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread. 2. then invoke application callback, but due to the same thread, and each port's recover will take a long time, so later port will reset failed. > >> >>> We could have a similar API 'rte_eth_dev_recover' to do the recovery >> functionality. >> >> I suppose such approach is also possible. >> Personally I am fine with both ways: either existing one or what you propose, >> as long as we'll fix existing race-condition. >> What is good with what you suggest - that way we probably don't need to >> worry how to allow user to enable/disable auto-recovery inside PMD. >> >> Konstantin >> >