From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B8D2641D73; Thu, 2 Mar 2023 13:09:11 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A2C1E42BD9; Thu, 2 Mar 2023 13:09:00 +0100 (CET) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 64F2340E09 for ; Thu, 2 Mar 2023 13:08:58 +0100 (CET) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4PS8vX6zs5zSkTL; Thu, 2 Mar 2023 20:06:00 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Thu, 2 Mar 2023 20:08:55 +0800 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.021; Thu, 2 Mar 2023 13:08:53 +0100 From: Konstantin Ananyev To: Fengchengwen , "thomas@monjalon.net" , "ferruh.yigit@amd.com" , "Andrew Rybchenko" , Kalesh AP , Ajit Khaparde CC: "dev@dpdk.org" Subject: RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode Thread-Topic: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode Thread-Index: AQHZS+uvQnCnmh+sv0S0/W2/2wgHVa7nZ26w Date: Thu, 2 Mar 2023 12:08:53 +0000 Message-ID: <0f387ca1eee34a7f92745de7b59a71a1@huawei.com> References: <20230301030610.49468-1-fengchengwen@huawei.com> <20230301030610.49468-2-fengchengwen@huawei.com> In-Reply-To: <20230301030610.49468-2-fengchengwen@huawei.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.206.138.42] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > In the proactive error handling mode, the PMD will set the data path > pointers to dummy functions and then try recovery, in this period the > application may still invoking data path API. This will introduce a > race-condition with data path which may lead to crash [1]. >=20 > Although the PMD added delay after setting data path pointers to cover > the above race-condition, it reduces the probability, but it doesn't > solve the problem. >=20 > To solve the race-condition problem fundamentally, the following > requirements are added: > 1. The PMD should set the data path pointers to dummy functions after > report RTE_ETH_EVENT_ERR_RECOVERING event. > 2. The application should stop data path API invocation when process > the RTE_ETH_EVENT_ERR_RECOVERING event. > 3. The PMD should set the data path pointers to valid functions before > report RTE_ETH_EVENT_RECOVERY_SUCCESS event. > 4. The application should enable data path API invocation when process > the RTE_ETH_EVENT_RECOVERY_SUCCESS event. >=20 > Also, this patch introduce a driver internal function > rte_eth_fp_ops_setup which used as an help function for PMD. >=20 > [1] http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2= -ashok.k.kaladi@intel.com/ >=20 > Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode") > Cc: stable@dpdk.org >=20 > Signed-off-by: Chengwen Feng > --- > doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++--------- > lib/ethdev/ethdev_driver.c | 8 +++++++ > lib/ethdev/ethdev_driver.h | 10 ++++++++ > lib/ethdev/rte_ethdev.h | 32 +++++++++++++++---------- > lib/ethdev/version.map | 1 + > 5 files changed, 46 insertions(+), 25 deletions(-) >=20 > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_gu= ide/poll_mode_drv.rst > index c145a9066c..e380ff135a 100644 > --- a/doc/guides/prog_guide/poll_mode_drv.rst > +++ b/doc/guides/prog_guide/poll_mode_drv.rst > @@ -638,14 +638,9 @@ different from the application invokes recovery in P= ASSIVE mode, > the PMD automatically recovers from error in PROACTIVE mode, > and only a small amount of work is required for the application. >=20 > -During error detection and automatic recovery, > -the PMD sets the data path pointers to dummy functions > -(which will prevent the crash), > -and also make sure the control path operations fail with a return code `= `-EBUSY``. > - > -Because the PMD recovers automatically, > -the application can only sense that the data flow is disconnected for a = while > -and the control API returns an error in this period. > +During error detection and automatic recovery, the PMD sets the data pat= h > +pointers to dummy functions and also make sure the control path operatio= ns > +failed with a return code ``-EBUSY``. >=20 > In order to sense the error happening/recovering, > as well as to restore some additional configuration, > @@ -653,9 +648,9 @@ three events are available: >=20 > ``RTE_ETH_EVENT_ERR_RECOVERING`` > Notify the application that an error is detected > - and the recovery is being started. > + and the recovery is about to start. > Upon receiving the event, the application should not invoke > - any control path function until receiving > + any control and data path API until receiving > ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or ``RTE_ETH_EVENT_RECOVERY_FAILED= `` event. >=20 > .. note:: > @@ -666,8 +661,9 @@ three events are available: >=20 > ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` > Notify the application that the recovery from error is successful, > - the PMD already re-configures the port, > - and the effect is the same as a restart operation. > + the PMD already re-configures the port. > + The application should restore some additional configuration, and the= n > + enable data path API invocation. >=20 > ``RTE_ETH_EVENT_RECOVERY_FAILED`` > Notify the application that the recovery from error failed, > diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c > index 0be1e8ca04..f994653fe9 100644 > --- a/lib/ethdev/ethdev_driver.c > +++ b/lib/ethdev/ethdev_driver.c > @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev *dev,= const char *ring_name, > return rc; > } >=20 > +void > +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) > +{ > + if (dev =3D=3D NULL) > + return; > + eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); > +} > + > const struct rte_memzone * > rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring= _name, > uint16_t queue_id, size_t size, unsigned int align, > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h > index 2c9d615fb5..0d964d1f67 100644 > --- a/lib/ethdev/ethdev_driver.h > +++ b/lib/ethdev/ethdev_driver.h > @@ -1621,6 +1621,16 @@ int > rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char *nam= e, > uint16_t queue_id); >=20 > +/** > + * @internal > + * Setup eth fast-path API to ethdev values. > + * > + * @param dev > + * Pointer to struct rte_eth_dev. > + */ > +__rte_internal > +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev); > + > /** > * @internal > * Atomically set the link status for the specific device. > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > index 049641d57c..44ee7229c1 100644 > --- a/lib/ethdev/rte_ethdev.h > +++ b/lib/ethdev/rte_ethdev.h > @@ -3944,25 +3944,28 @@ enum rte_eth_event_type { > */ > RTE_ETH_EVENT_RX_AVAIL_THRESH, > /** Port recovering from a hardware or firmware error. > - * If PMD supports proactive error recovery, > - * it should trigger this event to notify application > - * that it detected an error and the recovery is being started. > - * Upon receiving the event, the application should not invoke any cont= rol path API > - * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving > - * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED even= t. > - * The PMD will set the data path pointers to dummy functions, > - * and re-set the data path pointers to non-dummy functions > - * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. > - * It means that the application cannot send or receive any packets > - * during this period. > + * > + * If PMD supports proactive error recovery, it should trigger this > + * event to notify application that it detected an error and the > + * recovery is about to start. > + * > + * Upon receiving the event, the application should not invoke any > + * control and data path API until receiving > + * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED > + * event. > + * > + * Once this event is reported, the PMD will set the data path pointers > + * to dummy functions, and re-set the data path pointers to valid > + * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. > + * > * @note Before the PMD reports the recovery result, > * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again, > * because a larger error may occur during the recovery. > */ > RTE_ETH_EVENT_ERR_RECOVERING, > /** Port recovers successfully from the error. > - * The PMD already re-configured the port, > - * and the effect is the same as a restart operation. > + * > + * The PMD already re-configured the port: > * a) The following operation will be retained: (alphabetically) > * - DCB configuration > * - FEC configuration > @@ -3989,6 +3992,9 @@ enum rte_eth_event_type { > * (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP) > * c) Any other configuration will not be stored > * and will need to be re-configured. > + * > + * The application should restore some additional configuration > + * (see above case b/c), and then enable data path API invocation. > */ > RTE_ETH_EVENT_RECOVERY_SUCCESS, > /** Port recovery failed. > diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map > index 357d1a88c0..c273e0bdae 100644 > --- a/lib/ethdev/version.map > +++ b/lib/ethdev/version.map > @@ -320,6 +320,7 @@ INTERNAL { > rte_eth_devices; > rte_eth_dma_zone_free; > rte_eth_dma_zone_reserve; > + rte_eth_fp_ops_setup; > rte_eth_hairpin_queue_peer_bind; > rte_eth_hairpin_queue_peer_unbind; > rte_eth_hairpin_queue_peer_update; > -- =20 Acked-by: Konstantin Ananyev > 2.17.1