From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9614AA0093; Thu, 23 Jun 2022 17:59:00 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7AA564067B; Thu, 23 Jun 2022 17:59:00 +0200 (CEST) Received: from mail-108-mta68.mxroute.com (mail-108-mta68.mxroute.com [136.175.108.68]) by mails.dpdk.org (Postfix) with ESMTP id 79A6140146 for ; Thu, 23 Jun 2022 17:58:59 +0200 (CEST) Received: from filter006.mxroute.com ([140.82.40.27] filter006.mxroute.com) (Authenticated sender: mN4UYu2MZsgR) by mail-108-mta68.mxroute.com (ZoneMTA) with ESMTPSA id 1819149bcbb00028a7.004 for (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256); Thu, 23 Jun 2022 15:58:57 +0000 X-Zone-Loop: 247519dece6ee005ef708ed45c0db3695ff010261d70 X-Originating-IP: [140.82.40.27] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ashroe.eu; s=x; h=Content-Type:MIME-Version:Message-ID:In-reply-to:Date:Subject:Cc:To: From:References:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=MeXd7JL8n4WuqKHMxok3LquU1Gbmm+0hymx1DgFIoZE=; b=hbkDdTjxFSHecOksCavuzUnbrH N2ceJPS0T1yFjd+/mAgb9LuVHhZdZpmWUPRkvY4ZAmygp6W0Yj+FCnJhNl699w8cnqHp+O1886dL8 o9zFcHx9o7gx1RNpx2dXtmpgHwLo98RiZEcZR335LIXeXEYUPcI26iB8FHSOBw5UmOqbBLmVN36Cl 89o5mo25kzo3GaFlRrd4N0YMH04OOSct2lGBQqg3fDJq+tKt8rOaHgwFCOlSNtjBZ1CHJ9h0TVgPs 4UbVw6i5CzUdwi8mA1qgOqTmfytbbbJCeeYA1KP20uJPj6gMeiE28mwUfR2xCzKjBBmxTS/WTBwwG iED+GrxA==; References: <20220128124830.427-1-kalesh-anakkur.purayil@broadcom.com> <20220616094122.1909-1-fengchengwen@huawei.com> <20220616094122.1909-2-fengchengwen@huawei.com> User-agent: mu4e 1.6.10; emacs 27.1 From: Ray Kinsella To: Chengwen Feng Cc: thomas@monjalon.net, ferruh.yigit@xilinx.com, dev@dpdk.org, kalesh-anakkur.purayil@broadcom.com, somnath.kotur@broadcom.com, ajit.khaparde@broadcom.com, Andrew.Rybchenko@oktetlabs.ru Subject: Re: [PATCH v8 1/4] ethdev: support device error recovery notification Date: Thu, 23 Jun 2022 16:58:33 +0100 In-reply-to: <20220616094122.1909-2-fengchengwen@huawei.com> Message-ID: <87edzf4bi9.fsf@mdr78.vserver.site> MIME-Version: 1.0 Content-Type: text/plain X-AuthUser: mdr@ashroe.eu X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Chengwen Feng writes: > From: Kalesh AP > > Some PMDs (e.g. hns3) could detect hardware or firmware errors, and try > to recover from the errors. In this process, the PMD sets the data path > pointers to dummy functions (which will prevent the crash), and also > make sure the control path operations failed with retcode -EBUSY. > > Also in this process, from the perspective of application, services are > affected. For example, the Rx/Tx bust APIs cannot receive and send > packets, and the control plane API return failure. > > In some service scenarios, application needs to be aware of the event > to determine whether to migrate services. So three events were > introduced: > > 1. RTE_ETH_EVENT_ERR_RECOVERING: the PMD must trigger this event to > notify the application that it detected a hardware or firmware error > and tries to recover. > 2. RTE_ETH_EVENT_RECOVER_SUCCESS: the PMD must trigger this event to > notify the application that it has recovered from the error. And PMD > already re-configures the port to the state prior to the error. > 3. RTE_ETH_EVENT_RECOVER_FAILED: the PMD must trigger this event to > notify the application that it has failed to recover from the error. > The port may not be usable anymore. > > Note: the error recovery of these events is mainly performed by the > PMD. Unlike the RTE_ETH_EVENT_INTR_RESET which the error recovery is > performed by the application. The PMD must ensure that the above two > error handling methods cannot be used at the same time. > > Signed-off-by: Kalesh AP > Signed-off-by: Somnath Kotur > Signed-off-by: Chengwen Feng > Reviewed-by: Ajit Khaparde > --- > doc/guides/prog_guide/poll_mode_drv.rst | 32 +++++++++++++++++++++++++ > doc/guides/rel_notes/release_22_07.rst | 11 +++++++++ > lib/ethdev/rte_ethdev.h | 6 +++++ > 3 files changed, 49 insertions(+) > > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst > index 9d081b1cba..6398917485 100644 > --- a/doc/guides/prog_guide/poll_mode_drv.rst > +++ b/doc/guides/prog_guide/poll_mode_drv.rst > @@ -627,3 +627,35 @@ by application. > The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger > the application to handle reset event. It is duty of application to > handle all synchronization before it calls rte_eth_dev_reset(). > + > +Error Recovery Notification > +~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Some PMDs (e.g. hns3) could detect hardware or firmware errors, and try to > +recover from the errors. In this process, the PMD sets the data path pointers > +to dummy functions (which will prevent the crash), and also make sure the > +control path operations failed with retcode -EBUSY. > + > +Also in this process, from the perspective of application, services are > +affected. For example, the Rx/Tx bust APIs cannot receive and send packets, > +and the control plane API return failure. > + > +In some service scenarios, application needs to be aware of the event to > +determine whether to migrate services. So three events was introduced. > + > +The PMD must trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the > +application that it detected a hardware or firmware error and tries to recover. > + > +The PMD must trigger RTE_ETH_EVENT_RECOVER_SUCCESS event to notify the > +application that it has recovered from the error. And PMD already re-configures > +the port to the state prior to the error. > + > +The PMD must trigger RTE_ETH_EVENT_RECOVER_FAILED event to notify the > +application that it has failed to recover from the error. The port may not be > +usable anymore. > + > +.. note:: > + The error recovery of these events is mainly performed by the PMD. > + Unlike the RTE_ETH_EVENT_INTR_RESET which the error recovery is > + performed by the application. The PMD must ensure that the above two > + error handling methods cannot be used at the same time. > diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst > index 6fc044edaa..b237bd3303 100644 > --- a/doc/guides/rel_notes/release_22_07.rst > +++ b/doc/guides/rel_notes/release_22_07.rst > @@ -108,6 +108,17 @@ New Features > > Added an API which can get the device type of vDPA device. > > +* **Added error recover notification.** > + > + Added error recover notification to application including: > + > + * Added new event: ``RTE_ETH_EVENT_ERR_RECOVERING`` for the PMD to report > + that the port is recovering from an error. > + * Added new event: ``RTE_ETH_EVENT_RECOVER_SUCCESS`` for the PMD to report > + that the port recover successful from an error. RTE_ETH_EVENT_RECOVERY_SUCCESS > + * Added new event: ``RTE_ETH_EVENT_RECOVER_FAILED`` for the PMD to > report RTE_ETH_EVENT_RECOVERY_FAILED > + that the prot recover failed from an error to report that port recovery failed > + > * **Updated Amazon ena driver.** > > The new driver version (v2.7.0) includes: > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > index 045ee64747..6998f6f0be 100644 > --- a/lib/ethdev/rte_ethdev.h > +++ b/lib/ethdev/rte_ethdev.h > @@ -3928,6 +3928,12 @@ enum rte_eth_event_type { > * @see rte_eth_rx_avail_thresh_set() > */ > RTE_ETH_EVENT_RX_AVAIL_THRESH, > + /** Port recovering from a hardware or firmware error */ > + RTE_ETH_EVENT_ERR_RECOVERING, > + /** Port recovers successful from the error */ > + RTE_ETH_EVENT_RECOVER_SUCCESS, RTE_ETH_EVENT_RECOVERY_SUCCESS > + /** Port recovers failed from the error */ > + RTE_ETH_EVENT_RECOVER_FAILED, RTE_ETH_EVENT_RECOVERY_FAILED > RTE_ETH_EVENT_MAX /**< max value of this enum */ > }; -- Regards, Ray K