From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A2118A00BE; Thu, 16 Jun 2022 11:48:07 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E82E242C26; Thu, 16 Jun 2022 11:47:56 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 5FA8042BF3 for ; Thu, 16 Jun 2022 11:47:52 +0200 (CEST) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4LNy1q1h9BzSgxv; Thu, 16 Jun 2022 17:44:31 +0800 (CST) Received: from localhost.localdomain (10.67.165.24) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 16 Jun 2022 17:47:50 +0800 From: Chengwen Feng To: , CC: , , , , , Subject: [PATCH v8 1/4] ethdev: support device error recovery notification Date: Thu, 16 Jun 2022 17:41:19 +0800 Message-ID: <20220616094122.1909-2-fengchengwen@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20220616094122.1909-1-fengchengwen@huawei.com> References: <20220128124830.427-1-kalesh-anakkur.purayil@broadcom.com> <20220616094122.1909-1-fengchengwen@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.67.165.24] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Kalesh AP Some PMDs (e.g. hns3) could detect hardware or firmware errors, and try to recover from the errors. In this process, the PMD sets the data path pointers to dummy functions (which will prevent the crash), and also make sure the control path operations failed with retcode -EBUSY. Also in this process, from the perspective of application, services are affected. For example, the Rx/Tx bust APIs cannot receive and send packets, and the control plane API return failure. In some service scenarios, application needs to be aware of the event to determine whether to migrate services. So three events were introduced: 1. RTE_ETH_EVENT_ERR_RECOVERING: the PMD must trigger this event to notify the application that it detected a hardware or firmware error and tries to recover. 2. RTE_ETH_EVENT_RECOVER_SUCCESS: the PMD must trigger this event to notify the application that it has recovered from the error. And PMD already re-configures the port to the state prior to the error. 3. RTE_ETH_EVENT_RECOVER_FAILED: the PMD must trigger this event to notify the application that it has failed to recover from the error. The port may not be usable anymore. Note: the error recovery of these events is mainly performed by the PMD. Unlike the RTE_ETH_EVENT_INTR_RESET which the error recovery is performed by the application. The PMD must ensure that the above two error handling methods cannot be used at the same time. Signed-off-by: Kalesh AP Signed-off-by: Somnath Kotur Signed-off-by: Chengwen Feng Reviewed-by: Ajit Khaparde --- doc/guides/prog_guide/poll_mode_drv.rst | 32 +++++++++++++++++++++++++ doc/guides/rel_notes/release_22_07.rst | 11 +++++++++ lib/ethdev/rte_ethdev.h | 6 +++++ 3 files changed, 49 insertions(+) diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst index 9d081b1cba..6398917485 100644 --- a/doc/guides/prog_guide/poll_mode_drv.rst +++ b/doc/guides/prog_guide/poll_mode_drv.rst @@ -627,3 +627,35 @@ by application. The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger the application to handle reset event. It is duty of application to handle all synchronization before it calls rte_eth_dev_reset(). + +Error Recovery Notification +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Some PMDs (e.g. hns3) could detect hardware or firmware errors, and try to +recover from the errors. In this process, the PMD sets the data path pointers +to dummy functions (which will prevent the crash), and also make sure the +control path operations failed with retcode -EBUSY. + +Also in this process, from the perspective of application, services are +affected. For example, the Rx/Tx bust APIs cannot receive and send packets, +and the control plane API return failure. + +In some service scenarios, application needs to be aware of the event to +determine whether to migrate services. So three events was introduced. + +The PMD must trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the +application that it detected a hardware or firmware error and tries to recover. + +The PMD must trigger RTE_ETH_EVENT_RECOVER_SUCCESS event to notify the +application that it has recovered from the error. And PMD already re-configures +the port to the state prior to the error. + +The PMD must trigger RTE_ETH_EVENT_RECOVER_FAILED event to notify the +application that it has failed to recover from the error. The port may not be +usable anymore. + +.. note:: + The error recovery of these events is mainly performed by the PMD. + Unlike the RTE_ETH_EVENT_INTR_RESET which the error recovery is + performed by the application. The PMD must ensure that the above two + error handling methods cannot be used at the same time. diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 6fc044edaa..b237bd3303 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -108,6 +108,17 @@ New Features Added an API which can get the device type of vDPA device. +* **Added error recover notification.** + + Added error recover notification to application including: + + * Added new event: ``RTE_ETH_EVENT_ERR_RECOVERING`` for the PMD to report + that the port is recovering from an error. + * Added new event: ``RTE_ETH_EVENT_RECOVER_SUCCESS`` for the PMD to report + that the port recover successful from an error. + * Added new event: ``RTE_ETH_EVENT_RECOVER_FAILED`` for the PMD to report + that the prot recover failed from an error. + * **Updated Amazon ena driver.** The new driver version (v2.7.0) includes: diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index 045ee64747..6998f6f0be 100644 --- a/lib/ethdev/rte_ethdev.h +++ b/lib/ethdev/rte_ethdev.h @@ -3928,6 +3928,12 @@ enum rte_eth_event_type { * @see rte_eth_rx_avail_thresh_set() */ RTE_ETH_EVENT_RX_AVAIL_THRESH, + /** Port recovering from a hardware or firmware error */ + RTE_ETH_EVENT_ERR_RECOVERING, + /** Port recovers successful from the error */ + RTE_ETH_EVENT_RECOVER_SUCCESS, + /** Port recovers failed from the error */ + RTE_ETH_EVENT_RECOVER_FAILED, RTE_ETH_EVENT_MAX /**< max value of this enum */ }; -- 2.33.0