From: Chengwen Feng <fengchengwen@huawei.com>
To: <thomas@monjalon.net>, <ferruh.yigit@xilinx.com>, <ferruh.yigit@amd.com>
Cc: <dev@dpdk.org>, <kalesh-anakkur.purayil@broadcom.com>,
<somnath.kotur@broadcom.com>, <ajit.khaparde@broadcom.com>,
<mdr@ashroe.eu>, <Andrew.Rybchenko@oktetlabs.ru>
Subject: [PATCH v9 2/5] ethdev: support proactive error handling mode
Date: Thu, 22 Sep 2022 07:41:48 +0000 [thread overview]
Message-ID: <20220922074151.39450-3-fengchengwen@huawei.com> (raw)
In-Reply-To: <20220922074151.39450-1-fengchengwen@huawei.com>
From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Some PMDs (e.g. hns3) could detect hardware or firmware errors, and try
to recover from the errors. In this process, the PMD sets the data path
pointers to dummy functions (which will prevent the crash), and also
make sure the control path operations failed with retcode -EBUSY.
The above error handling mode is known as
RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE (proactive error handling mode).
In some service scenarios, application needs to be aware of the event
to determine whether to migrate services. So three events were
introduced:
1) RTE_ETH_EVENT_ERR_RECOVERING: used to notify the application that it
detected an error and the recovery is being started. Upon receiving the
event, the application should not invoke any control path APIs until
receiving RTE_ETH_EVENT_RECOVERY_SUCCESS or
RTE_ETH_EVENT_RECOVERY_FAILED event.
2) RTE_ETH_EVENT_RECOVERY_SUCCESS: used to notify the application that
it recovers successful from the error, the PMD already re-configures
the port to the state prior to the error.
3) RTE_ETH_EVENT_RECOVERY_FAILED: used to notify the application that it
recovers failed from the error, the port should not usable anymore. The
application should close the port.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
---
app/test-pmd/config.c | 2 ++
doc/guides/prog_guide/poll_mode_drv.rst | 39 +++++++++++++++++++++++++
doc/guides/rel_notes/release_22_11.rst | 12 ++++++++
lib/ethdev/rte_ethdev.h | 33 +++++++++++++++++++++
4 files changed, 86 insertions(+)
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 0c10c663e9..b716d2a15f 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -924,6 +924,8 @@ port_infos_display(portid_t port_id)
}
if (dev_info.err_handle_mode == RTE_ETH_ERROR_HANDLE_MODE_PASSIVE)
printf("Device error handling mode: passive\n");
+ else if (dev_info.err_handle_mode == RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE)
+ printf("Device error handling mode: proactive\n");
}
void
diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index 9d081b1cba..232dc459b0 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -627,3 +627,42 @@ by application.
The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
the application to handle reset event. It is duty of application to
handle all synchronization before it calls rte_eth_dev_reset().
+
+The above error handling mode is known as ``RTE_ETH_ERROR_HANDLE_MODE_PASSIVE``.
+
+Proactive Error Handling Mode
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If PMD supports ``RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE``, it means once detect
+hardware or firmware errors, the PMD will try to recover from the errors. In
+this process, the PMD sets the data path pointers to dummy functions (which
+will prevent the crash), and also make sure the control path operations failed
+with retcode -EBUSY.
+
+Also in this process, from the perspective of application, services are
+affected. For example, the Rx/Tx bust APIs cannot receive and send packets,
+and the control plane API return failure.
+
+In some service scenarios, application needs to be aware of the event to
+determine whether to migrate services. So three events were introduced:
+
+* RTE_ETH_EVENT_ERR_RECOVERING: used to notify the application that it detected
+ an error and the recovery is being started. Upon receiving the event, the
+ application should not invoke any control path APIs until receiving
+ RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event.
+
+
+* RTE_ETH_EVENT_RECOVERY_SUCCESS: used to notify the application that it
+ recovers successful from the error, the PMD already re-configures the port to
+ the state prior to the error.
+
+* RTE_ETH_EVENT_RECOVERY_FAILED: used to notify the application that it
+ recovers failed from the error, the port should not usable anymore. the
+ application should close the port.
+
+.. note::
+ * Before the PMD reports the recovery result, the PMD may report the
+ ``RTE_ETH_EVENT_ERR_RECOVERING`` event again, because a larger error
+ may occur during the recovery.
+ * The error handling mode supported by the PMD can be reported through
+ the ``rte_eth_dev_info_get`` API.
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf050..fc85e5fa87 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,18 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Added proactive error handling mode for ethdev.**
+
+ Added proactive error handling mode for ethdev, and three event were
+ introduced:
+
+ * Added new event: ``RTE_ETH_EVENT_ERR_RECOVERING`` for the PMD to report
+ that the port is recovering from an error.
+ * Added new event: ``RTE_ETH_EVENT_RECOVER_SUCCESS`` for the PMD to report
+ that the port recover successful from an error.
+ * Added new event: ``RTE_ETH_EVENT_RECOVER_FAILED`` for the PMD to report
+ that the prot recover failed from an error.
+
Removed Items
-------------
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 930b0a2fff..d3e81b98a7 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1859,6 +1859,12 @@ enum rte_eth_err_handle_mode {
* application invoke @see rte_eth_dev_reset to recover the port.
*/
RTE_ETH_ERROR_HANDLE_MODE_PASSIVE,
+ /** Proactive error handling, after the PMD detect that a reset is
+ * required, the PMD reports @see RTE_ETH_EVENT_ERR_RECOVERING event,
+ * and do recovery internally, finally, reports the recovery result
+ * event (@see RTE_ETH_EVENT_RECOVERY_*).
+ */
+ RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE,
};
/**
@@ -3944,6 +3950,33 @@ enum rte_eth_event_type {
* @see rte_eth_rx_avail_thresh_set()
*/
RTE_ETH_EVENT_RX_AVAIL_THRESH,
+ /** Port recovering from a hardware or firmware error.
+ * If PMD supports proactive error recovery, it should trigger this
+ * event to notify application that it detected an error and the
+ * recovery is being started. Upon receiving the event, the application
+ * should not invoke any control path APIs (such as
+ * rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
+ * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
+ * event.
+ * The PMD will set the data path pointers to dummy functions, and
+ * re-set the data patch pointers to non-dummy functions before reports
+ * RTE_ETH_EVENT_RECOVERY_SUCCESS event. It means that the application
+ * cannot send or receive any packets during this period.
+ * @note Before the PMD reports the recovery result, the PMD may report
+ * the RTE_ETH_EVENT_ERR_RECOVERING event again, because a larger error
+ * may occur during the recovery.
+ */
+ RTE_ETH_EVENT_ERR_RECOVERING,
+ /** Port recovers successful from the error.
+ * The PMD already re-configures the port to the state prior to the
+ * error.
+ */
+ RTE_ETH_EVENT_RECOVERY_SUCCESS,
+ /** Port recovers failed from the error.
+ * It means that the port should not usable anymore. The application
+ * should close the port.
+ */
+ RTE_ETH_EVENT_RECOVERY_FAILED,
RTE_ETH_EVENT_MAX /**< max value of this enum */
};
--
2.17.1
next prev parent reply other threads:[~2022-09-22 7:48 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220128124831.427-1-kalesh-anakkur.purayil@broadcom.com>
2022-09-22 7:41 ` [PATCH v9 0/5] support " Chengwen Feng
2022-09-22 7:41 ` [PATCH v9 1/5] ethdev: support get port " Chengwen Feng
2022-10-03 17:35 ` Ferruh Yigit
2022-10-05 1:56 ` fengchengwen
2022-09-22 7:41 ` Chengwen Feng [this message]
2022-10-03 17:35 ` [PATCH v9 2/5] ethdev: support proactive " Ferruh Yigit
2022-09-22 7:41 ` [PATCH v9 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-09-22 7:41 ` [PATCH v9 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-09-22 7:41 ` [PATCH v9 5/5] net/bnxt: " Chengwen Feng
2022-10-09 7:53 ` [PATCH v10 0/5] support " Chengwen Feng
2022-10-09 7:53 ` [PATCH v10 1/5] ethdev: support get port " Chengwen Feng
2022-10-09 7:53 ` [PATCH v10 2/5] ethdev: support proactive " Chengwen Feng
2022-10-09 7:53 ` [PATCH v10 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-10-09 7:53 ` [PATCH v10 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-10-09 7:53 ` [PATCH v10 5/5] net/bnxt: " Chengwen Feng
2022-10-09 9:10 ` [PATCH v11 0/5] support " Chengwen Feng
2022-10-09 9:10 ` [PATCH v11 1/5] ethdev: support get port " Chengwen Feng
2022-10-10 8:38 ` Andrew Rybchenko
2022-10-10 8:44 ` Andrew Rybchenko
2022-10-09 9:10 ` [PATCH v11 2/5] ethdev: support proactive " Chengwen Feng
2022-10-10 8:47 ` Andrew Rybchenko
2022-10-11 14:48 ` fengchengwen
2022-10-09 9:10 ` [PATCH v11 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-10-09 9:10 ` [PATCH v11 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-10-09 11:05 ` Dongdong Liu
2022-10-09 9:10 ` [PATCH v11 5/5] net/bnxt: " Chengwen Feng
2022-10-12 3:45 ` [PATCH v12 0/5] support " Chengwen Feng
2022-10-12 3:45 ` [PATCH v12 1/5] ethdev: add error handling mode to device info Chengwen Feng
2022-10-12 3:45 ` [PATCH v12 2/5] ethdev: support proactive error handling mode Chengwen Feng
2022-10-13 8:58 ` Andrew Rybchenko
2022-10-13 12:50 ` fengchengwen
2022-10-12 3:45 ` [PATCH v12 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-10-12 3:45 ` [PATCH v12 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-10-12 3:45 ` [PATCH v12 5/5] net/bnxt: " Chengwen Feng
2022-10-13 12:42 ` [PATCH v13 0/5] support " Chengwen Feng
2022-10-13 12:42 ` [PATCH v13 1/5] ethdev: add error handling mode to device info Chengwen Feng
2022-10-13 12:42 ` [PATCH v13 2/5] ethdev: support proactive error handling mode Chengwen Feng
2022-10-13 12:42 ` [PATCH v13 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-10-13 12:42 ` [PATCH v13 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-10-13 12:42 ` [PATCH v13 5/5] net/bnxt: " Chengwen Feng
2022-10-17 7:42 ` [PATCH v13 0/5] support " Andrew Rybchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220922074151.39450-3-fengchengwen@huawei.com \
--to=fengchengwen@huawei.com \
--cc=Andrew.Rybchenko@oktetlabs.ru \
--cc=ajit.khaparde@broadcom.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@amd.com \
--cc=ferruh.yigit@xilinx.com \
--cc=kalesh-anakkur.purayil@broadcom.com \
--cc=mdr@ashroe.eu \
--cc=somnath.kotur@broadcom.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).