From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Cc: dpdk-dev <dev@dpdk.org>, Thomas Monjalon <thomas@monjalon.net>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	Konstantin Ananyev <konstantin.ananyev@intel.com>,
	Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>,
	<asafp@nvidia.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	"jerinj@marvell.com" <jerinj@marvell.com>
Subject: Re: [dpdk-dev] [PATCH v7 1/4] ethdev: support device reset and recovery events
Date: Tue, 1 Feb 2022 13:19:51 +0000	[thread overview]
Message-ID: <ff901ac4-afac-0153-a3c6-945ed2f62611@intel.com> (raw)
In-Reply-To: <CAH-L+nPyAwScpDBM2-jorpT7-8_+jKRmCPCsA5E650_6hY4+AQ@mail.gmail.com>
On 2/1/2022 1:09 PM, Kalesh Anakkur Purayil wrote:
> Thank you Ferruh for the review. Please see inline.
> 
> On Tue, Feb 1, 2022 at 5:41 PM Ferruh Yigit <ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>> wrote:
> 
>     On 1/28/2022 12:48 PM, Kalesh A P wrote:
>      > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com <mailto:kalesh-anakkur.purayil@broadcom.com>>
>      >
>      > Adding support for the device reset and recovery events in the
>      > rte_eth_event framework. FW error and FW reset conditions would be
>      > managed internally by the PMD without needing application intervention.
>      > In such cases, PMD would need reset/recovery events to notify application
>      > that PMD is undergoing a reset.
>      >
>      > While most of the recovery process is transparent to the application since
>      > most of the driver ensures recovery from FW reset or FW error conditions,
>      > the application will have to reprogram any flows which were offloaded to
>      > the underlying hardware.
>      >
>      > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com <mailto:kalesh-anakkur.purayil@broadcom.com>>
>      > Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com <mailto:somnath.kotur@broadcom.com>>
>      > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com <mailto:ajit.khaparde@broadcom.com>>
> 
>     More developer cc'ed.
> 
>      > ---
>      >   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
>      >   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
>      >   2 files changed, 42 insertions(+)
>      >
>      > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
>      > index 6831289..9ecc0e4 100644
>      > --- a/doc/guides/prog_guide/poll_mode_drv.rst
>      > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>      > @@ -623,3 +623,27 @@ by application.
>      >   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
>      >   the application to handle reset event. It is duty of application to
>      >   handle all synchronization before it calls rte_eth_dev_reset().
>      > +
>      > +Error recovery support
>      > +~~~~~~~~~~~~~~~~~~~~~~
>      > +
>      > +When the PMD detects a FW reset or error condition, it may try to recover
>      > +from the error without needing the application intervention. In such cases,
>      > +PMD would need events to notify the application that it is undergoing
>      > +an error recovery.
>      > +
>      > +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
>      > +application that PMD detected a FW reset or FW error condition. PMD may
>      > +try to recover from the error by itself. Data path may be quiesced and
>      > +control path operations may fail during the recovery period. The application
>      > +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.
>      > +
> 
>     Between the time FW error occurred and the application receive the RECOVERING event,
>     datapath will continue to poll and application may call control APIs, so the event
>     really is not solving the issue and driver somehow should be sure this won't crash
>     the application, in that case not sure about the benefit of this event.
> 
> [Kalesh]: As soon as the driver detects a FW dead or reset condition, it sets the fastpath pointers to dummy functions. This will prevent the crash. All control path operations would fail with -EBUSY. This change is already there in bnxt PMD. This event is a notification to the application that the PMD is recovering from a FW error condition so that it can stop polling and issue control path operations.
> 
This was my point indeed. PMD can't rely on this event and should take actions (for both
datapath and control path) to prevent possible crash/error. If that is the case event
doesn't have that much benefit since PMD already has to protect itself.
> 
>      > +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application
>      > +that the it has recovered from the error condition. PMD re-configures the port
>      > +to the state prior to the error condition. Control path and data path are up now.
>      > +Since the device has undergone a reset, flow rules offloaded prior to reset
>      > +may be lost and the application should recreate the rules again.
>      > +
> 
>     I think the most difficult part here is clarify what application should do
>     when this event received consistent for all devices, "flow rules may be lost"
>     looks very vague to me.
>     Unless it is not clear for application what to do when this event is received,
>     it is not that useful or it will be specific to some PMDs. And I can see it is
>     hard to clarify this but perhaps we can define a set of common behavior.
>     Not sure what others are thinking.
> 
> [Kalesh]: Sure, let's wait for others' opinions as well.
> 
> 
>      > +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
>      > +that it has failed to recover from the error condition. The device may not be
>      > +usable anymore.
>      > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>      > index 147cc1c..a46819f 100644
>      > --- a/lib/ethdev/rte_ethdev.h
>      > +++ b/lib/ethdev/rte_ethdev.h
>      > @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>      >       RTE_ETH_EVENT_DESTROY,  /**< port is released */
>      >       RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>      >       RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>      > +     RTE_ETH_EVENT_ERR_RECOVERING,
>      > +                     /**< port recovering from an error
>      > +                      *
>      > +                      * PMD detected a FW reset or error condition.
>      > +                      * PMD will try to recover from the error.
>      > +                      * Data path may be quiesced and Control path operations
>      > +                      * may fail at this time.
>      > +                      */
> 
>     Please put multi line comments before enum, Andrew did a set of cleanups for these.
> 
> [Kalesh]: Sure, will do.
> 
> 
>      > +     RTE_ETH_EVENT_RECOVERED,
>      > +                     /**< port recovered from an error
>      > +                      *
>      > +                      * PMD has recovered from the error condition.
>      > +                      * Control path and Data path are up now.
>      > +                      * PMD re-configures the port to the state prior to the error.
>      > +                      * Since the device has undergone a reset, flow rules
>      > +                      * offloaded prior to reset may be lost and
>      > +                      * the application should recreate the rules again.
>      > +                      */
>      >       RTE_ETH_EVENT_MAX       /**< max value of this enum */
>      >   };
>      >
> 
> 
> 
> -- 
> Regards,
> Kalesh A P
next prev parent reply	other threads:[~2022-02-01 13:20 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-22 10:16 [dpdk-dev] [RFC PATCH 0/3] librte_ethdev: error recovery support Kalesh A P
2020-01-22 10:16 ` [dpdk-dev] [RFC PATCH 1/3] librte_ethdev: support device recovery event Kalesh A P
2020-03-11 13:20   ` Thomas Monjalon
2020-03-12  3:31     ` Kalesh Anakkur Purayil
2020-03-12  7:29       ` Thomas Monjalon
2020-01-22 10:16 ` [dpdk-dev] [RFC PATCH 2/3] net/bnxt: notify applications about device reset Kalesh A P
2020-01-22 10:16 ` [dpdk-dev] [RFC PATCH 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-03-11 13:19 ` [dpdk-dev] [RFC PATCH 0/3] librte_ethdev: error recovery support Thomas Monjalon
2020-03-12  3:25   ` Kalesh Anakkur Purayil
2020-03-12  7:34     ` Thomas Monjalon
2020-07-03 16:12       ` Ferruh Yigit
2020-09-30  7:03 ` [dpdk-dev] [RFC V2 " Kalesh A P
2020-09-30  7:03   ` [dpdk-dev] [RFC V2 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-09-30  7:03   ` [dpdk-dev] [RFC V2 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-09-30  7:03   ` [dpdk-dev] [RFC V2 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-09-30  7:07 ` [dpdk-dev] [RFC PATCH v2 0/3] librte_ethdev: error recovery support Kalesh A P
2020-09-30  7:07   ` [dpdk-dev] [RFC PATCH v2 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-09-30  7:07   ` [dpdk-dev] [RFC PATCH v2 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-09-30  7:07   ` [dpdk-dev] [RFC PATCH v2 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-09-30  7:12 ` [dpdk-dev] [RFC PATCH v3 0/3] librte_ethdev: error recovery support Kalesh A P
2020-09-30  7:12   ` [dpdk-dev] [RFC PATCH v3 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-09-30  7:50     ` Thomas Monjalon
2020-09-30  8:35       ` Kalesh Anakkur Purayil
2020-09-30  9:31         ` Thomas Monjalon
2020-09-30  7:12   ` [dpdk-dev] [RFC PATCH v3 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-09-30  7:12   ` [dpdk-dev] [RFC PATCH v3 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-10-08 10:53     ` Asaf Penso
2020-09-30 12:33 ` [dpdk-dev] [RFC PATCH v4 0/3] librte_ethdev: error recovery support Kalesh A P
2020-09-30 12:33   ` [dpdk-dev] [RFC PATCH v4 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-09-30 12:33   ` [dpdk-dev] [RFC PATCH v4 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-09-30 12:33   ` [dpdk-dev] [RFC PATCH v4 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-10-06 17:25     ` Ophir Munk
2020-10-07  4:46       ` Kalesh Anakkur Purayil
2020-10-07  8:36         ` Ophir Munk
2020-10-07  9:37         ` Ferruh Yigit
2020-10-07 18:42           ` Ajit Khaparde
2020-10-07 16:49 ` [dpdk-dev] [PATCH v5 0/3] librte_ethdev: error recovery support Kalesh A P
2020-10-07 16:49   ` [dpdk-dev] [PATCH v5 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-10-08 10:49     ` Asaf Penso
2020-10-07 16:49   ` [dpdk-dev] [PATCH v5 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-10-07 16:49   ` [dpdk-dev] [PATCH v5 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-10-09  3:48 ` [dpdk-dev] [PATCH v6 0/3] librte_ethdev: error recovery support Kalesh A P
2020-10-09  3:48   ` [dpdk-dev] [PATCH v6 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-10-11 21:29     ` Thomas Monjalon
2020-10-12  8:09       ` Andrew Rybchenko
2021-02-18 15:32         ` Ferruh Yigit
2020-10-09  3:48   ` [dpdk-dev] [PATCH v6 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-10-09  3:48   ` [dpdk-dev] [PATCH v6 3/3] app/testpmd: handle device recovery event Kalesh A P
2022-01-28 12:48   ` [dpdk-dev] [PATCH v7 0/4] ethdev: error recovery support Kalesh A P
2022-01-28 12:48     ` [dpdk-dev] [PATCH v7 1/4] ethdev: support device reset and recovery events Kalesh A P
2022-02-01 12:11       ` Ferruh Yigit
2022-02-01 13:09         ` Kalesh Anakkur Purayil
2022-02-01 13:19           ` Ferruh Yigit [this message]
2022-02-03 20:28             ` Ajit Khaparde
2022-02-10 22:42               ` Thomas Monjalon
2022-02-01 12:52       ` Ferruh Yigit
2022-02-02 11:44         ` Ray Kinsella
2022-02-10 22:16           ` Thomas Monjalon
2022-02-11 10:09             ` Ray Kinsella
2022-02-14 10:16               ` Ray Kinsella
2022-02-14 11:15                 ` Thomas Monjalon
2022-02-14 16:06                   ` Ray Kinsella
2022-02-14 16:25                     ` Thomas Monjalon
2022-02-14 18:27                       ` Ray Kinsella
2022-02-15 13:55                         ` Ray Kinsella
2022-02-15 15:12                           ` Thomas Monjalon
2022-02-15 16:12                             ` Ray Kinsella
2022-05-21 10:33                     ` fengchengwen
2022-05-24 15:11                       ` Ray Kinsella
2022-06-10  0:16                         ` fengchengwen
2022-01-28 12:48     ` [dpdk-dev] [PATCH v7 2/4] app/testpmd: handle device recovery event Kalesh A P
2022-01-28 12:48     ` [dpdk-dev] [PATCH v7 3/4] net/bnxt: notify applications about device reset/recovery Kalesh A P
2022-01-28 12:48     ` [dpdk-dev] [PATCH v7 4/4] doc: update release notes Kalesh A P
2022-02-01 12:12       ` Ferruh Yigit
2022-06-16  9:41     ` [PATCH v8 0/4] ethdev: support error recovery notification Chengwen Feng
2022-06-16  9:41       ` [PATCH v8 1/4] ethdev: support device " Chengwen Feng
2022-06-20 17:42         ` Thomas Monjalon
2022-06-21  1:38           ` fengchengwen
2022-06-21  7:04             ` Thomas Monjalon
2022-09-22  7:53               ` fengchengwen
2022-06-23 15:58         ` Ray Kinsella
2022-06-16  9:41       ` [PATCH v8 2/4] app/testpmd: handle error recovery notification event Chengwen Feng
2022-06-16  9:41       ` [PATCH v8 3/4] net/hns3: support " Chengwen Feng
2022-06-16  9:41       ` [PATCH v8 4/4] net/bnxt: notify applications about device reset/recovery Chengwen Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=ff901ac4-afac-0153-a3c6-945ed2f62611@intel.com \
    --to=ferruh.yigit@intel.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=asafp@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=kalesh-anakkur.purayil@broadcom.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).