DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Andrew Rybchenko <Andrew.Rybchenko@oktetlabs.ru>,
	Thomas Monjalon <thomas@monjalon.net>,
	Kalesh A P <kalesh-anakkur.purayil@broadcom.com>
Cc: dev@dpdk.org, Ajit Khaparde <ajit.khaparde@broadcom.com>,
	Ori Kam <orika@nvidia.com>, Asaf Penso <asafp@nvidia.com>
Subject: Re: [dpdk-dev] [PATCH v6 1/3] ethdev: support device reset and recovery events
Date: Thu, 18 Feb 2021 15:32:45 +0000
Message-ID: <089bbd4f-b1a6-2631-f601-982e266708c3@intel.com> (raw)
In-Reply-To: <d10795cc-17ea-bc4f-4664-0209cc526a70@oktetlabs.ru>

On 10/12/2020 9:09 AM, Andrew Rybchenko wrote:
> On 10/12/20 12:29 AM, Thomas Monjalon wrote:
>> 09/10/2020 05:48, Kalesh A P:
>>> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>>>
>>> Adding support for device reset and recovery events in the
>>> rte_eth_event framework. FW error and FW reset conditions would be
>>> managed internally by PMD without needing application intervention.
>>> In such cases, PMD would need reset/recovery events to notify application
>>> that PMD is undergoing a reset.
>>>
>>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
>>> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>>> Reviewed-by: Asaf Penso <asafp@nvidia.com>
>>
>> The ethdev maintainers are not Cc'ed.
>> Please use the option --cc-cmd devtools/get-maintainer.sh
>>
>>
>>> +Error recovery support
>>> +~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +When the PMD detects a FW reset or error condition, it will try to recover
>>> +from the error without needing the application intervention. In such cases,
>>> +PMD would need events to notify the application that it is undergoing
>>> +an error recovery.
>>> +
>>> +The PMD will trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
>>> +application that PMD detected a FW reset or FW error condition. PMD will
>>> +try to recover from the error by itself. Data path will be halted and
>>> +control path operations would fail during the recovery period.
>>> +
>>> +The PMD will trigger RTE_ETH_EVENT_RECOVERED event to notify the application
>>> +that the it has recovered from the error condition. Control path and data path
>>> +are up now. Since the device undergone a reset, flow rules offloaded prior to
>>> +the reset will be lost and the application has to recreate the rules again.
> 
> What should be done if the state is not recoverable?
> 
>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>> index 9759f13..9b4b015 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>> @@ -3207,6 +3207,23 @@ enum rte_eth_event_type {
>>>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>>> +	RTE_ETH_EVENT_ERR_RECOVERING,
>>> +			/**< port recovering from an error
>>> +			 *
>>> +			 * PMD detected a FW reset or error condition.
>>> +			 * PMD will try to recover from the error.
>>> +			 * Data path will be halted and Control path operations
>>> +			 * would fail at this time.
>>> +			 */
>>
>> Does it mean the application has nothing to do when receiving this event?
>> I think the app should stop polling at least.
>>
>>> +	RTE_ETH_EVENT_RECOVERED,
>>> +			/**< port recovered from an error
>>> +			 *
>>> +			 * PMD has recovered from the error condition.
>>> +			 * Control path and Data path are up now.
>>> +			 * Since the device undergone a reset, flow rules
>>> +			 * offloaded prior to the reset will be lost and
>>> +			 * the application has to recreate the rules again.
>>> +			 */
>>
>> Please be more precise.
>> Should the app re-configure the port, setup the queues, start the port?
>>
>>
> 

Hi Kalesh Anakkur,

The mechanics of notifying the application looks good, but the concerns seems 
more about what application should do with this information.

PMD notifies the application on the FW/HW reset and pushes some 
tasks/responsibilities to the application, but for this to be useful, these 
tasks should be clear to application.

Think yourself in a situation that you are developing an application and you 
received these events from a device that you don't know its internals, what will 
you do?

Both Thomas and Andrew put cases that needs more clarification for application. 
Can you please send a new version with those clarifications?


Thanks,
ferruh

  reply	other threads:[~2021-02-18 15:32 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-22 10:16 [dpdk-dev] [RFC PATCH 0/3] librte_ethdev: error recovery support Kalesh A P
2020-01-22 10:16 ` [dpdk-dev] [RFC PATCH 1/3] librte_ethdev: support device recovery event Kalesh A P
2020-03-11 13:20   ` Thomas Monjalon
2020-03-12  3:31     ` Kalesh Anakkur Purayil
2020-03-12  7:29       ` Thomas Monjalon
2020-01-22 10:16 ` [dpdk-dev] [RFC PATCH 2/3] net/bnxt: notify applications about device reset Kalesh A P
2020-01-22 10:16 ` [dpdk-dev] [RFC PATCH 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-03-11 13:19 ` [dpdk-dev] [RFC PATCH 0/3] librte_ethdev: error recovery support Thomas Monjalon
2020-03-12  3:25   ` Kalesh Anakkur Purayil
2020-03-12  7:34     ` Thomas Monjalon
2020-07-03 16:12       ` Ferruh Yigit
2020-09-30  7:03 ` [dpdk-dev] [RFC V2 " Kalesh A P
2020-09-30  7:03   ` [dpdk-dev] [RFC V2 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-09-30  7:03   ` [dpdk-dev] [RFC V2 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-09-30  7:03   ` [dpdk-dev] [RFC V2 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-09-30  7:07 ` [dpdk-dev] [RFC PATCH v2 0/3] librte_ethdev: error recovery support Kalesh A P
2020-09-30  7:07   ` [dpdk-dev] [RFC PATCH v2 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-09-30  7:07   ` [dpdk-dev] [RFC PATCH v2 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-09-30  7:07   ` [dpdk-dev] [RFC PATCH v2 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-09-30  7:12 ` [dpdk-dev] [RFC PATCH v3 0/3] librte_ethdev: error recovery support Kalesh A P
2020-09-30  7:12   ` [dpdk-dev] [RFC PATCH v3 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-09-30  7:50     ` Thomas Monjalon
2020-09-30  8:35       ` Kalesh Anakkur Purayil
2020-09-30  9:31         ` Thomas Monjalon
2020-09-30  7:12   ` [dpdk-dev] [RFC PATCH v3 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-09-30  7:12   ` [dpdk-dev] [RFC PATCH v3 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-10-08 10:53     ` Asaf Penso
2020-09-30 12:33 ` [dpdk-dev] [RFC PATCH v4 0/3] librte_ethdev: error recovery support Kalesh A P
2020-09-30 12:33   ` [dpdk-dev] [RFC PATCH v4 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-09-30 12:33   ` [dpdk-dev] [RFC PATCH v4 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-09-30 12:33   ` [dpdk-dev] [RFC PATCH v4 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-10-06 17:25     ` Ophir Munk
2020-10-07  4:46       ` Kalesh Anakkur Purayil
2020-10-07  8:36         ` Ophir Munk
2020-10-07  9:37         ` Ferruh Yigit
2020-10-07 18:42           ` Ajit Khaparde
2020-10-07 16:49 ` [dpdk-dev] [PATCH v5 0/3] librte_ethdev: error recovery support Kalesh A P
2020-10-07 16:49   ` [dpdk-dev] [PATCH v5 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-10-08 10:49     ` Asaf Penso
2020-10-07 16:49   ` [dpdk-dev] [PATCH v5 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-10-07 16:49   ` [dpdk-dev] [PATCH v5 3/3] app/testpmd: handle device recovery event Kalesh A P
2020-10-09  3:48 ` [dpdk-dev] [PATCH v6 0/3] librte_ethdev: error recovery support Kalesh A P
2020-10-09  3:48   ` [dpdk-dev] [PATCH v6 1/3] ethdev: support device reset and recovery events Kalesh A P
2020-10-11 21:29     ` Thomas Monjalon
2020-10-12  8:09       ` Andrew Rybchenko
2021-02-18 15:32         ` Ferruh Yigit [this message]
2020-10-09  3:48   ` [dpdk-dev] [PATCH v6 2/3] net/bnxt: notify applications about device reset/recovery Kalesh A P
2020-10-09  3:48   ` [dpdk-dev] [PATCH v6 3/3] app/testpmd: handle device recovery event Kalesh A P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=089bbd4f-b1a6-2631-f601-982e266708c3@intel.com \
    --to=ferruh.yigit@intel.com \
    --cc=Andrew.Rybchenko@oktetlabs.ru \
    --cc=ajit.khaparde@broadcom.com \
    --cc=asafp@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=kalesh-anakkur.purayil@broadcom.com \
    --cc=orika@nvidia.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git