DPDK patches and discussions
 help / color / mirror / Atom feed
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Chengwen Feng <fengchengwen@huawei.com>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	Ferruh Yigit <ferruh.yigit@amd.com>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	Kalesh AP <kalesh-anakkur.purayil@broadcom.com>,
	"Ajit Khaparde (ajit.khaparde@broadcom.com)"
	<ajit.khaparde@broadcom.com>
Cc: nd <nd@arm.com>, nd <nd@arm.com>
Subject: RE: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode
Date: Tue, 7 Mar 2023 05:34:10 +0000	[thread overview]
Message-ID: <DBAPR08MB581421B8E3850015FCABC85A98B79@DBAPR08MB5814.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <e3c370c0-dd60-8ee4-9378-9d8e816e8d6f@yandex.ru>



> -----Original Message-----
> From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> Sent: Sunday, March 5, 2023 9:24 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> dev@dpdk.org; Chengwen Feng <fengchengwen@huawei.com>;
> thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
> anakkur.purayil@broadcom.com>; Ajit Khaparde
> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
> Cc: nd <nd@arm.com>
> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
> mode
> 
> 
> >>>>
> >>>> In the proactive error handling mode, the PMD will set the data
> >>>> path pointers to dummy functions and then try recovery, in this
> >>>> period the application may still invoking data path API. This will
> >>>> introduce a race-condition with data path which may lead to crash [1].
> >>>>
> >>>> Although the PMD added delay after setting data path pointers to
> >>>> cover the above race-condition, it reduces the probability, but it
> >>>> doesn't solve the problem.
> >>>>
> >>>> To solve the race-condition problem fundamentally, the following
> >>>> requirements are added:
> >>>> 1. The PMD should set the data path pointers to dummy functions after
> >>>>      report RTE_ETH_EVENT_ERR_RECOVERING event.
> >>> Do you mean to say, PMD should set the data path pointers after
> >>> calling the
> >> call back function?
> >>> The PMD is running in the context of multiple EAL threads. How do
> >>> these
> >> threads synchronize such that only one thread sets these data pointers?
> >>
> >> As I understand this event callback supposed to be called in the
> >> context of EAL interrupt thread (whoever is more familiar with
> >> original idea, feel free to correct me if I missed something).
> > I could not figure this out. It looks to be called from the data plane thread
> context.
> > I also have a thought on alternate design at the end, appreciate if you can
> take a look.
> >
> >> How it is going to signal data-path threads that they need to
> >> stop/suspend calling data-path API - that's I suppose is left to application
> to decide...
> >> Same as right now it is application responsibility to stop data-path
> >> threads before doing dev_stop()/dev/_config()/etc.
> > Ok, good, this expectation is not new. The application must have a
> mechanism already.
> >
> >>
> >>
> >>>
> >>>> 2. The application should stop data path API invocation when process
> >>>>      the RTE_ETH_EVENT_ERR_RECOVERING event.
> >>> Any thoughts on how an application can do this?
> > We can ignore this question as there is already similar expectation set for
> earlier functionalities.
> >
> >>>
> >>>> 3. The PMD should set the data path pointers to valid functions before
> >>>>      report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>> 4. The application should enable data path API invocation when process
> >>>>      the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>> Do you mean to say that the application should not call the datapath
> >>> APIs
> >> while the PMD is running the recovery process?
> >>
> >> Yes, I believe that's the intention.
> > Ok, this is good and makes sense.
> >
> >>
> >>>>
> >>>> Also, this patch introduce a driver internal function
> >>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> >>>>
> >>>> [1]
> >>>>
> >>
> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
> >>>> -
> >>>> ashok.k.kaladi@intel.com/
> >>>>
> >>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >>>> ---
> >>>>    doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> >>>>    lib/ethdev/ethdev_driver.c              |  8 +++++++
> >>>>    lib/ethdev/ethdev_driver.h              | 10 ++++++++
> >>>>    lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
> >>>>    lib/ethdev/version.map                  |  1 +
> >>>>    5 files changed, 46 insertions(+), 25 deletions(-)
> >>>>
> >>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> index c145a9066c..e380ff135a 100644
> >>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> @@ -638,14 +638,9 @@ different from the application invokes
> >>>> recovery in PASSIVE mode,  the PMD automatically recovers from
> >>>> error in PROACTIVE mode,  and only a small amount of work is
> >>>> required for the
> >> application.
> >>>>
> >>>> -During error detection and automatic recovery, -the PMD sets the
> >>>> data path pointers to dummy functions -(which will prevent the
> >>>> crash), -and also make sure the control path operations fail with a
> >>>> return
> >> code ``-EBUSY``.
> >>>> -
> >>>> -Because the PMD recovers automatically, -the application can only
> >>>> sense that the data flow is disconnected for a while -and the
> >>>> control API returns an error in this period.
> >>>> +During error detection and automatic recovery, the PMD sets the
> >>>> +data path pointers to dummy functions and also make sure the
> >>>> +control path operations failed with a return code ``-EBUSY``.
> >>>>
> >>>>    In order to sense the error happening/recovering,  as well as to
> >>>> restore some additional configuration, @@ -653,9 +648,9 @@ three
> >>>> events
> >> are available:
> >>>>
> >>>>    ``RTE_ETH_EVENT_ERR_RECOVERING``
> >>>>       Notify the application that an error is detected
> >>>> -   and the recovery is being started.
> >>>> +   and the recovery is about to start.
> >>>>       Upon receiving the event, the application should not invoke
> >>>> -   any control path function until receiving
> >>>> +   any control and data path API until receiving
> >>>>       ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> >>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> >>>>
> >>>>    .. note::
> >>>> @@ -666,8 +661,9 @@ three events are available:
> >>>>
> >>>>    ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> >>>>       Notify the application that the recovery from error is successful,
> >>>> -   the PMD already re-configures the port,
> >>>> -   and the effect is the same as a restart operation.
> >>>> +   the PMD already re-configures the port.
> >>>> +   The application should restore some additional configuration,
> >>>> + and then
> >>> What is the additional configuration? Is this specific to each NIC/PMD?
> >>> I thought, this is an auto recovery process and the application does
> >>> not require
> >> to reconfigure anything. If the application has to restore the
> >> configuration, how does auto recovery differ from typical recovery
> process?
> >>>
> >>>> +   enable data path API invocation.
> >>>>
> >>>>    ``RTE_ETH_EVENT_RECOVERY_FAILED``
> >>>>       Notify the application that the recovery from error failed,
> >>>> diff --git a/lib/ethdev/ethdev_driver.c
> >>>> b/lib/ethdev/ethdev_driver.c index
> >>>> 0be1e8ca04..f994653fe9 100644
> >>>> --- a/lib/ethdev/ethdev_driver.c
> >>>> +++ b/lib/ethdev/ethdev_driver.c
> >>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct
> rte_eth_dev
> >>>> *dev, const char *ring_name,
> >>>>    	return rc;
> >>>>    }
> >>>>
> >>>> +void
> >>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
> >>>> +	if (dev == NULL)
> >>>> +		return;
> >>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
> >>>> +
> >>>>    const struct rte_memzone *
> >>>>    rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const
> >>>> char *ring_name,
> >>>>    			 uint16_t queue_id, size_t size, unsigned int align, diff
> -
> >> -git
> >>>> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> >>>> 2c9d615fb5..0d964d1f67 100644
> >>>> --- a/lib/ethdev/ethdev_driver.h
> >>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>> @@ -1621,6 +1621,16 @@ int
> >>>>    rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> >>>> char *name,
> >>>>    		 uint16_t queue_id);
> >>>>
> >>>> +/**
> >>>> + * @internal
> >>>> + * Setup eth fast-path API to ethdev values.
> >>>> + *
> >>>> + * @param dev
> >>>> + *  Pointer to struct rte_eth_dev.
> >>>> + */
> >>>> +__rte_internal
> >>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> >>>> +
> >>>>    /**
> >>>>     * @internal
> >>>>     * Atomically set the link status for the specific device.
> >>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> >>>> index
> >>>> 049641d57c..44ee7229c1 100644
> >>>> --- a/lib/ethdev/rte_ethdev.h
> >>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> >>>>    	 */
> >>>>    	RTE_ETH_EVENT_RX_AVAIL_THRESH,
> >>>>    	/** Port recovering from a hardware or firmware error.
> >>>> -	 * If PMD supports proactive error recovery,
> >>>> -	 * it should trigger this event to notify application
> >>>> -	 * that it detected an error and the recovery is being started.
> >>>> -	 * Upon receiving the event, the application should not invoke any
> >>>> control path API
> >>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
> >>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> >>>> -	 * The PMD will set the data path pointers to dummy functions,
> >>>> -	 * and re-set the data path pointers to non-dummy functions
> >>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>> -	 * It means that the application cannot send or receive any packets
> >>>> -	 * during this period.
> >>>> +	 *
> >>>> +	 * If PMD supports proactive error recovery, it should trigger this
> >>>> +	 * event to notify application that it detected an error and the
> >>>> +	 * recovery is about to start.
> >>>> +	 *
> >>>> +	 * Upon receiving the event, the application should not invoke any
> >>>> +	 * control and data path API until receiving
> >>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >>>> RTE_ETH_EVENT_RECOVERY_FAILED
> >>>> +	 * event.
> >>>> +	 *
> >>>> +	 * Once this event is reported, the PMD will set the data path pointers
> >>>> +	 * to dummy functions, and re-set the data path pointers to valid
> >>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> >>>> event.
> >>> Why do we need to set the data path pointers to dummy functions if
> >>> the
> >> application is restricted from invoking any control and data path
> >> APIs till the recovery process is completed?
> >>
> >> You are right, in theory it is not mandatory.
> >> Though it helps to flag a problem if user will still try to call them
> >> while recovery is in progress.
> > Ok, may be in debug mode.
> > I mean, we have already set an expectation to the application that it should
> not call and the application has implemented a method to do the same. Why
> do we need to complicate this?
> > If the application calls the APIs, it is a programming error.
> 
> 
> My preference would be to keep it this way for both debug and non-debug
> mode.
> It doesn't cost anything to us in terms of perfomance, but helps to catch
> problems with wrong behaving app.

This is also causing a synchronization problem. i.e. if this has to be done correctly, we need to use correct synchronization mechanisms.
We cannot set the function pointers and assume that data will be visible to other threads/cores in the correct order.
A possible mechanism (though I see some problems with this) could be to use a guard variable, which indicates when it is safe to use the function pointers on the data plane threads. This would require a load-acquire in the data plane threads.

> 
> >
> >> Again, same as we doing in dev_stop().
> >
> >>
> >>>
> >>>> +	 *
> >>>>    	 * @note Before the PMD reports the recovery result,
> >>>>    	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> >>>> again,
> >>>>    	 * because a larger error may occur during the recovery.
> >>>>    	 */
> >>>>    	RTE_ETH_EVENT_ERR_RECOVERING,
> >>> I understand this is not a change in this patch. But, just
> >>> wondering, what is the
> >> purpose of this? How is the application supposed to use this?
> >>>
> >>>>    	/** Port recovers successfully from the error.
> >>>> -	 * The PMD already re-configured the port,
> >>>> -	 * and the effect is the same as a restart operation.
> >>>> +	 *
> >>>> +	 * The PMD already re-configured the port:
> >>>>    	 * a) The following operation will be retained: (alphabetically)
> >>>>    	 *    - DCB configuration
> >>>>    	 *    - FEC configuration
> >>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> >>>>    	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> >>>>    	 * c) Any other configuration will not be stored
> >>>>    	 *    and will need to be re-configured.
> >>>> +	 *
> >>>> +	 * The application should restore some additional configuration
> >>>> +	 * (see above case b/c), and then enable data path API invocation.
> >>>>    	 */
> >>>>    	RTE_ETH_EVENT_RECOVERY_SUCCESS,
> >>>>    	/** Port recovery failed.
> >>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> >>>> 357d1a88c0..c273e0bdae 100644
> >>>> --- a/lib/ethdev/version.map
> >>>> +++ b/lib/ethdev/version.map
> >>>> @@ -320,6 +320,7 @@ INTERNAL {
> >>>>    	rte_eth_devices;
> >>>>    	rte_eth_dma_zone_free;
> >>>>    	rte_eth_dma_zone_reserve;
> >>>> +	rte_eth_fp_ops_setup;
> >>>>    	rte_eth_hairpin_queue_peer_bind;
> >>>>    	rte_eth_hairpin_queue_peer_unbind;
> >>>>    	rte_eth_hairpin_queue_peer_update;
> >>>> --
> >>>> 2.17.1
> >>>
> >
> > Is there any reason not to design this in the same way as
> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> 
> I suppose it is a question for the authors of original patch...
Appreciate if the authors could comment on this.

> 
> > We could have a similar API 'rte_eth_dev_recover' to do the recovery
> functionality.
> 
> I suppose such approach is also possible.
> Personally I am fine with both ways: either existing one or what you propose,
> as long as we'll fix existing race-condition.
> What is good with what you suggest - that way we probably don't need to
> worry how to allow user to enable/disable auto-recovery inside PMD.
> 
> Konstantin
> 


  reply	other threads:[~2023-03-07  5:34 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01  3:06 [PATCH 0/5] " Chengwen Feng
2023-03-01  3:06 ` [PATCH 1/5] ethdev: " Chengwen Feng
2023-03-02 12:08   ` Konstantin Ananyev
2023-03-03 16:51     ` Ferruh Yigit
2023-03-05 14:53       ` Konstantin Ananyev
2023-03-06  8:55         ` Ferruh Yigit
2023-03-06 10:22           ` Konstantin Ananyev
2023-03-06 11:00             ` Ferruh Yigit
2023-03-06 11:05               ` Ajit Khaparde
2023-03-06 11:13                 ` Konstantin Ananyev
2023-03-07  8:25                   ` fengchengwen
2023-03-07  9:52                     ` Konstantin Ananyev
2023-03-07 10:11                       ` Konstantin Ananyev
2023-03-07 12:07                     ` Ferruh Yigit
2023-03-07 12:26                       ` fengchengwen
2023-03-07 12:39                         ` Konstantin Ananyev
2023-03-09  2:05                           ` Ajit Khaparde
2023-03-06  1:41       ` fengchengwen
2023-03-06  8:57         ` Ferruh Yigit
2023-03-06  9:10         ` Ferruh Yigit
2023-03-02 23:30   ` Honnappa Nagarahalli
2023-03-03  0:21     ` Konstantin Ananyev
2023-03-04  5:08       ` Honnappa Nagarahalli
2023-03-05 15:23         ` Konstantin Ananyev
2023-03-07  5:34           ` Honnappa Nagarahalli [this message]
2023-03-07  8:39             ` fengchengwen
2023-03-08  1:09               ` Honnappa Nagarahalli
2023-03-09  0:59                 ` fengchengwen
2023-03-09  3:03                   ` Honnappa Nagarahalli
2023-03-09 11:30                     ` fengchengwen
2023-03-10  3:25                       ` Honnappa Nagarahalli
2023-03-07  9:56             ` Konstantin Ananyev
2023-03-01  3:06 ` [PATCH 2/5] net/hns3: replace fp ops config function Chengwen Feng
2023-03-02  6:50   ` Dongdong Liu
2023-03-01  3:06 ` [PATCH 3/5] net/bnxt: fix race-condition when report error recovery Chengwen Feng
2023-03-02 12:23   ` Konstantin Ananyev
2023-03-01  3:06 ` [PATCH 4/5] net/bnxt: use fp ops setup function Chengwen Feng
2023-03-02 12:30   ` Konstantin Ananyev
2023-03-03  0:01     ` Konstantin Ananyev
2023-03-03  1:17       ` Ajit Khaparde
2023-03-03  2:02       ` fengchengwen
2023-03-03  1:38     ` fengchengwen
2023-03-05 15:57       ` Konstantin Ananyev
2023-03-06  2:47         ` Ajit Khaparde
2023-03-01  3:06 ` [PATCH 5/5] app/testpmd: add error recovery usage demo Chengwen Feng
2023-03-02 13:01   ` Konstantin Ananyev
2023-03-03  1:49     ` fengchengwen
2023-03-03 16:59       ` Ferruh Yigit
2023-09-21 11:12 ` [PATCH 0/5] fix race-condition of proactive error handling mode Ferruh Yigit
2023-10-07  2:32   ` fengchengwen
2023-10-20 10:07 ` [PATCH v2 0/7] " Chengwen Feng
2023-10-20 10:07   ` [PATCH v2 1/7] ethdev: " Chengwen Feng
2023-11-01  3:39     ` lihuisong (C)
2023-10-20 10:07   ` [PATCH v2 2/7] net/hns3: replace fp ops config function Chengwen Feng
2023-11-01  3:40     ` lihuisong (C)
2023-11-02 10:34     ` Konstantin Ananyev
2023-10-20 10:07   ` [PATCH v2 3/7] net/bnxt: fix race-condition when report error recovery Chengwen Feng
2023-11-02 16:28     ` Ajit Khaparde
2023-10-20 10:07   ` [PATCH v2 4/7] net/bnxt: use fp ops setup function Chengwen Feng
2023-11-01  3:48     ` lihuisong (C)
2023-11-02 10:34     ` Konstantin Ananyev
2023-11-02 16:29       ` Ajit Khaparde
2023-10-20 10:07   ` [PATCH v2 5/7] app/testpmd: add error recovery usage demo Chengwen Feng
2023-11-01  4:08     ` lihuisong (C)
2023-11-06 13:01       ` fengchengwen
2023-10-20 10:07   ` [PATCH v2 6/7] app/testpmd: extract event handling to event.c Chengwen Feng
2023-11-01  4:09     ` lihuisong (C)
2023-10-20 10:07   ` [PATCH v2 7/7] doc: testpmd support event handling section Chengwen Feng
2023-11-06  9:28     ` lihuisong (C)
2023-11-06 12:39       ` fengchengwen
2023-11-08  3:02         ` lihuisong (C)
2023-11-06  1:35   ` [PATCH v2 0/7] fix race-condition of proactive error handling mode fengchengwen
2023-11-06 13:11 ` [PATCH v3 " Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 1/7] ethdev: " Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 2/7] net/hns3: replace fp ops config function Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 3/7] net/bnxt: fix race-condition when report error recovery Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 4/7] net/bnxt: use fp ops setup function Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 5/7] app/testpmd: add error recovery usage demo Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 6/7] app/testpmd: extract event handling to event.c Chengwen Feng
2023-11-06 13:11   ` [PATCH v3 7/7] doc: testpmd support event handling section Chengwen Feng
2023-11-08  3:03     ` lihuisong (C)
2023-12-05  2:30   ` [PATCH v3 0/7] fix race-condition of proactive error handling mode fengchengwen
2024-01-15  1:44     ` fengchengwen
2024-01-29  1:16       ` fengchengwen
2024-02-18  3:41         ` fengchengwen
2024-05-08  9:22           ` fengchengwen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBAPR08MB581421B8E3850015FCABC85A98B79@DBAPR08MB5814.eurprd08.prod.outlook.com \
    --to=honnappa.nagarahalli@arm.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=ferruh.yigit@amd.com \
    --cc=kalesh-anakkur.purayil@broadcom.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=nd@arm.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).