On Tue, Mar 7, 2023 at 4:40 AM Konstantin Ananyev wrote: > > > > > >>>>>>>>>>> In the proactive error handling mode, the PMD will set the data path > > >>>>>>>>>>> pointers to dummy functions and then try recovery, in this period the > > >>>>>>>>>>> application may still invoking data path API. This will introduce a > > >>>>>>>>>>> race-condition with data path which may lead to crash [1]. > > >>>>>>>>>>> > > >>>>>>>>>>> Although the PMD added delay after setting data path pointers to cover > > >>>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't > > >>>>>>>>>>> solve the problem. > > >>>>>>>>>>> > > >>>>>>>>>>> To solve the race-condition problem fundamentally, the following > > >>>>>>>>>>> requirements are added: > > >>>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after > > >>>>>>>>>>> report RTE_ETH_EVENT_ERR_RECOVERING event. > > >>>>>>>>>>> 2. The application should stop data path API invocation when process > > >>>>>>>>>>> the RTE_ETH_EVENT_ERR_RECOVERING event. > > >>>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before > > >>>>>>>>>>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event. > > >>>>>>>>>>> 4. The application should enable data path API invocation when process > > >>>>>>>>>>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event. > > >>>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> How this is solving the race-condition, by pushing responsibility to > > >>>>>>>>> stop data path to application? > > >>>>>>>> > > >>>>>>>> Exactly, it becomes application responsibility to make sure data-path is > > >>>>>>>> stopped/suspended before recovery will continue. > > >>>>>>>> > > >>>>>>> > > >>>>>>> From documentation of the feature: > > >>>>>>> > > >>>>>>> `` > > >>>>>>> Because the PMD recovers automatically, > > >>>>>>> the application can only sense that the data flow is disconnected for a > > >>>>>>> while and the control API returns an error in this period. > > >>>>>>> > > >>>>>>> In order to sense the error happening/recovering, as well as to restore > > >>>>>>> some additional configuration, three events are available: > > >>>>>>> `` > > >>>>>>> > > >>>>>>> It looks like initial design is to use events mainly inform application > > >>>>>>> about what happened and mainly for re-configuration. > > >>>>>>> > > >>>>>>> Although I am don't disagree to involve the application, I am not sure > > >>>>>>> that is part of current design. > > >>>>>> > > >>>>>> I thought we all agreed that initial design contain some fallacies that > > >>>>>> need to fixed, no? > > >>>>>> Statement that with current rte_ethdev design error recovery can be done > > >>>>>> without interaction with the app (to stop/suspend data/control path) > > >>>>>> is the main one I think. > > >>>>>> It needs some interaction with app layer, one way or another. > > >>>>>> > > >>>>>>>>> > > >>>>>>>>> What if application is not interested in recovery modes at all and not > > >>>>>>>>> registered any callback for the recovery? > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Are you saying there is no way for application to disable > > >>>>>>>> automatic recovery in PMD if it is not interested > > >>>>>>>> (or can't full-fill per-requesties for it)? > > >>>>>>>> If so, then yes it is a problem and we need to fix it. > > >>>>>>>> I assumed that such mechanism to disable unwanted events already exists, > > >>>>>>>> but I can't find anything. > > >>>>>>>> Wonder what would be the easiest way here - can PMD make a decision > > >>>>>>>> based on callback return value, or do we need a new API to > > >>>>>>>> enable/disable callbacks, or ...? > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> As far as I can see automatic recovery is not configurable by app. > > >>>>>>> > > >>>>>>> But that is not all, PMD sends events to application but PMD can't know > > >>>>>>> if application is handling them or not, so with current design PMD can't > > >>>>>>> rely on to app. > > >>>>>> > > >>>>>> Well, PMD invokes user provided callback. > > >>>>>> One way to fix that problem - if there is no callback provided, > > >>>>>> or callback returns an error code - PMD can assume that recovery > > >>>>>> should not be done. > > >>>>>> That is probably not the best design choice, but at least it will allow > > >>>>>> to fix the problem without too many changes and introducing new API. > > >>>>>> That could be sort of a 'quick fix'. > > >>>>>> In a meanwhile we can think about new/better approach for that. > > >>>>>> > > >>>>> > > >>>>> -rc2 for 23.03 is a few days away. > > >>>>> > > >>>>> What do you think to have 'quick fix' as modifying how driver updates > > >>>>> burst ops to prevent the race condition, for this release? > > >> > > >> The 'quick fix', do you mean only update function pointer (without rxq setting) ? > > >> Currently the PMDs which announced support "proactive error handling mode" already > > >> do this. > > >> > > > > > > Yes. > > > I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()' > > > updates all fields in 'rte_eth_fp_ops' but only function pointer seems > > > changed in the driver, resulting only function pointers to be updated. > > > > > > The discussion about race condition started with patch [1], which > > > mentions a crash because of a race condition. Later in discussions, > > > recovery event given as a sample for where the race can occur, that is > > > why we are here. > > > > > > But after above info, although there is race condition and a bigger > > > update (that needs application involvement) is required for recovery > > > mechanism, there is no crash and NO 'quick fix' is required for recovery. > > > > > > @Konstantin, @Chengwen, can you please confirm above understanding is > > > correct? > > > > Yes, that's what. > > Yes, I think with Chengwen patch the race condition problem should be fixed. > Though for that user has to provide a properly implemented callback. > What is not currently addressed - user can not disable this auto-recovery procedure on his will. > So if user will not provide a proper call-back the recovery can still proceed and race can happen. Ideally the user or the application should participate in the recovery to prevent more catastrophic results which may need a system reboot. Not all scenarios are recoverable, but based on implementation that could be a very small percentage. But the application awareness and participation as an end goal is a good idea nevertheless. > > > > > > > > > > > > > > > [1] > > > https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/ > > > > > >>>>> > > >>>>> And plan a design update for the next release? > > >>>> +1 on the overall approach. > > >>> > > >>> Yep, agree. > > >> > > >> Hope for better solution. > > >> And also, I notice only the openvswitch (from all open-source software which based-on DPDK) > > >> registers RTE_ETH_EVENT_INTR_RESET callback . > > >> > > >> Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible > > >> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism. > > >> > > >>> > > >>>> > > >>>>> > > >>>>> > > >>>>>>> > > >>>>>>>>> I think driver should not rely on application for this, unless > > >>>>>>>>> application explicitly says (to driver) that it is handling recovery, > > >>>>>>>>> right now there is no way for driver to know this. > > >>>>>>>> > > >>>>>>>> I think it is visa-versa: > > >>>>>>>> application should not enable auto-recovery if it can't meet > > >>>>>>>> per-requeststies for it (provide appropriate callback). > > >>>>>>>> > > >>>>>>> > > >>>>>>> I agree on above, we are saying similar thing in different perspective. > > >>>>>> > > >>>>>> Ok, that's good we are on the same page. > > >>>>>> > > >>>>>> > > >>>>>>> > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>>>> Also, this patch introduce a driver internal function > > >>>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD. > > >>>>>>>>>>> > > >>>>>>>>>>> [1] > > >>>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/ > > >>>>>>>>>>> > > >>>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode") > > >>>>>>>>>>> Cc: stable@dpdk.org > > >>>>>>>>>>> > > >>>>>>>>>>> Signed-off-by: Chengwen Feng > > >>>>>>>>>>> --- > > >>>>>>>>>>> doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++--------- > > >>>>>>>>>>> lib/ethdev/ethdev_driver.c | 8 +++++++ > > >>>>>>>>>>> lib/ethdev/ethdev_driver.h | 10 ++++++++ > > >>>>>>>>>>> lib/ethdev/rte_ethdev.h | 32 > > >>>>>>>>>>> +++++++++++++++---------- > > >>>>>>>>>>> lib/ethdev/version.map | 1 + > > >>>>>>>>>>> 5 files changed, 46 insertions(+), 25 deletions(-) > > >>>>>>>>>>> > > >>>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst > > >>>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst > > >>>>>>>>>>> index c145a9066c..e380ff135a 100644 > > >>>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst > > >>>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst > > >>>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery > > >>>>>>>>>>> in PASSIVE mode, > > >>>>>>>>>>> the PMD automatically recovers from error in PROACTIVE mode, > > >>>>>>>>>>> and only a small amount of work is required for the application. > > >>>>>>>>>>> > > >>>>>>>>>>> -During error detection and automatic recovery, > > >>>>>>>>>>> -the PMD sets the data path pointers to dummy functions > > >>>>>>>>>>> -(which will prevent the crash), > > >>>>>>>>>>> -and also make sure the control path operations fail with a return > > >>>>>>>>>>> code ``-EBUSY``. > > >>>>>>>>>>> - > > >>>>>>>>>>> -Because the PMD recovers automatically, > > >>>>>>>>>>> -the application can only sense that the data flow is disconnected > > >>>>>>>>>>> for a while > > >>>>>>>>>>> -and the control API returns an error in this period. > > >>>>>>>>>>> +During error detection and automatic recovery, the PMD sets the > > >>>>>>>>>>> data path > > >>>>>>>>>>> +pointers to dummy functions and also make sure the control path > > >>>>>>>>>>> operations > > >>>>>>>>>>> +failed with a return code ``-EBUSY``. > > >>>>>>>>>>> > > >>>>>>>>>>> In order to sense the error happening/recovering, > > >>>>>>>>>>> as well as to restore some additional configuration, > > >>>>>>>>>>> @@ -653,9 +648,9 @@ three events are available: > > >>>>>>>>>>> > > >>>>>>>>>>> ``RTE_ETH_EVENT_ERR_RECOVERING`` > > >>>>>>>>>>> Notify the application that an error is detected > > >>>>>>>>>>> - and the recovery is being started. > > >>>>>>>>>>> + and the recovery is about to start. > > >>>>>>>>>>> Upon receiving the event, the application should not invoke > > >>>>>>>>>>> - any control path function until receiving > > >>>>>>>>>>> + any control and data path API until receiving > > >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or > > >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event. > > >>>>>>>>>>> > > >>>>>>>>>>> .. note:: > > >>>>>>>>>>> @@ -666,8 +661,9 @@ three events are available: > > >>>>>>>>>>> > > >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` > > >>>>>>>>>>> Notify the application that the recovery from error is successful, > > >>>>>>>>>>> - the PMD already re-configures the port, > > >>>>>>>>>>> - and the effect is the same as a restart operation. > > >>>>>>>>>>> + the PMD already re-configures the port. > > >>>>>>>>>>> + The application should restore some additional configuration, > > >>>>>>>>>>> and then > > >>>>>>>>>>> + enable data path API invocation. > > >>>>>>>>>>> > > >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` > > >>>>>>>>>>> Notify the application that the recovery from error failed, > > >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c > > >>>>>>>>>>> index 0be1e8ca04..f994653fe9 100644 > > >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c > > >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c > > >>>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev > > >>>>>>>>>>> *dev, const char *ring_name, > > >>>>>>>>>>> return rc; > > >>>>>>>>>>> } > > >>>>>>>>>>> > > >>>>>>>>>>> +void > > >>>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) > > >>>>>>>>>>> +{ > > >>>>>>>>>>> + if (dev == NULL) > > >>>>>>>>>>> + return; > > >>>>>>>>>>> + eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); > > >>>>>>>>>>> +} > > >>>>>>>>>>> + > > >>>>>>>>>>> const struct rte_memzone * > > >>>>>>>>>>> rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char > > >>>>>>>>>>> *ring_name, > > >>>>>>>>>>> uint16_t queue_id, size_t size, unsigned int align, > > >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h > > >>>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644 > > >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h > > >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h > > >>>>>>>>>>> @@ -1621,6 +1621,16 @@ int > > >>>>>>>>>>> rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const > > >>>>>>>>>>> char *name, > > >>>>>>>>>>> uint16_t queue_id); > > >>>>>>>>>>> > > >>>>>>>>>>> +/** > > >>>>>>>>>>> + * @internal > > >>>>>>>>>>> + * Setup eth fast-path API to ethdev values. > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * @param dev > > >>>>>>>>>>> + * Pointer to struct rte_eth_dev. > > >>>>>>>>>>> + */ > > >>>>>>>>>>> +__rte_internal > > >>>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev); > > >>>>>>>>>>> + > > >>>>>>>>>>> /** > > >>>>>>>>>>> * @internal > > >>>>>>>>>>> * Atomically set the link status for the specific device. > > >>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > > >>>>>>>>>>> index 049641d57c..44ee7229c1 100644 > > >>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h > > >>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h > > >>>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type { > > >>>>>>>>>>> */ > > >>>>>>>>>>> RTE_ETH_EVENT_RX_AVAIL_THRESH, > > >>>>>>>>>>> /** Port recovering from a hardware or firmware error. > > >>>>>>>>>>> - * If PMD supports proactive error recovery, > > >>>>>>>>>>> - * it should trigger this event to notify application > > >>>>>>>>>>> - * that it detected an error and the recovery is being started. > > >>>>>>>>>>> - * Upon receiving the event, the application should not invoke > > >>>>>>>>>>> any control path API > > >>>>>>>>>>> - * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until > > >>>>>>>>>>> receiving > > >>>>>>>>>>> - * RTE_ETH_EVENT_RECOVERY_SUCCESS or > > >>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event. > > >>>>>>>>>>> - * The PMD will set the data path pointers to dummy functions, > > >>>>>>>>>>> - * and re-set the data path pointers to non-dummy functions > > >>>>>>>>>>> - * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. > > >>>>>>>>>>> - * It means that the application cannot send or receive any > > >>>>>>>>>>> packets > > >>>>>>>>>>> - * during this period. > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * If PMD supports proactive error recovery, it should trigger > > >>>>>>>>>>> this > > >>>>>>>>>>> + * event to notify application that it detected an error and the > > >>>>>>>>>>> + * recovery is about to start. > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * Upon receiving the event, the application should not invoke any > > >>>>>>>>>>> + * control and data path API until receiving > > >>>>>>>>>>> + * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED > > >>>>>>>>>>> + * event. > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * Once this event is reported, the PMD will set the data path > > >>>>>>>>>>> pointers > > >>>>>>>>>>> + * to dummy functions, and re-set the data path pointers to valid > > >>>>>>>>>>> + * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS > > >>>>>>>>>>> event. > > >>>>>>>>>>> + * > > >>>>>>>>>>> * @note Before the PMD reports the recovery result, > > >>>>>>>>>>> * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event > > >>>>>>>>>>> again, > > >>>>>>>>>>> * because a larger error may occur during the recovery. > > >>>>>>>>>>> */ > > >>>>>>>>>>> RTE_ETH_EVENT_ERR_RECOVERING, > > >>>>>>>>>>> /** Port recovers successfully from the error. > > >>>>>>>>>>> - * The PMD already re-configured the port, > > >>>>>>>>>>> - * and the effect is the same as a restart operation. > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * The PMD already re-configured the port: > > >>>>>>>>>>> * a) The following operation will be retained: (alphabetically) > > >>>>>>>>>>> * - DCB configuration > > >>>>>>>>>>> * - FEC configuration > > >>>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type { > > >>>>>>>>>>> * (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP) > > >>>>>>>>>>> * c) Any other configuration will not be stored > > >>>>>>>>>>> * and will need to be re-configured. > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * The application should restore some additional configuration > > >>>>>>>>>>> + * (see above case b/c), and then enable data path API invocation. > > >>>>>>>>>>> */ > > >>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_SUCCESS, > > >>>>>>>>>>> /** Port recovery failed. > > >>>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map > > >>>>>>>>>>> index 357d1a88c0..c273e0bdae 100644 > > >>>>>>>>>>> --- a/lib/ethdev/version.map > > >>>>>>>>>>> +++ b/lib/ethdev/version.map > > >>>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL { > > >>>>>>>>>>> rte_eth_devices; > > >>>>>>>>>>> rte_eth_dma_zone_free; > > >>>>>>>>>>> rte_eth_dma_zone_reserve; > > >>>>>>>>>>> + rte_eth_fp_ops_setup; > > >>>>>>>>>>> rte_eth_hairpin_queue_peer_bind; > > >>>>>>>>>>> rte_eth_hairpin_queue_peer_unbind; > > >>>>>>>>>>> rte_eth_hairpin_queue_peer_update; > > >>>>>>>>>>> -- > > >>>>>>>>>> Acked-by: Konstantin Ananyev > > >>>>>>>>>> > > >>>>>>>>>>> 2.17.1 > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>> > > > > > > . > > >