From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 55286A0093; Sat, 21 May 2022 12:33:54 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id F045B40156; Sat, 21 May 2022 12:33:53 +0200 (CEST) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by mails.dpdk.org (Postfix) with ESMTP id F320840040 for ; Sat, 21 May 2022 12:33:51 +0200 (CEST) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4L50Ky4DwBzhYVY; Sat, 21 May 2022 18:33:10 +0800 (CST) Received: from [127.0.0.1] (10.67.100.224) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Sat, 21 May 2022 18:33:49 +0800 Subject: Re: [dpdk-dev] [PATCH v7 1/4] ethdev: support device reset and recovery events To: Ray Kinsella , Thomas Monjalon CC: Ferruh Yigit , Kalesh A P , , , , David Marchand , Andrew Rybchenko , , shuliubin 00419723 References: <20201009034832.10302-1-kalesh-anakkur.purayil@broadcom.com> <87sfspiuj1.fsf@mdr78.vserver.site> <878rudiwhb.fsf@mdr78.vserver.site> <45691978.XUcTiDjVJD@thomas> <875yphigb6.fsf@mdr78.vserver.site> From: fengchengwen Message-ID: Date: Sat, 21 May 2022 18:33:48 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <875yphigb6.fsf@mdr78.vserver.site> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.100.224] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi all, This patch lasts for a long time. Are we waiting for 22.11 to deal with it? We have the same requirements for the reset or recovery mechanism, but there are differences: APP PMD | | | detect error | <---report error event--- | | | do error stats | and report | | ---start recover--> | | do recover | <---report recover result | | | if succ just log else may migrate service Can we generalize these processes(means that the implementation is at the framework layer)? or only at PMD API? On 2022/2/15 0:06, Ray Kinsella wrote: > > Thomas Monjalon writes: > >> 14/02/2022 11:16, Ray Kinsella: >>> Ray Kinsella writes: >>>> Thomas Monjalon writes: >>>>> 02/02/2022 12:44, Ray Kinsella: >>>>>> Ferruh Yigit writes: >>>>>>> On 1/28/2022 12:48 PM, Kalesh A P wrote: >>>>>>>> --- a/lib/ethdev/rte_ethdev.h >>>>>>>> +++ b/lib/ethdev/rte_ethdev.h >>>>>>>> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type { >>>>>>>> RTE_ETH_EVENT_DESTROY, /**< port is released */ >>>>>>>> RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */ >>>>>>>> RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */ >>>>>>>> + RTE_ETH_EVENT_ERR_RECOVERING, >>>>>>>> + /**< port recovering from an error >>>>>>>> + * >>>>>>>> + * PMD detected a FW reset or error condition. >>>>>>>> + * PMD will try to recover from the error. >>>>>>>> + * Data path may be quiesced and Control path operations >>>>>>>> + * may fail at this time. >>>>>>>> + */ >>>>>>>> + RTE_ETH_EVENT_RECOVERED, >>>>>>>> + /**< port recovered from an error >>>>>>>> + * >>>>>>>> + * PMD has recovered from the error condition. >>>>>>>> + * Control path and Data path are up now. >>>>>>>> + * PMD re-configures the port to the state prior to the error. >>>>>>>> + * Since the device has undergone a reset, flow rules >>>>>>>> + * offloaded prior to reset may be lost and >>>>>>>> + * the application should recreate the rules again. >>>>>>>> + */ >>>>>>>> RTE_ETH_EVENT_MAX /**< max value of this enum */ >>>>>>> >>>>>>> >>>>>>> Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people >>>>>>> to evaluate if it is a false positive: >>>>>>> >>>>>>> >>>>>>> 1 function with some indirect sub-type change: >>>>>>> [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes: >>>>>>> parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes: >>>>>>> underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed: >>>>>>> in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)': >>>>>>> parameter 2 of type 'enum rte_eth_event_type' has sub-type changes: >>>>>>> type size hasn't changed >>>>>>> 2 enumerator insertions: >>>>>>> 'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11' >>>>>>> 'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12' >>>>>>> 1 enumerator change: >>>>>>> 'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1 >>>>>> >>>>>> I don't immediately see the problem that this would cause. >>>>>> There are no array sizes etc dependent on the value of MAX for instance. >>>>>> >>>>>> Looks safe? >>>>> >>>>> We never know how this enum will be used by the application. >>>>> The max value may be used for the size of an event array. >>>>> It looks a real ABI issue unfortunately. >>>> >>>> Right - but we only really care about it when an array size based on MAX >>>> is likely to be passed to DPDK, which doesn't apply in this case. >> >> I don't completely agree. >> A developer may assume an event will never exceed MAX value. >> However, after an upgrade of DPDK without app rebuild, >> a higher event value may be received in the app, >> breaking the assumption. >> Should we consider this case as an ABI breakage? > > Nope - I think we should explicitly exclude MAX values from any > ABI guarantee, as being able to change them is key to our be able to > evolve DPDK while maintaining ABI stability. > > Consider what it means applying the ABI policy to a MAX value, you are > in effect saying that that no value can be added to this enumeration > until the next ABI version, for me this is very restrictive without a > solid reason. > >> >>>> I noted that some Linux folks explicitly mark similar MAX values as not >>>> part of the ABI. >>>> >>>> /usr/include/linux/perf_event.h >>>> 37: PERF_TYPE_MAX, /* non-ABI */ >>>> 60: PERF_COUNT_HW_MAX, /* non-ABI */ >>>> 79: PERF_COUNT_HW_CACHE_MAX, /* non-ABI */ >>>> 87: PERF_COUNT_HW_CACHE_OP_MAX, /* non-ABI */ >>>> 94: PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */ >>>> 116: PERF_COUNT_SW_MAX, /* non-ABI */ >>>> 149: PERF_SAMPLE_MAX = 1U << 24, /* non-ABI */ >>>> 151: __PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* >>>> non-ABI; internal use */ >>>> 189: PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */ >>>> 267: PERF_TXN_MAX = (1 << 8), /* non-ABI */ >>>> 301: PERF_FORMAT_MAX = 1U << 4, /* non-ABI */ >>>> 1067: PERF_RECORD_MAX, /* non-ABI */ >>>> 1078: PERF_RECORD_KSYMBOL_TYPE_MAX /* non-ABI */ >>>> 1087: PERF_BPF_EVENT_MAX, /* non-ABI */ >>> >>> Any thoughts on similarly annotating all our _MAX enums in the same way? >>> We could also add a section in the ABI Policy to make it explicit _MAX >>> enum values are not part of the ABI - what do folks think? >> >> Interesting. I am not sure it is always ABI-safe though. > >