From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 27AC341E05; Tue, 7 Mar 2023 13:07:38 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1568340E03; Tue, 7 Mar 2023 13:07:38 +0100 (CET) Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2051.outbound.protection.outlook.com [40.107.244.51]) by mails.dpdk.org (Postfix) with ESMTP id 748844067E for ; Tue, 7 Mar 2023 13:07:36 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NyWUbTYX/w+gzoC4qwNADanTV32ADEa/MwSQV6ut49pDiSQ54Z0flIasK99wZQ+G45aTQY7wWGKXGMG61qj90z0K/AEj0v81dQcLCTkyWXQpO0zGoS/Wl9ghbzS/qYv3LdFleeqs3mv2WTfaB2OlMNm0Tq7I0+NVJgWV2dPrctyh7YqPSxVh4yERHWE994tHY2EMKdgbthqVCvxaSL5xGgMujpvuNxIZkoxLWvn7vJZUI8tzJAA6mUfu1TDbmzRmhhTspyvlZmXETTSeGmP07ZH9CSNIvl5+cjLUKqaa1qM6jw0z7cd+edk242AV87/NEL2rg6ygdqPRDhHneMxSaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+d57kmESY2ax+LwFRfi9WBYPi/5MfXRI0DuSc/xNrqo=; b=EZPqWOid9eRiE6WJ/5wU64nclPnBFzOjcvE+TA0d97XNPdQ/0uG2N8HBcJGiRVPY9YIHVKClcaxGKgZd8tM/LwOEXIdWydmwHR6+vfhR8mb5m2ZZjXlV8kocCSfZnmS6cwlYLJdq9MVwJZa/rmPUoPUo/WjogsByKlExcZqhLh5Gl/ZOslY/eMY+zTqVeAH3cd2kqxJxkc/p67GyW4IZfnKlY8zJPwE1DpwtiOJuRzwP3mYNOlTBhfqgju91ye9/K3KK0pP1PLyfVbZOSINc9nPR6zlwcotlRzXYExFIEW8Wsg21Ix7Q3wbGw2o/JKCIzuLcaEJctTR5s6FXU2nvzw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+d57kmESY2ax+LwFRfi9WBYPi/5MfXRI0DuSc/xNrqo=; b=GZ3FefX+QBQxo65gmeoxfk5XS2SJ21BNn+RaPDQ94CEK9JvL6UJ9S/iCnPzhmQJWFj3MpIwJgZI44UywLtEIyd5pzLcrHcKzqByXaZKhvb6DqjPYZM/sTv6yF68PzAE3EN8ikJIBD3au2eSlNBrYZiE/xo+kD78jYb49E85LwmE= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; Received: from CH2PR12MB4294.namprd12.prod.outlook.com (2603:10b6:610:a9::11) by MW4PR12MB6802.namprd12.prod.outlook.com (2603:10b6:303:20f::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6156.29; Tue, 7 Mar 2023 12:07:31 +0000 Received: from CH2PR12MB4294.namprd12.prod.outlook.com ([fe80::dd5a:8a5c:f493:9640]) by CH2PR12MB4294.namprd12.prod.outlook.com ([fe80::dd5a:8a5c:f493:9640%4]) with mapi id 15.20.6156.029; Tue, 7 Mar 2023 12:07:31 +0000 Message-ID: <5b110017-8679-4603-5c25-742ed83e4bda@amd.com> Date: Tue, 7 Mar 2023 12:07:24 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Content-Language: en-US To: fengchengwen , Konstantin Ananyev , Ajit Khaparde Cc: Konstantin Ananyev , Thomas Monjalon , Andrew Rybchenko , "dev@dpdk.org" References: <20230301030610.49468-1-fengchengwen@huawei.com> <20230301030610.49468-2-fengchengwen@huawei.com> <0f387ca1eee34a7f92745de7b59a71a1@huawei.com> <18c5b676-ae72-e646-89af-d6cd636d923f@yandex.ru> <5b40049f-6f22-fc3f-b13e-da1793c2fd1a@amd.com> <9092fdae9c1d4c53a00a8f23eb1129ec@huawei.com> <0effdaa9-5045-0635-775c-6e1eda0d5dba@amd.com> <9dc5d714-07a4-c32e-e557-efe8e7fe2d16@huawei.com> From: Ferruh Yigit Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode In-Reply-To: <9dc5d714-07a4-c32e-e557-efe8e7fe2d16@huawei.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: LO4P123CA0045.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:152::14) To CH2PR12MB4294.namprd12.prod.outlook.com (2603:10b6:610:a9::11) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PR12MB4294:EE_|MW4PR12MB6802:EE_ X-MS-Office365-Filtering-Correlation-Id: e62033e3-cebd-4a27-a1aa-08db1f04871d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9k/+TcmGQ1r12axIghH0kUOrzWNSGTM6/jWmDVwt0VbjNEXQaNBoIQLb2zs7DrgLPTy0hta0SgP04HqYFPLD5eLiJzkpgydxGofvkDI+2HNYeKsuEr2hnKclycFzglGaBsC3lqRyk1LEzCOEtvreQ1y3Whrdhlbg4X2akzbOnItb8GQCNrWyrrL0MAJXWMRcqLCRVPMg0e73g9fLs50QKqFaLmk2DaufykIS70R/7vTHOPXajVVYtf5DttYtVkIDtRCa/7zh9ZMLG0OFRiBJpmjYyk4Okt3to6+s0YhwzisU31d+UjT641GbwJAmWLCEP8bduqLRp5hFlacDCDFGmFO67FsfbTMFAPqahoHckRtfrU1wxuaqDwgQxCg8o5tkzGyQCGdrL4Z2G6nch5CYq2bo7ycrsHzaxXlwd2LoWIss5Xt7c7CQuCh5pwU3uVwF8pl3aZ2ESXBRQfJunFmWspSz5S2/+oflyT9TzaHcghPV6xezWD+fo7JWsDx19ZRg1yKlGRpcvlHhW1Q3KWp98/6u9HsmcP06Ioquhx9EmiglD4bpdebcJQAiEbWo3NF4AM2e4kJ4nlY730OPCToAnVqIqyPsNyw4FdYqfMi5RJvfItL9D8L8v8qrW/yI04IW1WKRcDwcjiC1g0MVpJDKm6WNgTPmy7osF0w1yjh1ofF/4FIgTUL9JXnex6ypcJPfSjXVuzW+9idcgi1XWXlRJL3LxM8gBOkFYyH2Irv2ATHQRU5G/lnDX2eW7k6SHQt87wUQjypP9vxzArlcic/bpw== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CH2PR12MB4294.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(4636009)(346002)(39860400002)(376002)(366004)(136003)(396003)(451199018)(6486002)(31686004)(36756003)(4326008)(41300700001)(8936002)(66946007)(66556008)(8676002)(66476007)(44832011)(30864003)(2906002)(38100700002)(31696002)(86362001)(5660300002)(6666004)(6512007)(6506007)(966005)(54906003)(316002)(110136005)(478600001)(83380400001)(2616005)(186003)(53546011)(26005)(45980500001)(43740500002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?bUJTMjFqRnNMVmJsamNXcDRrVWR3RTF4VTJxcDN2d1dTR2k3V1IxZXNEeGVy?= =?utf-8?B?SWNZazRoa3NzeTY1a3QyNUtKaW5vL3phVTU4a3NZaWhnWlJFOG5xdmF5LzNy?= =?utf-8?B?VkhRUXpTYTdRcGtHNk5MSnI3ZDNubXlwRU9NWG90ejFxTEdVZ0xtRGlLb1JN?= =?utf-8?B?TTE0TVFoTXIwWjQwbWlyREMwT0g5QVcraHNJNzZGRmNjSC9vZWtxTmNRV2xs?= =?utf-8?B?L1BPYUZkN1I1ZTBDV3lNWlk4Q1NoRlUxc01UYXBNWnlFZG1iZ0NzTTlDNkEv?= =?utf-8?B?VmZZdDBHQTF3Qnk2RTdJVDByWlFXMHoydG9XTWZ3Y09XQnhReC9BVm5WYVFT?= =?utf-8?B?dEtkbEdxbHdYbE1VYlNtSWxXME5hN3BvaVhSUEh0ZlZhbURyQ3l3ay9SM3ZI?= =?utf-8?B?QTJqY0FwRDMyRHhxcWtEN3k3ckJZU3IwWHJXTHJVWGQwYjkybkh2aHBYdXZu?= =?utf-8?B?ZWZQQjVqOUppTit2TEJMdTRWOFJteWNIM3JZUkhudmsvc3drMHZCWXN3eC8y?= =?utf-8?B?RlN0MkthR2JZVGJBMHRZdjFYMG41WjZkWVpuZWNZQkx1YWtxL05sQ2hMUzhP?= =?utf-8?B?YlhmSS9SRWxuYzdSVkNadmhIQXdzYm5BR0ZDUVpmS0lTT1VKcWh3STVMUm90?= =?utf-8?B?Nm9DMHY4ZHNsaU1JWmt0eWZ4dElUMFl5bkNzNFh5Ri9Ed2JzVXZOM0VFdllq?= =?utf-8?B?SlFUaU1PMHBaNHNOYzEzZDR6ZzA1cDA4cllSZVZxUXpaRTZjNUV2T3hVNmx4?= =?utf-8?B?U2JEZVNDVzd0TEpXZzNOaE9hdHBHYVNuRi9vZVVpM3VCOFhmY2k2S0NaYUUw?= =?utf-8?B?UEZmY3l4N2xIVzQ5NCtJMDIyV0wrSDBzSUJkdDBQU011bW9TNkpMVFZPeDBV?= =?utf-8?B?R1RMaWxOWnNaakZ2S0ZZSENkeFhQZmZaajl1T0pyNm45QjBPbzZKWXlTdUp3?= =?utf-8?B?OE9BMi80TjBXWG01cEwrcUFmZTQ0WUFLR3pjcWdybEQ2VlZPU0VkMmNGVm55?= =?utf-8?B?OTVFaFJ2eTFhYnNNdWVMRmZjVkJwWjZDYk1sNjd3a0FTQUFyM1RlaEF1VmdQ?= =?utf-8?B?RDBZNWR4ejBjTFcyZG5ZSm9tMml2bTMzN0tTcUw3UUttMEJVSWJLNFlpVjdX?= =?utf-8?B?aUxvQXd4aFVKK0VsSmkrSzI2Q040UzVySzJReHNRUDhOZ1R0MEEvQ0J1UmJS?= =?utf-8?B?QU52TlVlV2RDUWM4bVpPY29KYlZ1UmpXb1NRV1dobUFXMHVibUZmUXFVekhY?= =?utf-8?B?SEdmajVqWTlhVkJHdTk0dFFpZGxReFdjbGtuNEg5Z1BSSW5yaEhKYkFHYWdQ?= =?utf-8?B?VHAweGtqbkJkNC9HSFZ0M0srWlZHU0dPL3NvSDdPZC84dGZXeDZTMmMrOEth?= =?utf-8?B?bjMrS2s4aG9HRkNGdlBwVGpzcW1va1hRUVdmaGtDV05qYVUzVHg3RUdYWnpC?= =?utf-8?B?b2M3aFpKeFhqTXFoZ1FoUnpXeDFSclMyZHVXaEpPTmlwSkgrTENsRVNZYkpn?= =?utf-8?B?US9KKzdzYmlqL1NOT2FnV0cyN2t2eDdSMkM4SzRTeUVONzJOaWNCTlNad05a?= =?utf-8?B?ZlcreFBFc21vaDBtSDNlTnpEdGhGbWtabHU0RDN2VTdreHhYOTlsZjhtbndJ?= =?utf-8?B?QldabGd4RDc2MG10WkEvK2VmOTBVVk1seDdvZE9zVmxDQy9JdTlORlNnU0p1?= =?utf-8?B?Uk1nWE0yc2kzWGFGL3hqdU5CcHkvNVlZNGZRU3lEZXJiOFUrQm5FS29mc080?= =?utf-8?B?R1hiT0Mrdm11c3hjMDhVRktoZjgrU3hZbGplMGRXSFdKNFZhK3lkRFVaTmYv?= =?utf-8?B?Z045cGRjalFKWUNkRUpoY01rbW1SYkxFR3lsbjdmUEZCbnVFU2dQY24xdTRr?= =?utf-8?B?WnVKTE5KVmZQcmlKMk9UR01JTjN2MExGQURGWmxXTUZrTXpZVkdndUE5dXR4?= =?utf-8?B?M0gxSTRwcHNCQytveis2bUZoTUt0QlkzQUZmZ2JGRlkyNlgzRTNkeE03RFhj?= =?utf-8?B?MlAxM1BHZktoTVVDSGhwSEgrWE9ERTM5WjFSVnlFZTVOQ0FFaVpibTRKY3VC?= =?utf-8?B?NUUycFFldUVTY0E1UDY5K0N3UDZpQzhDTjU3YTBZc3ZxMDZXS09nZm5rcVND?= =?utf-8?Q?eNgWi21lnuNDd4ZGJPzQNW9RD?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: e62033e3-cebd-4a27-a1aa-08db1f04871d X-MS-Exchange-CrossTenant-AuthSource: CH2PR12MB4294.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Mar 2023 12:07:31.2753 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: yQqH3iVDAq1uCkjHHNw8D6rjK5tg5lYYpyST5TaeC6IYKlIKbGGNsWZocJBZZ7Eq X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB6802 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 3/7/2023 8:25 AM, fengchengwen wrote: > > > On 2023/3/6 19:13, Konstantin Ananyev wrote: >> >> >>>>>>>>>> In the proactive error handling mode, the PMD will set the data path >>>>>>>>>> pointers to dummy functions and then try recovery, in this period the >>>>>>>>>> application may still invoking data path API. This will introduce a >>>>>>>>>> race-condition with data path which may lead to crash [1]. >>>>>>>>>> >>>>>>>>>> Although the PMD added delay after setting data path pointers to cover >>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't >>>>>>>>>> solve the problem. >>>>>>>>>> >>>>>>>>>> To solve the race-condition problem fundamentally, the following >>>>>>>>>> requirements are added: >>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after >>>>>>>>>> report RTE_ETH_EVENT_ERR_RECOVERING event. >>>>>>>>>> 2. The application should stop data path API invocation when process >>>>>>>>>> the RTE_ETH_EVENT_ERR_RECOVERING event. >>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before >>>>>>>>>> report RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>> 4. The application should enable data path API invocation when process >>>>>>>>>> the RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>> >>>>>>>> >>>>>>>> How this is solving the race-condition, by pushing responsibility to >>>>>>>> stop data path to application? >>>>>>> >>>>>>> Exactly, it becomes application responsibility to make sure data-path is >>>>>>> stopped/suspended before recovery will continue. >>>>>>> >>>>>> >>>>>> From documentation of the feature: >>>>>> >>>>>> `` >>>>>> Because the PMD recovers automatically, >>>>>> the application can only sense that the data flow is disconnected for a >>>>>> while and the control API returns an error in this period. >>>>>> >>>>>> In order to sense the error happening/recovering, as well as to restore >>>>>> some additional configuration, three events are available: >>>>>> `` >>>>>> >>>>>> It looks like initial design is to use events mainly inform application >>>>>> about what happened and mainly for re-configuration. >>>>>> >>>>>> Although I am don't disagree to involve the application, I am not sure >>>>>> that is part of current design. >>>>> >>>>> I thought we all agreed that initial design contain some fallacies that >>>>> need to fixed, no? >>>>> Statement that with current rte_ethdev design error recovery can be done >>>>> without interaction with the app (to stop/suspend data/control path) >>>>> is the main one I think. >>>>> It needs some interaction with app layer, one way or another. >>>>> >>>>>>>> >>>>>>>> What if application is not interested in recovery modes at all and not >>>>>>>> registered any callback for the recovery? >>>>>>> >>>>>>> >>>>>>> Are you saying there is no way for application to disable >>>>>>> automatic recovery in PMD if it is not interested >>>>>>> (or can't full-fill per-requesties for it)? >>>>>>> If so, then yes it is a problem and we need to fix it. >>>>>>> I assumed that such mechanism to disable unwanted events already exists, >>>>>>> but I can't find anything. >>>>>>> Wonder what would be the easiest way here - can PMD make a decision >>>>>>> based on callback return value, or do we need a new API to >>>>>>> enable/disable callbacks, or ...? >>>>>>> >>>>>>> >>>>>> >>>>>> As far as I can see automatic recovery is not configurable by app. >>>>>> >>>>>> But that is not all, PMD sends events to application but PMD can't know >>>>>> if application is handling them or not, so with current design PMD can't >>>>>> rely on to app. >>>>> >>>>> Well, PMD invokes user provided callback. >>>>> One way to fix that problem - if there is no callback provided, >>>>> or callback returns an error code - PMD can assume that recovery >>>>> should not be done. >>>>> That is probably not the best design choice, but at least it will allow >>>>> to fix the problem without too many changes and introducing new API. >>>>> That could be sort of a 'quick fix'. >>>>> In a meanwhile we can think about new/better approach for that. >>>>> >>>> >>>> -rc2 for 23.03 is a few days away. >>>> >>>> What do you think to have 'quick fix' as modifying how driver updates >>>> burst ops to prevent the race condition, for this release? > > The 'quick fix', do you mean only update function pointer (without rxq setting) ? > Currently the PMDs which announced support "proactive error handling mode" already > do this. > Yes. I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()' updates all fields in 'rte_eth_fp_ops' but only function pointer seems changed in the driver, resulting only function pointers to be updated. The discussion about race condition started with patch [1], which mentions a crash because of a race condition. Later in discussions, recovery event given as a sample for where the race can occur, that is why we are here. But after above info, although there is race condition and a bigger update (that needs application involvement) is required for recovery mechanism, there is no crash and NO 'quick fix' is required for recovery. @Konstantin, @Chengwen, can you please confirm above understanding is correct? [1] https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/ >>>> >>>> And plan a design update for the next release? >>> +1 on the overall approach. >> >> Yep, agree. > > Hope for better solution. > And also, I notice only the openvswitch (from all open-source software which based-on DPDK) > registers RTE_ETH_EVENT_INTR_RESET callback . > > Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible > with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism. > >> >>> >>>> >>>> >>>>>> >>>>>>>> I think driver should not rely on application for this, unless >>>>>>>> application explicitly says (to driver) that it is handling recovery, >>>>>>>> right now there is no way for driver to know this. >>>>>>> >>>>>>> I think it is visa-versa: >>>>>>> application should not enable auto-recovery if it can't meet >>>>>>> per-requeststies for it (provide appropriate callback). >>>>>>> >>>>>> >>>>>> I agree on above, we are saying similar thing in different perspective. >>>>> >>>>> Ok, that's good we are on the same page. >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> Also, this patch introduce a driver internal function >>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD. >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/ >>>>>>>>>> >>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode") >>>>>>>>>> Cc: stable@dpdk.org >>>>>>>>>> >>>>>>>>>> Signed-off-by: Chengwen Feng >>>>>>>>>> --- >>>>>>>>>> doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++--------- >>>>>>>>>> lib/ethdev/ethdev_driver.c | 8 +++++++ >>>>>>>>>> lib/ethdev/ethdev_driver.h | 10 ++++++++ >>>>>>>>>> lib/ethdev/rte_ethdev.h | 32 >>>>>>>>>> +++++++++++++++---------- >>>>>>>>>> lib/ethdev/version.map | 1 + >>>>>>>>>> 5 files changed, 46 insertions(+), 25 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>> index c145a9066c..e380ff135a 100644 >>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst >>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery >>>>>>>>>> in PASSIVE mode, >>>>>>>>>> the PMD automatically recovers from error in PROACTIVE mode, >>>>>>>>>> and only a small amount of work is required for the application. >>>>>>>>>> >>>>>>>>>> -During error detection and automatic recovery, >>>>>>>>>> -the PMD sets the data path pointers to dummy functions >>>>>>>>>> -(which will prevent the crash), >>>>>>>>>> -and also make sure the control path operations fail with a return >>>>>>>>>> code ``-EBUSY``. >>>>>>>>>> - >>>>>>>>>> -Because the PMD recovers automatically, >>>>>>>>>> -the application can only sense that the data flow is disconnected >>>>>>>>>> for a while >>>>>>>>>> -and the control API returns an error in this period. >>>>>>>>>> +During error detection and automatic recovery, the PMD sets the >>>>>>>>>> data path >>>>>>>>>> +pointers to dummy functions and also make sure the control path >>>>>>>>>> operations >>>>>>>>>> +failed with a return code ``-EBUSY``. >>>>>>>>>> >>>>>>>>>> In order to sense the error happening/recovering, >>>>>>>>>> as well as to restore some additional configuration, >>>>>>>>>> @@ -653,9 +648,9 @@ three events are available: >>>>>>>>>> >>>>>>>>>> ``RTE_ETH_EVENT_ERR_RECOVERING`` >>>>>>>>>> Notify the application that an error is detected >>>>>>>>>> - and the recovery is being started. >>>>>>>>>> + and the recovery is about to start. >>>>>>>>>> Upon receiving the event, the application should not invoke >>>>>>>>>> - any control path function until receiving >>>>>>>>>> + any control and data path API until receiving >>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or >>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event. >>>>>>>>>> >>>>>>>>>> .. note:: >>>>>>>>>> @@ -666,8 +661,9 @@ three events are available: >>>>>>>>>> >>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` >>>>>>>>>> Notify the application that the recovery from error is successful, >>>>>>>>>> - the PMD already re-configures the port, >>>>>>>>>> - and the effect is the same as a restart operation. >>>>>>>>>> + the PMD already re-configures the port. >>>>>>>>>> + The application should restore some additional configuration, >>>>>>>>>> and then >>>>>>>>>> + enable data path API invocation. >>>>>>>>>> >>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` >>>>>>>>>> Notify the application that the recovery from error failed, >>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c >>>>>>>>>> index 0be1e8ca04..f994653fe9 100644 >>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c >>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c >>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev >>>>>>>>>> *dev, const char *ring_name, >>>>>>>>>> return rc; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> +void >>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) >>>>>>>>>> +{ >>>>>>>>>> + if (dev == NULL) >>>>>>>>>> + return; >>>>>>>>>> + eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> const struct rte_memzone * >>>>>>>>>> rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char >>>>>>>>>> *ring_name, >>>>>>>>>> uint16_t queue_id, size_t size, unsigned int align, >>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h >>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644 >>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h >>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h >>>>>>>>>> @@ -1621,6 +1621,16 @@ int >>>>>>>>>> rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const >>>>>>>>>> char *name, >>>>>>>>>> uint16_t queue_id); >>>>>>>>>> >>>>>>>>>> +/** >>>>>>>>>> + * @internal >>>>>>>>>> + * Setup eth fast-path API to ethdev values. >>>>>>>>>> + * >>>>>>>>>> + * @param dev >>>>>>>>>> + * Pointer to struct rte_eth_dev. >>>>>>>>>> + */ >>>>>>>>>> +__rte_internal >>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev); >>>>>>>>>> + >>>>>>>>>> /** >>>>>>>>>> * @internal >>>>>>>>>> * Atomically set the link status for the specific device. >>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h >>>>>>>>>> index 049641d57c..44ee7229c1 100644 >>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h >>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h >>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type { >>>>>>>>>> */ >>>>>>>>>> RTE_ETH_EVENT_RX_AVAIL_THRESH, >>>>>>>>>> /** Port recovering from a hardware or firmware error. >>>>>>>>>> - * If PMD supports proactive error recovery, >>>>>>>>>> - * it should trigger this event to notify application >>>>>>>>>> - * that it detected an error and the recovery is being started. >>>>>>>>>> - * Upon receiving the event, the application should not invoke >>>>>>>>>> any control path API >>>>>>>>>> - * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until >>>>>>>>>> receiving >>>>>>>>>> - * RTE_ETH_EVENT_RECOVERY_SUCCESS or >>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event. >>>>>>>>>> - * The PMD will set the data path pointers to dummy functions, >>>>>>>>>> - * and re-set the data path pointers to non-dummy functions >>>>>>>>>> - * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>>>>>> - * It means that the application cannot send or receive any >>>>>>>>>> packets >>>>>>>>>> - * during this period. >>>>>>>>>> + * >>>>>>>>>> + * If PMD supports proactive error recovery, it should trigger >>>>>>>>>> this >>>>>>>>>> + * event to notify application that it detected an error and the >>>>>>>>>> + * recovery is about to start. >>>>>>>>>> + * >>>>>>>>>> + * Upon receiving the event, the application should not invoke any >>>>>>>>>> + * control and data path API until receiving >>>>>>>>>> + * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED >>>>>>>>>> + * event. >>>>>>>>>> + * >>>>>>>>>> + * Once this event is reported, the PMD will set the data path >>>>>>>>>> pointers >>>>>>>>>> + * to dummy functions, and re-set the data path pointers to valid >>>>>>>>>> + * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS >>>>>>>>>> event. >>>>>>>>>> + * >>>>>>>>>> * @note Before the PMD reports the recovery result, >>>>>>>>>> * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event >>>>>>>>>> again, >>>>>>>>>> * because a larger error may occur during the recovery. >>>>>>>>>> */ >>>>>>>>>> RTE_ETH_EVENT_ERR_RECOVERING, >>>>>>>>>> /** Port recovers successfully from the error. >>>>>>>>>> - * The PMD already re-configured the port, >>>>>>>>>> - * and the effect is the same as a restart operation. >>>>>>>>>> + * >>>>>>>>>> + * The PMD already re-configured the port: >>>>>>>>>> * a) The following operation will be retained: (alphabetically) >>>>>>>>>> * - DCB configuration >>>>>>>>>> * - FEC configuration >>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type { >>>>>>>>>> * (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP) >>>>>>>>>> * c) Any other configuration will not be stored >>>>>>>>>> * and will need to be re-configured. >>>>>>>>>> + * >>>>>>>>>> + * The application should restore some additional configuration >>>>>>>>>> + * (see above case b/c), and then enable data path API invocation. >>>>>>>>>> */ >>>>>>>>>> RTE_ETH_EVENT_RECOVERY_SUCCESS, >>>>>>>>>> /** Port recovery failed. >>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map >>>>>>>>>> index 357d1a88c0..c273e0bdae 100644 >>>>>>>>>> --- a/lib/ethdev/version.map >>>>>>>>>> +++ b/lib/ethdev/version.map >>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL { >>>>>>>>>> rte_eth_devices; >>>>>>>>>> rte_eth_dma_zone_free; >>>>>>>>>> rte_eth_dma_zone_reserve; >>>>>>>>>> + rte_eth_fp_ops_setup; >>>>>>>>>> rte_eth_hairpin_queue_peer_bind; >>>>>>>>>> rte_eth_hairpin_queue_peer_unbind; >>>>>>>>>> rte_eth_hairpin_queue_peer_update; >>>>>>>>>> -- >>>>>>>>> Acked-by: Konstantin Ananyev >>>>>>>>> >>>>>>>>>> 2.17.1 >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>