From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 586F141DE6; Mon, 6 Mar 2023 12:57:45 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DC11340E50; Mon, 6 Mar 2023 12:57:44 +0100 (CET) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2070.outbound.protection.outlook.com [40.107.237.70]) by mails.dpdk.org (Postfix) with ESMTP id 0FDC040A8A for ; Mon, 6 Mar 2023 12:57:43 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=REx5lSz9INID06C3fFrtYc4s4UM8WxWuwWEKywjlZXTDzBrIvDUSb3kJ0Ka0yZzNyijYowkgZB0dxV4t+cJAAHxU2BIMy8rkphvw+w+tGB8clugfN8g7Fzaf9ngP/d9l9DWz45ybjEho2fbp313h6OafKlkuLuKMGb2+vqMxoQK9oZcqj1nRiyYRr33OEZv7AR0fS19+sshKw5dNN6IMy/lqn+TnokGw5KO/sdwnRisTllRYwz29bsXsrJZnc8zAA/i4M/SipOV5sQvsaWKdT0BcmsMklWioBIF6Y/z/muPAwdJUtF2Duu12fWec881ZlXEWEAzbur9YSTX4eUVjhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oqBvnLaNV6VRwae3C0VBnvA9HsO4Qhn8bZNaafWyN9k=; b=hXpT76LLzzjI5UXvV9pYiTObiHn9qJCKVeJ5W2ae9tcXgKV10zBfvUAgOH3ULJD6tqqUJi/bXrhkhaJVc9jj0jUV9MhjLrA1J5bKH2dZ8nQd/5ZTDo6i+TUvn6Hut4fgsRmebB2q02ak/yx2R6okDORQvinCluetL3VeFFXHs9EedrlUrZl2vb+2y5Kc3fCPRJiru+LhMotNhrl6Nmp2BQotmhiAOPYkjeQ5DX3FrLod7mB2zbgYwZwXnDrsHUzLmq7nHQKHqHwc8i+yl2my7SEh4I6Im0c2vbiF/h8Fv1/e+UiyvEuSKq+N+vdk2HUc7Gz1/3+y8nBNK3nrJnlxzg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oqBvnLaNV6VRwae3C0VBnvA9HsO4Qhn8bZNaafWyN9k=; b=WOz417Y6bZPwDI1IEk7G08kaSep4oDSgT/nVLxYqKWyl8/vVEr/39l9HOLRtQRmXZpnE8uvlmcTY7Bnb2nQNCDYPiHvJiDi6q/Z9gioKe4qs3oGgJjQY8vJeoUxdpwbUXPbQS4OcgUB4nd1noMLjMnXluWRZF0Zri6e47AsA8P0= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; Received: from CH2PR12MB4294.namprd12.prod.outlook.com (2603:10b6:610:a9::11) by SA1PR12MB7319.namprd12.prod.outlook.com (2603:10b6:806:2b5::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6156.23; Mon, 6 Mar 2023 11:57:39 +0000 Received: from CH2PR12MB4294.namprd12.prod.outlook.com ([fe80::dd5a:8a5c:f493:9640]) by CH2PR12MB4294.namprd12.prod.outlook.com ([fe80::dd5a:8a5c:f493:9640%5]) with mapi id 15.20.6156.028; Mon, 6 Mar 2023 11:57:38 +0000 Message-ID: <273e4b96-260f-0096-9570-3268cf25fc78@amd.com> Date: Mon, 6 Mar 2023 11:57:32 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Content-Language: en-US To: Konstantin Ananyev , Ruifeng Wang , Konstantin Ananyev , "dev@dpdk.org" , Fengchengwen , Honnappa Nagarahalli , Stephen Hemminger , "Ajit Khaparde (ajit.khaparde@broadcom.com)" Cc: nd References: <20230220060839.1267349-1-ashok.k.kaladi@intel.com> <20230220060839.1267349-2-ashok.k.kaladi@intel.com> <4786db4b-63dc-5329-522d-77eb58d4cff4@huawei.com> <20230221090053.14d653bf@hermes.local> <3cd97a71-b32f-b33b-dce1-46fabad182f6@huawei.com> <54fbf4e55cd44477b1e956f98a7a3c50@huawei.com> <3dc3b3d3-c80f-a361-8780-b1b3e48d843e@yandex.ru> <52296fe2-d9f6-1a24-e577-e5271a69a053@amd.com> <5cbf53cb272d42b994eb12b337466986@huawei.com> From: Ferruh Yigit Subject: Re: [PATCH 2/2] ethdev: fix race condition in fast-path ops setup In-Reply-To: <5cbf53cb272d42b994eb12b337466986@huawei.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: LO2P265CA0061.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:60::25) To CH2PR12MB4294.namprd12.prod.outlook.com (2603:10b6:610:a9::11) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PR12MB4294:EE_|SA1PR12MB7319:EE_ X-MS-Office365-Filtering-Correlation-Id: d82d3245-ec96-487b-e924-08db1e39fb89 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5nO3zNhupOSGFb9l03CDKLXCH/rEMztAQj4rrXC7t5AJVCDy2ir0GXj2bQJpTr4LWsuElxerNqSMnAaPgAhkN5Qjk9sRVofYb6jx2t5GZTNmj5fEUMb1PcHK/ySpZ1ZjTmH+4MLqpSOrNejbgu0KQVziDCny9oG2/fbx4MwNFVzHryzWc316UwfKKRUwImRDAJqLcMPOBfz9FpHwRaSAwZ1xZB1j+7AX3KJUJVtdRT2CSWPpif12wKgUwdfjWMreQ1Si96bf8H3fRlyPxp/ByUFNu/s7E6/eEbyMzF2YxcaoWK6JaO7j1wkFcQj7fp95RZxFVPsyrWRFBEzNNUv3J2Vq3HMRBNoTWDVEuyAYpPjZTJo8jgmYsaZxzku/X4OU8mCNN0emHoLqK0T4wNxIukvKkKnhccynIZoVMP4wGLW89kMCgzJve/Ue2C7/jm/y7ff1/3qiWlm6WeRPJct6ZIa5eN797WPiinOvle/yqkKf+Ezz7ImdVuv7vlurN2abv3MzdZggHyHhapRgrKQorl5eeHxxh34PddYRuDDvZensSE1ydOMflok1awfkj4QZN7macz2cLPovpRQGMPkqey3Fdjz6bj9IxuKCB8q3E0Zuc79lviVUgAOsJWxxOwmLXJwuhqZT91Oi9gRaLOge+eqHWkcUQjOlntOn3hofsv40u8Iz90JLM39l7Fv9ttJbayudByn3mF+IjOLu1G/EMyepMGtK6ZiJ4c7LItp96Ay/Nsax2W8PwuMJRX4M+8mgN60tFMj0Fx1Oprc3+WQIww== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CH2PR12MB4294.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(4636009)(376002)(396003)(136003)(366004)(39860400002)(346002)(451199018)(31686004)(6666004)(83380400001)(36756003)(478600001)(110136005)(316002)(38100700002)(2616005)(6506007)(66556008)(6486002)(6512007)(186003)(26005)(41300700001)(53546011)(8936002)(5660300002)(30864003)(44832011)(66476007)(31696002)(66946007)(8676002)(4326008)(2906002)(86362001)(23180200003)(45980500001)(43740500002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?bDR1eTRzSW9OQkFBbVhkR2pkZVNSa25ObnU5ZllUWHN6QTg2OFBtVjZtYnhl?= =?utf-8?B?V0xZNnpuMmFaVlZsSmdobnRiVzlZekpObzduYnZNY2lmMkE5MmFScEk1MTEy?= =?utf-8?B?WE5UNEFvWXlUbUxSVi9sUGF4UlF5OGduRENVVkdMWnNQcFV5ME1oSG5BUzFE?= =?utf-8?B?MTl1emozT2NZdWo3QWZXZHdvQjA5SHZ4UGZaRU1pTlZiM1RtSEwvU1RMYkx3?= =?utf-8?B?SnFsMUMxRWc1dDA5TDFSSlZhTE45OE1wYkIvVVV4SVFtZnJvZStrdm9scmtn?= =?utf-8?B?K2l3TlZwL0pVNCtSQ0RQSjdxZERxcGhxTWpDZUpNV3FGUzVBTVZIdWorQmEv?= =?utf-8?B?ZjhlYW9EeVh3SUdld1ZpdHl5UE9zaXNrSVdwR1dHSXpraDNEK1pKODQyQTZh?= =?utf-8?B?Q1BpOVpsMnowaVVEdllvMVIxMFhaZ2NidWw5dEd1c3dYMEdzWUdKNWgvKy9t?= =?utf-8?B?V29MQ3V3TU81TWU0VHRqZ2s5L0dheWJmNmsvZTBqdTczbU1JeWNhRm9EdHZi?= =?utf-8?B?VlYxSmpJOXRwNkpVd0JaeEhkd0s1NG1mQ0RYTGpPNkFvTnZ5M3ZoU1hJMllq?= =?utf-8?B?d2s3NzVqVXRxbm1BcXg1MXovUFNrR1lFRWZINmYrY0RUWnJ4YWprQXBxRDFv?= =?utf-8?B?QWo1SXh0MGw3NGQzbHJyZEVoWXU4cWk1alEreWFsbnNWOVZmcW5OWjUzb1hF?= =?utf-8?B?RzdqM2pWU2ZkeXAvU211aXhjWUFPZENNUU05U1BKRERxNkRXOGEvWEp2cU53?= =?utf-8?B?MmF0S1VYV1FCYnYyWThrajJZaEFuWVJ1bldnYWE3bG9jdUxTNXNFMUpzdElE?= =?utf-8?B?d1plL0VhenFFa0lIYVVDVHhma1J4MnFidmRuNXV1VjljR0lYMEJmMlFDbDZm?= =?utf-8?B?MXYyZHhya1Fic2x2MlNSc1RGcitkaHovcEozaDI3Q1lTZG9MUEorallINE9q?= =?utf-8?B?bFprV28xaUhSQzVkQlN2RGs0WnhHMHo2ZmRSc3ZDOUkwY1FWaGVRT1pvSk5m?= =?utf-8?B?SDdMaUo3OS9zc1NlclQzRElReldLZS9tK0h4ZDN1eFpXdlVXbHNDbjVsZito?= =?utf-8?B?a2xUR1dIdXRzUDdiaEhJbE5zSTlZZ2Zma0dWdE5DeFpQWlhMSU0zOEFMY0ts?= =?utf-8?B?bTZlYWVnN1JWbUQwQWxYc1lQN3RheTJMbGsyVzVTQWpYWVZ0Nm1qT1hkQzlF?= =?utf-8?B?V3NLSGF5NHBYNnFYUzEwUGE2SlZ2dTROV1k4ZGFzcjBDRzZrMVZpdEFRUUh6?= =?utf-8?B?Q3dGSENiRWFIUnFDU2hEbS8zVUNnRFVLRFlhR1VMcXE1M0YzMUc1Z1ZDTHFt?= =?utf-8?B?SE9mL25vbURzS1lST3kxVGN3RXZQQXZmMzU2TlNnaEJWQ01SN3I5NXY3Rnlx?= =?utf-8?B?THBabjlPRlB1WGg3bFJIbVduZGgxWEtZL29zcHlmRHVMNzFpakE0QlRVdG9m?= =?utf-8?B?QzVQd1dqWnJtcWYxVmhkdHQ4NVpadS95WjdIVGhhcWNvS2VDR3dRUG1kdnVN?= =?utf-8?B?d2tSQ0tMbFByazRGTGdqVFA1RDlTTEdPUm1WSWdOQjZmMVhmMzVpcXZkS2NL?= =?utf-8?B?MHI4V0dDUHkxUTBuTHJwN0g0d3RQQTQwSkJzcGp1N3VEcFhXR2lIK1drSEYz?= =?utf-8?B?bjRlNFJrRkVpakcwdVZDZzhpSFVkSGN2OVpJSkc3b2FEU1F5cGNFOVZ0L1NL?= =?utf-8?B?L1YwNi9NY2h4TG05L3UrRHd6S1labXpFZFpIbm92dTIybUZDNjZvYkdPU2k4?= =?utf-8?B?V1JJYkdWQUphRmhaa3IvdEdFYUQ0Y0l3M1RSNVBOd0tyS3pNNkJnMVFYWHNM?= =?utf-8?B?V25EcnZmY2NJdHk1NGkvMHcrT1ljdjNXdlloTWgzYitocDFwd0RFTXZwaE1r?= =?utf-8?B?VXRqa3VRdy9QOGlDM3BaWW9uR0hKOTg4b2duWW5nZFRRSGlPMTlhSFdmMXcw?= =?utf-8?B?WUJUWXpRdHFDalVTWDduL3VOb2g0d1ZPQzZ6NzZ3MDRyVXlld1pxYUtRZEJn?= =?utf-8?B?OWNSWVRiWWczSzNEKzA3SEZqT1JVNUJFSmNSaWZBeEtzb2h3Und6WDBMazhZ?= =?utf-8?B?ODZ5a1JuZzFVTUZoaXI1YkpZZkkveHR0QXRQZzh3VVQxWTNIY3UxWGdlb095?= =?utf-8?Q?sirYFqNv0rtPqWkSIMyh4Zdzu?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: d82d3245-ec96-487b-e924-08db1e39fb89 X-MS-Exchange-CrossTenant-AuthSource: CH2PR12MB4294.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2023 11:57:38.7176 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: +59Z41JSlX6OqRKH/j8SmVLWWEubPcdcccZOn4GrBfYEN6/2Oy0LhswEPaKnd5Ez X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB7319 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 3/6/2023 10:32 AM, Konstantin Ananyev wrote: > > >>> -----Original Message----- >>> From: Ferruh Yigit >>> Sent: Saturday, March 4, 2023 1:19 AM >>> To: Konstantin Ananyev ; dev@dpdk.org; fengchengwen >>> ; Konstantin Ananyev ; Honnappa >>> Nagarahalli ; Stephen Hemminger ; >>> Ruifeng Wang ; Ajit Khaparde (ajit.khaparde@broadcom.com) >>> >>> Subject: Re: [PATCH 2/2] ethdev: fix race condition in fast-path ops setup >>> >>> On 2/26/2023 5:22 PM, Konstantin Ananyev wrote: >>>> >>>>>>>>>>>>>> If ethdev enqueue or dequeue function is called during >>>>>>>>>>>>>> eth_dev_fp_ops_setup(), it may get pre-empted after setting >>>>>>>>>>>>>> the function pointers, but before setting the pointer to port data. >>>>>>>>>>>>>> In this case the newly registered enqueue/dequeue function >>>>>>>>>>>>>> will use dummy port data and end up in seg fault. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This patch moves the updation of each data pointers before >>>>>>>>>>>>>> updating corresponding function pointers. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fixes: c87d435a4d79 ("ethdev: copy fast-path API into >>>>>>>>>>>>>> separate >>>>>>>>>>>>>> structure") >>>>>>>>>>>>>> Cc: stable@dpdk.org >>>>>>>>>>> >>>>>>>>>>> Why is something calling enqueue/dequeue when device is not >>>>>>>>>>> fully >>>>>>>>> started. >>>>>>>>>>> A correctly written application would not call rx/tx burst >>>>>>>>>>> until after ethdev start had finished. >>>>>>>>>> >>>>>>>>>> Please refer the eb0d471a894 (ethdev: add proactive error >>>>>>>>>> handling mode), when driver recover itself, the application may >>>>>>>>>> still invoke >>>>>>>>> enqueue/dequeue API. >>>>>>>>> >>>>>>>>> Right now DPDK ethdev layer *does not* provide synchronization >>>>>>>>> mechanisms between data-path and control-path functions. >>>>>>>>> That was a deliberate deisgn choice. If we want to change that >>>>>>>>> rule, then I suppose we need a community consensus for it. >>>>>>>>> I think that if the driver wants to provide some sort of error >>>>>>>>> recovery procedure, then it has to provide some synchronization >>>>>>>>> mechanism inside it between data-path and control-path functions. >>>>>>>>> Actually looking at eb0d471a894 (ethdev: add proactive error >>>>>>>>> handling mode), and following patches I wonder how it creeped in? >>>>>>>>> It seems we just introduced a loophole for race condition with >>>>>>>>> this approach... >>>>>>> >>>>>>> Could you try to describe the specific scenario of loophole ? >>>>>> >>>>>> Ok, as I understand the existing mechanism: >>>>>> >>>>>> When PMD wants to start a recovery it has to: >>>>>>   - invoke >>>>>> rte_eth_dev_callback_process(RTE_ETH_EVENT_ERR_RECOVERING); >>>>>>     That supposed to call user provided callback. After callback is >>>>>> finished PMD assumes >>>>>>     that user is aware that recovery is about to start and should >>>>>> make some precautions. >>>>>> - when recovery is finished it invokes another callback: >>>>>>    RTE_ETH_EVENT_RECOVERY_(SUCCESS/FAILED). After that user either >>>>>> can continue to >>>>>>    use port or have to treat is as faulty. >>>>>> >>>>>> The idea is ok in principle, but there is a problem. >>>>>> >>>>>> lib/ethdev/rte_ethdev.h: >>>>>>             /** Port recovering from a hardware or firmware error. >>>>>>           * If PMD supports proactive error recovery, >>>>>>           * it should trigger this event to notify application >>>>>>           * that it detected an error and the recovery is being started. >>>>>> >>>>>> <<< !!!!! >>>>>>           * Upon receiving the event, the application should not >>>>>> invoke any control path API >>>>>>           * (such as rte_eth_dev_configure/rte_eth_dev_stop...) >>>>>> until receiving >>>>>>           * RTE_ETH_EVENT_RECOVERY_SUCCESS or >>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event. >>>>>>           * The PMD will set the data path pointers to dummy >>>>>> functions, >>>>>>           * and re-set the data path pointers to non-dummy functions >>>>>>           * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event. >>>>>> <<< !!!!! >>>>>> >>>>>> That part is just wrong I believe. >>>>>> It should be: >>>>>> Upon receiving the event, the application should not invoke any >>>>>> *both control and data-path* API until receiving >>>>>> RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED >>>>>> event. >>>>>> Resetting data path pointers to dummy functions by PMD *before* >>>>>> invoking rte_eth_dev_callback_process(RTE_ETH_EVENT_ERR_RECOVERING); >>>>>> introduces a race-condition with data-path threads, as such thread >>>>>> could already be inside RX/TX function or can already read RX/TX >>>>>> function/data pointers and be about to use them. >>>>> >>>>> Current practices: the PMDs already add some delay after set Rx/Tx >>>>> callback to dummy, and plus the DPDK worker thread is busypolling, >>>>> the probability of occurence in reality is zero. But in theoretically >>>>> exist the above race-condition. >>>> >>>> >>>> Adding delay might make a problem a bit less reproducible, but it >>>> doesn't fix it. >>>> The bug is still there. >>>> >>>> >>>>> >>>>>> And right now rte_ethdev layer doesn't provide any mechanism to >>>>>> check it or wait when they'll finish, etc. >>>>> >>>>> Yes >>>>> >>>>>> >>>>>> So, probably the simplest way to fix it with existing DPDK design: >>>>>> - user level callback  RTE_ETH_EVENT_ERR_RECOVERING should return >>>>>> only after it ensures that *all* >>>>>>    application threads (and processes) stopped using either control >>>>>> or data-path functions for that port >>>>> >>>>> Agree >>>>> >>>>>>    (yes it means that application that wants to use this feature has >>>>>> to provide its own synchronization mechanism >>>>>>    around data-path functions (RX/TX) that it is going to use). >>>>>> - after that PMD is safe to reset rte_eth_fp_ops[] values to dummy ones. >>>>>> >>>>>> And message to all PMD developers: >>>>>> *please stop updating rte_eth_fp_ops[] on your own*. >>>>>> That's a bad practice and it is not supposed to do things that way. >>>>>> There is a special API provided for these purposes: >>>>>> eth_dev_fp_ops_reset(), eth_dev_fp_ops_setup(), so use it. >>>>> >>>>> This two function is in private.h, so it should be expose to public >>>>> header file. >>>> >>>> You mean we need to move these functions declarations into ethdev_driver.h? >>>> If so, then yes, I think we probably do. >>>> >>>> >>> >>> >>> What about making slightly different version available to drivers, which only updates >>> function pointers, but not 'fpo->rxq' / 'fpo->txq'. >>> >>> This way driver can switch to between dummy and real burst function without worrying Rx/Tx >>> queue validity. >>> >>> @Chengwen, @Ruifeng, can this solve the issue for relaxed memory ordering systems? >> >> Yes, updating only function pointers removes the synchronization requirement between function >> pointer and qdata. > > Lads, that wouldn't work anyway. > The race between recovery procedure and data-path persists: > Recovery still has no idea is at given moment any thread doing RX/TX or not, and there is no > way for it to know when such thread will finish. Yes race condition persists, but as long as data (rxq/txq) stays valid, does it cause a trouble? At lest this fixes the potential crash I think. > We do need some synchronization mechanism between control(recovery) and data-path threads. > I believe it is unavoidable. > >>> >>> >>> >>>>>> >>>>>> BTW,  I don't see any implementation for >>>>>> RTE_ETH_EVENT_ERR_RECOVERING within either testpmd or any other >>>>>> example apps. >>>>>> Am I missing something? >>>>> >>>>> Currently it just promote the event. >>>> >>>> >>>> Ok, can I suggest then to add a proper usage for into in testpmd? >>>> It looks really strange that we add new feature into ethdev (and 2 >>>> PMDs), but didn't provide any way for users to test it. >>>> >>>>> >>>>>> If not, then probably it could be a good starting point - let's >>>>>> incorporate it inside testpmd (new forwarding engine probably) so >>>>>> everyone can test/try it. >>>>>> >>>>>>           * It means that the application cannot send or receive any >>>>>> packets >>>>>>           * during this period. >>>>>>           * @note Before the PMD reports the recovery result, >>>>>>           * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING >>>>>> event again, >>>>>>           * because a larger error may occur during the recovery. >>>>>>           */ >>>>>>          RTE_ETH_EVENT_ERR_RECOVERING, >>>>>> >>>>>>>>> It probably needs to be either deprecated or reworked. >>>>>>>> Looking at the commit, it does not say anything about the data >>>>>>>> plane functions which probably means, the error recovery is >>>>>>> happening within the data plane thread. What happens to other data >>>>>>> plane threads that are polling the same port on which the error >>>>>>> recovery is happening? >>>>>>> >>>>>>> The commit log says: "the PMD sets the data path pointers to dummy >>>>>>> functions". >>>>>>> >>>>>>> So the data plane threads will receive non-packet and send zero >>>>>>> with port which in error recovery. >>>>>>> >>>>>>>> >>>>>>>> Also, the commit log says that while the error recovery is under >>>>>>>> progress, the application should not call any control plane APIs. >>>>>>>> Does >>>>>>> that mean, the application has to check for error condition every >>>>>>> time it calls a control plane API? >>>>>>> >>>>>>> If application has not register event >>>>>>> (RTE_ETH_EVENT_ERR_RECOVERING) callback, it could calls control >>>>>>> plane API, but it will return failed. >>>>>>> If application has register above callback, it can wait for >>>>>>> recovery result, or direct call without wait but this will return failed. >>>>>>> >>>>>>>> >>>>>>>> The commit message also says that "PMD makes sure the control path >>>>>>>> operations failed with retcode -EBUSY". It does not say how it >>>>>>> does this. But, any communication from the PMD thread to control >>>>>>> plane thread may introduce race conditions if not done correctly. >>>>>>> >>>>>>> First there are no PMD thread, do you mean eal-intr-thread ? >>>>>>> >>>>>>> As for this question, you can see PMDs which already implement it, >>>>>>> they both provides mutual exclusion protection. >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Would something like this work better? >>>>>>>>>>> >>>>>>>>>>> Note: there is another bug in current code. The check for link >>>>>>>>>>> state interrupt and link_ops could return -ENOTSUP and leave >>>>>>>>>>> device in >>>>>>>>> indeterminate state. >>>>>>>>>>> The check should be done before calling PMD. >>>>>>>>>>> >>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c >>>>>>>>>>> index >>>>>>>>>>> 0266cc82acb6..d6c163ed85e7 100644 >>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.c >>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.c >>>>>>>>>>> @@ -1582,6 +1582,14 @@ rte_eth_dev_start(uint16_t port_id) >>>>>>>>>>>           return 0; >>>>>>>>>>>       } >>>>>>>>>>> >>>>>>>>>>> +    if (dev->data->dev_conf.intr_conf.lsc == 0 && >>>>>>>>>>> +        dev->dev_ops->link_update == NULL) { >>>>>>>>>>> +        RTE_ETHDEV_LOG(INFO, >>>>>>>>>>> +                   "Device with port_id=%"PRIu16" link update >>>>>>>>>>> +not >>>>>>>>> supported\n", >>>>>>>>>>> +                   port_id); >>>>>>>>>>> +            return -ENOTSUP; >>>>>>>>>>> +    } >>>>>>>>>>> + >>>>>>>>>>>       ret = rte_eth_dev_info_get(port_id, &dev_info); >>>>>>>>>>>       if (ret != 0) >>>>>>>>>>>           return ret; >>>>>>>>>>> @@ -1591,9 +1599,7 @@ rte_eth_dev_start(uint16_t port_id) >>>>>>>>>>>           eth_dev_mac_restore(dev, &dev_info); >>>>>>>>>>> >>>>>>>>>>>       diag = (*dev->dev_ops->dev_start)(dev); >>>>>>>>>>> -    if (diag == 0) >>>>>>>>>>> -        dev->data->dev_started = 1; >>>>>>>>>>> -    else >>>>>>>>>>> +    if (diag != 0) >>>>>>>>>>>           return eth_err(port_id, diag); >>>>>>>>>>> >>>>>>>>>>>       ret = eth_dev_config_restore(dev, &dev_info, port_id); @@ >>>>>>>>>>> -1611,16 >>>>>>>>>>> +1617,18 @@ rte_eth_dev_start(uint16_t port_id) >>>>>>>>>>>           return ret; >>>>>>>>>>>       } >>>>>>>>>>> >>>>>>>>>>> -    if (dev->data->dev_conf.intr_conf.lsc == 0) { >>>>>>>>>>> -        if (*dev->dev_ops->link_update == NULL) >>>>>>>>>>> -            return -ENOTSUP; >>>>>>>>>>> -        (*dev->dev_ops->link_update)(dev, 0); >>>>>>>>>>> -    } >>>>>>>>>>> - >>>>>>>>>>>       /* expose selection of PMD fast-path functions */ >>>>>>>>>>>       eth_dev_fp_ops_setup(rte_eth_fp_ops + port_id, dev); >>>>>>>>>>> >>>>>>>>>>> +    /* ensure state is set before marking device ready */ >>>>>>>>>>> +    rte_smp_wmb(); >>>>>>>>>>> + >>>>>>>>>>>       rte_ethdev_trace_start(port_id); >>>>>>>>>>> + >>>>>>>>>>> +    /* Update current link state */ >>>>>>>>>>> +    if (dev->data->dev_conf.intr_conf.lsc == 0) >>>>>>>>>>> +        (*dev->dev_ops->link_update)(dev, 0); >>>>>>>>>>> + >>>>>>>>>>>       return 0; >>>>>>>>>>>   } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>> >>>> >