From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2CB0BA0C41; Thu, 30 Sep 2021 13:01:38 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AA54140DDA; Thu, 30 Sep 2021 13:01:37 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 5246A4067E for ; Thu, 30 Sep 2021 13:01:36 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10122"; a="225200015" X-IronPort-AV: E=Sophos;i="5.85,336,1624345200"; d="scan'208";a="225200015" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Sep 2021 04:01:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,336,1624345200"; d="scan'208";a="457399539" Received: from fmsmsx603.amr.corp.intel.com ([10.18.126.83]) by orsmga002.jf.intel.com with ESMTP; 30 Sep 2021 04:01:32 -0700 Received: from fmsmsx611.amr.corp.intel.com (10.18.126.91) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Thu, 30 Sep 2021 04:01:32 -0700 Received: from fmsmsx609.amr.corp.intel.com (10.18.126.89) by fmsmsx611.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Thu, 30 Sep 2021 04:01:31 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx609.amr.corp.intel.com (10.18.126.89) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12 via Frontend Transport; Thu, 30 Sep 2021 04:01:31 -0700 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (104.47.70.105) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2242.12; Thu, 30 Sep 2021 04:01:31 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QMlOioq7ufyqJEqpYznklNaN0sYT/xpzwgAn0HhQotNNdXCcHnq6mx8t6TLVTkcBd16jHgwI1uLvBDdMwLNazd+qGxJXHltjv3NRt7QIZ8S/ciuQakCg7dZpkuKlIgbnDrUr3Y369qszCwTcbqh0fVTIlxj3kXUwz05tJCEhy1hhFXGKwB8PVw0b93bAjwxkZTk2mVfWJofHxYe2rVVH3sEdocANlBwH9op4PBjxTTgFPmPgI9qe6K2XU9i/FqrVwLnxkO71/vdHV6SkZfN+5L4KW0oqLjYf2/aM9dgRYuFsKNuAeJpyJS4eYE+hJVb10/55vqnaaoMAKH5s0SWEmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=5p5ZAxd7DupMfbVMIGrzmtgOTBRfaGt1j/ZEbqWittk=; b=jApJzN/ViksQ5jYODOUVND9ZSbDIHg4l028RpAPBBSYzZ36Px9vpG8ov+L5EKnAVulB1V0GuaSe5FXKRu/C2HQlNK5EYb1gFMNXIGebLzwoMRgARGhFrue81fHUnMSLdV+h+/J9riP803ZejrN7JgyAzp4L4PNjGfAg++/QIt8Gd11WXWV0dMwkbrVr37Dy99j8T2lQKsdgRxG3JM8q+lXlzKVhYDrfuPJaGWT/vhBRQcUnkyzjC5vfhxBk0YI+fXbiOXni6iTsu1fAV8uHA2y8IY1ys3iSPyX3DSNb2Q6KNCjSSGGScaB7kd5K8ToV7XhN0/P6HI5pSfj/miRblxQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5p5ZAxd7DupMfbVMIGrzmtgOTBRfaGt1j/ZEbqWittk=; b=AClplPNtpeIaSuvaNHL9fuw3ih6GjjxuIJ5Rl1VwVnB28/V2shi+Vw6MF0m9KhBB9xanWVPUW+GbWrbUhEZeMFBFOZzX9IA/1b3cIyra0fqGWPtoflkvHfyjQPfa7lqm+HIrjRh3Hq8MFiDPeipadoEwQ8iuYZRzxR4Q2g0xvjY= Authentication-Results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=intel.com; Received: from PH0PR11MB5000.namprd11.prod.outlook.com (2603:10b6:510:41::19) by PH0PR11MB4821.namprd11.prod.outlook.com (2603:10b6:510:34::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.14; Thu, 30 Sep 2021 11:01:29 +0000 Received: from PH0PR11MB5000.namprd11.prod.outlook.com ([fe80::747b:3a08:d1ec:31fc]) by PH0PR11MB5000.namprd11.prod.outlook.com ([fe80::747b:3a08:d1ec:31fc%5]) with mapi id 15.20.4544.021; Thu, 30 Sep 2021 11:01:29 +0000 To: Huisong Li , "Singh, Aman Deep" , Thomas Monjalon CC: Andrew Rybchenko , "dev@dpdk.org" , Anatoly Burakov , David Marchand References: <1627908397-51565-1-git-send-email-lihuisong@huawei.com> <1627957839-38279-1-git-send-email-lihuisong@huawei.com> <9670389d-ebbc-9d9c-0cac-c7e8826ecb6f@huawei.com> <21383486.34YfpWhNxb@thomas> <858dc20f-5711-f770-2c08-0a432b6ea733@huawei.com> <4e2b5720-6da0-155b-845d-76c42b800abc@intel.com> From: Ferruh Yigit X-User: ferruhy Message-ID: Date: Thu, 30 Sep 2021 12:01:23 +0100 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-ClientProxiedBy: DB6PR0601CA0006.eurprd06.prod.outlook.com (2603:10a6:4:7b::16) To PH0PR11MB5000.namprd11.prod.outlook.com (2603:10b6:510:41::19) MIME-Version: 1.0 Received: from [192.168.0.206] (37.228.236.146) by DB6PR0601CA0006.eurprd06.prod.outlook.com (2603:10a6:4:7b::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.14 via Frontend Transport; Thu, 30 Sep 2021 11:01:28 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: bb19c1f7-3cab-4525-c662-08d98401a7e6 X-MS-TrafficTypeDiagnostic: PH0PR11MB4821: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ze5yG/22+eoMsWKNj77Fqu1zW1geiQv3rnRBgHLPAaBnHHQlan6gTwPPBCP0k0edRh1yIlnaPfold4MfRYe9LrXhgJBWVMC4cp+YYi5V34wJsC0Q36jZF3GhrxhTgG6+D45v9D9wVh+NyHCxWbp9fe+gsdgTHBdERmtB8BrkMPlFBezZklpd0i9gj0IZJRz6wpm/UJ+JPsuiPp/WtcJrnXqum5GjD37YZ5PSwU027Z28sZpUbHgpdtCIFRZPA0j7TavcjYzy7nP5wbfEOf4QOk44pawuuRj4iMq02jImBRCZbN+QmpyOf+4Dln1oyPUM50UKGHkOMRpby/zEYySbMLeLc63nFCdlgfG6UaE4dPyJmVvaIInvAV8eTgUvf0JwDJvj5TFLtRhGWZOqndFYecjSz9uvyMLQj+VsfSQNU+v5asjZnvwUizDs4EKkcS2pSWdpxzTeW/99jIhnsPJTZiXKEckAlloyE+QDvOtsy3dLc5qjbBrAJPNHVkthxFIqmasNfQZ9fcehXPNAWAtp6gOcroGBPL8zDu8dXyhFLmp2TbvwwPHx7WI3Z6yfCBQVuPm1B7gkn8c9NZlAzA7I13j4al0kExVwruWmvNGuXudS+plGLkUK6/JolzEJ6BjQyRi7T8JM16f/8Zng+pjJPX1vablrBeV/2kSlz5Xu+llu+ec8FymJAN8TozCzQZ3+rKEeqPT+dzJS8EYY140L1FfNJx+cxigFiPBblbSYUOw= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH0PR11MB5000.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(31686004)(31696002)(86362001)(508600001)(316002)(6666004)(83380400001)(66946007)(26005)(5660300002)(66476007)(186003)(66556008)(6486002)(16576012)(8676002)(2906002)(36756003)(110136005)(54906003)(38100700002)(8936002)(956004)(30864003)(2616005)(4326008)(53546011)(44832011)(21314003)(45980500001); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?V243b1FCaXJrMlNoUVZiRmZQZ3hJeElCdGVlaVZ3eTN0WGdrbmdSNXJPL3k5?= =?utf-8?B?b1FtZTJoVGNNaUd4SnZNSW5sSmdqc0tZTk5MejBZb0ZtanlleHY3SGtSUFZ0?= =?utf-8?B?T1hrWHpaRit6QnV6N3VIQ1JUNWg1RnBZdzY4NjBOWWlzU085K3ZUUm5IU3VI?= =?utf-8?B?VzloV05kY0EzQkk0bjcwRmYxVkNYT25PQU5ORUUwNFMxNFhXY21HVnJTbUYv?= =?utf-8?B?YXRtVDNqUlNlMlhVUW56L0NlRlp3MXpIdmZXTlpjb09QMGUwcU5tbUdyNmZS?= =?utf-8?B?UmxHa2pVRWpqbXpkSlp5TzdCcFAxaW1EVGZrUDduTFJ1b3JuQThlNnk2NXFy?= =?utf-8?B?ZkhCMTJDL0h6NllsSmUvazgzbVZHZDRUVTJTS0lwMUoxRUtFN2ZBdVRlcUhM?= =?utf-8?B?L2ZvbC9lb01qVEwrYjhVRjlZZTJsZXBNWUJTdkxrTzFIeGhFRkRrYWNRSWZl?= =?utf-8?B?bWhWSXlPb2dody9TVkNtQTRrRW5xbm9OTjRtQ0dzUTdka0RGaExpOHJQU3NY?= =?utf-8?B?d2hJQm5MNnhsMmQ1SE9KWGtLdFN0eVNGLy8rKy9FR2cyVXdSS0wydjdsRkw2?= =?utf-8?B?NmhEMkRXU01kaDhuMHErK1FRdDBTRVdZQStHREZtaXczZFdnY3RqdWlUWWdC?= =?utf-8?B?a1VSME1lMHBZMVFUckVVQmQ5TVYycm9VdUEvMmVkT0cvOGk1UjMzV0hENytG?= =?utf-8?B?cEVhdlowRmd1dVZFS2MyTFNmWlJYeHlCVlJYeXRhOUkxQ1ZWcGRwem5pcVBF?= =?utf-8?B?MVRtTlh3ZGNEODlrWGw5WEZIWldmd0piUFdqK0ZCT3BNS0w5QlVSM3pHZTJl?= =?utf-8?B?Y0pkbzFzUndsRlhLQ3hkSkNpZzZodFRKakthTGtYZVU4OWpDa0JqMm1HSkJL?= =?utf-8?B?RGFydTNBVjY3NFU4TDllbXc4SzFIT2tIdllKZ3NuaTI5OHFnTlltNks0VXU5?= =?utf-8?B?TDhnOEo4UlBTZS9MdmlCQ2tubWc0VlV6MUN0UE03RWo4MkNoeU85OFRzdWhE?= =?utf-8?B?RTVOVWUwZkhTUDExK1NwamZGQ3hqNFByczdjWk1qdTN6K0xTbnkrbzZGUDlC?= =?utf-8?B?RkVhUkpHOWtUU0Q5aHZTTmVyYXoyUGNwL2E4V2ZmUDA3N3QyNnpPQWszK0Fz?= =?utf-8?B?a2d5L1FZYWQwUUJKaVlYMExLTTl2VUZvT1RLNlJRa0VlbnlENnNRVDFBSVFy?= =?utf-8?B?eVZlQXpXZ01TMVhudGJyMG4zbnNEMFVmZVROcGR6NlIyMWlMK2dldW9rK3Ji?= =?utf-8?B?K2RnRzNPTFdpSEp6RHE2dWVFTFpod2RUc0FoNk5aMElIRkxMS1JxK1lMUFh4?= =?utf-8?B?NmlyL282MlA1U1l3M0V2Y0Z0SUd2dWxCdXQrWDRxSENYMGhIc01RVkRGTDlG?= =?utf-8?B?aW1nU01SVFVDUnNHWnhNUlVQdGt2UjUvaHhRT1J2dmJiMlBTd3UvK2IrNzg3?= =?utf-8?B?Qzh3dEJJM2Z2ZVZHSFhGN21GalZoWXJUWFVybGFlTHNKc0lnWDdEQkxTMjUv?= =?utf-8?B?M0JYakpOZlFNNkdnQ2J4UXhCSmNsZXZiNnZCSXl1V29WSGdVZDR3dFBDZnVk?= =?utf-8?B?a1Y1dXNka21TNzg3eGp4T2o1S2FOdVJPU0sxenF5eGVhcGJrNnpMbzFJY3lj?= =?utf-8?B?MWo5SnNMWTNob3hMUHFWYzhZNE44dnExU3ZQQm5sdW5VckZXWGVaYVJIcDZ6?= =?utf-8?B?RGxXTjFPSWlvOGs5dE4ySzdreGY2LzVIUm9EbFNjU1hBZU9iOWdKa3QvZm1p?= =?utf-8?Q?DDa+7VRQbcE08fSrQqrFnw/yQZw6LYW0Ac6fDqa?= X-MS-Exchange-CrossTenant-Network-Message-Id: bb19c1f7-3cab-4525-c662-08d98401a7e6 X-MS-Exchange-CrossTenant-AuthSource: PH0PR11MB5000.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Sep 2021 11:01:29.8290 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: LPY9Pv8dABq4F97T63aT8v+xGVhexBh7mOMJESs8x3IbrKevEQtlTmXC3BDhFZdy5oEcAq9y8PtAAO28Gnib4g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR11MB4821 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [RFC V2] ethdev: fix issue that dev close in PMD calls twice X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 9/30/2021 11:54 AM, Huisong Li wrote: > > 在 2021/9/28 15:19, Singh, Aman Deep 写道: >> >> On 9/22/2021 9:01 AM, Huisong Li wrote: >>> >>> 在 2021/9/20 22:07, Ferruh Yigit 写道: >>>> On 8/25/2021 10:53 AM, Huisong Li wrote: >>>>> 在 2021/8/24 22:42, Ferruh Yigit 写道: >>>>>> On 8/19/2021 4:45 AM, Huisong Li wrote: >>>>>>> 在 2021/8/18 19:24, Ferruh Yigit 写道: >>>>>>>> On 8/13/2021 9:16 AM, Huisong Li wrote: >>>>>>>>> 在 2021/8/13 14:12, Thomas Monjalon 写道: >>>>>>>>>> 13/08/2021 04:11, Huisong Li: >>>>>>>>>>> Hi, all >>>>>>>>>>> >>>>>>>>>>> This patch can enhance the security of device uninstallation to >>>>>>>>>>> eliminate dependency on user usage methods. >>>>>>>>>>> >>>>>>>>>>> Can you check this patch? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 在 2021/8/3 10:30, Huisong Li 写道: >>>>>>>>>>>> Ethernet devices in DPDK can be released by rte_eth_dev_close() and >>>>>>>>>>>> rte_dev_remove(). These APIs both call xxx_dev_close() in PMD layer >>>>>>>>>>>> to uninstall hardware. However, the two APIs do not have explicit >>>>>>>>>>>> invocation restrictions. In other words, at the ethdev layer, it is >>>>>>>>>>>> possible to call rte_eth_dev_close() before calling rte_dev_remove() >>>>>>>>>>>> or rte_eal_hotplug_remove(). In such a bad scenario, >>>>>>>>>> It is not a bad scenario. >>>>>>>>>> If there is no more port for the device after calling close, >>>>>>>>>> the device should be removed automatically. >>>>>>>>>> Keep in mind "close" is for one port, "remove" is for the entire device >>>>>>>>>> which can have more than one port. >>>>>>>>> I know. >>>>>>>>> >>>>>>>>> dev_close() is for removing an eth device. And rte_dev_remove() can be >>>>>>>>> used >>>>>>>>> >>>>>>>>> for removing the rte device and all its eth devices belonging to the rte >>>>>>>>> device. >>>>>>>>> >>>>>>>>> In rte_dev_remove(), "remove" is executed in primary or one of secondary, >>>>>>>>> >>>>>>>>> all eth devices having same pci address will be closed and removed. >>>>>>>>> >>>>>>>>>>>> the primary >>>>>>>>>>>> process may be fine, but it may cause that xxx_dev_close() in the PMD >>>>>>>>>>>> layer will be called twice in the secondary process. So this patch >>>>>>>>>>>> fixes it. >>>>>>>>>> If a port is closed in primary, it should be the same in secondary. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> +    /* >>>>>>>>>>>> +     * The eth_dev->data->name doesn't be cleared by the secondary >>>>>>>>>>>> process, >>>>>>>>>>>> +     * so above "eth_dev" isn't NULL after rte_eth_dev_close() called. >>>>>>>>>> This assumption is not clear. All should be closed together. >>>>>>>>> However, dev_close() does not have the feature similar to >>>>>>>>> rte_dev_remove(). >>>>>>>>> >>>>>>>>> Namely, it is not guaranteed that all eth devices are closed together in >>>>>>>>> ethdev >>>>>>>>> layer. It depends on app or user. >>>>>>>>> >>>>>>>>> If the app does not close together, the operation of repeatedly >>>>>>>>> uninstalling an >>>>>>>>> eth device in the secondary process >>>>>>>>> >>>>>>>>> will be triggered when dev_close() is first called by one secondary >>>>>>>>> process, and >>>>>>>>> then rte_dev_remove() is called. >>>>>>>>> >>>>>>>>> So I think it should be avoided. >>>>>>>> First of all, I am not sure about calling 'rte_eth_dev_close()' or >>>>>>>> 'rte_dev_remove()' from the secondary process. >>>>>>>> There are explicit checks in various locations to prevent clearing >>>>>>>> resources >>>>>>>> completely from secondary process. >>>>>>> There's no denying that. >>>>>>> >>>>>>> Generally, hardware resources of eth device and shared data of the >>>>>>> primary and >>>>>>> secondary process >>>>>>> >>>>>>> are cleared by primary, which are controled by ethdev layer or PMD layer. >>>>>>> >>>>>>> But there may be some private data or resources of each process (primary or >>>>>>> secondary ), such as mp action >>>>>>> >>>>>>> registered by rte_mp_action_register() or others.  For these resources, the >>>>>>> secondary process still needs to clear. >>>>>>> >>>>>>> Namely, both primary and secondary processes need to prevent repeated >>>>>>> offloading >>>>>>> of resources. >>>>>>> >>>>>>>> Calling 'rte_eth_dev_close()' or 'rte_dev_remove()' by secondary is >>>>>>>> technically >>>>>>>> can be done but application needs to be extra cautious and should take >>>>>>>> extra >>>>>>>> measures and synchronization to make it work. >>>>>>>> Regular use-case is secondary processes do the packet processing and all >>>>>>>> control >>>>>>>> commands run by primary. >>>>>>> You are right. We have a consensus that 'rte_eth_dev_close()' or >>>>>>> 'rte_dev_remove()' >>>>>>> >>>>>>> can be called by primary and secondary processes. >>>>>>> >>>>>>> But DPDK framework cannot assume user behavior.😁 >>>>>>> >>>>>>> We need to make it more secure and reliable for both primary and secondary >>>>>>> processes. >>>>>>> >>>>>>>> In primary, if you call 'rte_eth_dev_close()' it will clear all ethdev >>>>>>>> resources >>>>>>>> and further 'rte_dev_remove()' call will detect missing ethdev resources >>>>>>>> and >>>>>>>> won't try to clear them again. >>>>>>>> >>>>>>>> In secondary, if you call 'rte_eth_dev_close()', it WON'T clear all >>>>>>>> resources >>>>>>>> and further 'rte_dev_remove()' call (either from primary or secondary) >>>>>>>> will try >>>>>>>> to clean ethdev resources again. You are trying to prevent this retry in >>>>>>>> remove >>>>>>>> happening for secondary process. >>>>>>> Right. However, if secondary process in PMD layer has its own private >>>>>>> resources >>>>>>> to be >>>>>>> >>>>>>> cleared, it still need to do it by calling 'rte_eth_dev_close()' or >>>>>>> 'rte_dev_remove()'. >>>>>>> >>>>>>>> In secondary it won't free ethdev resources anyway if you let it continue, >>>>>>>> but I >>>>>>>> guess here you are trying to prevent the PMD dev_close() called again. >>>>>>>> Why? Is >>>>>>>> it just for optimization or does it cause unexpected behavior in the PMD? >>>>>>>> >>>>>>>> >>>>>>>> Overall, to free resources you need to do the 'rte_eth_dev_close()' or >>>>>>>> 'rte_dev_remove()' in the primary anyway. So instead of this workaround, I >>>>>>>> would >>>>>>>> suggest making PMD dev_close() safe to be called multiple times (if this >>>>>>>> is the >>>>>>>> problem.) >>>>>>> In conclusion,  primary and secondary processes in PMD layer may have >>>>>>> their own >>>>>>> >>>>>>> private data and resources, which need to be processed and released. >>>>>>> >>>>>>> Currently,  these for PMD are either handled and cleaned up in >>>>>>> dev_close() or >>>>>>> remove(). >>>>>>> >>>>>>> However, code streams in rte_dev_remove() cannot ensure that the >>>>>>> uninstallation >>>>>>> >>>>>>> from secondary process will not be repeated if rte_eth_dev_close() is first >>>>>>> called by >>>>>>> >>>>>>> secondary(primary is ok, plese review this patch). >>>>>>> >>>>>>> I think, this is the same for each PMD and is better suited to doing it in >>>>>>> ethdev layer. >>>>>>> >>>>>> This patch prevents to call dev_close() twice in the secondary process, is >>>>>> this >>>>>> fixing a theoretical problem or an actual problem? >>>>>> >>>>>> If it is an actual problem can you please provide details, callstack of the >>>>>> problematic case? >>>>> We analyzed the code when modifying the bug and found that the problem did >>>>> exist. >>>>> >>>>> The ethdev layer did not guarantee the security. >>>>> >>>> I was wondering if there is a crash for an unexpected path, for below case the >>>> primary process check in the 'hns3_dev_uninit()' should already prevent >>>> anything >>>> unexpected. So I assume this is a fix for a theoretical issue. >>> >>> Yes. For primary process, hns3 can prevent multiple device uninstallation >>> through >>> >>> "adapter_state" controled by primary process. >>> >>> The problem of multiple device uninstallation has been prevented at >>> rte_eth_dev_pci_generic_remove() >>> >>> in the ethdev layer, as described in the patch. >>> >>> In primary process, rte_eth_dev_allocated() in >>> rte_eth_dev_pci_generic_remove() will return NULL >>> >>> when first calling dev_close(), and then calling rte_dev_remove(). So it is >>> ok. But the logic can not >>> >>> prevent the same case in secondary because secondary does not clear dev->data. >>> >>>> >>>> In secondary process, these init/uninit device is already not very safe. For >>>> example, as far as I can see in secondary if you hot remove a device device and >>>> hot plug a new one, new device will use wrong device data (since hot remove >>>> won't clear device data, new one will continue to use it). >>> >>> No. If we hot remove a device in secondary, the secondary will request its >>> primary >>> >>> send "remove device" message to all secondaries. After all secondaries are >>> removed, >>> >>> the primary also removes the device and the device data will be cleared. >>> >>> This is the logic of rte_dev_remove(). >>> >>>> So I am not sure about adding secondary process related checks in that area and >>>> causing a false sense of security, and polluting the logic with secondary >>>> specific checks. >>>> Also the check you add may hit by primary process and I am worried on an >>>> unexpected side affect it cause. >>>> >>>> >>>> As said above I am not sure about this new check, but even we continue with it, >>>> what about wrapping the check with secondary process check, at least to be sure >>>> there won't be any side affect for primary process. >>>> >>> If app hot remove device in primary/secondary, the original logic is still >>> used in this interface, because of >>> >>> bing RTE_ETH_DEV_ATTACHED state for eth_dev. If app first calls dev_close(), >>> eth device is >>> >>> RTE_ETH_DEV_UNUSED state in primary/secondary. In this issue case, this >>> interface is still return from the >>> >>> original place instead of the new check. >>> >>> So I don't think the new check will affect the primary process. Conversely, >>> it's safer for secondary processes. >>> >>> Please check it again, thanks. >> >> As this issue is specific to secondary process, can we have these change under >> a check like "if (rte_eal_process_type() != RTE_PROC_PRIMARY)" >> >> By this we can avoid any side effect of these changes for primary process and >> also it makes code readablity easier for designers who are not checking >> secondary process changes. > > yes. > > @Ferruh, what do you think? > +1 to have it, it clarifies the intention of the check, plus prevents possible sice affect for primary as Aman mentioned. >> >>> >>>>> The general function of the two interfaces is as follows: >>>>> >>>>> rte_eth_dev_close() --> release eth device >>>>> >>>>> rte_dev_remove() -->   release eth device + remove and free >>>>> rte_pci_device(primary andsecondary). >>>>> >>>>> According to the OVS application scenario, first call dev_close() and then >>>>> call >>>>> >>>>> remove(), which is possible. >>>>> >>>>> We constructed this scenario using testpmd to start the secondary process(the >>>>> multi-process >>>>> >>>>> patch of testpmd is being uploaded.). It is proved that the ethdev layer >>>>> cannot >>>>> guarantee >>>>> >>>>> this security. The callstack is as follows: >>>>> >>>>> ************************************************************************************************************************************************** >>>>> >>>>> >>>>> >>>>> gdb) set args -a 0000:7d:00.0 -l 2-3 --file-prefix=rte_lee --proc-type=auto -- >>>>> -i --rxq 4 --txq 4 --num-procs=2 --proc-id=1 >>>>> (gdb) b hns3_dev_uninit >>>>> Breakpoint 1 at 0xbca7d0: file ../drivers/net/hns3/hns3_ethdev.c, line 7806. >>>>> (gdb) b hns3_dev_close >>>>> Breakpoint 2 at 0xbc6d44: file ../drivers/net/hns3/hns3_ethdev.c, line 6189. >>>>> (gdb) r >>>>> Starting program: >>>>> /home/lihuisong/v500/gcov-test/dpdk/build/app/dpdk-testpmd -a >>>>> 0000:7d:00.0 -l 2-3 --file-prefix=rte_lee --proc-type=auto -- -i --rxq 4 >>>>> --txq 4 >>>>> --num-procs=2 --proc-id=1 >>>>> Missing separate debuginfo for /root/lib/libnuma.so.1 >>>>> Try: yum --enablerepo='*debug*' install >>>>> /usr/lib/debug/.build-id/ce/4eea0b0f2150f70a080bbd8835e43e78373096.debug >>>>> [Thread debugging using libthread_db enabled] >>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>> EAL: Detected 128 lcore(s) >>>>> EAL: Detected 4 NUMA nodes >>>>> EAL: Auto-detected process type: SECONDARY >>>>> EAL: Detected static linkage of DPDK >>>>> [New Thread 0xfffff7a0ad10 (LWP 17717)] >>>>> EAL: Multi-process socket /var/run/dpdk/rte_lee/mp_socket_17714_66aa8d3a60a9 >>>>> [New Thread 0xfffff7209d10 (LWP 17718)] >>>>> EAL: Selected IOVA mode 'VA' >>>>> EAL: VFIO support initialized >>>>> [New Thread 0xfffff69f8d10 (LWP 17719)] >>>>> EAL: Using IOMMU type 1 (Type 1) >>>>> EAL: Probe PCI driver: net_hns3 (19e5:a222) device: 0000:7d:00.0 (socket 0) >>>>> [New Thread 0xfffff61f7d10 (LWP 17720)] >>>>> TELEMETRY: No legacy callbacks, legacy socket not created >>>>> Interactive-mode selected >>>>> 0000:7d:00.0 hns3_dev_mtu_set(): Failed to set mtu, port 0 must be stopped >>>>> before configuration >>>>> Failed to set MTU to 1500 for port 0 >>>>> testpmd: create a new mbuf pool : n=155456, size=2176, socket=0 >>>>> testpmd: preferred mempool ops selected: ring_mp_mc >>>>> >>>>> Warning! port-topology=paired and odd forward ports number, the last port will >>>>> pair with itself. >>>>> >>>>> Configuring Port 0 (socket 0) >>>>> Port 0: 00:18:2D:00:00:9E >>>>> Checking link statuses... >>>>> Done >>>>> testpmd> port close 0 >>>>> Closing ports... >>>>> >>>>> Breakpoint 2, hns3_dev_close (eth_dev=0x21cdb80 ) >>>>>      at ../drivers/net/hns3/hns3_ethdev.c:6189 >>>>> 6189        struct hns3_adapter *hns = eth_dev->data->dev_private; >>>>> Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.aarch64 >>>>> libpcap-1.5.3-11.el7.aarch64 openssl-libs-1.0.2k-16.el7.aarch64 >>>>> zlib-1.2.7-18.el7.aarch64 >>>>> (gdb) bt >>>>> #0  hns3_dev_close (eth_dev=0x21cdb80 ) at >>>>> ../drivers/net/hns3/hns3_ethdev.c:6189 >>>>> #1  0x0000000000742eac in rte_eth_dev_close (port_id=0) at >>>>> ../lib/ethdev/rte_ethdev.c:1870 >>>>> #2  0x0000000000542a8c in close_port (pid=0) at ../app/test-pmd/testpmd.c:2895 >>>>> #3  0x00000000004df82c in cmd_operate_specific_port_parsed >>>>> (parsed_result=0xffffffffb0f0, >>>>>      cl=0x2475080, data=0x0) at ../app/test-pmd/cmdline.c:1272 >>>>> #4  0x0000000000738ce4 in cmdline_parse (cl=0x2475080, buf=0x24750c8 "port >>>>> close >>>>> 0\n") >>>>>      at ../lib/cmdline/cmdline_parse.c:290 >>>>> #5  0x0000000000736b7c in cmdline_valid_buffer (rdl=0x2475090, buf=0x24750c8 >>>>> "port close 0\n", >>>>>      size=14) at ../lib/cmdline/cmdline.c:26 >>>>> #6  0x000000000073c078 in rdline_char_in (rdl=0x2475090, c=10 '\n') >>>>>      at ../lib/cmdline/cmdline_rdline.c:421 >>>>> #7  0x0000000000736fcc in cmdline_in (cl=0x2475080, buf=0xfffffffff25f >>>>> "\n\200\362\377\377\377\377", >>>>>      size=1) at ../lib/cmdline/cmdline.c:149 >>>>> #8  0x0000000000737270 in cmdline_interact (cl=0x2475080) at >>>>> ../lib/cmdline/cmdline.c:223 >>>>> #9  0x00000000004f0ccc in prompt () at ../app/test-pmd/cmdline.c:17882 >>>>> #10 0x0000000000545528 in main (argc=8, argv=0xfffffffff470) at >>>>> ../app/test-pmd/testpmd.c:3998 >>>>> (gdb) c >>>>> Continuing. >>>>> Port 0 is closed >>>>> Done >>>>> testpmd> device detach 0000:7d:00.0 >>>>> Removing a device... >>>>> [Switching to Thread 0xfffff7a0ad10 (LWP 17717)] >>>>> >>>>> Breakpoint 1, hns3_dev_uninit (eth_dev=0x21cdb80 ) >>>>>      at ../drivers/net/hns3/hns3_ethdev.c:7806 >>>>> 7806        struct hns3_adapter *hns = eth_dev->data->dev_private; >>>>> (gdb) bt >>>>> #0  hns3_dev_uninit (eth_dev=0x21cdb80 ) at >>>>> ../drivers/net/hns3/hns3_ethdev.c:7806 >>>>> #1  0x0000000000bb8668 in rte_eth_dev_pci_generic_remove (pci_dev=0x247a600, >>>>>      dev_uninit=0xbca7c4 ) at ../lib/ethdev/ethdev_pci.h:155 >>>>> #2  0x0000000000bca89c in eth_hns3_pci_remove (pci_dev=0x247a600) >>>>>      at ../drivers/net/hns3/hns3_ethdev.c:7833 >>>>> #3  0x00000000007e85f4 in rte_pci_detach_dev (dev=0x247a600) at >>>>> ../drivers/bus/pci/pci_common.c:287 >>>>> #4  0x00000000007e8f14 in pci_unplug (dev=0x247a610) at >>>>> ../drivers/bus/pci/pci_common.c:570 >>>>> #5  0x0000000000775678 in local_dev_remove (dev=0x247a610) at >>>>> ../lib/eal/common/eal_common_dev.c:319 >>>>> #6  0x000000000078f114 in __handle_primary_request (param=0xfffff00008c0) >>>>>      at ../lib/eal/common/hotplug_mp.c:284 >>>>> #7  0x000000000079ebf4 in eal_alarm_callback (arg=0x0) at >>>>> ../lib/eal/linux/eal_alarm.c:92 >>>>> #8  0x00000000007a3f30 in eal_intr_process_interrupts (events=0xfffff7a0a3e0, >>>>> nfds=1) >>>>>      at ../lib/eal/linux/eal_interrupts.c:998 >>>>> #9  0x00000000007a4224 in eal_intr_handle_interrupts (pfd=10, totalfds=2) >>>>>      at ../lib/eal/linux/eal_interrupts.c:1071 >>>>> #10 0x00000000007a442c in eal_intr_thread_main (arg=0x0) at >>>>> ../lib/eal/linux/eal_interrupts.c:1142 >>>>> #11 0x0000000000789388 in ctrl_thread_init (arg=0x246ef40) >>>>>      at ../lib/eal/common/eal_common_thread.c:202 >>>>> #12 0x0000fffff7bf3c48 in start_thread () from /lib64/libpthread.so.0 >>>>> #13 0x0000fffff7b45600 in thread_start () from /lib64/libc.so.6 >>>>> >>>>> ************************************************************************************************************************************************** >>>>> >>>>> >>>>> >>>>> Note: above log is from the secondary process. >>>>> >>>>>>>> And again, please re-consider calling 'rte_eth_dev_close()' or >>>>>>>> 'rte_dev_remove()' from the secondary process. >>>>>>>> >>>>>>>>>>>> +     * Namely, whether "eth_dev" is NULL cannot be used to determine >>>>>>>>>>>> whether >>>>>>>>>>>> +     * an ethdev port has been released. >>>>>>>>>>>> +     * For both primary process and secondary process, >>>>>>>>>>>> eth_dev->state is >>>>>>>>>>>> +     * RTE_ETH_DEV_UNUSED, which means the ethdev port has been >>>>>>>>>>>> released. >>>>>>>>>>>> +     */ >>>>>>>>>>>> +    if (eth_dev->state == RTE_ETH_DEV_UNUSED) { >>>>>>>>>>>> +        RTE_ETHDEV_LOG(INFO, "The ethdev port has been released."); >>>>>>>>>>>> +        return 0; >>>>>>>>>>>> +    } >>>>>>>>>> . >>>>>>>> . >>>>>> . >>>> . >> .