From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C261CA0C46; Tue, 28 Sep 2021 09:20:13 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7BB2140E3C; Tue, 28 Sep 2021 09:20:13 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id AD68440DF6 for ; Tue, 28 Sep 2021 09:20:11 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10120"; a="224672760" X-IronPort-AV: E=Sophos;i="5.85,328,1624345200"; d="scan'208";a="224672760" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Sep 2021 00:20:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,328,1624345200"; d="scan'208";a="553825571" Received: from orsmsx605.amr.corp.intel.com ([10.22.229.18]) by FMSMGA003.fm.intel.com with ESMTP; 28 Sep 2021 00:20:08 -0700 Received: from orsmsx606.amr.corp.intel.com (10.22.229.19) by ORSMSX605.amr.corp.intel.com (10.22.229.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12; Tue, 28 Sep 2021 00:20:06 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx606.amr.corp.intel.com (10.22.229.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.12 via Frontend Transport; Tue, 28 Sep 2021 00:20:06 -0700 Received: from NAM04-MW2-obe.outbound.protection.outlook.com (104.47.73.175) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2242.12; Tue, 28 Sep 2021 00:20:04 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=duTcSlYGIIVU7KGzA61lZAZz6TD5IPwyZqvVMwSzAGRc8LW9WhAhfcHjmw1fppbP5kqQLEDEgaRYE8lc/2DpGnB/owyu87E6/5QWzJZcZBfNHKT1FQXOUnYzJ5++NEpll65FTnDhDegg5kZigAzvJ/GK+ODFHnhN6zhO12NTTsyasoRqvoCz8EIBm904Z7IdF9riEC6rveiUstGiIfYdWKZfgj5U0FA/X+kCVB83OJ8zBlyLN0M6TUUelSZkOiStv8dAM+YvuHd54v0ID4wPX/vaXUFD0av/hbJi36PJZxj/eGCxg8MignhAZmUFOon8HrtCr2ydgeBpt6/ctRKpOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=wj4FCeJzG+rKbTjQRapQBjdQemC6xIVgKu81VZj28F0=; b=M/zdxoXNyfQyX6pfVj0As+yMqnNlB+08pw9AYHXypnVuTwTUkvhqO/qKPrSuLdc91m1hzSgNHb8hjS7wzzDqnN7w9vzdE+nbe5XBvncUrQBehstcwPup9qEWOWMGVCsfTlBlWqjDfUDhIYRsrYz4qQZXVvb+19rQ79p4Kq3w1ogrh+NNmOzCiYOmxx+/dFUJND8DBMU6C7TRIgk9Rmvx/Nq2bOQLSrYIY8BjgXMQNU/v2+IUI81SBKxPgLST+U+3xXps2VDbaXwQJSURm4p5kWM1Ywhoh6P+HZ3Da7Q/zKz5haImgUHI5vw5adx7nNuWzJrfT/WAdITQzSRzNfEPTA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wj4FCeJzG+rKbTjQRapQBjdQemC6xIVgKu81VZj28F0=; b=vGQa1Wu+T84WBJcL9IDd5SzTWdgyl3r7vCap3S+V0R5RzDDHBB7u7H4/Sz0c0ae9SsyHdHd/liFcEPDSEVRbvKD08MGzMSYKHDdYkY6sQDBsIl5IAOiimrLaSawHWis6YtWZBIH6fuspmnCSbUs1P9+IlPCiDZyO/xtVGLM5eLY= Authentication-Results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=intel.com; Received: from CY4PR11MB1367.namprd11.prod.outlook.com (2603:10b6:903:2d::16) by CY4PR1101MB2232.namprd11.prod.outlook.com (2603:10b6:910:21::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4544.14; Tue, 28 Sep 2021 07:20:03 +0000 Received: from CY4PR11MB1367.namprd11.prod.outlook.com ([fe80::a00b:3347:5cd4:d190]) by CY4PR11MB1367.namprd11.prod.outlook.com ([fe80::a00b:3347:5cd4:d190%2]) with mapi id 15.20.4544.022; Tue, 28 Sep 2021 07:20:03 +0000 To: Huisong Li , Ferruh Yigit , Thomas Monjalon CC: Andrew Rybchenko , "dev@dpdk.org" , Anatoly Burakov , David Marchand References: <1627908397-51565-1-git-send-email-lihuisong@huawei.com> <1627957839-38279-1-git-send-email-lihuisong@huawei.com> <9670389d-ebbc-9d9c-0cac-c7e8826ecb6f@huawei.com> <21383486.34YfpWhNxb@thomas> <858dc20f-5711-f770-2c08-0a432b6ea733@huawei.com> From: "Singh, Aman Deep" Message-ID: <4e2b5720-6da0-155b-845d-76c42b800abc@intel.com> Date: Tue, 28 Sep 2021 12:49:51 +0530 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.14.0 In-Reply-To: <858dc20f-5711-f770-2c08-0a432b6ea733@huawei.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-ClientProxiedBy: PN2PR01CA0046.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c01:22::21) To CY4PR11MB1367.namprd11.prod.outlook.com (2603:10b6:903:2d::16) MIME-Version: 1.0 Received: from [192.168.1.18] (223.178.212.116) by PN2PR01CA0046.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c01:22::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4544.13 via Frontend Transport; Tue, 28 Sep 2021 07:20:00 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d1447619-12ed-48dd-d863-08d98250639a X-MS-TrafficTypeDiagnostic: CY4PR1101MB2232: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: oiuO9TigUPasaP7v8HTxoNDo5g6yc++bYXreVcEQC5Cb0bTZvJyK5HPOPbY/IBRECCORRmHoH5entriova7oQV6QMo8ALl4kyZ+Xah2hCUJ1MnCFMQ27qoo+QGrt4feok8ZJ7UfXqK6xDuok3bJmxrUFS18Rjrm+mHcBx6Ebvj9nYcRHux6/WnLc4tj94nW1vLJMR82igLiXYKDS8dI2ty0CA+yKV6L0kmVpPS8mEsaa+tKaL//yFI1oFRDPpCQdYm0s9H0zCm1BfUCRRAYiPEm24VoJCZB4H9PFLy56FXf45MKgF4xl03b1ZJoFL52teNfow4fIJP5nHjK8zZiIBxwyyUWpbdZAsKqgc623P9x+aUV2/NKjwS2H9bGME9uLqKDva5BvTarw32RyMW18jaw3Vb1rpoc7l9XyqSMSZPAizCU3S96FKoygEipdpg5Uo0kO5nSsvoTVKBxZ0uvzbKZ/M/tZ/3obT7Qv7P+H56JT4KfoZf/2lV0XEj9Wj3BO236jpZ1QLvwSWEA9MAtuJx8vz0zlR/hTujm1mhswRVT2WrAMDFQBKJQbMHezBNbxI1N2H6Kvk4iX+SEvbSrOIbj+F4HlLTGUGY0pNiKPHhlrN8leet9OUHk0k+Fmvlkkt4MOYg8AHzV3f/K0Mjhc7HLrpMR56Rxg1a/MwuvLH1CMnGxmfDZykwkx9/3nH/NPEcf987lfu0uMRL/Ob/IBVsVgwmgvJs06C82PNxUFK0B+IywNtimw+XdBhfOx4g0h X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CY4PR11MB1367.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(508600001)(86362001)(31696002)(2616005)(38100700002)(53546011)(956004)(2906002)(30864003)(31686004)(5660300002)(186003)(16576012)(110136005)(66556008)(4326008)(8936002)(66476007)(6666004)(26005)(83380400001)(36756003)(54906003)(316002)(6486002)(66946007)(8676002)(21314003)(43740500002)(45980500001); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?d1RrR3pIQVJ2MGt2V0NrcTZFblVzSjhKWGtIRnRmNi9sUDQ0WjI5TEJsZ0Q3?= =?utf-8?B?RnVGWVBaVUhuVFJ5bTNWWlcweWF0cXBlbk5sVWh5azlubTJ0MUZzd0VBZ0Zp?= =?utf-8?B?TlB1L1ZqNFNsZEdNMVp2RlF2TFFtT3BIMVhpVVJRdUdYZStLaVlGdk1Bd3ZG?= =?utf-8?B?YWRhekk3RW1qOVk5R0VYaDFrUHAwcndtYUJOR1c3SkJxYWpWS3lwa2NHazAz?= =?utf-8?B?QnV4TmtYT3lXOFh4M2tadHR1bVNRZXB2SG1FNU9vVHpoQUhwTGdjdklrL1JW?= =?utf-8?B?Y1N3RmxpSXpmU0tRQ0FPVHU5dWw4Vm44ZlVreUYrcTN4c0pqL3NGRHZpNlBW?= =?utf-8?B?Qk50Z2F0V3pJTkF0MVRuLzY4V3lzRXB2RzF0ZmtqWVlaeGxNeitkWjhLNVJF?= =?utf-8?B?T1JBZG9pUWh5UXpSQVBYajdtbFBHaWtYYzZFbUYzZTJXQ1lTZW93cDRPS0k3?= =?utf-8?B?WTF3UVdaQ2pzT2VSaFJGd1ZzT2kxYm9FaXVuQW9EdDQ1c0Q4aDRZVlI0RUdC?= =?utf-8?B?dzM4aGlOeWxLSkdWM2doNEZFeHNTbHNONjhpVTlrdHoxUUZVbnZDU0R6QVJD?= =?utf-8?B?M2tBZzFtU1FGd0JuT1lXTlNQK3IyU25xSmEzb2R0cndxaW8zaUV6MVRFUEpk?= =?utf-8?B?cGJtNU16ZkFyZWNMMDdHeENNdTFRVGRXYlgwYkFaRTg2d2ZGczVZb08zVUhn?= =?utf-8?B?S3k2UkExeTBMcStoK0x1NUUybU1RLzYwZm1kSWk2WEx3L0ppMEYwZGx4VFE1?= =?utf-8?B?MmZrTEM3d3NLZHRxVy94bVJGVnFMWm55ZU5JWStncHlSMTZCVU00UlJ5K1cz?= =?utf-8?B?ZG5UQjhKV1JZZkFZSHlCVGdjSmo0RE1CbXJiTnNYUEJ0WHNBaFNSNkRxSVJ3?= =?utf-8?B?U0hLNy9aU0d6UnJITmxidDdwbm8vWGxsVFRZSVRvRGZsenp0RUN2TGZibm56?= =?utf-8?B?UjdFTDlSR202ZHVNSmhrVFdTUlZSZFdtWjcyQUhmYjlOemZMR2tBYzI5ZGdt?= =?utf-8?B?WGVhbHFEbi9GQ1RyeU10MFZ2Y1paZFZDM2doMmU3U2U0SzVQWHBYb2owTFk0?= =?utf-8?B?bWRZVHhqUmppK0FmUklKNHZwTk1wcmR6RzQvR1dtcm1zVDU4ZU9Qcy9WOFI5?= =?utf-8?B?SHlraSs5cXJZeG9XSHF1aFlvZFdnYW9PQjVRN2VoQk9LOFhqT0UvbDVLZS9H?= =?utf-8?B?THNXM2RUWGYyRFhPMFlabjFwcytEQlN2S2RtYVJGQlFTVFNnMjlwRVE1UWt2?= =?utf-8?B?Y1ozMHMwMTY0TjVSdjBtV1BvR00xVjd3NnRNSFNydldYczhmNHFVeWlFM0Jt?= =?utf-8?B?aG85S2hoZTg1L3lPY2hsSHNldExlWkhYWFFQNk50NlZrK0JzSUxpU3ptMUk5?= =?utf-8?B?b0R0Y2EzUGVKTmV2MmJPSGtHNklZZGNLeFNXVEZ6K05DdUxOWmZKSWYxY2JS?= =?utf-8?B?bnZYMTRhVzhCTWJQUGYrRVo1dTRERENtc08wY09GUm9lVWpFSzZ3RUZVeUdh?= =?utf-8?B?R2JpUk1QUnViaFVJaHBmY1hMN2VHRHM5MFhrZjc4UDVFejhiZFhJT2JWbFFm?= =?utf-8?B?UUc1cXA2UGlyd0RVV0JHcER4cW1pWUVRTlgweEZ5YmhPQTFQSXhneHhRQkM3?= =?utf-8?B?SkhhK2FuYnVqTUI4clphdG1LZ0phdUoyeXQxZno4RVpCRmp5NFo2YmRSaW5m?= =?utf-8?B?Q2dBUWtOUlRESFNkMkFFbXd4aDBPd3dxT2JBREc4M05xUG4rblgxcFFaaFR5?= =?utf-8?Q?ny+DbsrYNCkUGV1tT7suAdnxatGh8DieXZMjbty?= X-MS-Exchange-CrossTenant-Network-Message-Id: d1447619-12ed-48dd-d863-08d98250639a X-MS-Exchange-CrossTenant-AuthSource: CY4PR11MB1367.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Sep 2021 07:20:03.2201 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: pvrbQmVExUwRlOCkMXmzeH6EAsipFJqiSLRgrlNR+xYWONHked2miw/0jQTUrmjyalDH1cmng4sorSs+1OUBDJuNgo473Hl1jSEbqazOnAA= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR1101MB2232 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [RFC V2] ethdev: fix issue that dev close in PMD calls twice X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 9/22/2021 9:01 AM, Huisong Li wrote: > > 在 2021/9/20 22:07, Ferruh Yigit 写道: >> On 8/25/2021 10:53 AM, Huisong Li wrote: >>> 在 2021/8/24 22:42, Ferruh Yigit 写道: >>>> On 8/19/2021 4:45 AM, Huisong Li wrote: >>>>> 在 2021/8/18 19:24, Ferruh Yigit 写道: >>>>>> On 8/13/2021 9:16 AM, Huisong Li wrote: >>>>>>> 在 2021/8/13 14:12, Thomas Monjalon 写道: >>>>>>>> 13/08/2021 04:11, Huisong Li: >>>>>>>>> Hi, all >>>>>>>>> >>>>>>>>> This patch can enhance the security of device uninstallation to >>>>>>>>> eliminate dependency on user usage methods. >>>>>>>>> >>>>>>>>> Can you check this patch? >>>>>>>>> >>>>>>>>> >>>>>>>>> 在 2021/8/3 10:30, Huisong Li 写道: >>>>>>>>>> Ethernet devices in DPDK can be released by >>>>>>>>>> rte_eth_dev_close() and >>>>>>>>>> rte_dev_remove(). These APIs both call xxx_dev_close() in PMD >>>>>>>>>> layer >>>>>>>>>> to uninstall hardware. However, the two APIs do not have >>>>>>>>>> explicit >>>>>>>>>> invocation restrictions. In other words, at the ethdev layer, >>>>>>>>>> it is >>>>>>>>>> possible to call rte_eth_dev_close() before calling >>>>>>>>>> rte_dev_remove() >>>>>>>>>> or rte_eal_hotplug_remove(). In such a bad scenario, >>>>>>>> It is not a bad scenario. >>>>>>>> If there is no more port for the device after calling close, >>>>>>>> the device should be removed automatically. >>>>>>>> Keep in mind "close" is for one port, "remove" is for the >>>>>>>> entire device >>>>>>>> which can have more than one port. >>>>>>> I know. >>>>>>> >>>>>>> dev_close() is for removing an eth device. And rte_dev_remove() >>>>>>> can be used >>>>>>> >>>>>>> for removing the rte device and all its eth devices belonging to >>>>>>> the rte >>>>>>> device. >>>>>>> >>>>>>> In rte_dev_remove(), "remove" is executed in primary or one of >>>>>>> secondary, >>>>>>> >>>>>>> all eth devices having same pci address will be closed and removed. >>>>>>> >>>>>>>>>> the primary >>>>>>>>>> process may be fine, but it may cause that xxx_dev_close() in >>>>>>>>>> the PMD >>>>>>>>>> layer will be called twice in the secondary process. So this >>>>>>>>>> patch >>>>>>>>>> fixes it. >>>>>>>> If a port is closed in primary, it should be the same in >>>>>>>> secondary. >>>>>>>> >>>>>>>> >>>>>>>>>> +    /* >>>>>>>>>> +     * The eth_dev->data->name doesn't be cleared by the >>>>>>>>>> secondary >>>>>>>>>> process, >>>>>>>>>> +     * so above "eth_dev" isn't NULL after >>>>>>>>>> rte_eth_dev_close() called. >>>>>>>> This assumption is not clear. All should be closed together. >>>>>>> However, dev_close() does not have the feature similar to >>>>>>> rte_dev_remove(). >>>>>>> >>>>>>> Namely, it is not guaranteed that all eth devices are closed >>>>>>> together in >>>>>>> ethdev >>>>>>> layer. It depends on app or user. >>>>>>> >>>>>>> If the app does not close together, the operation of repeatedly >>>>>>> uninstalling an >>>>>>> eth device in the secondary process >>>>>>> >>>>>>> will be triggered when dev_close() is first called by one secondary >>>>>>> process, and >>>>>>> then rte_dev_remove() is called. >>>>>>> >>>>>>> So I think it should be avoided. >>>>>> First of all, I am not sure about calling 'rte_eth_dev_close()' or >>>>>> 'rte_dev_remove()' from the secondary process. >>>>>> There are explicit checks in various locations to prevent >>>>>> clearing resources >>>>>> completely from secondary process. >>>>> There's no denying that. >>>>> >>>>> Generally, hardware resources of eth device and shared data of the >>>>> primary and >>>>> secondary process >>>>> >>>>> are cleared by primary, which are controled by ethdev layer or PMD >>>>> layer. >>>>> >>>>> But there may be some private data or resources of each process >>>>> (primary or >>>>> secondary ), such as mp action >>>>> >>>>> registered by rte_mp_action_register() or others.  For these >>>>> resources, the >>>>> secondary process still needs to clear. >>>>> >>>>> Namely, both primary and secondary processes need to prevent >>>>> repeated offloading >>>>> of resources. >>>>> >>>>>> Calling 'rte_eth_dev_close()' or 'rte_dev_remove()' by secondary >>>>>> is technically >>>>>> can be done but application needs to be extra cautious and should >>>>>> take extra >>>>>> measures and synchronization to make it work. >>>>>> Regular use-case is secondary processes do the packet processing >>>>>> and all >>>>>> control >>>>>> commands run by primary. >>>>> You are right. We have a consensus that 'rte_eth_dev_close()' or >>>>> 'rte_dev_remove()' >>>>> >>>>> can be called by primary and secondary processes. >>>>> >>>>> But DPDK framework cannot assume user behavior.😁 >>>>> >>>>> We need to make it more secure and reliable for both primary and >>>>> secondary >>>>> processes. >>>>> >>>>>> In primary, if you call 'rte_eth_dev_close()' it will clear all >>>>>> ethdev >>>>>> resources >>>>>> and further 'rte_dev_remove()' call will detect missing ethdev >>>>>> resources and >>>>>> won't try to clear them again. >>>>>> >>>>>> In secondary, if you call 'rte_eth_dev_close()', it WON'T clear >>>>>> all resources >>>>>> and further 'rte_dev_remove()' call (either from primary or >>>>>> secondary) will try >>>>>> to clean ethdev resources again. You are trying to prevent this >>>>>> retry in remove >>>>>> happening for secondary process. >>>>> Right. However, if secondary process in PMD layer has its own >>>>> private resources >>>>> to be >>>>> >>>>> cleared, it still need to do it by calling 'rte_eth_dev_close()' or >>>>> 'rte_dev_remove()'. >>>>> >>>>>> In secondary it won't free ethdev resources anyway if you let it >>>>>> continue, >>>>>> but I >>>>>> guess here you are trying to prevent the PMD dev_close() called >>>>>> again. Why? Is >>>>>> it just for optimization or does it cause unexpected behavior in >>>>>> the PMD? >>>>>> >>>>>> >>>>>> Overall, to free resources you need to do the >>>>>> 'rte_eth_dev_close()' or >>>>>> 'rte_dev_remove()' in the primary anyway. So instead of this >>>>>> workaround, I >>>>>> would >>>>>> suggest making PMD dev_close() safe to be called multiple times >>>>>> (if this is the >>>>>> problem.) >>>>> In conclusion,  primary and secondary processes in PMD layer may >>>>> have their own >>>>> >>>>> private data and resources, which need to be processed and released. >>>>> >>>>> Currently,  these for PMD are either handled and cleaned up in >>>>> dev_close() or >>>>> remove(). >>>>> >>>>> However, code streams in rte_dev_remove() cannot ensure that the >>>>> uninstallation >>>>> >>>>> from secondary process will not be repeated if rte_eth_dev_close() >>>>> is first >>>>> called by >>>>> >>>>> secondary(primary is ok, plese review this patch). >>>>> >>>>> I think, this is the same for each PMD and is better suited to >>>>> doing it in >>>>> ethdev layer. >>>>> >>>> This patch prevents to call dev_close() twice in the secondary >>>> process, is this >>>> fixing a theoretical problem or an actual problem? >>>> >>>> If it is an actual problem can you please provide details, >>>> callstack of the >>>> problematic case? >>> We analyzed the code when modifying the bug and found that the >>> problem did exist. >>> >>> The ethdev layer did not guarantee the security. >>> >> I was wondering if there is a crash for an unexpected path, for below >> case the >> primary process check in the 'hns3_dev_uninit()' should already >> prevent anything >> unexpected. So I assume this is a fix for a theoretical issue. > > Yes. For primary process, hns3 can prevent multiple device > uninstallation through > > "adapter_state" controled by primary process. > > The problem of multiple device uninstallation has been prevented at > rte_eth_dev_pci_generic_remove() > > in the ethdev layer, as described in the patch. > > In primary process, rte_eth_dev_allocated() in > rte_eth_dev_pci_generic_remove() will return NULL > > when first calling dev_close(), and then calling rte_dev_remove(). So > it is ok. But the logic can not > > prevent the same case in secondary because secondary does not clear > dev->data. > >> >> In secondary process, these init/uninit device is already not very >> safe. For >> example, as far as I can see in secondary if you hot remove a device >> device and >> hot plug a new one, new device will use wrong device data (since hot >> remove >> won't clear device data, new one will continue to use it). > > No. If we hot remove a device in secondary, the secondary will request > its primary > > send "remove device" message to all secondaries. After all secondaries > are removed, > > the primary also removes the device and the device data will be cleared. > > This is the logic of rte_dev_remove(). > >> So I am not sure about adding secondary process related checks in >> that area and >> causing a false sense of security, and polluting the logic with >> secondary >> specific checks. >> Also the check you add may hit by primary process and I am worried on an >> unexpected side affect it cause. >> >> >> As said above I am not sure about this new check, but even we >> continue with it, >> what about wrapping the check with secondary process check, at least >> to be sure >> there won't be any side affect for primary process. >> > If app hot remove device in primary/secondary, the original logic is > still used in this interface, because of > > bing RTE_ETH_DEV_ATTACHED state for eth_dev. If app first calls > dev_close(), eth device is > > RTE_ETH_DEV_UNUSED state in primary/secondary. In this issue case, > this interface is still return from the > > original place instead of the new check. > > So I don't think the new check will affect the primary process. > Conversely, it's safer for secondary processes. > > Please check it again, thanks. As this issue is specific to secondary process, can we have these change under a check like "if (rte_eal_process_type() != RTE_PROC_PRIMARY)" By this we can avoid any side effect of these changes for primary process and also it makes code readablity easier for designers who are not checking secondary process changes. > >>> The general function of the two interfaces is as follows: >>> >>> rte_eth_dev_close() --> release eth device >>> >>> rte_dev_remove() -->   release eth device + remove and free >>> rte_pci_device(primary andsecondary). >>> >>> According to the OVS application scenario, first call dev_close() >>> and then call >>> >>> remove(), which is possible. >>> >>> We constructed this scenario using testpmd to start the secondary >>> process(the >>> multi-process >>> >>> patch of testpmd is being uploaded.). It is proved that the ethdev >>> layer cannot >>> guarantee >>> >>> this security. The callstack is as follows: >>> >>> ************************************************************************************************************************************************** >>> >>> >>> >>> gdb) set args -a 0000:7d:00.0 -l 2-3 --file-prefix=rte_lee >>> --proc-type=auto -- >>> -i --rxq 4 --txq 4 --num-procs=2 --proc-id=1 >>> (gdb) b hns3_dev_uninit >>> Breakpoint 1 at 0xbca7d0: file ../drivers/net/hns3/hns3_ethdev.c, >>> line 7806. >>> (gdb) b hns3_dev_close >>> Breakpoint 2 at 0xbc6d44: file ../drivers/net/hns3/hns3_ethdev.c, >>> line 6189. >>> (gdb) r >>> Starting program: >>> /home/lihuisong/v500/gcov-test/dpdk/build/app/dpdk-testpmd -a >>> 0000:7d:00.0 -l 2-3 --file-prefix=rte_lee --proc-type=auto -- -i >>> --rxq 4 --txq 4 >>> --num-procs=2 --proc-id=1 >>> Missing separate debuginfo for /root/lib/libnuma.so.1 >>> Try: yum --enablerepo='*debug*' install >>> /usr/lib/debug/.build-id/ce/4eea0b0f2150f70a080bbd8835e43e78373096.debug >>> >>> [Thread debugging using libthread_db enabled] >>> Using host libthread_db library "/lib64/libthread_db.so.1". >>> EAL: Detected 128 lcore(s) >>> EAL: Detected 4 NUMA nodes >>> EAL: Auto-detected process type: SECONDARY >>> EAL: Detected static linkage of DPDK >>> [New Thread 0xfffff7a0ad10 (LWP 17717)] >>> EAL: Multi-process socket >>> /var/run/dpdk/rte_lee/mp_socket_17714_66aa8d3a60a9 >>> [New Thread 0xfffff7209d10 (LWP 17718)] >>> EAL: Selected IOVA mode 'VA' >>> EAL: VFIO support initialized >>> [New Thread 0xfffff69f8d10 (LWP 17719)] >>> EAL: Using IOMMU type 1 (Type 1) >>> EAL: Probe PCI driver: net_hns3 (19e5:a222) device: 0000:7d:00.0 >>> (socket 0) >>> [New Thread 0xfffff61f7d10 (LWP 17720)] >>> TELEMETRY: No legacy callbacks, legacy socket not created >>> Interactive-mode selected >>> 0000:7d:00.0 hns3_dev_mtu_set(): Failed to set mtu, port 0 must be >>> stopped >>> before configuration >>> Failed to set MTU to 1500 for port 0 >>> testpmd: create a new mbuf pool : n=155456, size=2176, >>> socket=0 >>> testpmd: preferred mempool ops selected: ring_mp_mc >>> >>> Warning! port-topology=paired and odd forward ports number, the last >>> port will >>> pair with itself. >>> >>> Configuring Port 0 (socket 0) >>> Port 0: 00:18:2D:00:00:9E >>> Checking link statuses... >>> Done >>> testpmd> port close 0 >>> Closing ports... >>> >>> Breakpoint 2, hns3_dev_close (eth_dev=0x21cdb80 ) >>>      at ../drivers/net/hns3/hns3_ethdev.c:6189 >>> 6189        struct hns3_adapter *hns = eth_dev->data->dev_private; >>> Missing separate debuginfos, use: debuginfo-install >>> glibc-2.17-260.el7.aarch64 >>> libpcap-1.5.3-11.el7.aarch64 openssl-libs-1.0.2k-16.el7.aarch64 >>> zlib-1.2.7-18.el7.aarch64 >>> (gdb) bt >>> #0  hns3_dev_close (eth_dev=0x21cdb80 ) at >>> ../drivers/net/hns3/hns3_ethdev.c:6189 >>> #1  0x0000000000742eac in rte_eth_dev_close (port_id=0) at >>> ../lib/ethdev/rte_ethdev.c:1870 >>> #2  0x0000000000542a8c in close_port (pid=0) at >>> ../app/test-pmd/testpmd.c:2895 >>> #3  0x00000000004df82c in cmd_operate_specific_port_parsed >>> (parsed_result=0xffffffffb0f0, >>>      cl=0x2475080, data=0x0) at ../app/test-pmd/cmdline.c:1272 >>> #4  0x0000000000738ce4 in cmdline_parse (cl=0x2475080, buf=0x24750c8 >>> "port close >>> 0\n") >>>      at ../lib/cmdline/cmdline_parse.c:290 >>> #5  0x0000000000736b7c in cmdline_valid_buffer (rdl=0x2475090, >>> buf=0x24750c8 >>> "port close 0\n", >>>      size=14) at ../lib/cmdline/cmdline.c:26 >>> #6  0x000000000073c078 in rdline_char_in (rdl=0x2475090, c=10 '\n') >>>      at ../lib/cmdline/cmdline_rdline.c:421 >>> #7  0x0000000000736fcc in cmdline_in (cl=0x2475080, buf=0xfffffffff25f >>> "\n\200\362\377\377\377\377", >>>      size=1) at ../lib/cmdline/cmdline.c:149 >>> #8  0x0000000000737270 in cmdline_interact (cl=0x2475080) at >>> ../lib/cmdline/cmdline.c:223 >>> #9  0x00000000004f0ccc in prompt () at ../app/test-pmd/cmdline.c:17882 >>> #10 0x0000000000545528 in main (argc=8, argv=0xfffffffff470) at >>> ../app/test-pmd/testpmd.c:3998 >>> (gdb) c >>> Continuing. >>> Port 0 is closed >>> Done >>> testpmd> device detach 0000:7d:00.0 >>> Removing a device... >>> [Switching to Thread 0xfffff7a0ad10 (LWP 17717)] >>> >>> Breakpoint 1, hns3_dev_uninit (eth_dev=0x21cdb80 ) >>>      at ../drivers/net/hns3/hns3_ethdev.c:7806 >>> 7806        struct hns3_adapter *hns = eth_dev->data->dev_private; >>> (gdb) bt >>> #0  hns3_dev_uninit (eth_dev=0x21cdb80 ) at >>> ../drivers/net/hns3/hns3_ethdev.c:7806 >>> #1  0x0000000000bb8668 in rte_eth_dev_pci_generic_remove >>> (pci_dev=0x247a600, >>>      dev_uninit=0xbca7c4 ) at >>> ../lib/ethdev/ethdev_pci.h:155 >>> #2  0x0000000000bca89c in eth_hns3_pci_remove (pci_dev=0x247a600) >>>      at ../drivers/net/hns3/hns3_ethdev.c:7833 >>> #3  0x00000000007e85f4 in rte_pci_detach_dev (dev=0x247a600) at >>> ../drivers/bus/pci/pci_common.c:287 >>> #4  0x00000000007e8f14 in pci_unplug (dev=0x247a610) at >>> ../drivers/bus/pci/pci_common.c:570 >>> #5  0x0000000000775678 in local_dev_remove (dev=0x247a610) at >>> ../lib/eal/common/eal_common_dev.c:319 >>> #6  0x000000000078f114 in __handle_primary_request >>> (param=0xfffff00008c0) >>>      at ../lib/eal/common/hotplug_mp.c:284 >>> #7  0x000000000079ebf4 in eal_alarm_callback (arg=0x0) at >>> ../lib/eal/linux/eal_alarm.c:92 >>> #8  0x00000000007a3f30 in eal_intr_process_interrupts >>> (events=0xfffff7a0a3e0, >>> nfds=1) >>>      at ../lib/eal/linux/eal_interrupts.c:998 >>> #9  0x00000000007a4224 in eal_intr_handle_interrupts (pfd=10, >>> totalfds=2) >>>      at ../lib/eal/linux/eal_interrupts.c:1071 >>> #10 0x00000000007a442c in eal_intr_thread_main (arg=0x0) at >>> ../lib/eal/linux/eal_interrupts.c:1142 >>> #11 0x0000000000789388 in ctrl_thread_init (arg=0x246ef40) >>>      at ../lib/eal/common/eal_common_thread.c:202 >>> #12 0x0000fffff7bf3c48 in start_thread () from /lib64/libpthread.so.0 >>> #13 0x0000fffff7b45600 in thread_start () from /lib64/libc.so.6 >>> >>> ************************************************************************************************************************************************** >>> >>> >>> >>> Note: above log is from the secondary process. >>> >>>>>> And again, please re-consider calling 'rte_eth_dev_close()' or >>>>>> 'rte_dev_remove()' from the secondary process. >>>>>> >>>>>>>>>> +     * Namely, whether "eth_dev" is NULL cannot be used to >>>>>>>>>> determine >>>>>>>>>> whether >>>>>>>>>> +     * an ethdev port has been released. >>>>>>>>>> +     * For both primary process and secondary process, >>>>>>>>>> eth_dev->state is >>>>>>>>>> +     * RTE_ETH_DEV_UNUSED, which means the ethdev port has >>>>>>>>>> been released. >>>>>>>>>> +     */ >>>>>>>>>> +    if (eth_dev->state == RTE_ETH_DEV_UNUSED) { >>>>>>>>>> +        RTE_ETHDEV_LOG(INFO, "The ethdev port has been >>>>>>>>>> released."); >>>>>>>>>> +        return 0; >>>>>>>>>> +    } >>>>>>>> . >>>>>> . >>>> . >> .