From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4A4B4A0547; Thu, 21 Oct 2021 11:16:07 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0F263410E2; Thu, 21 Oct 2021 11:16:07 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 1BCA740142 for ; Thu, 21 Oct 2021 11:16:06 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19L7fcmf002523; Thu, 21 Oct 2021 02:16:03 -0700 Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by mx0b-0016f401.pphosted.com with ESMTP id 3btjwdn92a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 Oct 2021 02:16:03 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VfcKsM1QkWbUkd7SfXXb11Twn7+sa7F6Tg3qu4CFCJlpa2LnaqJ/SFdf/+bDjZpGlyxhXgYi8ylwuYQojksnVpnwJokqJTjYGkV7InHKt8e4ZK7vTCofPnVYs5juHk8/FFJnsJAcxfCHaM8ipz4C7ig52ajVFXW7+AsUHqiPnEzp9NmMsBjKeh8U0oUZZfonf2ELAC3kMPRkcwpHjA4+tAIfhLfThs9hgnPHfGtRKLPzTvWkag/rlqefxXhqYspIViw4fwBRIxyElzztb4BZQXtz77LTqb44aK1rwnxejLLcxH+ezm9Hn7pQ0ggKL+X2D139g+hkbVQoDGLcjL+xJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wLuoLRKtw+XxV1yWukMd8bZjGMrlSePh1B06Eztrydc=; b=R1t2sXsy4+2i4Qpt004O4hQvGDAO6O/ZVI1GqYuSzUQOnG56FNjWB3zMOJ7MQbaVwWzhp0LClqtRtxTzpOFCNn3CgW//MXtXw+JJDTdmINV8Gtogp3SPcH+vZ5SI/7N+LfDg1sC1IzqDPU1GjnH1pnF1fxrv8ZZKGv3MyzV43vp4xV+WputrDn61y2GLD4KBcUOFVgQQyjQ9ikflT9g0YGz/VqFlfR8NPXnge3/bDQKNe9//4XdOqWMpqVeT7S5aZCj6W7YdzLIXsQRUfuTtDEnXYYWfqO988t6Uby5dTfPPggBUQ9HG6FQ9vG9y+xmH3u7++TwS10GbuSmmQtgaaA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=marvell.com; dmarc=pass action=none header.from=marvell.com; dkim=pass header.d=marvell.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector1-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wLuoLRKtw+XxV1yWukMd8bZjGMrlSePh1B06Eztrydc=; b=rthqrbwJLhNZuVbjzH/rNJ883LCklvzQh9GjO0P0808CsJazqXqQPWXkd9V9s9bgaKtwXITRML3mIofc4n0yjQIA/u/bxNwE/nSJbEUXnuhyaKkFyw5ty3gvx37g2tZ34x/Q1ylblD97+UQ4ZdNKhmW5q0ILeQNxX2h4ss9eVwc= Received: from BN9PR18MB4204.namprd18.prod.outlook.com (2603:10b6:408:119::18) by BN6PR18MB1107.namprd18.prod.outlook.com (2603:10b6:404:69::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4628.16; Thu, 21 Oct 2021 09:16:02 +0000 Received: from BN9PR18MB4204.namprd18.prod.outlook.com ([fe80::29f4:8e3d:264f:26b1]) by BN9PR18MB4204.namprd18.prod.outlook.com ([fe80::29f4:8e3d:264f:26b1%8]) with mapi id 15.20.4628.018; Thu, 21 Oct 2021 09:16:01 +0000 From: Harman Kalra To: Dmitry Kozlyuk CC: Stephen Hemminger , Thomas Monjalon , "david.marchand@redhat.com" , "dev@dpdk.org" , Ray Kinsella Thread-Topic: [EXT] Re: [dpdk-dev] [PATCH v3 2/7] eal/interrupts: implement get set APIs Thread-Index: AQHXxGKdt5yz0m35OkaI3v43YB1ftqvZXh4AgACCpWCAAiVlgIABJygw Date: Thu, 21 Oct 2021 09:16:01 +0000 Message-ID: References: <20210826145726.102081-1-hkalra@marvell.com> <20211018193707.123559-1-hkalra@marvell.com> <20211018193707.123559-3-hkalra@marvell.com> <20211018155654.0d3ffbed@hermes.local> <20211020183051.657b05c1@sovereign> In-Reply-To: <20211020183051.657b05c1@sovereign> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 4f347d18-bd80-4048-aa8c-08d9947366db x-ms-traffictypediagnostic: BN6PR18MB1107: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: nCCpUcGyhFj36mePSAMgYdrKXY+GZhSt0F1kzjbBC5/hrGfiAi6M+fWU9aKE4I/ojhDMcyENIjT4Pt8v92ArmhRlxodwnvLXI1Dly9FyLlxztn0OeypVGwi73/LCLyStaUN9DmRUTJ+/51jCLmPoLTqNGbpTkvf84iXlxPDu3B6XYUtSPbIF5ifZFCZ9U7nODi1pI1dE8A87hx5wspVSTWogfo26ZD2K3Sta5ftfadxSkT/zWfjFD0Q2OzB1NvmfKye32vOULKs8nbLchxDXtNIOoTJQhnpMlfSFs85Uzwl9Ty99huRbB2l4zIW5pGWwuRDwsfqSmcD9P7/nU5nAo/pmFgrmzHIdewXVh7DU3d4qkrCTv1f2+02+12wcqGXGqWsUiGzOSG4zgilhQMosmT3fxt59elw4QxFup0oQqVkTQ8+6g+p4E4EjF6WR10x/kx7ZD3afgoJCvALyxoaYCH+JPLrbt9+oP+ehoGRPNJgwl7Z6UkGgdqAJLDulFX445NYBp4mcIKCgPbK4D62hfxKq9ZI9M1H2rnDp+2OcWpRzdO6ePqagVN17AWZVM220R27CeEbMESjoWpVRVG6kmmEBAb+HbnKQxzHuG27BxkkDueNHHnoUESEvGaduosS/IkHGgxmvo3qwNyzk6L7aAGCzXvSPLqNqq4z5SQLtPVp/GbrS2A+r4nKgrUQMj9HvTaQLqoY4ASZAavyxBlgRdw== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BN9PR18MB4204.namprd18.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(83380400001)(66476007)(38100700002)(86362001)(6506007)(122000001)(2906002)(6916009)(4326008)(53546011)(8676002)(71200400001)(76116006)(38070700005)(7696005)(33656002)(186003)(66946007)(64756008)(66446008)(5660300002)(55016002)(54906003)(52536014)(316002)(66556008)(8936002)(9686003)(508600001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?bXZvhI/mXv52TMpwFvcReYQuiokdoT5QIUsrCu0sPCFt+T5H9cMY3hqfLOBT?= =?us-ascii?Q?KUapX0b8TvfhgNmPg7yluLRdXSifmqZIg682w4leWVKi+bCg2KvhZ04OqTNX?= =?us-ascii?Q?mDgprIicFfKcM2LmXlMce8EZkI6OYFy5VdKQb/us9e7Ba2526f4tsgy/tZg4?= =?us-ascii?Q?PbCr3jFs7QJ6hvZFKZz+ARkGEQo+tcmEl3Rhx1T9vAhhb5zXMXJOxXyu5RQu?= =?us-ascii?Q?10dweda8ovkw3Zsn4MrRhsaqBag3TElLMzpVf+JDsh9yMrxfymy9B9ovTKzU?= =?us-ascii?Q?BO789IZ8FSS0oNqaKDrJmx4gUFWb+WchPhF9ik1ZPNlpOdNpXMHjNjtO4IIO?= =?us-ascii?Q?HCt1CTEgFtLatWNRe2Nwl/9JQtHhmCF+Wuy7kSzkDrcCBmxbx8K4NMTKpON/?= =?us-ascii?Q?y8xzGVhz8HpNgc/XmTdBaPR9UqZIvroUTGL/HOC7B6umPzhAYskkEfvU9dV8?= =?us-ascii?Q?CbTDQG2pBYAtyuHBKzw7gH1UbQNO6MrBaiwFxkOjtzDduRhrNTlJ2m4OfDS9?= =?us-ascii?Q?ISTB677uk7vxLRA3e4VOHOAeT2aB07J6XTVQ/INzghFJkVxWXeqaNwkteFGo?= =?us-ascii?Q?vLXx4gcX6yj0L+QK0IEzIVp9sS6VDyRH+RwZhymcShWcPYQaWT5SuRV+j1hv?= =?us-ascii?Q?XtbWwb58TK1AHWLWxEoIJZ4SZ5dL6TlgCnO3av2oNJxROU12XbIYeI8g4SrN?= =?us-ascii?Q?O6cbdo0gSKUk+wdmVq6UKsvwty9PRx1hORaVUeVHaVFwr+aLURDwCGKfsqj1?= =?us-ascii?Q?TFrIeSzQSkL6td/QU25xlbf7r5PjPzA46S/q4Z3QivrQSOkYobak0tUmbdSk?= =?us-ascii?Q?FXbzWResO7SIh4mNLiL5nlnW7oHq3odhFXXztGwUUahUAixCf748YWofmPxR?= =?us-ascii?Q?2s6cjVjgRpQhDbOUEjQbysn9uvVynqEA0lSZq1zH/FdtVxzVwb9JtldhhedA?= =?us-ascii?Q?qUecqDvgpMbqcig+zRPV3ApyGAeUY9IK0a+/PoVwlwZB7ixfsyITNXmjbcWk?= =?us-ascii?Q?HZfy7EYMV8FeGzCkPWXCwpG+TF+TuHrJEixQwlrGrBtEImacD9TUl+4F0i/k?= =?us-ascii?Q?jxM08mCYVrnHqUfQJ+6t0zw+BdjsgQGDan7hpQuGl0bd2vUZsiYlA38h5HgC?= =?us-ascii?Q?aYEVl7ZYPjXNQwMssULjZJXNQd1uRDw+RZYdHMdOCNgN1MWxDzYI33Ql5whw?= =?us-ascii?Q?4Enj+nwrzkup6xHC5nsw8gEwXQvtcLMQ7RRYBRMMGPKojl40MET4IJElXPOP?= =?us-ascii?Q?jlb7M25j5PSP5xWtkCYYLllkx5NJOzNOTWG/xip5q6ys57YnP0MmeGWD0ec3?= =?us-ascii?Q?ax0pHnNjQA3uwnlAZGJjRoGaPD3LkE9R4U7oNbYMAxLOOF2S00VH03TiwN/8?= =?us-ascii?Q?BtFAfIafdTWbVDUu+/Z8iv5ApPJhzgT1NI5e/jabWxSFUGbRunnIqc1jxbrn?= =?us-ascii?Q?6L2JuR45rIrmSKhxWwN5XL/TdFPCgFEVvO2yolH+E/IKV820m5czRUAg1E5U?= =?us-ascii?Q?dWLOba5tT+WF1zQ=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: marvell.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BN9PR18MB4204.namprd18.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4f347d18-bd80-4048-aa8c-08d9947366db X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Oct 2021 09:16:01.5894 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: hkalra@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR18MB1107 X-Proofpoint-GUID: D127cCziSy2iTUWJbypbAm6cdqn7sTHb X-Proofpoint-ORIG-GUID: D127cCziSy2iTUWJbypbAm6cdqn7sTHb X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-21_02,2021-10-20_02,2020-04-07_01 Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v3 2/7] eal/interrupts: implement get set APIs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Dmitry Kozlyuk > Sent: Wednesday, October 20, 2021 9:01 PM > To: Harman Kalra > Cc: Stephen Hemminger ; Thomas > Monjalon ; david.marchand@redhat.com; > dev@dpdk.org; Ray Kinsella > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v3 2/7] eal/interrupts: implemen= t > get set APIs >=20 > > > > > > > + /* Detect if DPDK malloc APIs are ready to be used. */ > > > > + mem_allocator =3D rte_malloc_is_ready(); > > > > + if (mem_allocator) > > > > + intr_handle =3D rte_zmalloc(NULL, sizeof(struct > > > rte_intr_handle), > > > > + 0); > > > > + else > > > > + intr_handle =3D calloc(1, sizeof(struct rte_intr_handle)); > > > > > > This is problematic way to do this. > > > The reason to use rte_malloc vs malloc should be determined by usage. > > > > > > If the pointer will be shared between primary/secondary process then > > > it has to be in hugepages (ie rte_malloc). If it is not shared then > > > then use regular malloc. > > > > > > But what you have done is created a method which will be a latent > > > bug for anyone using primary/secondary process. > > > > > > Either: > > > intr_handle is not allowed to be used in secondary. > > > Then always use malloc(). > > > Or. > > > intr_handle can be used by both primary and secondary. > > > Then always use rte_malloc(). > > > Any code path that allocates intr_handle before pool is > > > ready is broken. > > > > Hi Stephan, > > > > Till V2, I implemented this API in a way where user of the API can > > choose If he wants intr handle to be allocated using malloc or > > rte_malloc by passing a flag arg to the rte_intr_instanc_alloc API. > > User of the API will best know if the intr handle is to be shared with > secondary or not. > > > > But after some discussions and suggestions from the community we > > decided to drop that flag argument and auto detect on whether > > rte_malloc APIs are ready to be used and thereafter make all further > allocations via rte_malloc. > > Currently alarm subsystem (or any driver doing allocation in > > constructor) gets interrupt instance allocated using glibc malloc that > > too because rte_malloc* is not ready by rte_eal_alarm_init(), while > > all further consumers gets instance allocated via rte_malloc. >=20 > Just as a comment, bus scanning is the real issue, not the alarms. > Alarms could be initialized after the memory management (but it's irrelev= ant > because their handle is not accessed from the outside). > However, MM needs to know bus IOVA requirements to initialize, which is > usually determined by at least bus device requirements. >=20 > > I think this should not cause any issue in primary/secondary model as > > all interrupt instance pointer will be shared. >=20 > What do you mean? Aren't we discussing the issue that those allocated ear= ly > are not shared? >=20 > > Infact to avoid any surprises of primary/secondary not working we > > thought of making all allocations via rte_malloc. >=20 > I don't see why anyone would not make them shared. > In order to only use rte_malloc(), we need: > 1. In bus drivers, move handle allocation from scan to probe stage. > 2. In EAL, move alarm initialization to after the MM. > It all can be done later with v3 design---but there are out-of-tree drive= rs. > We need to force them to make step 1 at some point. > I see two options: > a) Right now have an external API that only works with rte_malloc() > and internal API with autodetection. Fix DPDK and drop internal API. > b) Have external API with autodetection. Fix DPDK. > At the next ABI breakage drop autodetection and libc-malloc. >=20 > > David, Thomas, Dmitry, please add if I missed anything. > > > > Can we please conclude on this series APIs as API freeze deadline (rc1)= is > very near. >=20 > I support v3 design with no options and autodetection, because that's the > interface we want in the end. > Implementation can be improved later. Hi All, I came across 2 issues introduced with auto detection mechanism. 1. In case of primary secondary model. Primary application is started whic= h makes lots of allocations via rte_malloc* =20 Secondary side: a. Secondary starts, in its "rte_eal_init()" it makes some allocation v= ia rte_*, and in one of the allocation request for heap expand is made as current memseg got exhausted. (malloc_he= ap_alloc_on_heap_id ()-> alloc_more_mem_on_socket()->try_expand_heap()) b. A request to primary for heap expand is sent. Please note secondary h= olds the spinlock while making the request. (malloc_heap_alloc_on_heap_id ()->rte_spinlock_lock(&(heap->lo= ck));) Primary side: a. Primary receives the request, install a new hugepage and setups up th= e heap (handle_alloc_request()) b. To inform all the secondaries about the new memseg, primary sends a s= ync notice where it sets up an=20 alarm (rte_mp_request_async ()->mp_request_async()). c. Inside alarm setup API, we register an interrupt callback. d. Inside rte_intr_callback_register(), a new interrupt instance allocat= ion is requested for "src->intr_handle" e. Since memory management is detected as up, inside "rte_intr_instance_= alloc()", call to "rte_zmalloc" for allocating memory and further inside "malloc_heap_alloc_on_heap_id()", prim= ary will experience a deadlock while taking up the spinlock because this spinlock is already hold by secon= dary. 2. "eal_flags_file_prefix_autotest" is failing because the spawned process = by this tests are expected to cleanup their hugepage traces from respective directories (eg /dev/hugepage).=20 a. Inside eal_cleanup, rte_free()->malloc_heap_free(), where element to be = freed is added to the free list and checked if nearby elements can be joined together and form a big free chunk= (malloc_elem_free()). b. If this free chunk is big enough than the hugepage size, respective huge= page can be uninstalled after making sure no allocation from this hugepage exists. (malloc_heap_free()->malloc_h= eap_free_pages()->eal_memalloc_free_seg()) But because of interrupt allocations made for pci intr handles (used for VF= IO) and other driver specific interrupt handles are not cleaned up in "rte_eal_cleanup()", these hugepage files are= not removed and test fails. There could be more such issues, I think we should firstly fix the DPDK. 1. Memory management should be made independent and should be the first thi= ng to come up in rte_eal_init() 2. rte_eal_cleanup() should be exactly opposite to rte_eal_init(), just lik= e bus_probe, we should have bus_remove to clean up all the memory allocations. Regarding this IRQ series, I would like to fall back to our original design= i.e. rte_intr_instance_alloc() should take an argument whether its memory should be allocated using glibc malloc or rt= e_malloc*. Decision for allocation (malloc or rte_malloc) can be made on fact that in the existing code is the= interrupt handle is shared? Eg. a. In case of alarm intr_handle was global entry and not confined to a= ny structure, so this can be allocated from normal malloc. b. PCI device, had static entry for intr_handle inside "struct rte_pci_devi= ce" and memory for struct rte_pci_device is via normal malloc, so it intr_handle can also be malloc'ed c. Some driver with intr_handle inside its priv structure, and this priv st= ructure gets allocated via rte_malloc, so Intr_handle can also be rte_malloc. Later once DPDK is fixed up, this argument can be removed and all allocatio= ns can be via rte_malloc family without any auto detection. David, Dmitry, Thomas, Stephan, please share your views.... Thanks Harman