From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 46FFF45966; Thu, 12 Sep 2024 04:19:13 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 343954027D; Thu, 12 Sep 2024 04:19:13 +0200 (CEST) Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on2065.outbound.protection.outlook.com [40.107.96.65]) by mails.dpdk.org (Postfix) with ESMTP id CE9A240267 for ; Thu, 12 Sep 2024 04:19:11 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=CWsdesX8e750qXKdq9i/9oEkSUBeEKzmlZVsaDtWP72DYufumFXdX0aRDFKhvafvKPOpID6bPQNV0jk41dTteAqXYw1RJmaF+kXnW0bknVzq8TLihUTwfxpASKZAlfFYJ1SOhpFRfO3BRM6z54PS1X9jBWHPzkbiIuNs5XmjgaNVf6Kqh974ai40gHM1UZ3zbVutWXEqEu8R5RQz4YwMJmSu4Za+d7QaQltIHXQBwHWWrdYHPtvAYhCbaIL0tDP5JxPl5o9PwGLDt5Y/t4PSMd/eHb+9CulBZiIl3qxlV+1UQlPzC11lFWqxBx2F97b4qYmc8x/g6+Zfgv79ncmcQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=10U2TO3Ga+Uo9J+2tkr7brFvChsAdesqJHhGycYGyXo=; b=Y/jMVDRx1591F/3TZsg3+Kwx32CD3RFCOhquZDQdSYu+l+MucpFzihEL/HgRBF6Yw+NSFMeDMk4OcUnqkTR0vT5FutI5CgQ6GOuOQy98/Id7fr1E7DbPs4+uUaQystyaS23FHQ/oR0bAVCQkPnMn9OsYmuiO4XWZyNptKotqs6PKr0GgHsIbq8l3AYlMsROq3llha8CPyzYfUCKs9vXi39AB7nHDiYMmBZT2DgMDsp0Azrue715cWPuuyA5CoHY2mCtSGXZpeoAQLvkbwPhpHDFoApd2V1MOwWls3HnEKit96jStJ9qfVkGFHK5OzZCjSon+9kPtHcs2wU3o+rOAZw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=10U2TO3Ga+Uo9J+2tkr7brFvChsAdesqJHhGycYGyXo=; b=C6fYJ66lfJaaFYkV/5SL3qBTD2ZUy3nYhAZX6+x3NqmkS8XJ4AdPc3fhynvx2Aon/yXSJFVpJicVhjbCq6zJVExRnWK/twf3c6eiJS7uHchuJsTqbKZ1w1FtkJnLT2xOE+lCW9SOVvmnIB1MOPUCC1zgr0sm0kBOhz1dlPCMuUs= Received: from PH7PR12MB8596.namprd12.prod.outlook.com (2603:10b6:510:1b7::6) by SN7PR12MB6861.namprd12.prod.outlook.com (2603:10b6:806:266::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.20; Thu, 12 Sep 2024 02:19:08 +0000 Received: from PH7PR12MB8596.namprd12.prod.outlook.com ([fe80::a011:943d:7291:8069]) by PH7PR12MB8596.namprd12.prod.outlook.com ([fe80::a011:943d:7291:8069%5]) with mapi id 15.20.7939.017; Thu, 12 Sep 2024 02:19:07 +0000 From: "Varghese, Vipin" To: Bruce Richardson CC: =?iso-8859-1?Q?Mattias_R=F6nnblom?= , "Yigit, Ferruh" , "dev@dpdk.org" Subject: RE: [RFC 0/2] introduce LLC aware functions Thread-Topic: [RFC 0/2] introduce LLC aware functions Thread-Index: AQHa+JNN1dIv3haYqUm1ERuQ50RdLbI7nXQAgAgSf4CAA7kzgIAIKv8ggAAKnQCAAmLgMIAA1QwAgACgD/A= Date: Thu, 12 Sep 2024 02:19:07 +0000 Message-ID: References: <20240827151014.201-1-vipin.varghese@amd.com> <45f26104-ad6c-4e42-8446-d8b51ac3f2dd@lysator.liu.se> In-Reply-To: Accept-Language: en-IN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_ActionId=962938bd-ab78-4f30-879a-4af8fecc7214; MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_ContentBits=0; MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Enabled=true; MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Method=Privileged; MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_Name=Open Source; MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_SetDate=2024-09-12T01:34:57Z; MSIP_Label_f265efc6-e181-49d6-80f4-fae95cf838a0_SiteId=3dd8961f-e488-4e60-8e11-a82d994e183d; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: PH7PR12MB8596:EE_|SN7PR12MB6861:EE_ x-ms-office365-filtering-correlation-id: acd50482-0f3d-4f72-5b61-08dcd2d147d2 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?Q?Q8yTB9LkLs0dNySpJiXH6nih4yq1Vj2BhIsLrDFhwGlo/KLHCQLCwFdC1d?= =?iso-8859-1?Q?gYnAo0vvgyyB0pdEp2DPQy4ieh0T+H8AN12bXMXvQDyXDo2wqT+pUqti3m?= =?iso-8859-1?Q?5kk0vPxugrMK0DDNAwO6ROUt+t/qISb5N5rlIPWTYREWkPWsq+frfkmbww?= =?iso-8859-1?Q?Kw5l9r8HKg5n05IbfNy690gL35UlU7mZmoTbuZHudwPZ0PNWImO5ZcUIdw?= =?iso-8859-1?Q?/dTndE878eJwW2fw36o3yQY2riiuFxfBn0PnfIGaANJGjc7y/WhynkUmKp?= =?iso-8859-1?Q?Hg84d0riTe8tgClwGTm6+ldN7mje0ex1hVa1sOQSPLx0WnG9xclhfAXyFa?= =?iso-8859-1?Q?UeAwB+wjTq3oMLlxvCKVHnhyYCOQO2nim3zCkWrJhl6wPl1OT+dhnuFudM?= =?iso-8859-1?Q?WnxCIXicS87sCuiUXMy6hEdhg8ybBDQIlCO37IIQinXxfr/LmdezFdmw5N?= =?iso-8859-1?Q?96KmfVXHAMy0FW87+kwWD81AWR/AwcoHjoNOTUv73bp4LTRBTQaau9P0wa?= =?iso-8859-1?Q?4KNkLVNcIZ0x1qXQCo+WmEC/ZxD8mFkemmHsF/YP1ctVZtGcCRv92Ezg1I?= =?iso-8859-1?Q?GU5M1fxwuLhavzs0qHIrXLXiWbFEpo57cS+K4JDbBCGbBY4gMVs8giTFyA?= =?iso-8859-1?Q?kNTA3rND0ylKD4ldNig+BTlu7KJw1qbb9mlrOmXIzeIRdXTNz7BQZl4Q48?= =?iso-8859-1?Q?Pn8TPQCrsnHRr1OUwm7H8nYFdtkwb3pPnbaG5BLmN0hqAhMJEIW+sDQ00M?= =?iso-8859-1?Q?uKRh6zieAR9lQbPlBD1bdPU1GUg3aEiHhUfHf/m0a7wXzIoEqoWTmHhqoN?= =?iso-8859-1?Q?OdTtP5za0k+JKDgu7RvzHQdcbovsqmJiMBujD84d0aQ59w+HpzzvyVWQs1?= =?iso-8859-1?Q?78htx24JgbpEZPUgIN+aUqlGMurFfiH3Yyouo01QeJVdLBuHuLHYUXIjA1?= =?iso-8859-1?Q?R+GKDKXtYUcrgkFmpz5Nx/erK1esZrOx5Vaa+6nv6Mq3NSPeQ4exV9svOo?= =?iso-8859-1?Q?F0ZhgG4YMWrYFfnDt4KlgDdlHhh347NEL8iFYjHjneIYFv9EIKH3+rw7IM?= =?iso-8859-1?Q?S+BtUKVuHgt+4fLs5QF3z4z1B+F6j8AnHErYJJUjM/7vgA4hoIQOgHIDUx?= =?iso-8859-1?Q?Ke+1V5iFk+GbaZ/1Fh8hwCFnPoZQUbAeYoIvEvKXst4MmuKuvJS4JAmxzn?= =?iso-8859-1?Q?SxB+X4IcOTEj6aYcNVPfw7qPVtRn2xBfDz0WcwFZs65IGnJELlmkZENEbl?= =?iso-8859-1?Q?wMz6CayuJT3pjYsQVxLD0gUJyJW3ikuYk/lWpLqDIYfvsSLCPcIN/eaUYh?= =?iso-8859-1?Q?F/OeLIJksqVhEfEor65z/VzpVfxsD4b/x6QJPl/489b3wumAK2fFEJiGk+?= =?iso-8859-1?Q?+PkWpvU1sEp0NVTlQWtMjmsreqR0W6vD1koek4KVawyD4RmlhnpVv8yC6K?= =?iso-8859-1?Q?mk/S34CT7vPubu6aZHfXuGn+YE+C+q69s18EGg=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH7PR12MB8596.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?Mxk4pTqiznVOp9apJA+ati/e10YEAvXTKu9UJhmq3tUn4HIcd9seXuUtwd?= =?iso-8859-1?Q?0KiNnl6jQxL80Ayamm7TFuQDHwoI0uCQg92p93PZEU2TLClGP8eXh87eHM?= =?iso-8859-1?Q?dA207emBIlPnW+coju8q862rBOO+71MGk+Z9OFsz0pmXn8P45bjaAI+P2d?= =?iso-8859-1?Q?MqkNUlDkiOJiSMrejJmFeo32460Pl6iBVHWTbnC27JGEWt7+zbK3kN9BNG?= =?iso-8859-1?Q?AUT8DYatvIoILluwMFtSCBeqACmgXK1X8y0yoHmUB8SQmj34q7k4MPCGb1?= =?iso-8859-1?Q?wjgWSGQWg3avQSCgz9p8IlmxhOERMQr4XQBN9+5oWAAFBlDXFkRbLS3n8H?= =?iso-8859-1?Q?zCqbEoUv2qmz/4FfnpZrOlooMNC5XAwx9mvK1Arurhrhyx4p2FzuBr2/yQ?= =?iso-8859-1?Q?41aswWFr+7wjh0p7/Xp6Wrl36Tc+PwmHBPgYZXxm2dkCuvEtCO+AtrgP3e?= =?iso-8859-1?Q?yyja69G+rmFpmUs3AbJOOPX84Xs0fde4VKr+6LxcLzVh2ZTBUI959apZdd?= =?iso-8859-1?Q?KgeER75vXVwsC0WaN5jSrOo1O+OM/WK2gMbMAHxuJ4HJIwth/fwdnvCT9J?= =?iso-8859-1?Q?BjGQZGI3VlSvTLxKWYdrJM29Ql3b4wwuf5EgCcr3pY4DD1RZDBJi8QhXg5?= =?iso-8859-1?Q?QC3bBZLXYOl7VquckU3sBXxYqQ4vKzfT3mlAsPFKvTxAnOMT6+vmXg9RSC?= =?iso-8859-1?Q?z2Idrr77gnNqnq/m51Uihxn0gyqSQeoCA9k57paPuHQeWsyccw5+cE9AG7?= =?iso-8859-1?Q?FQr6CnAfRuZ6717KOwUKaJpINZbhLvdXV4nDmndHAphc77wyHXfuukyeFb?= =?iso-8859-1?Q?TlnX5oyvobPdrMDQTGFC6jxHjWuxoSwpKs2jy3atblkYqK9PjEryz9Wsoa?= =?iso-8859-1?Q?3gkfaOGSAiH3/quC2DbPGJJGJVInT3ctZw24myCB8MnznhtIIJ4mZYkrrc?= =?iso-8859-1?Q?gbccKUiAmIfFd6cD14rtK8mLwOww1rpUPrfFNs6Z4qDcTm+d/vHVkBFGrz?= =?iso-8859-1?Q?MMb2HFbf/Hj3BTlW+CNEFawZCOXB/yK1huT+APY8ylT66EfPVg/+SL9D7Z?= =?iso-8859-1?Q?IDlRAbLOMBV31wUePMaEXvgA4bg1vmxTdzC3FOCa7jdZlYlujks7MffHzv?= =?iso-8859-1?Q?ZWH2H4wohnFEcVmFkgFAoLyE6YHAheXReXIsw8QGwlBf7iU82yOqmMlnmV?= =?iso-8859-1?Q?tfSmTqul3Tn8A3r83JJCzTHq4Hjd7meAHQmw/9P0QwmulozI0DpHTtPjFs?= =?iso-8859-1?Q?TVmlKrTndhCtLIV3rwYVEv1yInIsUkqThkbV9nRg2ldfJG3swAXrjVV6ci?= =?iso-8859-1?Q?V5zteQAc4KfUXVr07D+mrJnTbp3ZVMhTuq9i1eT9FwbLRuPA7WecDHdGrK?= =?iso-8859-1?Q?Ke8HV9EAiRZKCbN76C3nrvDIRO3hNzZ+V5T5k1J2TnqXBj/vGgGaZRfM7X?= =?iso-8859-1?Q?CCDUEbvLVKt6417AKCNXkFkfOv3P+80nZ3jUN5aONBhSf9qpUH5m9GA9sb?= =?iso-8859-1?Q?RF3buhm6L4rx9QQzrSFuRb1aBYjO6tDEkaJElibAgsQ9U2EfDGD/Vz7meo?= =?iso-8859-1?Q?HvUNIP9Zm6cCDGUhGpufacruz0Ll4x5PzBtdjy7KOPaUz7O3wLM2ctP1d+?= =?iso-8859-1?Q?4MAhz33Jy0EgI=3D?= Content-Type: multipart/alternative; boundary="_000_PH7PR12MB85961A1CF3D5D7250DEF8B2482642PH7PR12MB8596namp_" MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PH7PR12MB8596.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: acd50482-0f3d-4f72-5b61-08dcd2d147d2 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Sep 2024 02:19:07.4465 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 15/Xq5m+fPvGGIWWfBby0r7i9j9F1OBbYyRKbtnHomYcL8uCtgPwrykRUuSpjYram1IrDrXJTuMnHlHgsX7M+A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB6861 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --_000_PH7PR12MB85961A1CF3D5D7250DEF8B2482642PH7PR12MB8596namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable [Public] > > > > > > > > > > > >>> > > > >>> > > > >>> Thank you Mattias for the comments and question, please let me > > > >>> try to explain the same below > > > >>> > > > >>>> We shouldn't have a separate CPU/cache hierarchy API instead? > > > >>> > > > >>> Based on the intention to bring in CPU lcores which share same > > > >>> L3 (for better cache hits and less noisy neighbor) current API > > > >>> focuses on using > > > >>> > > > >>> Last Level Cache. But if the suggestion is `there are SoC where > > > >>> L2 cache are also shared, and the new API should be > > > >>> provisioned`, I am also > > > >>> > > > >>> comfortable with the thought. > > > >>> > > > >> > > > >> Rather than some AMD special case API hacked into , > > > >> I think we are better off with no DPDK API at all for this kind of > functionality. > > > > > > > > Hi Mattias, as shared in the earlier email thread, this is not a > > > > AMD special > > > case at all. Let me try to explain this one more time. One of > > > techniques used to increase cores cost effective way to go for tiles = of > compute complexes. > > > > This introduces a bunch of cores in sharing same Last Level Cache > > > > (namely > > > L2, L3 or even L4) depending upon cache topology architecture. > > > > > > > > The API suggested in RFC is to help end users to selectively use > > > > cores under > > > same Last Level Cache Hierarchy as advertised by OS (irrespective of > > > the BIOS settings used). This is useful in both bare-metal and contai= ner > environment. > > > > > > > > > > I'm pretty familiar with AMD CPUs and the use of tiles (including > > > the challenges these kinds of non-uniformities pose for work scheduli= ng). > > > > > > To maximize performance, caring about core<->LLC relationship may > > > well not be enough, and more HT/core/cache/memory topology > > > information is required. That's what I meant by special case. A > > > proper API should allow access to information about which lcores are > > > SMT siblings, cores on the same L2, and cores on the same L3, to > > > name a few things. Probably you want to fit NUMA into the same API > > > as well, although that is available already in . > > > > Thank you Mattias for the information, as shared by in the reply with > Anatoly we want expose a new API `rte_get_next_lcore_ex` which intakes a > extra argument `u32 flags`. > > The flags can be RTE_GET_LCORE_L1 (SMT), RTE_GET_LCORE_L2, > RTE_GET_LCORE_L3, RTE_GET_LCORE_BOOST_ENABLED, > RTE_GET_LCORE_BOOST_DISABLED. > > > > For the naming, would "rte_get_next_sibling_core" (or lcore if you prefer= ) be a > clearer name than just adding "ex" on to the end of the existing function= ? Thank you Bruce, Please find my answer below Functions shared as per the RFC were ``` - rte_get_llc_first_lcores: Retrieves all the first lcores in the shared L= LC. - rte_get_llc_lcore: Retrieves all lcores that share the LLC. - rte_get_llc_n_lcore: Retrieves the first n or skips the first n lcores i= n the shared LLC. ``` MACRO's extending the usability were ``` RTE_LCORE_FOREACH_LLC_FIRST: iterates through all first lcore from each LLC= . RTE_LCORE_FOREACH_LLC_FIRST_WORKER: iterates through all first worker lcore= from each LLC. RTE_LCORE_FOREACH_LLC_WORKER: iterates lcores from LLC based on hint (lcore= id). RTE_LCORE_FOREACH_LLC_SKIP_FIRST_WORKER: iterates lcores from LLC while ski= pping first worker. RTE_LCORE_FOREACH_LLC_FIRST_N_WORKER: iterates through `n` lcores from each= LLC. RTE_LCORE_FOREACH_LLC_SKIP_N_WORKER: skip first `n` lcores, then iterates t= hrough reaming lcores in each LLC. ``` Based on the discussions we agreed on sharing version-2 FRC for extending A= PI as `rte_get_next_lcore_extnd` with extra argument as `flags`. As per my ideation, for the API ` rte_get_next_sibling_core`, the above API= can easily with flag ` RTE_GET_LCORE_L1 (SMT)`. Is this right understandin= g? We can easily have simple MACROs like `RTE_LCORE_FOREACH_L1` which allows t= o iterate SMT sibling threads. > > Looking logically, I'm not sure about the BOOST_ENABLED and > BOOST_DISABLED flags you propose The idea for the BOOST_ENABLED & BOOST_DISABLED is based on DPDK power libr= ary which allows to enable boost. Allow user to select lcores where BOOST is enabled|disabled using MACRO or = API. - in a system with multiple possible > standard and boost frequencies what would those correspond to? I now understand the confusion, apologies for mixing the AMD EPYC SoC boost= with Intel Turbo. Thank you for pointing out, we will use the terminology ` RTE_GET_LCORE_TUR= BO`. What's also > missing is a define for getting actual NUMA siblings i.e. those sharing c= ommon > memory but not an L3 or anything else. This can be extended into `rte_get_next_lcore_extnd` with flag ` RTE_GET_LC= ORE_NUMA`. This will allow to grab all lcores under the same sub-memory NUM= A as shared by LCORE. If SMT sibling is enabled and DPDK Lcore mask covers the sibling threads, t= hen ` RTE_GET_LCORE_NUMA` get all lcore and sibling threads under same memo= ry NUMA of lcore shared. > > My suggestion would be to have the function take just an integer-type e.g= . > uint16_t parameter which defines the memory/cache hierarchy level to use,= 0 > being lowest, 1 next, and so on. Different systems may have different num= bers > of cache levels so lets just make it a zero-based index of levels, rather= than > giving explicit defines (except for memory which should probably always b= e > last). The zero-level will be for "closest neighbour" Good idea, we did prototype this internally. But issue it will keep on addi= ng the number of API into lcore library. To keep the API count less, we are using lcore id as hint to sub-NUMA. > whatever that happens to be, with as many levels as is necessary to expre= ss > the topology, e.g. without SMT, but with 3 cache levels, level 0 would be= an L2 > neighbour, level 1 an L3 neighbour. If the L3 was split within a memory N= UMA > node, then level 2 would give the NUMA siblings. We'd just need an API to > return the max number of levels along with the iterator. We are using lcore numa as the hint. > > Regards, > /Bruce --_000_PH7PR12MB85961A1CF3D5D7250DEF8B2482642PH7PR12MB8596namp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
[P= ublic]

<= snipped>
&nbs= p;
>= > > > <snipped>
>= > > >
>= > > >>> <snipped>
>= > > >>>
>= > > >>> Thank you Mattias for the comments and question, pl= ease let me
>= > > >>> try to explain the same below
>= > > >>>
>= > > >>>> We shouldn't have a separate CPU/cache hierarch= y API instead?
>= > > >>>
>= > > >>> Based on the intention to bring in CPU lcores which= share same
>= > > >>> L3 (for better cache hits and less noisy neighbor) = current API
>= > > >>> focuses on using
>= > > >>>
>= > > >>> Last Level Cache. But if the suggestion is `there a= re SoC where
>= > > >>> L2 cache are also shared, and the new API should be=
>= > > >>> provisioned`, I am also
>= > > >>>
>= > > >>> comfortable with the thought.
>= > > >>>
>= > > >>
>= > > >> Rather than some AMD special case API hacked into <r= te_lcore.h>,
>= > > >> I think we are better off with no DPDK API at all for t= his kind of
>= functionality.
>= > > >
>= > > > Hi Mattias, as shared in the earlier email thread, this is = not a
>= > > > AMD special
>= > > case at all. Let me try to explain this one more time. One of
>= > > techniques used to increase cores cost effective way to go for t= iles of
>= compute complexes.
>= > > > This introduces a bunch of cores in sharing same Last Level= Cache
>= > > > (namely
>= > > L2, L3 or even L4) depending upon cache topology architecture.
>= > > >
>= > > > The API suggested in RFC is to help end users to selectivel= y use
>= > > > cores under
>= > > same Last Level Cache Hierarchy as advertised by OS (irrespectiv= e of
>= > > the BIOS settings used). This is useful in both bare-metal and c= ontainer
>= environment.
>= > > >
>= > >
>= > > I'm pretty familiar with AMD CPUs and the use of tiles (includin= g
>= > > the challenges these kinds of non-uniformities pose for work sch= eduling).
>= > >
>= > > To maximize performance, caring about core<->LLC relations= hip may
>= > > well not be enough, and more HT/core/cache/memory topology
>= > > information is required. That's what I meant by special case. A<= /span>
>= > > proper API should allow access to information about which lcores= are
>= > > SMT siblings, cores on the same L2, and cores on the same L3, to=
>= > > name a few things. Probably you want to fit NUMA into the same A= PI
>= > > as well, although that is available already in <rte_lcore.h&g= t;.
>= >
>= > Thank you Mattias for the information, as shared by in the reply with=
>= Anatoly we want expose a new API `rte_get_next_lcore_ex` which intakes a
>= extra argument `u32 flags`.
>= > The flags can be RTE_GET_LCORE_L1 (SMT), RTE_GET_LCORE_L2,
>= RTE_GET_LCORE_L3, RTE_GET_LCORE_BOOST_ENABLED,
>= RTE_GET_LCORE_BOOST_DISABLED.
>= >
>=
>= For the naming, would "rte_get_next_sibling_core" (or lcore if y= ou prefer) be a
>= clearer name than just adding "ex" on to the end of the existing= function?
Than= k you Bruce, Please find my answer below
&nbs= p;
Func= tions shared as per the RFC were
```<= /span>
- r= te_get_llc_first_lcores: Retrieves all the first lcores in the shared LLC.<= /span>
- r= te_get_llc_lcore: Retrieves all lcores that share the LLC.
- r= te_get_llc_n_lcore: Retrieves the first n or skips the first n lcores in th= e shared LLC.
```<= /span>
&nbs= p;
MACR= O’s extending the usability were
```<= /span>
RTE_= LCORE_FOREACH_LLC_FIRST: iterates through all first lcore from each LLC.
RTE_= LCORE_FOREACH_LLC_FIRST_WORKER: iterates through all first worker lcore fro= m each LLC.
RTE_= LCORE_FOREACH_LLC_WORKER: iterates lcores from LLC based on hint (lcore id)= .
RTE_= LCORE_FOREACH_LLC_SKIP_FIRST_WORKER: iterates lcores from LLC while skippin= g first worker.
RTE_= LCORE_FOREACH_LLC_FIRST_N_WORKER: iterates through `n` lcores from each LLC= .
RTE_= LCORE_FOREACH_LLC_SKIP_N_WORKER: skip first `n` lcores, then iterates throu= gh reaming lcores in each LLC.
```<= /span>
&nbs= p;
Base= d on the discussions we agreed on sharing version-2 FRC for extending API a= s `rte_get_next_lcore_extnd` with extra argument as `flags`.<= /div>
As p= er my ideation, for the API ` rte_get_next_sibling_core`, the above API can= easily with flag ` RTE_GET_LCORE_L1 (SMT)`. Is this right understanding?
We c= an easily have simple MACROs like `RTE_LCORE_FOREACH_L1` which allows to it= erate SMT sibling threads.
&nbs= p;
>=
>= Looking logically, I'm not sure about the BOOST_ENABLED and<= /div>
>= BOOST_DISABLED flags you propose
The = idea for the BOOST_ENABLED & BOOST_DISABLED is based on DPDK power libr= ary which allows to enable boost.
Allo= w user to select lcores where BOOST is enabled|disabled using MACRO or API.=
&nbs= p;
- i= n a system with multiple possible
>= standard and boost frequencies what would those correspond to?
I no= w understand the confusion, apologies for mixing the AMD EPYC SoC boost wit= h Intel Turbo.
&nbs= p;
Than= k you for pointing out, we will use the terminology ` RTE_GET_LCORE_TURBO`.=
&nbs= p;
Wha= t's also
>= missing is a define for getting actual NUMA siblings i.e. those sharing co= mmon
>= memory but not an L3 or anything else.
This= can be extended into `rte_get_next_lcore_extnd` with flag ` RTE_GET_LCORE_= NUMA`. This will allow to grab all lcores under the same sub-memory NUMA as= shared by LCORE.
If S= MT sibling is enabled and DPDK Lcore mask covers the sibling threads, then = ` RTE_GET_LCORE_NUMA` get all lcore and sibling threads under same memory N= UMA of lcore shared.
&nbs= p;
>=
>= My suggestion would be to have the function take just an integer-type e.g.=
>= uint16_t parameter which defines the memory/cache hierarchy level to use, = 0
>= being lowest, 1 next, and so on. Different systems may have different numb= ers
>= of cache levels so lets just make it a zero-based index of levels, rather = than
>= giving explicit defines (except for memory which should probably always be=
>= last). The zero-level will be for "closest neighbour"
Good= idea, we did prototype this internally. But issue it will keep on adding t= he number of API into lcore library.
To k= eep the API count less, we are using lcore id as hint to sub-NUMA.
&nbs= p;
>= whatever that happens to be, with as many levels as is necessary to expres= s
>= the topology, e.g. without SMT, but with 3 cache levels, level 0 would be = an L2
>= neighbour, level 1 an L3 neighbour. If the L3 was split within a memory NU= MA
>= node, then level 2 would give the NUMA siblings. We'd just need an API to<= /span>
>= return the max number of levels along with the iterator.
We a= re using lcore numa as the hint.
&nbs= p;
>=
>= Regards,
>= /Bruce
--_000_PH7PR12MB85961A1CF3D5D7250DEF8B2482642PH7PR12MB8596namp_--