From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D68DB458F2; Tue, 3 Sep 2024 10:50:18 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AC2CF4042C; Tue, 3 Sep 2024 10:50:18 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) by mails.dpdk.org (Postfix) with ESMTP id 260D6402D0 for ; Tue, 3 Sep 2024 10:50:16 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725353417; x=1756889417; h=message-id:date:subject:to:references:from:in-reply-to: content-transfer-encoding:mime-version; bh=75GfBRPHggHnaJCxB6O8ppViYxHynZz11KJ99ckeMy4=; b=LWd6RsiiFhlbPD95N6Xu2q5TVhkeG4MB7j0sV27opIuH0vctONT78XFm FtG5pgxQvBsLckfgcuvCwsN+tOV6rTxGT6gjIVMVyJGjpJzHDZv5m8KfQ K0BeAbZ/f5rbAPiAtVJNHjrEf1pjaZ7HLPKrCKK6wyIaLFphnB1K+kklT IoClK+ihZs2ckNOIBD+e0LwHUntwszpxDMAvFVAuCvdQ1RE1JlQCX+qQJ nhBuTurx+ikm6Q7+op/PlKZdYzVddyYIfLCS23z/Fqfk7Ftb/wArtMm6k L7ljly5E4Z+Il5SX0FtNYA7VH3MVFRjlZWTUQJ/hyrFs3KhR/TXccnBFw A==; X-CSE-ConnectionGUID: QOMED9qbQRywoIOKMjKJtw== X-CSE-MsgGUID: 96lfwPSIRmGiAt5W8JNoag== X-IronPort-AV: E=McAfee;i="6700,10204,11183"; a="13340902" X-IronPort-AV: E=Sophos;i="6.10,198,1719903600"; d="scan'208";a="13340902" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 01:50:16 -0700 X-CSE-ConnectionGUID: oL87V53FRAK5CXhxek9WqQ== X-CSE-MsgGUID: rH/46jI3QU2ii6uA4k9odw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,198,1719903600"; d="scan'208";a="65032632" Received: from fmsmsx603.amr.corp.intel.com ([10.18.126.83]) by fmviesa010.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 03 Sep 2024 01:50:16 -0700 Received: from fmsmsx612.amr.corp.intel.com (10.18.126.92) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 3 Sep 2024 01:50:15 -0700 Received: from fmsmsx602.amr.corp.intel.com (10.18.126.82) by fmsmsx612.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 3 Sep 2024 01:50:15 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39 via Frontend Transport; Tue, 3 Sep 2024 01:50:15 -0700 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (104.47.56.42) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 3 Sep 2024 01:50:15 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=H+un3Ip82SHR3c97XNvBlbAuQemqg63uUNOFVeZgC1YZkLeFyRUo+ca6wNopGpOSHwhknX4OyPs/UY4Kp7U5mFf/HFLAr6xk6IVPNsPSSt4i0DsJj8lspn32TDwE2TvnrGdt14YN8+u2g3t9cKAJ4dzj4ys2FfkaKVhHEzneONUvvuOexs/Oz0maSN/nRMcT1jclcXr+4CXW1nyeVa73F8TQmRgWUDakZfmWyzkr9+D+MmqXk3OMUwYhNNEEcv/c2yzcTZLwo4WbF0D+TAp09NmxMHqAwfPY2C7duAMvT6AXEbIFkSSHtF0vs2LCFITAdZ3CK9j/Q4nzldkyGarHuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ukIuwtdrKv0dsHoL5zABxZ6b3+7YnDbUyFP8LbDz4iU=; b=uxuT/WK4P4QpJmYWO2j3kZtHeXeZHeOuHN583ZMNz59lRrngjU5zLywjJyxKgUTs3K5JBs0peMWZQU41nIZ5NJ2BKKwob+6u8kw2e2dCczAyUN4J9q1+w/ISiK4ehX5Db+Ge6O/nMjfwxNH4wZNJPwLmVkyaUgLoVuWiYtCa8CrmimpzRfGAHE2g97La0UM3JoRwVzR+TsYF5Z2s9tQCA87uKwsEWThkV7LsB7ut/sOpWQ2otIbleIr5Po66m9TgrtbMedLzc4WlPWoLrdX36k8g+xDAv1PM8EKhZKuS6LKcw8GpmWzMCI8XvgyOz3M5mwdNdyzsnNcTxU6BiksoNg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from PH7PR11MB6498.namprd11.prod.outlook.com (2603:10b6:510:1f1::21) by SN7PR11MB7465.namprd11.prod.outlook.com (2603:10b6:806:34e::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.24; Tue, 3 Sep 2024 08:50:13 +0000 Received: from PH7PR11MB6498.namprd11.prod.outlook.com ([fe80::999a:425d:a211:5d30]) by PH7PR11MB6498.namprd11.prod.outlook.com ([fe80::999a:425d:a211:5d30%6]) with mapi id 15.20.7918.024; Tue, 3 Sep 2024 08:50:13 +0000 Message-ID: <3edc8a89-7d10-47f4-8f95-856c2a7fc7ba@intel.com> Date: Tue, 3 Sep 2024 10:50:05 +0200 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 0/2] introduce LLC aware functions To: "Varghese, Vipin" , , References: <20240827151014.201-1-vipin.varghese@amd.com> <288d9e9e-aaec-4dac-b969-54e01956ef4e@intel.com> <65f3dc80-2d07-4b8b-9a5c-197eb2b21180@amd.com> <8addd7f6-fac8-45ec-a44f-f81eb008cc36@intel.com> Content-Language: en-US From: "Burakov, Anatoly" In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: DU7P189CA0021.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:552::17) To PH7PR11MB6498.namprd11.prod.outlook.com (2603:10b6:510:1f1::21) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR11MB6498:EE_|SN7PR11MB7465:EE_ X-MS-Office365-Filtering-Correlation-Id: f8d84c53-083c-4e90-65ae-08dccbf56cac X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?B?WnpmNzRPZmZ4K2J0ZG54TWVvb0EzV1ltVjRpTjBXNzA3RzczY0V1UTh3ek91?= =?utf-8?B?ZXBWNndGRWtIVXk4azYxVmQrejBrNk1lblNGZUNHVGhLem4wckoxNS9xazYx?= =?utf-8?B?Q21wME1nOTVZR21LcmxzYzd5ZzB1T1NxemF2NnpyU0U0TlBTT0diN3J0ek9u?= =?utf-8?B?cHBGOFVIdjVXMWFocU0rV1lpVmxySk5mOXpTYm9sSTcvbGk0TWdmeGgxYVcz?= =?utf-8?B?Vlo3YldpeGpuQTZ0YnUvaGsrM3UwZU9NQjdoZ2VlbUJrZEpGV0VGNHRETVU2?= =?utf-8?B?UTRwd3IvdkZzbTlTRkp1aUovczVwNUZoVHFqUHAvY2dQbHJybGVEcXk3SkJB?= =?utf-8?B?dFNFbUhCNm9mTWhNZGtkTW5ScUptc1VDMm5Ya3JsbkpPekxFOHlEWTVwRmJE?= =?utf-8?B?RzNwNStjYS8rK1E2UndJYzFNUExaTVh4cE9xMW5RNTZRRmFnL3RZcS9TcGZ2?= =?utf-8?B?NDl4VUZ3Z0hNdXhNZ0MzUXJmcjZRWlpSMFkrbFA2VmJDLzVhVlh3VWR3Y09u?= =?utf-8?B?NU1Obi9mR0NGbHh6T2l3ZVF3YnNBNyt0aG4zWnl2NTRhdVpLZU5GSER5MjRp?= =?utf-8?B?cUpHcDFnOFUwRnByMG5PZTM1L0daN0EvQk9jTlRZMTdtenhpNkZnLzhKR3M0?= =?utf-8?B?eDZzRU83V1k1NWJvRjYvVkI1MWhqQXNRaVBzYzB4V1N5NzRISVYreFRhTzVN?= =?utf-8?B?cDI5enJDdHRKRzQ2YnpxU3AvZ3hXalFNL3ZuVENoL0JZRmRjQzM4S0VUb3Fw?= =?utf-8?B?K09ZZ05kdEp5RWFFWldnRkRuRzA0TU8renVXT3J3QitZbCt2TWZZUlJneFRk?= =?utf-8?B?ZVVHUk1vOUN1WVl6akJyVDI0ZWNvR0lXRnRzR29ZamhndWo0QWJDbzFmK0tj?= =?utf-8?B?MlYrb3JmeFdtaGdjT2tiM3E4bkUzKzJkVXNvNjRsVUdNR3JPeWtJMmlSZkpB?= =?utf-8?B?aTdyNVF6djA3MGxXby9SV0U4eWZmMkxrQVJia3lhZ01zWVJFUTBKaFNXcGJV?= =?utf-8?B?YXQrL1BLVkFuckJqeGwyOTlVWWQ5STNPcE1WakUxOG9QWGZOajk1bkIwTERp?= =?utf-8?B?WmJaaFAzTzVzMHQ4ZTJ4RWdNbzR5UHl6eTVNR1pvMWRTbDNuc3U0UFYrQmY0?= =?utf-8?B?TkJRaEtLZXI4UHdIS0I1a3hkTGE4UlFOLzJCUXFKWnNYL09tYS9iRWdMeUtL?= =?utf-8?B?TllNbG1JK3hZL3p6NXI5a2xpWllRWDZxSTJXLys4bHJNTnNudS9oajc2emJM?= =?utf-8?B?aFJFVG9tcUYraG9oQkdJSWxhZjljd0VPRmJMRUZINXFETk5RSktwNlNNazV5?= =?utf-8?B?VkdvYW0wekVkTFQ4WFJjTGZ5akpEdkZhcTBCbWhGK1JGMmFJcG9uN2oxenJW?= =?utf-8?B?eHcwMlhIYWFtRjBhM2VpN0RrbUtWdVBnRFdtWGttU0t5K0VuUHRhbVhWWXpn?= =?utf-8?B?RVVHM3dXcklNWGVLcVFmNDIwbjVheUNTZ1JnZ1F1WjI5ZGU2VXZrak5SbnZ5?= =?utf-8?B?SXpkYTFoYlZjZzc0OFJkWEJkMVV4MlhkTWZBUWt4SGhDb1FwYWcySklHczlY?= =?utf-8?B?T0VnZWw4QzVDV1cxNkFtaTBEMm5ZTlNyZlQxbkJFUktPOVRYd0x3THVOZnZX?= =?utf-8?B?UHJzazdvdzJpNG5HVFZWcU5SN0VxM2FmWXJPWktiWTh0SmFsalQ5NVFHUzgr?= =?utf-8?B?L3ZnV2dpMFMxRUNoblkzK3BGYmZYR2tRcjRKd1pMZ3RRbnM0eWhRYzduOVpj?= =?utf-8?Q?GpAc7iUR6rc+7TLzishrSBdjqider1nE/0y9K7/?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH7PR11MB6498.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(376014)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?REVISGNzblFrQUJ4ajNSUEpmNWJwN3dtUjRuc08vc3hRYUFjVURML0lJZTAz?= =?utf-8?B?Rk9HTys4NDRqM0IxREFDZDYxUCtoTGlvS0JRRmtDZGFQRzFjdDh4VU1Nc0pP?= =?utf-8?B?d1ZVY0l4S09mcDRaQWpJa21mNGwwZTZTZUd2UU9OY2hnSEY3T2NnT3hJb1B0?= =?utf-8?B?c1orVWw3NXlHL25JRlRpMWR6c01EcTNTMmFwRkhUQzFBNmlEY2VlTTBWZHlY?= =?utf-8?B?Sk8zY1QyRjREQU9WZDFWeVBFOHVEUFlnSFUwbGlSc2cydmtjaStYZ2ZMTzcy?= =?utf-8?B?YW1uMDBqTjhHOVhDZXl0RHY2Y0k5T1AvTGc3YTVFSmN6NWE3am12ak5XR0Jn?= =?utf-8?B?OXpHYWlneS81RTlNM3NOZnJPNDBDNXJwcmMxd25hUHFhTi84VDZiVXh3emRz?= =?utf-8?B?U2V5WElrL2ZCWGxzVVdSZENSZG5HN0Q5UTBXWmZVdXRtbEk2Y3RUMklMRmh3?= =?utf-8?B?c0h0OEZudnlHTjRYdm1HQ25TNkMrSHJSeGg0cE1UbU9WWUI4Mm8wekhnWHE3?= =?utf-8?B?dytSZ3J0ZDdNd0JBZGVFbjhJY2JydGY5VlNPSVpobW91U3RTWXUwM0lvQ2cr?= =?utf-8?B?ZzFhTkxPMmFxakJaOXhYZkU0RVdDOXRLdzRIN2pSSUxWN2NDMXRQQXVrbVRE?= =?utf-8?B?RDlZakxJYVdHWlZ1S1NFY0Z0WnVucjBZWW1YVW9xMm5Kc1ZTVFk1ZXd4eGFE?= =?utf-8?B?QnhnaVBDTW5oZURSOTRNbXQwK3NBQm0xYS95MmkrRitrV3hEbnhyNGtUWHF3?= =?utf-8?B?UEtyL20rVnYxMFJCRHg5ZXFtMDU0bUlOT2Nyb24rcyt0SmNERy82dlFPZ3VB?= =?utf-8?B?dXMzNVMxZmpwKzdQKzlyUkNXdTh5R0FSWk5DVUNzSDRPVDZXdEFxb1VsMGNu?= =?utf-8?B?QnMwTmNsUFM2MjY0VHJuWlV0VGsrWkVGUlloRHZiZmZZcVRSWTNkZGxCSVVJ?= =?utf-8?B?WDcyS1h0ZWNDYXhPdFJHOW14bHl2WEpRczc2d3E1UUFXZjRPYlhtMlVqOHBl?= =?utf-8?B?Z1B2K2lDb0FjRzlVcWRXZllWUFpuTzVuMFBRVGs5MFlzSnUxZUhhR0gveFRC?= =?utf-8?B?Skg4SHdJRmwrZ3pneFJZbjJ0UVEvM3NFVFlwUVlBa3AxcWFadUV6bXFaZXlJ?= =?utf-8?B?Q2lIN3dHclB0dXZPRHhFMlJnOVd0M0tuN21qSnZmUEtIUjRac25VN0ExczMz?= =?utf-8?B?dVFqdnRsaElRaWYvK0dzb0h1bC9IT251amhhK2dvblFFamY5K1hxWDRXcVN6?= =?utf-8?B?aWhpTElJVS96Z0VXTDZsVXdYQ2tNZjNVa2g2Qmx3d2M4SUVVSGUxaitKaDBM?= =?utf-8?B?NnNmaStSYWJ3ekRaMHdteC95UjZIN0FBTzhXMUJLSjFIYkpvVGxQRmNGSlha?= =?utf-8?B?M2ZkWTlnN1U2Q2pvSHJodWsrS21Gd2d2bzMwcjlGM0Z0Y0ZSSzNNUHEyc1lC?= =?utf-8?B?RER2ZGZ2L2xxK1lXS0pUNEhKSEI4NDh5bHBuZlVQdUkxdGRKeFpMME5tSDVM?= =?utf-8?B?SklRdkJZcUduYThzK2tIZkFjQXREZUdOVWVoa1VqT3RIdFEreXliOEQwUHRw?= =?utf-8?B?Z1dqR3hQanJSMFhUbk5MOS9vMFp5eWgxVzRVWWdxdHNUZnRueU44eVJYS3k3?= =?utf-8?B?WGp6cGZzbEhKREN2R3pzUndsdFIralBGUGo5SVlDRFc4dGhBbStXK2lGYmg2?= =?utf-8?B?akhlS3NhcStCMU1CQzR5WGVqWXJVaWJJbms3ZGN1TWxoajZ0YkpWNktZN1Ar?= =?utf-8?B?SzIrQU5Nb2VmS084MjZSbnZIQVZMTU9KbjdaaWx0aTZlYStnZnNKR3A0aWVX?= =?utf-8?B?aThCUnZmelZZKzNkYTcrT0w5bWdTVTNESko3ZkpPeXVzR0wrQzZtVTcrMXpX?= =?utf-8?B?UldvR2FiSS90aTVHOS91ei9XcVhSck0vemJSSEpaMVZKZGd2OXZ5dTNWaXF4?= =?utf-8?B?amQ4b1JjV2Rxb2JBeVdrSk9JaEZMZCs3STlIY2VSQ2J2U2UvSUQ1UUZUR1cz?= =?utf-8?B?Q0kxdW14aGk2MXgwYmtEcEVldnJmU3lQbU0wRDByUWpqWnhOMlVGck1INTly?= =?utf-8?B?UTFOZ2g5MlBxYUpFYk0ydWNLYkRZaExoeUk1MzgxanFpRjdHZVVOZDVqREI4?= =?utf-8?B?VHZ6bTQ5T1poekZwV2hLTWI0aTFCODkza0E3djRLZHZtL1BLQ3hlZFpzQTF1?= =?utf-8?B?MFE9PQ==?= X-MS-Exchange-CrossTenant-Network-Message-Id: f8d84c53-083c-4e90-65ae-08dccbf56cac X-MS-Exchange-CrossTenant-AuthSource: PH7PR11MB6498.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Sep 2024 08:50:13.1902 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: jXNlErIwd2+5f3YwDsZeIMplDNFjO/M07Ni8nVEbhJJZsKgCbhAxaxbz6WnGO2lBg80eVOeWDc4ew8w85J61B2So0kN9EmMFIBSSkZba8H8= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR11MB7465 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 9/2/2024 5:33 PM, Varghese, Vipin wrote: > >>> >>>> I recently looked into how Intel's Sub-NUMA Clustering would work >>>> within >>>> DPDK, and found that I actually didn't have to do anything, because the >>>> SNC "clusters" present themselves as NUMA nodes, which DPDK already >>>> supports natively. >>> >>> yes, this is correct. In Intel Xeon Platinum BIOS one can enable >>> `Cluster per NUMA` as `1,2 or4`. >>> >>> This divides the tiles into Sub-Numa parition, each having separate >>> lcores,memory controllers, PCIe >>> >>> and accelerator. >>> >>>> >>>> Does AMD's implementation of chiplets not report themselves as separate >>>> NUMA nodes? >>> >>> In AMD EPYC Soc, this is different. There are 2 BIOS settings, namely >>> >>> 1. NPS: `Numa Per Socket` which allows the IO tile (memory, PCIe and >>> Accelerator) to be partitioned as Numa 0, 1, 2 or 4. >>> >>> 2. L3 as NUMA: `L3 cache of CPU tiles as individual NUMA`. This allows >>> all CPU tiles to be independent NUMA cores. >>> >>> >>> The above settings are possible because CPU is independent from IO tile. >>> Thus allowing 4 combinations be available for use. >> >> Sure, but presumably if the user wants to distinguish this, they have to >> configure their system appropriately. If user wants to take advantage of >> L3 as NUMA (which is what your patch proposes), then they can enable the >> BIOS knob and get that functionality for free. DPDK already supports >> this. >> > The intend of the RFC is to introduce the ability to select lcore within > the same > > L3 cache whether the BIOS is set or unset for `L3 as NUMA`. This is also > achieved > > and tested on platforms which advertises via sysfs by OS kernel. Thus > eliminating > > the dependency on hwloc and libuma which can be different versions in > different distros. But we do depend on libnuma, so we might as well depend on it? Are there different versions of libnuma that interfere with what you're trying to do? You keep coming back to this "whether the BIOS is set or unset" for L3 as NUMA, but I'm still unclear as to what issues your patch is solving assuming "knob is set". When the system is configured correctly, it already works and reports cores as part of NUMA nodes (as L3) correctly. It is only when the system is configured *not* to do that that issues arise, is it not? In which case IMO the easier solution would be to just tell the user to enable that knob in BIOS? > > >>> >>> These are covered in the tuning gudie for the SoC in 12. How to get best >>> performance on AMD platform — Data Plane Development Kit 24.07.0 >>> documentation (dpdk.org) >>> . >>> >>> >>>> Because if it does, I don't really think any changes are >>>> required because NUMA nodes would give you the same thing, would it >>>> not? >>> >>> I have a different opinion to this outlook. An end user can >>> >>> 1. Identify the lcores and it's NUMA user `usertools/cpu-layout.py` >> >> I recently submitted an enhacement for CPU layout script to print out >> NUMA separately from physical socket [1]. >> >> [1] >> https://patches.dpdk.org/project/dpdk/patch/40cf4ee32f15952457ac5526cfce64728bd13d32.1724323106.git.anatoly.burakov@intel.com/ >> >> I believe when "L3 as NUMA" is enabled in BIOS, the script will display >> both physical package ID as well as NUMA nodes reported by the system, >> which will be different from physical package ID, and which will display >> information you were looking for. > > As AMD we had submitted earlier work on the same via usertools: enhance > logic to display NUMA - Patchwork (dpdk.org) > . > > this clearly were distinguishing NUMA and Physical socket. Oh, cool, I didn't see that patch. I would argue my visual format is more readable though, so perhaps we can get that in :) > Agreed, but as pointed out in case of Intel Xeon Platinum SPR, the tile > consists of cpu, memory, pcie and accelerator. > > hence setting the BIOS option `Cluster per NUMA` the OS kernel & libnuma > display appropriate Domain with memory, pcie and cpu. > > > In case of AMD SoC, libnuma for CPU is different from memory NUMA per > socket. I'm curious how does the kernel handle this then, and what are you getting from libnuma. You seem to be implying that there are two different NUMA nodes on your SoC, and either kernel or libnuma are in conflict as to what belongs to what NUMA node? > >> >>> >>> 3. there are no API which distinguish L3 numa domain. Function >>> `rte_socket_id >>> ` for CPU tiles like AMD SoC will return physical socket. >> >> Sure, but I would think the answer to that would be to introduce an API >> to distinguish between NUMA (socket ID in DPDK parlance) and package >> (physical socket ID in the "traditional NUMA" sense). Once we can >> distinguish between those, DPDK can just rely on NUMA information >> provided by the OS, while still being capable of identifying physical >> sockets if the user so desires. > Agreed, +1 for the idea for physcial socket and changes in library to > exploit the same. >> >> I am actually going to introduce API to get *physical socket* (as >> opposed to NUMA node) in the next few days. >> > But how does it solve the end customer issues > > 1. if there are multiple NIC or Accelerator on multiple socket, but IO > tile is partitioned to Sub Domain. At least on Intel platforms, NUMA node gets assigned correctly - that is, if my Xeon with SNC enabled has NUMA nodes 3,4 on socket 1, and there's a NIC connected to socket 1, it's going to show up as being on NUMA node 3 or 4 depending on where exactly I plugged it in. Everything already works as expected, and there is no need for any changes for Intel platforms (at least none that I can see). My proposed API is really for those users who wish to explicitly allow for reserving memory/cores on "the same physical socket", as "on the same tile" is already taken care of by NUMA nodes. > > 2. If RTE_FLOW steering is applied on NIC which needs to processed under > same L3 - reduces noisy neighbor and better cache hits > > 3, for PKT-distribute library which needs to run within same worker > lcore set as RX-Distributor-TX. > Same as above: on Intel platforms, NUMA nodes already solve this. > Totally agree, that is what the RFC is also doing, based on what OS sees > as NUMA we are using it. > > Only addition is within the NUMA if there are split LLC, allow selection > of those lcores. Rather than blindly choosing lcore using > > rte_lcore_get_next. It feels like we're working around a problem that shouldn't exist in the first place, because kernel should already report this information. Within NUMA subsystem, there is sysfs node "distance" that, at least on Intel platforms and in certain BIOS configuration, reports distance between NUMA nodes, from which one can make inferences about how far a specific NUMA node is from any other NUMA node. This could have been used to encode L3 cache information. Do AMD platforms not do that? In that case, "lcore next" for a particular socket ID (NUMA node, in reality) should already get us any cores that are close to each other, because all of this information is already encoded in NUMA nodes by the system. I feel like there's a disconnect between my understanding of the problem space, and yours, so I'm going to ask a very basic question: Assuming the user has configured their AMD system correctly (i.e. enabled L3 as NUMA), are there any problem to be solved by adding a new API? Does the system not report each L3 as a separate NUMA node? > > >> We force the user to configure their system >> correctly as it is, and I see no reason to second-guess user's BIOS >> configuration otherwise. > > Again iterating, the changes suggested in RFC are agnostic to what BIOS > options are used, But that is exactly my contention: are we not effectively working around users' misconfiguration of a system then? -- Thanks, Anatoly