DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: "Varghese, Vipin" <vipin.varghese@amd.com>,
	<ferruh.yigit@amd.com>, <dev@dpdk.org>
Subject: Re: [RFC 0/2] introduce LLC aware functions
Date: Mon, 2 Sep 2024 16:17:13 +0200	[thread overview]
Message-ID: <8addd7f6-fac8-45ec-a44f-f81eb008cc36@intel.com> (raw)
In-Reply-To: <65f3dc80-2d07-4b8b-9a5c-197eb2b21180@amd.com>

On 9/2/2024 3:08 AM, Varghese, Vipin wrote:
> <Snipped>
> 
> Thank you Antaloy for the response. Let me try to share my understanding.
> 
>> I recently looked into how Intel's Sub-NUMA Clustering would work within
>> DPDK, and found that I actually didn't have to do anything, because the
>> SNC "clusters" present themselves as NUMA nodes, which DPDK already
>> supports natively.
> 
> yes, this is correct. In Intel Xeon Platinum BIOS one can enable 
> `Cluster per NUMA` as `1,2 or4`.
> 
> This divides the tiles into Sub-Numa parition, each having separate 
> lcores,memory controllers, PCIe
> 
> and accelerator.
> 
>>
>> Does AMD's implementation of chiplets not report themselves as separate
>> NUMA nodes? 
> 
> In AMD EPYC Soc, this is different. There are 2 BIOS settings, namely
> 
> 1. NPS: `Numa Per Socket` which allows the IO tile (memory, PCIe and 
> Accelerator) to be partitioned as Numa 0, 1, 2 or 4.
> 
> 2. L3 as NUMA: `L3 cache of CPU tiles as individual NUMA`. This allows 
> all CPU tiles to be independent NUMA cores.
> 
> 
> The above settings are possible because CPU is independent from IO tile. 
> Thus allowing 4 combinations be available for use.

Sure, but presumably if the user wants to distinguish this, they have to 
configure their system appropriately. If user wants to take advantage of 
L3 as NUMA (which is what your patch proposes), then they can enable the 
BIOS knob and get that functionality for free. DPDK already supports this.

> 
> These are covered in the tuning gudie for the SoC in 12. How to get best 
> performance on AMD platform — Data Plane Development Kit 24.07.0 
> documentation (dpdk.org) 
> <https://doc.dpdk.org/guides/linux_gsg/amd_platform.html>.
> 
> 
>> Because if it does, I don't really think any changes are
>> required because NUMA nodes would give you the same thing, would it not?
> 
> I have a different opinion to this outlook. An end user can
> 
> 1. Identify the lcores and it's NUMA user `usertools/cpu-layout.py`

I recently submitted an enhacement for CPU layout script to print out 
NUMA separately from physical socket [1].

[1] 
https://patches.dpdk.org/project/dpdk/patch/40cf4ee32f15952457ac5526cfce64728bd13d32.1724323106.git.anatoly.burakov@intel.com/

I believe when "L3 as NUMA" is enabled in BIOS, the script will display 
both physical package ID as well as NUMA nodes reported by the system, 
which will be different from physical package ID, and which will display 
information you were looking for.

> 
> 2. But it is core mask in eal arguments which makes the threads 
> available to be used in a process.

See above: if the OS already reports NUMA information, this is not a 
problem to be solved, CPU layout script can give this information to the 
user.

> 
> 3. there are no API which distinguish L3 numa domain. Function 
> `rte_socket_id 
> <https://doc.dpdk.org/api/rte__lcore_8h.html#a7c8da4664df26a64cf05dc508a4f26df>` for CPU tiles like AMD SoC will return physical socket.

Sure, but I would think the answer to that would be to introduce an API 
to distinguish between NUMA (socket ID in DPDK parlance) and package 
(physical socket ID in the "traditional NUMA" sense). Once we can 
distinguish between those, DPDK can just rely on NUMA information 
provided by the OS, while still being capable of identifying physical 
sockets if the user so desires.

I am actually going to introduce API to get *physical socket* (as 
opposed to NUMA node) in the next few days.

> 
> 
> Example: In AMD EPYC Genoa, there are total of 13 tiles. 12 CPU tiles 
> and 1 IO tile. Setting
> 
> 1. NPS to 4 will divide the memory, PCIe and accelerator into 4 domain. 
> While the all CPU will appear as single NUMA but each 12 tile having 
> independent L3 caches.
> 
> 2. Setting `L3 as NUMA` allows each tile to appear as separate L3 clusters.
> 
> 
> Hence, adding an API which allows to select available lcores based on 
> Split L3 is essential irrespective of the BIOS setting.
> 

I think the crucial issue here is the "irrespective of BIOS setting" 
bit. If EAL is getting into the game of figuring out exact intricacies 
of physical layout of the system, then there's a lot more work to be 
done as there are lots of different topologies, as other people have 
already commented, and such an API needs *a lot* of thought put into it.

If, on the other hand, we leave this issue to the kernel, and only 
gather NUMA information provided by the kernel, then nothing has to be 
done - DPDK already supports all of this natively, provided the user has 
configured the system correctly.

Moreover, arguably DPDK already works that way: technically you can get 
physical socket information even absent of NUMA support in BIOS, but 
DPDK does not do that. Instead, if OS reports NUMA node as 0, that's 
what we're going with (even if we could detect multiple sockets from 
sysfs), and IMO it should stay that way unless there is a strong 
argument otherwise. We force the user to configure their system 
correctly as it is, and I see no reason to second-guess user's BIOS 
configuration otherwise.

-- 
Thanks,
Anatoly


  reply	other threads:[~2024-09-02 14:17 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-27 15:10 Vipin Varghese
2024-08-27 15:10 ` [RFC 1/2] eal: add llc " Vipin Varghese
2024-08-27 17:36   ` Stephen Hemminger
2024-09-02  0:27     ` Varghese, Vipin
2024-08-27 20:56   ` Wathsala Wathawana Vithanage
2024-08-29  3:21     ` 答复: " Feifei Wang
2024-09-02  1:20     ` Varghese, Vipin
2024-09-03 17:54       ` Wathsala Wathawana Vithanage
2024-09-04  8:18         ` Bruce Richardson
2024-09-06 11:59         ` Varghese, Vipin
2024-09-12 16:58           ` Wathsala Wathawana Vithanage
2024-08-27 15:10 ` [RFC 2/2] eal/lcore: add llc aware for each macro Vipin Varghese
2024-08-27 21:23 ` [RFC 0/2] introduce LLC aware functions Mattias Rönnblom
2024-09-02  0:39   ` Varghese, Vipin
2024-09-04  9:30     ` Mattias Rönnblom
2024-09-04 14:37       ` Stephen Hemminger
2024-09-11  3:13         ` Varghese, Vipin
2024-09-11  3:53           ` Stephen Hemminger
2024-09-12  1:11             ` Varghese, Vipin
2024-09-09 14:22       ` Varghese, Vipin
2024-09-09 14:52         ` Mattias Rönnblom
2024-09-11  3:26           ` Varghese, Vipin
2024-09-11 15:55             ` Mattias Rönnblom
2024-09-11 17:04               ` Honnappa Nagarahalli
2024-09-12  1:33                 ` Varghese, Vipin
2024-09-12  6:38                   ` Mattias Rönnblom
2024-09-12  7:02                     ` Mattias Rönnblom
2024-09-12 11:23                       ` Varghese, Vipin
2024-09-12 12:12                         ` Mattias Rönnblom
2024-09-12 15:50                           ` Stephen Hemminger
2024-09-12 11:17                     ` Varghese, Vipin
2024-09-12 11:59                       ` Mattias Rönnblom
2024-09-12 13:30                         ` Bruce Richardson
2024-09-12 16:32                           ` Mattias Rönnblom
2024-09-12  2:28                 ` Varghese, Vipin
2024-09-11 16:01             ` Bruce Richardson
2024-09-11 22:25               ` Konstantin Ananyev
2024-09-12  2:38                 ` Varghese, Vipin
2024-09-12  2:19               ` Varghese, Vipin
2024-09-12  9:17                 ` Bruce Richardson
2024-09-12 11:50                   ` Varghese, Vipin
2024-09-13 14:15                     ` Burakov, Anatoly
2024-09-12 13:18                   ` Mattias Rönnblom
2024-08-28  8:38 ` Burakov, Anatoly
2024-09-02  1:08   ` Varghese, Vipin
2024-09-02 14:17     ` Burakov, Anatoly [this message]
2024-09-02 15:33       ` Varghese, Vipin
2024-09-03  8:50         ` Burakov, Anatoly
2024-09-05 13:05           ` Ferruh Yigit
2024-09-05 14:45             ` Burakov, Anatoly
2024-09-05 15:34               ` Ferruh Yigit
2024-09-06  8:44                 ` Burakov, Anatoly
2024-09-09 14:14                   ` Varghese, Vipin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8addd7f6-fac8-45ec-a44f-f81eb008cc36@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=vipin.varghese@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).