From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Ferruh Yigit <ferruh.yigit@amd.com>,
"Varghese, Vipin" <vipin.varghese@amd.com>, <dev@dpdk.org>
Cc: "Mattias Rönnblom" <hofors@lysator.liu.se>
Subject: Re: [RFC 0/2] introduce LLC aware functions
Date: Thu, 5 Sep 2024 16:45:10 +0200 [thread overview]
Message-ID: <a4fae3a0-bab8-4539-9a19-08f90ee4f542@intel.com> (raw)
In-Reply-To: <3eae1577-f06f-48f2-863a-faf70b97bc72@amd.com>
On 9/5/2024 3:05 PM, Ferruh Yigit wrote:
> On 9/3/2024 9:50 AM, Burakov, Anatoly wrote:
>> On 9/2/2024 5:33 PM, Varghese, Vipin wrote:
>>> <snipped>
>>>>>
Hi Ferruh,
>>
>> I feel like there's a disconnect between my understanding of the problem
>> space, and yours, so I'm going to ask a very basic question:
>>
>> Assuming the user has configured their AMD system correctly (i.e.
>> enabled L3 as NUMA), are there any problem to be solved by adding a new
>> API? Does the system not report each L3 as a separate NUMA node?
>>
>
> Hi Anatoly,
>
> Let me try to answer.
>
> To start with, Intel "Sub-NUMA Clustering" and AMD NUMA is different, as
> far as I understand SNC is more similar to more classic physical socket
> based NUMA.
>
> Following is the AMD CPU:
> ┌─────┐┌─────┐┌──────────┐┌─────┐┌─────┐
> │ ││ ││ ││ ││ │
> │ ││ ││ ││ ││ │
> │TILE1││TILE2││ ││TILE5││TILE6│
> │ ││ ││ ││ ││ │
> │ ││ ││ ││ ││ │
> │ ││ ││ ││ ││ │
> └─────┘└─────┘│ IO │└─────┘└─────┘
> ┌─────┐┌─────┐│ TILE │┌─────┐┌─────┐
> │ ││ ││ ││ ││ │
> │ ││ ││ ││ ││ │
> │TILE3││TILE4││ ││TILE7││TILE8│
> │ ││ ││ ││ ││ │
> │ ││ ││ ││ ││ │
> │ ││ ││ ││ ││ │
> └─────┘└─────┘└──────────┘└─────┘└─────┘
>
> Each 'Tile' has multiple cores, and 'IO Tile' has memory controller, bus
> controllers etc..
>
> When NPS=x configured in bios, IO tile resources are split and each seen
> as a NUMA node.
>
> Following is NPS=4
> ┌─────┐┌─────┐┌──────────┐┌─────┐┌─────┐
> │ ││ ││ . ││ ││ │
> │ ││ ││ . ││ ││ │
> │TILE1││TILE2││ . ││TILE5││TILE6│
> │ ││ ││NUMA .NUMA││ ││ │
> │ ││ ││ 0 . 1 ││ ││ │
> │ ││ ││ . ││ ││ │
> └─────┘└─────┘│ . │└─────┘└─────┘
> ┌─────┐┌─────┐│..........│┌─────┐┌─────┐
> │ ││ ││ . ││ ││ │
> │ ││ ││NUMA .NUMA││ ││ │
> │TILE3││TILE4││ 2 . 3 ││TILE7││TILE8│
> │ ││ ││ . ││ ││ │
> │ ││ ││ . ││ ││ │
> │ ││ ││ . ││ ││ │
> └─────┘└─────┘└─────.────┘└─────┘└─────┘
>
> Benefit of this is approach is now all cores has to access all NUMA
> without any penalty. Like a DPDK application can use cores from 'TILE1',
> 'TILE4' & 'TILE7' to access to NUMA0 (or any NUMA) resources in high
> performance.
> This is different than SNC where cores access to cross NUMA resources
> hit by performance penalty.
>
> Now, although which tile cores come from doesn't matter from NUMA
> perspective, it may matter (based on workload) to have them under same LLC.
>
> One way to make sure all cores are under same LLC, is to enable "L3 as
> NUMA" BIOS option, which will make each TILE shown as a different NUMA,
> and user select cores from one NUMA.
> This is sufficient up to some point, but not enough when application
> needs number of cores that uses multiple tiles.
>
> Assume each tile has 8 cores, and application needs 24 cores, when user
> provide all cores from TILE1, TILE2 & TILE3, in DPDK right now there is
> now way for application to figure out how to group/select these cores to
> use cores efficiently.
>
> Indeed this is what Vipin is enabling, from a core, he is finding list
> of cores that will work efficiently with this core. In this perspective
> this is nothing really related to NUMA configuration, and nothing really
> specific to AMD, as defined Linux sysfs interface is used for this.
>
> There are other architectures around that has similar NUMA configuration
> and they can also use same logic, at worst we can introduce an
> architecture specific code that all architectures can have a way to find
> other cores that works more efficient with given core. This is a useful
> feature for DPDK.
>
> Lets looks into another example, application uses 24 cores in an graph
> library like usage, that we want to group each three cores to process a
> graph node. Application needs to a way to select which three cores works
> most efficient with eachother, that is what this patch enables. In this
> case enabling "L3 as NUMA" does not help at all. With this patch both
> bios config works, but of course user should select cores to provide
> application based on configuration.
>
>
> And we can even improve this effective core selection, like as Mattias
> suggested we can select cores that share L2 caches, with expansion of
> this patch. This is unrelated to NUMA, and again it is not introducing
> architecture details to DPDK as this implementation already relies on
> Linux sysfs interface.
>
> I hope it clarifies a little more.
>
>
> Thanks,
> ferruh
>
Yes, this does help clarify things a lot as to why current NUMA support
would be insufficient to express what you are describing.
However, in that case I would echo sentiment others have expressed
already as this kind of deep sysfs parsing doesn't seem like it would be
in scope for EAL, it sounds more like something a sysadmin/orchestration
(or the application itself) would do.
I mean, in principle I'm not opposed to having such an API, it just
seems like the abstraction would perhaps need to be a bit more robust
than directly referencing cache structure? Maybe something that
degenerates into NUMA nodes would be better, so that applications
wouldn't have to *specifically* worry about cache locality but instead
have a more generic API they can use to group cores together?
--
Thanks,
Anatoly
next prev parent reply other threads:[~2024-09-05 14:45 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-27 15:10 Vipin Varghese
2024-08-27 15:10 ` [RFC 1/2] eal: add llc " Vipin Varghese
2024-08-27 17:36 ` Stephen Hemminger
2024-09-02 0:27 ` Varghese, Vipin
2024-08-27 20:56 ` Wathsala Wathawana Vithanage
2024-08-29 3:21 ` 答复: " Feifei Wang
2024-09-02 1:20 ` Varghese, Vipin
2024-09-03 17:54 ` Wathsala Wathawana Vithanage
2024-09-04 8:18 ` Bruce Richardson
2024-09-06 11:59 ` Varghese, Vipin
2024-09-12 16:58 ` Wathsala Wathawana Vithanage
2024-08-27 15:10 ` [RFC 2/2] eal/lcore: add llc aware for each macro Vipin Varghese
2024-08-27 21:23 ` [RFC 0/2] introduce LLC aware functions Mattias Rönnblom
2024-09-02 0:39 ` Varghese, Vipin
2024-09-04 9:30 ` Mattias Rönnblom
2024-09-04 14:37 ` Stephen Hemminger
2024-09-11 3:13 ` Varghese, Vipin
2024-09-11 3:53 ` Stephen Hemminger
2024-09-12 1:11 ` Varghese, Vipin
2024-09-09 14:22 ` Varghese, Vipin
2024-09-09 14:52 ` Mattias Rönnblom
2024-09-11 3:26 ` Varghese, Vipin
2024-09-11 15:55 ` Mattias Rönnblom
2024-09-11 17:04 ` Honnappa Nagarahalli
2024-09-12 1:33 ` Varghese, Vipin
2024-09-12 6:38 ` Mattias Rönnblom
2024-09-12 7:02 ` Mattias Rönnblom
2024-09-12 11:23 ` Varghese, Vipin
2024-09-12 12:12 ` Mattias Rönnblom
2024-09-12 15:50 ` Stephen Hemminger
2024-09-12 11:17 ` Varghese, Vipin
2024-09-12 11:59 ` Mattias Rönnblom
2024-09-12 13:30 ` Bruce Richardson
2024-09-12 16:32 ` Mattias Rönnblom
2024-09-12 2:28 ` Varghese, Vipin
2024-09-11 16:01 ` Bruce Richardson
2024-09-11 22:25 ` Konstantin Ananyev
2024-09-12 2:38 ` Varghese, Vipin
2024-09-12 2:19 ` Varghese, Vipin
2024-09-12 9:17 ` Bruce Richardson
2024-09-12 11:50 ` Varghese, Vipin
2024-09-13 14:15 ` Burakov, Anatoly
2024-09-12 13:18 ` Mattias Rönnblom
2024-08-28 8:38 ` Burakov, Anatoly
2024-09-02 1:08 ` Varghese, Vipin
2024-09-02 14:17 ` Burakov, Anatoly
2024-09-02 15:33 ` Varghese, Vipin
2024-09-03 8:50 ` Burakov, Anatoly
2024-09-05 13:05 ` Ferruh Yigit
2024-09-05 14:45 ` Burakov, Anatoly [this message]
2024-09-05 15:34 ` Ferruh Yigit
2024-09-06 8:44 ` Burakov, Anatoly
2024-09-09 14:14 ` Varghese, Vipin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4fae3a0-bab8-4539-9a19-08f90ee4f542@intel.com \
--to=anatoly.burakov@intel.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@amd.com \
--cc=hofors@lysator.liu.se \
--cc=vipin.varghese@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).