DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Ferruh Yigit <ferruh.yigit@amd.com>,
	"Varghese, Vipin" <vipin.varghese@amd.com>, <dev@dpdk.org>
Cc: "Mattias Rönnblom" <hofors@lysator.liu.se>
Subject: Re: [RFC 0/2] introduce LLC aware functions
Date: Thu, 5 Sep 2024 16:45:10 +0200	[thread overview]
Message-ID: <a4fae3a0-bab8-4539-9a19-08f90ee4f542@intel.com> (raw)
In-Reply-To: <3eae1577-f06f-48f2-863a-faf70b97bc72@amd.com>

On 9/5/2024 3:05 PM, Ferruh Yigit wrote:
> On 9/3/2024 9:50 AM, Burakov, Anatoly wrote:
>> On 9/2/2024 5:33 PM, Varghese, Vipin wrote:
>>> <snipped>
>>>>>

Hi Ferruh,

>>
>> I feel like there's a disconnect between my understanding of the problem
>> space, and yours, so I'm going to ask a very basic question:
>>
>> Assuming the user has configured their AMD system correctly (i.e.
>> enabled L3 as NUMA), are there any problem to be solved by adding a new
>> API? Does the system not report each L3 as a separate NUMA node?
>>
> 
> Hi Anatoly,
> 
> Let me try to answer.
> 
> To start with, Intel "Sub-NUMA Clustering" and AMD NUMA is different, as
> far as I understand SNC is more similar to more classic physical socket
> based NUMA.
> 
> Following is the AMD CPU:
>        ┌─────┐┌─────┐┌──────────┐┌─────┐┌─────┐
>        │     ││     ││          ││     ││     │
>        │     ││     ││          ││     ││     │
>        │TILE1││TILE2││          ││TILE5││TILE6│
>        │     ││     ││          ││     ││     │
>        │     ││     ││          ││     ││     │
>        │     ││     ││          ││     ││     │
>        └─────┘└─────┘│    IO    │└─────┘└─────┘
>        ┌─────┐┌─────┐│   TILE   │┌─────┐┌─────┐
>        │     ││     ││          ││     ││     │
>        │     ││     ││          ││     ││     │
>        │TILE3││TILE4││          ││TILE7││TILE8│
>        │     ││     ││          ││     ││     │
>        │     ││     ││          ││     ││     │
>        │     ││     ││          ││     ││     │
>        └─────┘└─────┘└──────────┘└─────┘└─────┘
> 
> Each 'Tile' has multiple cores, and 'IO Tile' has memory controller, bus
> controllers etc..
> 
> When NPS=x configured in bios, IO tile resources are split and each seen
> as a NUMA node.
> 
> Following is NPS=4
>        ┌─────┐┌─────┐┌──────────┐┌─────┐┌─────┐
>        │     ││     ││     .    ││     ││     │
>        │     ││     ││     .    ││     ││     │
>        │TILE1││TILE2││     .    ││TILE5││TILE6│
>        │     ││     ││NUMA .NUMA││     ││     │
>        │     ││     ││ 0   . 1  ││     ││     │
>        │     ││     ││     .    ││     ││     │
>        └─────┘└─────┘│     .    │└─────┘└─────┘
>        ┌─────┐┌─────┐│..........│┌─────┐┌─────┐
>        │     ││     ││     .    ││     ││     │
>        │     ││     ││NUMA .NUMA││     ││     │
>        │TILE3││TILE4││ 2   . 3  ││TILE7││TILE8│
>        │     ││     ││     .    ││     ││     │
>        │     ││     ││     .    ││     ││     │
>        │     ││     ││     .    ││     ││     │
>        └─────┘└─────┘└─────.────┘└─────┘└─────┘
> 
> Benefit of this is approach is now all cores has to access all NUMA
> without any penalty. Like a DPDK application can use cores from 'TILE1',
> 'TILE4' & 'TILE7' to access to NUMA0 (or any NUMA) resources in high
> performance.
> This is different than SNC where cores access to cross NUMA resources
> hit by performance penalty.
> 
> Now, although which tile cores come from doesn't matter from NUMA
> perspective, it may matter (based on workload) to have them under same LLC.
> 
> One way to make sure all cores are under same LLC, is to enable "L3 as
> NUMA" BIOS option, which will make each TILE shown as a different NUMA,
> and user select cores from one NUMA.
> This is sufficient up to some point, but not enough when application
> needs number of cores that uses multiple tiles.
> 
> Assume each tile has 8 cores, and application needs 24 cores, when user
> provide all cores from TILE1, TILE2 & TILE3, in DPDK right now there is
> now way for application to figure out how to group/select these cores to
> use cores efficiently.
> 
> Indeed this is what Vipin is enabling, from a core, he is finding list
> of cores that will work efficiently with this core. In this perspective
> this is nothing really related to NUMA configuration, and nothing really
> specific to AMD, as defined Linux sysfs interface is used for this.
> 
> There are other architectures around that has similar NUMA configuration
> and they can also use same logic, at worst we can introduce an
> architecture specific code that all architectures can have a way to find
> other cores that works more efficient with given core. This is a useful
> feature for DPDK.
> 
> Lets looks into another example, application uses 24 cores in an graph
> library like usage, that we want to group each three cores to process a
> graph node. Application needs to a way to select which three cores works
> most efficient with eachother, that is what this patch enables. In this
> case enabling "L3 as NUMA" does not help at all. With this patch both
> bios config works, but of course user should select cores to provide
> application based on configuration.
> 
> 
> And we can even improve this effective core selection, like as Mattias
> suggested we can select cores that share L2 caches, with expansion of
> this patch. This is unrelated to NUMA, and again it is not introducing
> architecture details to DPDK as this implementation already relies on
> Linux sysfs interface.
> 
> I hope it clarifies a little more.
> 
> 
> Thanks,
> ferruh
> 

Yes, this does help clarify things a lot as to why current NUMA support 
would be insufficient to express what you are describing.

However, in that case I would echo sentiment others have expressed 
already as this kind of deep sysfs parsing doesn't seem like it would be 
in scope for EAL, it sounds more like something a sysadmin/orchestration 
(or the application itself) would do.

I mean, in principle I'm not opposed to having such an API, it just 
seems like the abstraction would perhaps need to be a bit more robust 
than directly referencing cache structure? Maybe something that 
degenerates into NUMA nodes would be better, so that applications 
wouldn't have to *specifically* worry about cache locality but instead 
have a more generic API they can use to group cores together?

-- 
Thanks,
Anatoly


  reply	other threads:[~2024-09-05 14:45 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-27 15:10 Vipin Varghese
2024-08-27 15:10 ` [RFC 1/2] eal: add llc " Vipin Varghese
2024-08-27 17:36   ` Stephen Hemminger
2024-09-02  0:27     ` Varghese, Vipin
2024-08-27 20:56   ` Wathsala Wathawana Vithanage
2024-08-29  3:21     ` 答复: " Feifei Wang
2024-09-02  1:20     ` Varghese, Vipin
2024-09-03 17:54       ` Wathsala Wathawana Vithanage
2024-09-04  8:18         ` Bruce Richardson
2024-09-06 11:59         ` Varghese, Vipin
2024-09-12 16:58           ` Wathsala Wathawana Vithanage
2024-08-27 15:10 ` [RFC 2/2] eal/lcore: add llc aware for each macro Vipin Varghese
2024-08-27 21:23 ` [RFC 0/2] introduce LLC aware functions Mattias Rönnblom
2024-09-02  0:39   ` Varghese, Vipin
2024-09-04  9:30     ` Mattias Rönnblom
2024-09-04 14:37       ` Stephen Hemminger
2024-09-11  3:13         ` Varghese, Vipin
2024-09-11  3:53           ` Stephen Hemminger
2024-09-12  1:11             ` Varghese, Vipin
2024-09-09 14:22       ` Varghese, Vipin
2024-09-09 14:52         ` Mattias Rönnblom
2024-09-11  3:26           ` Varghese, Vipin
2024-09-11 15:55             ` Mattias Rönnblom
2024-09-11 17:04               ` Honnappa Nagarahalli
2024-09-12  1:33                 ` Varghese, Vipin
2024-09-12  6:38                   ` Mattias Rönnblom
2024-09-12  7:02                     ` Mattias Rönnblom
2024-09-12 11:23                       ` Varghese, Vipin
2024-09-12 12:12                         ` Mattias Rönnblom
2024-09-12 15:50                           ` Stephen Hemminger
2024-09-12 11:17                     ` Varghese, Vipin
2024-09-12 11:59                       ` Mattias Rönnblom
2024-09-12 13:30                         ` Bruce Richardson
2024-09-12 16:32                           ` Mattias Rönnblom
2024-09-12  2:28                 ` Varghese, Vipin
2024-09-11 16:01             ` Bruce Richardson
2024-09-11 22:25               ` Konstantin Ananyev
2024-09-12  2:38                 ` Varghese, Vipin
2024-09-12  2:19               ` Varghese, Vipin
2024-09-12  9:17                 ` Bruce Richardson
2024-09-12 11:50                   ` Varghese, Vipin
2024-09-13 14:15                     ` Burakov, Anatoly
2024-09-12 13:18                   ` Mattias Rönnblom
2024-08-28  8:38 ` Burakov, Anatoly
2024-09-02  1:08   ` Varghese, Vipin
2024-09-02 14:17     ` Burakov, Anatoly
2024-09-02 15:33       ` Varghese, Vipin
2024-09-03  8:50         ` Burakov, Anatoly
2024-09-05 13:05           ` Ferruh Yigit
2024-09-05 14:45             ` Burakov, Anatoly [this message]
2024-09-05 15:34               ` Ferruh Yigit
2024-09-06  8:44                 ` Burakov, Anatoly
2024-09-09 14:14                   ` Varghese, Vipin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a4fae3a0-bab8-4539-9a19-08f90ee4f542@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=hofors@lysator.liu.se \
    --cc=vipin.varghese@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).