Re: [RFC 0/2] introduce LLC aware functions

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Ferruh Yigit <ferruh.yigit@amd.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"Varghese, Vipin" <vipin.varghese@amd.com>,
	dev@dpdk.org
Cc: "Mattias Rönnblom" <hofors@lysator.liu.se>
Subject: Re: [RFC 0/2] introduce LLC aware functions
Date: Thu, 5 Sep 2024 16:34:33 +0100	[thread overview]
Message-ID: <462f4550-698e-4f49-9280-3a3708448337@amd.com> (raw)
In-Reply-To: <a4fae3a0-bab8-4539-9a19-08f90ee4f542@intel.com>

On 9/5/2024 3:45 PM, Burakov, Anatoly wrote:
> On 9/5/2024 3:05 PM, Ferruh Yigit wrote:
>> On 9/3/2024 9:50 AM, Burakov, Anatoly wrote:
>>> On 9/2/2024 5:33 PM, Varghese, Vipin wrote:
>>>> <snipped>
>>>>>>
> 
> Hi Ferruh,
> 
>>>
>>> I feel like there's a disconnect between my understanding of the problem
>>> space, and yours, so I'm going to ask a very basic question:
>>>
>>> Assuming the user has configured their AMD system correctly (i.e.
>>> enabled L3 as NUMA), are there any problem to be solved by adding a new
>>> API? Does the system not report each L3 as a separate NUMA node?
>>>
>>
>> Hi Anatoly,
>>
>> Let me try to answer.
>>
>> To start with, Intel "Sub-NUMA Clustering" and AMD NUMA is different, as
>> far as I understand SNC is more similar to more classic physical socket
>> based NUMA.
>>
>> Following is the AMD CPU:
>>        ┌─────┐┌─────┐┌──────────┐┌─────┐┌─────┐
>>        │     ││     ││          ││     ││     │
>>        │     ││     ││          ││     ││     │
>>        │TILE1││TILE2││          ││TILE5││TILE6│
>>        │     ││     ││          ││     ││     │
>>        │     ││     ││          ││     ││     │
>>        │     ││     ││          ││     ││     │
>>        └─────┘└─────┘│    IO    │└─────┘└─────┘
>>        ┌─────┐┌─────┐│   TILE   │┌─────┐┌─────┐
>>        │     ││     ││          ││     ││     │
>>        │     ││     ││          ││     ││     │
>>        │TILE3││TILE4││          ││TILE7││TILE8│
>>        │     ││     ││          ││     ││     │
>>        │     ││     ││          ││     ││     │
>>        │     ││     ││          ││     ││     │
>>        └─────┘└─────┘└──────────┘└─────┘└─────┘
>>
>> Each 'Tile' has multiple cores, and 'IO Tile' has memory controller, bus
>> controllers etc..
>>
>> When NPS=x configured in bios, IO tile resources are split and each seen
>> as a NUMA node.
>>
>> Following is NPS=4
>>        ┌─────┐┌─────┐┌──────────┐┌─────┐┌─────┐
>>        │     ││     ││     .    ││     ││     │
>>        │     ││     ││     .    ││     ││     │
>>        │TILE1││TILE2││     .    ││TILE5││TILE6│
>>        │     ││     ││NUMA .NUMA││     ││     │
>>        │     ││     ││ 0   . 1  ││     ││     │
>>        │     ││     ││     .    ││     ││     │
>>        └─────┘└─────┘│     .    │└─────┘└─────┘
>>        ┌─────┐┌─────┐│..........│┌─────┐┌─────┐
>>        │     ││     ││     .    ││     ││     │
>>        │     ││     ││NUMA .NUMA││     ││     │
>>        │TILE3││TILE4││ 2   . 3  ││TILE7││TILE8│
>>        │     ││     ││     .    ││     ││     │
>>        │     ││     ││     .    ││     ││     │
>>        │     ││     ││     .    ││     ││     │
>>        └─────┘└─────┘└─────.────┘└─────┘└─────┘
>>
>> Benefit of this is approach is now all cores has to access all NUMA
>> without any penalty. Like a DPDK application can use cores from 'TILE1',
>> 'TILE4' & 'TILE7' to access to NUMA0 (or any NUMA) resources in high
>> performance.
>> This is different than SNC where cores access to cross NUMA resources
>> hit by performance penalty.
>>
>> Now, although which tile cores come from doesn't matter from NUMA
>> perspective, it may matter (based on workload) to have them under same
>> LLC.
>>
>> One way to make sure all cores are under same LLC, is to enable "L3 as
>> NUMA" BIOS option, which will make each TILE shown as a different NUMA,
>> and user select cores from one NUMA.
>> This is sufficient up to some point, but not enough when application
>> needs number of cores that uses multiple tiles.
>>
>> Assume each tile has 8 cores, and application needs 24 cores, when user
>> provide all cores from TILE1, TILE2 & TILE3, in DPDK right now there is
>> now way for application to figure out how to group/select these cores to
>> use cores efficiently.
>>
>> Indeed this is what Vipin is enabling, from a core, he is finding list
>> of cores that will work efficiently with this core. In this perspective
>> this is nothing really related to NUMA configuration, and nothing really
>> specific to AMD, as defined Linux sysfs interface is used for this.
>>
>> There are other architectures around that has similar NUMA configuration
>> and they can also use same logic, at worst we can introduce an
>> architecture specific code that all architectures can have a way to find
>> other cores that works more efficient with given core. This is a useful
>> feature for DPDK.
>>
>> Lets looks into another example, application uses 24 cores in an graph
>> library like usage, that we want to group each three cores to process a
>> graph node. Application needs to a way to select which three cores works
>> most efficient with eachother, that is what this patch enables. In this
>> case enabling "L3 as NUMA" does not help at all. With this patch both
>> bios config works, but of course user should select cores to provide
>> application based on configuration.
>>
>>
>> And we can even improve this effective core selection, like as Mattias
>> suggested we can select cores that share L2 caches, with expansion of
>> this patch. This is unrelated to NUMA, and again it is not introducing
>> architecture details to DPDK as this implementation already relies on
>> Linux sysfs interface.
>>
>> I hope it clarifies a little more.
>>
>>
>> Thanks,
>> ferruh
>>
> 
> Yes, this does help clarify things a lot as to why current NUMA support
> would be insufficient to express what you are describing.
> 
> However, in that case I would echo sentiment others have expressed
> already as this kind of deep sysfs parsing doesn't seem like it would be
> in scope for EAL, it sounds more like something a sysadmin/orchestration
> (or the application itself) would do.
> 
> I mean, in principle I'm not opposed to having such an API, it just
> seems like the abstraction would perhaps need to be a bit more robust
> than directly referencing cache structure? Maybe something that
> degenerates into NUMA nodes would be better, so that applications
> wouldn't have to *specifically* worry about cache locality but instead
> have a more generic API they can use to group cores together?
> 

Unfortunately can't cover all usecases by sysadmin/orchestration (as
graph usecase one above), and definitely too much HW detail for the
application, that is why we required some programmatic way (APIs) for
applications.

And we are on the same page that, the more we can get away from
architecture details in the abstraction (APIs) better it is, overall
intention is to provide ways to application to find lcores works
efficiently with each other.

For this what do you think about slightly different API *, like:
```
rte_get_next_lcore_ex(uint i, u32 flag)
```

Based on the flag, we can grab the next eligible lcore, for this patch
the flag can be `RTE_LCORE_LLC`, but options are wide and different
architectures can have different grouping to benefit most from HW in a
vendor agnostic way.
I like the idea, what do you think about this abstraction?

* Kudos to Vipin 😉

next prev parent reply	other threads:[~2024-09-05 15:34 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-27 15:10 Vipin Varghese
2024-08-27 15:10 ` [RFC 1/2] eal: add llc " Vipin Varghese
2024-08-27 17:36   ` Stephen Hemminger
2024-09-02  0:27     ` Varghese, Vipin
2024-08-27 20:56   ` Wathsala Wathawana Vithanage
2024-08-29  3:21     ` 答复: " Feifei Wang
2024-09-02  1:20     ` Varghese, Vipin
2024-09-03 17:54       ` Wathsala Wathawana Vithanage
2024-09-04  8:18         ` Bruce Richardson
2024-09-06 11:59         ` Varghese, Vipin
2024-09-12 16:58           ` Wathsala Wathawana Vithanage
2024-10-21  8:20             ` Varghese, Vipin
2024-08-27 15:10 ` [RFC 2/2] eal/lcore: add llc aware for each macro Vipin Varghese
2024-08-27 21:23 ` [RFC 0/2] introduce LLC aware functions Mattias Rönnblom
2024-09-02  0:39   ` Varghese, Vipin
2024-09-04  9:30     ` Mattias Rönnblom
2024-09-04 14:37       ` Stephen Hemminger
2024-09-11  3:13         ` Varghese, Vipin
2024-09-11  3:53           ` Stephen Hemminger
2024-09-12  1:11             ` Varghese, Vipin
2024-09-09 14:22       ` Varghese, Vipin
2024-09-09 14:52         ` Mattias Rönnblom
2024-09-11  3:26           ` Varghese, Vipin
2024-09-11 15:55             ` Mattias Rönnblom
2024-09-11 17:04               ` Honnappa Nagarahalli
2024-09-12  1:33                 ` Varghese, Vipin
2024-09-12  6:38                   ` Mattias Rönnblom
2024-09-12  7:02                     ` Mattias Rönnblom
2024-09-12 11:23                       ` Varghese, Vipin
2024-09-12 12:12                         ` Mattias Rönnblom
2024-09-12 15:50                           ` Stephen Hemminger
2024-09-12 11:17                     ` Varghese, Vipin
2024-09-12 11:59                       ` Mattias Rönnblom
2024-09-12 13:30                         ` Bruce Richardson
2024-09-12 16:32                           ` Mattias Rönnblom
2024-09-12  2:28                 ` Varghese, Vipin
2024-09-11 16:01             ` Bruce Richardson
2024-09-11 22:25               ` Konstantin Ananyev
2024-09-12  2:38                 ` Varghese, Vipin
2024-09-12  2:19               ` Varghese, Vipin
2024-09-12  9:17                 ` Bruce Richardson
2024-09-12 11:50                   ` Varghese, Vipin
2024-09-13 14:15                     ` Burakov, Anatoly
2024-09-12 13:18                   ` Mattias Rönnblom
2024-08-28  8:38 ` Burakov, Anatoly
2024-09-02  1:08   ` Varghese, Vipin
2024-09-02 14:17     ` Burakov, Anatoly
2024-09-02 15:33       ` Varghese, Vipin
2024-09-03  8:50         ` Burakov, Anatoly
2024-09-05 13:05           ` Ferruh Yigit
2024-09-05 14:45             ` Burakov, Anatoly
2024-09-05 15:34               ` Ferruh Yigit [this message]
2024-09-06  8:44                 ` Burakov, Anatoly
2024-09-09 14:14                   ` Varghese, Vipin
2024-10-07 21:28 ` Stephen Hemminger
2024-10-21  8:17   ` Varghese, Vipin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=462f4550-698e-4f49-9280-3a3708448337@amd.com \
    --to=ferruh.yigit@amd.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=hofors@lysator.liu.se \
    --cc=vipin.varghese@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).