DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Mattias Rönnblom" <hofors@lysator.liu.se>
To: Vipin Varghese <vipin.varghese@amd.com>,
	dev@dpdk.org, roretzla@linux.microsoft.com,
	bruce.richardson@intel.com, john.mcnamara@intel.com,
	dmitry.kozliuk@gmail.com, jerinj@marvell.com
Cc: ruifeng.wang@arm.com, mattias.ronnblom@ericsson.com,
	anatoly.burakov@intel.com, stephen@networkplumber.org,
	ferruh.yigit@amd.com, honnappa.nagarahalli@arm.com,
	wathsala.vithanage@arm.com, konstantin.ananyev@huawei.com
Subject: Re: [RFC v2 0/3] Introduce Topology NUMA grouping for lcores
Date: Wed, 30 Oct 2024 07:47:31 +0100	[thread overview]
Message-ID: <63338f4e-3314-4567-9056-e9101f81cf42@lysator.liu.se> (raw)
In-Reply-To: <20241030044035.1201-1-vipin.varghese@amd.com>

On 2024-10-30 05:40, Vipin Varghese wrote:
> One of the way to increase core density in a given physical package is
> to use easy to replicate tiles. These tiles are either core complexes or
> core complexes with IO (memory and PCIe). This results to possibility of
> having two types NUMA topology.
>   - CPU topology & IO topology
>   - CPU+IO topology
> 
> For platforms like
>   - AMD SoC EPYC, core complexes are in separate CPU domain and IO
>     are in different NUMA domain (except for zen1 Naples)
>   - Intel 4th Xeon (SPR) & above, the CPU+IO NUMA partitioning is
>     achieved by BIOS option `SNC as 1, 2 or 4`.
>   - Ampere One allow CPU NUMA paritioning by BIOS option `SLC`.
>   - while other platforms has 2 or 4 cores sharing same L2 cache.
> 
> Grouping DPDK logical cores within the same Cache and IO, helps to
> leverage application same cache or IO locality. For applications to
> leverage cache or IO locality, ones needs to use lcores sharing same
> topology. This approach ensures more consistent latencies by
> minimizing the dispersion of lcores across different tiles.
> 

Here's my take on a hardware topology API rationale:

When scheduling work on the lcores assigned to a DPDK application, the 
hardware topology is of relevance.

For example, performance is generally increased by organizing related 
processing on lcores sharing the same cache (at some level), memory 
controller, are SMT siblings on the same physical core, or sit close to 
some I/O device.

This issue is not new with tiles. I'm not even sure it has much to do 
with tiles. Tiles just means more asymmetry, when there were asymmetry 
already.

This issue is also not unique to lcores. Even if you had an application 
where the packet processing fast path never even touched a CPU core, the 
need for topology awareness when selecting which (and how) I/O devices 
and accelerators should be involved would remain.

It seem to me that this should definitely be an API separate from 
<rte_lcore.h>. It should either be an abstract, 
purely-for-work-scheduling-type API a la the Linux kernel's scheduling 
domains, or a more complete/complex API a la hwloc, with all the 
nitty-gritty details that you have to have an @<CPU-vendor>.com mail 
address to really appreciate.

> Using lcores in same NUMA domain shows imporvement for applications
>   - using pipline staging
>   - each lcore processing part of payload
>   - eventual hit in either L2 or L3
> 
> Library dependency: hwloc
> 
> Topology Flags:
> ---------------
>   - RTE_LCORE_DOMAIN_L1: to group cores sharing same L1 cache
>   - RTE_LCORE_DOMAIN_SMT: same as RTE_LCORE_DOMAIN_L1
>   - RTE_LCORE_DOMAIN_L2: group cores sharing same L2 cache
>   - RTE_LCORE_DOMAIN_L3: group cores sharing same L3 cache
>   - RTE_LCORE_DOMAIN_IO: group cores sharing same IO
> 
> < Function: Purpose >
> ---------------------
>   - rte_get_domain_count: get domain count based on Topology Flag
>   - rte_lcore_count_from_domain: get valid lcores count under each domain
>   - rte_get_lcore_in_domain: valid lcore id based on index
>   - rte_get_next_lcore_from_domain: next valid lcore within domain
>   - rte_get_next_lcore_from_next_domain: next valid lcore from next domain
> 
> Note:
>   1. Topology is NUMA grouping.
>   2. Domain is various sub-groups within a specific Topology.
> 
> Topology example: L1, L2, L3, IO
> Domian example: IO-A, IO-B
> 
> < MACRO: Purpose >
> ------------------
>   - RTE_LCORE_FOREACH_DOMAIN: iterate lcores from all domains
>   - RTE_LCORE_FOREACH_WORKER_DOMAIN: iterate worker lcores from all domains
>   - RTE_LCORE_FORN_NEXT_DOMAIN: iterate domain select n'th lcore
>   - RTE_LCORE_FORN_WORKER_NEXT_DOMAIN: iterate domain for worker n'th lcore.
> 
> Future work (after merge):
> --------------------------
>   - dma-perf per IO NUMA
>   - eventdev per L3 NUMA
>   - pipeline per SMT|L3 NUMA
>   - distributor per L3 for Port-Queue
>   - l2fwd-power per SMT
>   - testpmd option for IO NUMA per port
> 
> Platform tested on:
> -------------------
>   - INTEL(R) XEON(R) PLATINUM 8562Y+ (support IO numa 1 & 2)
>   - AMD EPYC 8534P (supports IO numa 1 & 2)
>   - AMD EPYC 9554 (supports IO numa 1, 2, 4)
> 
> Logs:
> -----
> 1. INTEL(R) XEON(R) PLATINUM 8562Y+:
>   - SNC=1
> 	Domain (IO): at index (0) there are 48 core, with (0) at index 0
>   - SNC=2
> 	Domain (IO): at index (0) there are 24 core, with (0) at index 0
> 	Domain (IO): at index (1) there are 24 core, with (12) at index 0
> 
> 2. AMD EPYC 8534P:
>   - NPS=1:
> 	Domain (IO): at index (0) there are 128 core, with (0) at index 0
>   - NPS=2:
> 	Domain (IO): at index (0) there are 64 core, with (0) at index 0
> 	Domain (IO): at index (1) there are 64 core, with (32) at index 0
> 
> 
> Signed-off-by: Vipin Varghese <vipin.varghese@amd.com>
> 
> Vipin Varghese (3):
>    eal/lcore: add topology based functions
>    test/lcore: enable tests for topology
>    examples: add lcore topology API calls
> 
>   app/test/test_lcores.c            | 189 ++++++++++
>   config/meson.build                |  18 +
>   examples/helloworld/main.c        | 142 +++++++-
>   examples/l2fwd/main.c             |  56 ++-
>   examples/skeleton/basicfwd.c      |  22 ++
>   lib/eal/common/eal_common_lcore.c | 580 ++++++++++++++++++++++++++++++
>   lib/eal/common/eal_private.h      |  48 +++
>   lib/eal/freebsd/eal.c             |  10 +
>   lib/eal/include/rte_lcore.h       | 168 +++++++++
>   lib/eal/linux/eal.c               |  11 +
>   lib/eal/meson.build               |   4 +
>   lib/eal/version.map               |   9 +
>   lib/eal/windows/eal.c             |  12 +
>   13 files changed, 1259 insertions(+), 10 deletions(-)
> 


      parent reply	other threads:[~2024-10-30  6:47 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-30  4:40 Vipin Varghese
2024-10-30  4:40 ` [RFC v2 1/3] eal/lcore: add topology based functions Vipin Varghese
2024-10-30  4:40 ` [RFC v2 2/3] test/lcore: enable tests for topology Vipin Varghese
2024-10-30  4:40 ` [RFC v2 3/3] examples: add lcore topology API calls Vipin Varghese
2024-10-30  6:47 ` Mattias Rönnblom [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63338f4e-3314-4567-9056-e9101f81cf42@lysator.liu.se \
    --to=hofors@lysator.liu.se \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=ferruh.yigit@amd.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=john.mcnamara@intel.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=roretzla@linux.microsoft.com \
    --cc=ruifeng.wang@arm.com \
    --cc=stephen@networkplumber.org \
    --cc=vipin.varghese@amd.com \
    --cc=wathsala.vithanage@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).