Re: [PATCH v4 0/4] Introduce Topology NUMA grouping for lcores

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Jan Viktorin <viktorin@cesnet.cz>
To: Vipin Varghese <vipin.varghese@amd.com>
Cc: <dev@dpdk.org>, <roretzla@linux.microsoft.com>,
	<bruce.richardson@intel.com>, <john.mcnamara@intel.com>,
	<dmitry.kozliuk@gmail.com>, <pbhagavatula@marvell.com>,
	<jerinj@marvell.com>, <ruifeng.wang@arm.com>,
	<mattias.ronnblom@ericsson.com>, <anatoly.burakov@intel.com>,
	<stephen@networkplumber.org>, <ferruh.yigit@amd.com>,
	<honnappa.nagarahalli@arm.com>, <wathsala.vithanage@arm.com>,
	<konstantin.ananyev@huawei.com>, <mb@smartsharesystems.com>
Subject: Re: [PATCH v4 0/4] Introduce Topology NUMA grouping for lcores
Date: Mon, 17 Mar 2025 14:46:07 +0100	[thread overview]
Message-ID: <20250317144607.118a40b0@coaster.localdomain> (raw)
In-Reply-To: <20241105102849.1947-1-vipin.varghese@amd.com>

Hello Vipin and others,

please, will there be any progress or update on this series?

I successfully tested those changes on our Intel and AMD machines and
would like to use it in production soon.

The API is a little bit unintuitive, at least for me, but I
successfully integrated into our software.

I am missing a clear relation to the NUMA socket approach used in DPDK.
E.g. I would like to be able to easily walk over a list of lcores from
a specific NUMA node grouped by L3 domain. Yes, there is the
RTE_LCORE_DOMAIN_IO, but would it always match the appropriate socket
IDs?

Also, I do not clearly understand what is the purpose of using domain
selector like:

  RTE_LCORE_DOMAIN_L1 | RTE_LCORE_DOMAIN_L2

or even:

  RTE_LCORE_DOMAIN_L3 | RTE_LCORE_DOMAIN_L2

the documentation does not explain this. I could not spot any kind of
grouping that would help me in any way. Some "best practices" examples
would be nice to have to understand the intentions better.

I found a little catch when running DPDK with more lcores than there
are physical or SMT CPU cores. This happens when using e.g. an option
like --lcores=(0-15)@(0-1). The results from the topology API would not
match the lcores because hwloc is not aware of the lcores concept. This
might be mentioned somewhere.

Anyway, I really appreciate this work and would like to see it upstream.
Especially for AMD machines, some framework like this is a must.

Kind regards,
Jan

On Tue, 5 Nov 2024 15:58:45 +0530
Vipin Varghese <vipin.varghese@amd.com> wrote:

> This patch introduces improvements for NUMA topology awareness in
> relation to DPDK logical cores. The goal is to expose API which allows
> users to select optimal logical cores for any application. These
> logical cores can be selected from various NUMA domains like CPU and
> I/O.
> 
> Change Summary:
>  - Introduces the concept of NUMA domain partitioning based on CPU and
>    I/O topology.
>  - Adds support for grouping DPDK logical cores within the same Cache
>    and I/O domain for improved locality.
>  - Implements topology detection and core grouping logic that
>    distinguishes between the following NUMA configurations:
>     * CPU topology & I/O topology (e.g., AMD SoC EPYC, Intel Xeon SPR)
>     * CPU+I/O topology (e.g., Ampere One with SLC, Intel Xeon SPR
> with SNC)
>  - Enhances performance by minimizing lcore dispersion across
> tiles|compute package with different L2/L3 cache or IO domains.
> 
> Reason:
>  - Applications using DPDK libraries relies on consistent memory
> access.
>  - Lcores being closer to same NUMA domain as IO.
>  - Lcores sharing same cache.
> 
> Latency is minimized by using lcores that share the same NUMA
> topology. Memory access is optimized by utilizing cores within the
> same NUMA domain or tile. Cache coherence is preserved within the
> same shared cache domain, reducing the remote access from
> tile|compute package via snooping (local hit in either L2 or L3
> within same NUMA domain).
> 
> Library dependency: hwloc
> 
> Topology Flags:
> ---------------
>  - RTE_LCORE_DOMAIN_L1: to group cores sharing same L1 cache
>  - RTE_LCORE_DOMAIN_SMT: same as RTE_LCORE_DOMAIN_L1
>  - RTE_LCORE_DOMAIN_L2: group cores sharing same L2 cache
>  - RTE_LCORE_DOMAIN_L3: group cores sharing same L3 cache
>  - RTE_LCORE_DOMAIN_L4: group cores sharing same L4 cache
>  - RTE_LCORE_DOMAIN_IO: group cores sharing same IO
> 
> < Function: Purpose >
> ---------------------
>  - rte_get_domain_count: get domain count based on Topology Flag
>  - rte_lcore_count_from_domain: get valid lcores count under each
> domain
>  - rte_get_lcore_in_domain: valid lcore id based on index
>  - rte_lcore_cpuset_in_domain: return valid cpuset based on index
>  - rte_lcore_is_main_in_domain: return true|false if main lcore is
> present
>  - rte_get_next_lcore_from_domain: next valid lcore within domain
>  - rte_get_next_lcore_from_next_domain: next valid lcore from next
> domain
> 
> Note:
>  1. Topology is NUMA grouping.
>  2. Domain is various sub-groups within a specific Topology.
> 
> Topology example: L1, L2, L3, L4, IO
> Domian example: IO-A, IO-B
> 
> < MACRO: Purpose >
> ------------------
>  - RTE_LCORE_FOREACH_DOMAIN: iterate lcores from all domains
>  - RTE_LCORE_FOREACH_WORKER_DOMAIN: iterate worker lcores from all
> domains
>  - RTE_LCORE_FORN_NEXT_DOMAIN: iterate domain select n'th lcore
>  - RTE_LCORE_FORN_WORKER_NEXT_DOMAIN: iterate domain for worker n'th
> lcore.
> 
> Future work (after merge):
> --------------------------
>  - dma-perf per IO NUMA
>  - eventdev per L3 NUMA
>  - pipeline per SMT|L3 NUMA
>  - distributor per L3 for Port-Queue
>  - l2fwd-power per SMT
>  - testpmd option for IO NUMA per port
> 
> Platform tested on:
> -------------------
>  - INTEL(R) XEON(R) PLATINUM 8562Y+ (support IO numa 1 & 2)
>  - AMD EPYC 8534P (supports IO numa 1 & 2)
>  - AMD EPYC 9554 (supports IO numa 1, 2, 4)
> 
> Logs:
> -----
> 1. INTEL(R) XEON(R) PLATINUM 8562Y+:
>  - SNC=1
>         Domain (IO): at index (0) there are 48 core, with (0) at
> index 0
>  - SNC=2
>         Domain (IO): at index (0) there are 24 core, with (0) at
> index 0 Domain (IO): at index (1) there are 24 core, with (12) at
> index 0
> 
> 2. AMD EPYC 8534P:
>  - NPS=1:
>         Domain (IO): at index (0) there are 128 core, with (0) at
> index 0
>  - NPS=2:
>         Domain (IO): at index (0) there are 64 core, with (0) at
> index 0 Domain (IO): at index (1) there are 64 core, with (32) at
> index 0
> 
> Signed-off-by: Vipin Varghese <vipin.varghese@amd.com>
> 
> Vipin Varghese (4):
>   eal/lcore: add topology based functions
>   test/lcore: enable tests for topology
>   doc: add topology grouping details
>   examples: update with lcore topology API
> 
>  app/test/test_lcores.c                        | 528 +++++++++++++
>  config/meson.build                            |  18 +
>  .../prog_guide/env_abstraction_layer.rst      |  22 +
>  examples/helloworld/main.c                    | 154 +++-
>  examples/l2fwd/main.c                         |  56 +-
>  examples/skeleton/basicfwd.c                  |  22 +
>  lib/eal/common/eal_common_lcore.c             | 714
> ++++++++++++++++++ lib/eal/common/eal_private.h                  |
> 58 ++ lib/eal/freebsd/eal.c                         |  10 +
>  lib/eal/include/rte_lcore.h                   | 209 +++++
>  lib/eal/linux/eal.c                           |  11 +
>  lib/eal/meson.build                           |   4 +
>  lib/eal/version.map                           |  11 +
>  lib/eal/windows/eal.c                         |  12 +
>  14 files changed, 1819 insertions(+), 10 deletions(-)
>

next prev parent reply	other threads:[~2025-03-17 13:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-05 10:28 Vipin Varghese
2024-11-05 10:28 ` [PATCH v4 1/4] eal/lcore: add topology based functions Vipin Varghese
2024-11-05 10:28 ` [PATCH v4 2/4] test/lcore: enable tests for topology Vipin Varghese
2024-11-05 10:28 ` [PATCH v4 3/4] doc: add topology grouping details Vipin Varghese
2024-11-05 10:28 ` [PATCH v4 4/4] examples: update with lcore topology API Vipin Varghese
2025-02-13  3:09 ` [PATCH v4 0/4] Introduce Topology NUMA grouping for lcores Varghese, Vipin
2025-02-13  8:34   ` Thomas Monjalon
2025-02-13  9:18     ` Morten Brørup
2025-03-03  9:06       ` Varghese, Vipin
2025-03-04 10:08         ` Morten Brørup
2025-03-05  7:43           ` Mattias Rönnblom
2025-03-03  8:59     ` Varghese, Vipin
2025-03-17 13:46 ` Jan Viktorin [this message]
2025-04-09 10:08   ` Varghese, Vipin
2025-06-03  6:03     ` Varghese, Vipin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250317144607.118a40b0@coaster.localdomain \
    --to=viktorin@cesnet.cz \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=ferruh.yigit@amd.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=john.mcnamara@intel.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=mb@smartsharesystems.com \
    --cc=pbhagavatula@marvell.com \
    --cc=roretzla@linux.microsoft.com \
    --cc=ruifeng.wang@arm.com \
    --cc=stephen@networkplumber.org \
    --cc=vipin.varghese@amd.com \
    --cc=wathsala.vithanage@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).