From: Jan Viktorin <viktorin@cesnet.cz>
To: Vipin Varghese <vipin.varghese@amd.com>
Cc: <dev@dpdk.org>, <roretzla@linux.microsoft.com>,
<bruce.richardson@intel.com>, <john.mcnamara@intel.com>,
<dmitry.kozliuk@gmail.com>, <pbhagavatula@marvell.com>,
<jerinj@marvell.com>, <ruifeng.wang@arm.com>,
<mattias.ronnblom@ericsson.com>, <anatoly.burakov@intel.com>,
<stephen@networkplumber.org>, <ferruh.yigit@amd.com>,
<honnappa.nagarahalli@arm.com>, <wathsala.vithanage@arm.com>,
<konstantin.ananyev@huawei.com>, <mb@smartsharesystems.com>
Subject: Re: [PATCH v4 0/4] Introduce Topology NUMA grouping for lcores
Date: Mon, 17 Mar 2025 14:46:07 +0100 [thread overview]
Message-ID: <20250317144607.118a40b0@coaster.localdomain> (raw)
In-Reply-To: <20241105102849.1947-1-vipin.varghese@amd.com>
Hello Vipin and others,
please, will there be any progress or update on this series?
I successfully tested those changes on our Intel and AMD machines and
would like to use it in production soon.
The API is a little bit unintuitive, at least for me, but I
successfully integrated into our software.
I am missing a clear relation to the NUMA socket approach used in DPDK.
E.g. I would like to be able to easily walk over a list of lcores from
a specific NUMA node grouped by L3 domain. Yes, there is the
RTE_LCORE_DOMAIN_IO, but would it always match the appropriate socket
IDs?
Also, I do not clearly understand what is the purpose of using domain
selector like:
RTE_LCORE_DOMAIN_L1 | RTE_LCORE_DOMAIN_L2
or even:
RTE_LCORE_DOMAIN_L3 | RTE_LCORE_DOMAIN_L2
the documentation does not explain this. I could not spot any kind of
grouping that would help me in any way. Some "best practices" examples
would be nice to have to understand the intentions better.
I found a little catch when running DPDK with more lcores than there
are physical or SMT CPU cores. This happens when using e.g. an option
like --lcores=(0-15)@(0-1). The results from the topology API would not
match the lcores because hwloc is not aware of the lcores concept. This
might be mentioned somewhere.
Anyway, I really appreciate this work and would like to see it upstream.
Especially for AMD machines, some framework like this is a must.
Kind regards,
Jan
On Tue, 5 Nov 2024 15:58:45 +0530
Vipin Varghese <vipin.varghese@amd.com> wrote:
> This patch introduces improvements for NUMA topology awareness in
> relation to DPDK logical cores. The goal is to expose API which allows
> users to select optimal logical cores for any application. These
> logical cores can be selected from various NUMA domains like CPU and
> I/O.
>
> Change Summary:
> - Introduces the concept of NUMA domain partitioning based on CPU and
> I/O topology.
> - Adds support for grouping DPDK logical cores within the same Cache
> and I/O domain for improved locality.
> - Implements topology detection and core grouping logic that
> distinguishes between the following NUMA configurations:
> * CPU topology & I/O topology (e.g., AMD SoC EPYC, Intel Xeon SPR)
> * CPU+I/O topology (e.g., Ampere One with SLC, Intel Xeon SPR
> with SNC)
> - Enhances performance by minimizing lcore dispersion across
> tiles|compute package with different L2/L3 cache or IO domains.
>
> Reason:
> - Applications using DPDK libraries relies on consistent memory
> access.
> - Lcores being closer to same NUMA domain as IO.
> - Lcores sharing same cache.
>
> Latency is minimized by using lcores that share the same NUMA
> topology. Memory access is optimized by utilizing cores within the
> same NUMA domain or tile. Cache coherence is preserved within the
> same shared cache domain, reducing the remote access from
> tile|compute package via snooping (local hit in either L2 or L3
> within same NUMA domain).
>
> Library dependency: hwloc
>
> Topology Flags:
> ---------------
> - RTE_LCORE_DOMAIN_L1: to group cores sharing same L1 cache
> - RTE_LCORE_DOMAIN_SMT: same as RTE_LCORE_DOMAIN_L1
> - RTE_LCORE_DOMAIN_L2: group cores sharing same L2 cache
> - RTE_LCORE_DOMAIN_L3: group cores sharing same L3 cache
> - RTE_LCORE_DOMAIN_L4: group cores sharing same L4 cache
> - RTE_LCORE_DOMAIN_IO: group cores sharing same IO
>
> < Function: Purpose >
> ---------------------
> - rte_get_domain_count: get domain count based on Topology Flag
> - rte_lcore_count_from_domain: get valid lcores count under each
> domain
> - rte_get_lcore_in_domain: valid lcore id based on index
> - rte_lcore_cpuset_in_domain: return valid cpuset based on index
> - rte_lcore_is_main_in_domain: return true|false if main lcore is
> present
> - rte_get_next_lcore_from_domain: next valid lcore within domain
> - rte_get_next_lcore_from_next_domain: next valid lcore from next
> domain
>
> Note:
> 1. Topology is NUMA grouping.
> 2. Domain is various sub-groups within a specific Topology.
>
> Topology example: L1, L2, L3, L4, IO
> Domian example: IO-A, IO-B
>
> < MACRO: Purpose >
> ------------------
> - RTE_LCORE_FOREACH_DOMAIN: iterate lcores from all domains
> - RTE_LCORE_FOREACH_WORKER_DOMAIN: iterate worker lcores from all
> domains
> - RTE_LCORE_FORN_NEXT_DOMAIN: iterate domain select n'th lcore
> - RTE_LCORE_FORN_WORKER_NEXT_DOMAIN: iterate domain for worker n'th
> lcore.
>
> Future work (after merge):
> --------------------------
> - dma-perf per IO NUMA
> - eventdev per L3 NUMA
> - pipeline per SMT|L3 NUMA
> - distributor per L3 for Port-Queue
> - l2fwd-power per SMT
> - testpmd option for IO NUMA per port
>
> Platform tested on:
> -------------------
> - INTEL(R) XEON(R) PLATINUM 8562Y+ (support IO numa 1 & 2)
> - AMD EPYC 8534P (supports IO numa 1 & 2)
> - AMD EPYC 9554 (supports IO numa 1, 2, 4)
>
> Logs:
> -----
> 1. INTEL(R) XEON(R) PLATINUM 8562Y+:
> - SNC=1
> Domain (IO): at index (0) there are 48 core, with (0) at
> index 0
> - SNC=2
> Domain (IO): at index (0) there are 24 core, with (0) at
> index 0 Domain (IO): at index (1) there are 24 core, with (12) at
> index 0
>
> 2. AMD EPYC 8534P:
> - NPS=1:
> Domain (IO): at index (0) there are 128 core, with (0) at
> index 0
> - NPS=2:
> Domain (IO): at index (0) there are 64 core, with (0) at
> index 0 Domain (IO): at index (1) there are 64 core, with (32) at
> index 0
>
> Signed-off-by: Vipin Varghese <vipin.varghese@amd.com>
>
> Vipin Varghese (4):
> eal/lcore: add topology based functions
> test/lcore: enable tests for topology
> doc: add topology grouping details
> examples: update with lcore topology API
>
> app/test/test_lcores.c | 528 +++++++++++++
> config/meson.build | 18 +
> .../prog_guide/env_abstraction_layer.rst | 22 +
> examples/helloworld/main.c | 154 +++-
> examples/l2fwd/main.c | 56 +-
> examples/skeleton/basicfwd.c | 22 +
> lib/eal/common/eal_common_lcore.c | 714
> ++++++++++++++++++ lib/eal/common/eal_private.h |
> 58 ++ lib/eal/freebsd/eal.c | 10 +
> lib/eal/include/rte_lcore.h | 209 +++++
> lib/eal/linux/eal.c | 11 +
> lib/eal/meson.build | 4 +
> lib/eal/version.map | 11 +
> lib/eal/windows/eal.c | 12 +
> 14 files changed, 1819 insertions(+), 10 deletions(-)
>
prev parent reply other threads:[~2025-03-17 13:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-05 10:28 Vipin Varghese
2024-11-05 10:28 ` [PATCH v4 1/4] eal/lcore: add topology based functions Vipin Varghese
2024-11-05 10:28 ` [PATCH v4 2/4] test/lcore: enable tests for topology Vipin Varghese
2024-11-05 10:28 ` [PATCH v4 3/4] doc: add topology grouping details Vipin Varghese
2024-11-05 10:28 ` [PATCH v4 4/4] examples: update with lcore topology API Vipin Varghese
2025-02-13 3:09 ` [PATCH v4 0/4] Introduce Topology NUMA grouping for lcores Varghese, Vipin
2025-02-13 8:34 ` Thomas Monjalon
2025-02-13 9:18 ` Morten Brørup
2025-03-03 9:06 ` Varghese, Vipin
2025-03-04 10:08 ` Morten Brørup
2025-03-05 7:43 ` Mattias Rönnblom
2025-03-03 8:59 ` Varghese, Vipin
2025-03-17 13:46 ` Jan Viktorin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250317144607.118a40b0@coaster.localdomain \
--to=viktorin@cesnet.cz \
--cc=anatoly.burakov@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=dmitry.kozliuk@gmail.com \
--cc=ferruh.yigit@amd.com \
--cc=honnappa.nagarahalli@arm.com \
--cc=jerinj@marvell.com \
--cc=john.mcnamara@intel.com \
--cc=konstantin.ananyev@huawei.com \
--cc=mattias.ronnblom@ericsson.com \
--cc=mb@smartsharesystems.com \
--cc=pbhagavatula@marvell.com \
--cc=roretzla@linux.microsoft.com \
--cc=ruifeng.wang@arm.com \
--cc=stephen@networkplumber.org \
--cc=vipin.varghese@amd.com \
--cc=wathsala.vithanage@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).