DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: David Marchand <david.marchand@redhat.com>, <dev@dpdk.org>
Subject: Re: [PATCH v2 0/3] allow easier use of high lcore-ids
Date: Mon, 7 Apr 2025 12:56:02 +0100	[thread overview]
Message-ID: <Z_O9Uq_Mg86yjKzi@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9FB9A@smartserver.smartshare.dk>

On Mon, Apr 07, 2025 at 01:32:59PM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Monday, 7 April 2025 12.41
> > 
> > On Mon, Apr 07, 2025 at 12:15:13PM +0200, Morten Brørup wrote:
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] Sent:
> > > > Monday, 7 April 2025 11.49
> > > >
> > > > On Mon, Apr 07, 2025 at 09:04:05AM +0200, David Marchand wrote:
> > > > > Hello Bruce,
> > > > >
> > > > > On Tue, Apr 1, 2025 at 4:08 PM Bruce Richardson
> > > > > <bruce.richardson@intel.com> wrote:
> > > > > >
> > > > > > On Mon, Mar 24, 2025 at 05:30:26PM +0000, Bruce Richardson
> > wrote:
> > > > > > > Traditionally, DPDK has had a direct mapping of internal
> > lcore-
> > > > ids, to
> > > > > > > the actual core numbers in use. With higher core count
> > servers
> > > > becoming
> > > > > > > more prevalent the issue becomes one of increasing memory
> > > > footprint when
> > > > > > > using such a scheme, due to the need to have all arrays
> > > > dimensioned for
> > > > > > > all cores on the system, whether or not those cores are in
> > use by
> > > > the
> > > > > > > app.
> > > > > > >
> > > > > > > Therefore, the decision was made in the past to not expand
> > the
> > > > > > > build-time RTE_MAX_LCORE value beyond 128. Instead, it was
> > > > recommended
> > > > > > > that users use the "--lcores" EAL parameter to take the high-
> > > > numbered
> > > > > > > cores they wish to use and map them to lcore-ids within the 0
> > -
> > > > 128
> > > > > > > range. While this works, this is a little clunky as it means
> > that
> > > > > > > instead of just passing, for example, "-l 130-139", the user
> > must
> > > > > > > instead pass "--lcores 0@130,1@131,2@132,3@133,...."
> > > > > > >
> > > > > > > This patchset attempts to simplify the situation by adding a
> > new
> > > > flag to
> > > > > > > do this mapping automatically. To use cores 130-139 and map
> > them
> > > > to ids
> > > > > > > 0-9 internally, the EAL args now become: "-l 130-139 --map-
> > lcore-
> > > > ids",
> > > > > > > or using the shorter "-M" version of the flag: "-Ml 130-139".
> > > > > > >
> > > > > > > Adding this new parameter required some rework of the
> > existing
> > > > arg
> > > > > > > parsing code, because in current DPDK the args are parsed and
> > > > checked in
> > > > > > > the order they appear on the commandline. This means that
> > using
> > > > the
> > > > > > > example above, the core parameter 130-139 will be rejected
> > > > immediately
> > > > > > > before the "map-lcore-ids" parameter is seen. To work around
> > > > this, the
> > > > > > > core (and service core) parameters are not parsed when seen,
> > > > instead
> > > > > > > they are only saved off and parsed after all arguments are
> > > > parsed. The
> > > > > > > "-l" and "-c" parameters are converted into "--lcores"
> > arguments,
> > > > so all
> > > > > > > assigning of lcore ids is done there in all cases.
> > > > > > >
> > > > > > > RFC->v2: * converted printf to DEBUG log * added "-M" as
> > shorter
> > > > > > > version of flag * added documentation * renamed internal API
> > that
> > > > > > > was changed to avoid any potential
> > > > hidden
> > > > > > >   runtime issues.
> > > > > > >
> > > > > > > Bruce Richardson (3): eal: centralize core parameter parsing
> > eal:
> > > > > > > convert core masks and lists to core sets eal: allow
> > automatic
> > > > > > > mapping of high lcore ids
> > > > > > >
> > > > > > Ping for review.
> > > > > >
> > > > > > At a high level, does this feature seem useful to users?
> > > > >
> > > > > This seems useful, though I am not I would touch the existing
> > > > options.
> > > > > I would have gone with a simple -L option (taking the same kind
> > of
> > > > > input than -l but with new behavior), and not combine a flag with
> > > > > existing options.
> > > > >
> > > >
> > > > That would be an easier patchset to do up. However, it would then
> > mean
> > > > that we have no less than 4 different ways to specify the cores to
> > use:
> > > > "- c", "-l", "-L", "--lcores" - and therefore need 4 different sets
> > of
> > > > parsing options for them, and more checks to ensure we have only
> > one of
> > > > the 4 specified in any run. That's why I chose the modifier option,
> > and
> > > > to try and consolidate the core setup a bit.
> > > >
> > > > However, if having a completely new option is preferred, I am happy
> > > > enough to do up a different patchset for that.
> > > >
> > > > > I scanned through the series, not much to say.  Maybe add a unit
> > test
> > > > > for new cmdline option.
> > > > >
> > > > Sure. Once it's decided what approach (if any) to take, I'll do up
> > a
> > > > new patchset, taking into account any relevant feedback on this
> > set.
> > > >
> > > > /Bruce
> > >
> > > Changing the EAL parameter parser to a "two pass parser" makes sense.
> > I
> > > think checking for existence of more than one lcore specification
> > options
> > > should suffice; we don't need to accept multiple lcore specification
> > > options and check for conflicts.
> > >
> > > When remapping, do we need to support gaps in the "lcore" (logical
> > cores)
> > > array, e.g. for secondary processes, or can it be continuous from 0
> > to
> > > the number of specified lcores?
> > >
> > > And are new EAL parameters for this really beneficial?  Doesn't e.g.
> > "-l
> > > 0-9@130-139,100@140" suffice?
> > >
> > Actually, I believe "0-9@130-139"[1]  is not the same as
> > "0@130,1@131,2@132,...". The latter affinities one thread to one core
> > ten
> > times over, while the former affinitizes 10 threads to 10 cores -
> > leaving
> > those threads free to move about within the 10 cores specified.
> 
> Interesting. The documentation [GSG] isn't clear to me about this; a example there could help clarify.
> 
> [GSG]: https://doc.dpdk.org/guides/linux_gsg/linux_eal_parameters.html#lcore-related-options
> 

Yep, agreed.

> If users are manually passing lcore parameters to the EAL, then I see why some sort of remapping shorthand is useful.
> IMO, if the mappings are relatively exotic, it should be acceptable requiring an external script to build a long list of mapping parameters and then invoke the application with those script-generated EAL parameters.
> This would reduce the scope to support relatively simple, common mappings.
> 
> Could we expand the --lcores syntax to support common mappings?
> 
> E.g. "0-9@130+" to do what I thought.
> The lack of "()" treats the entries individually (not as a group), and the "+" indicates auto-increment.
> 
> A more advanced example:
> "0-9@(130-131)+", meaning lcore 0 gets cpus 130-131, lcore 1 gets cpus 132-133, etc.
> 

My issues with the above syntax idea is:
* I think it's overly complicating the lcores parameter adding in the
  support for the "+" symbol - especially in the advanced case example you
  provide. I worry about code maintainability here.
* More significantly for me, I think it's also getting things backwards in
  that it is focusing more on the lcore ids visible to the app, rather than
  the physical cores to be used. For the example above of 0-9@130+, what I
  would expect is that the user is mainly thinking about the cores he wants
  to use 130-139, which are then to be mapped to lower ids. If we had the
  syntax reversed where the physical cores were first, I'd say it would
  make more sense, e.g. 130-139@0+
* finally, as I found out last month, there are systems on which the cores
  are spread across numa-nodes on odd/even boundaries, so to have an app
  running on socket 0, you need to give the core ids as a list i.e.
  0,2,4,6, and cannot use ranges. [This also reenforces the point above
  too, where again it's the internal ids we need to generalize, not the
  physical cpus]

My thinking on the matter, and I'm happy to be corrected if I'm wrong here,
is that end-users are unlikely to significantly care what the actual lcore
ids are internally in the program, so long as they work and are unique.
What does matter is the physical cpus on which the code is to run.
Therefore, my approach to this is to find the simplest possible solution
whereby the user can just provide a list of cores and tell EAL to just map
them to reasonable values. For the "reasonable" values, I would imagine
that for the vast majority of cases starting "0" is what is wanted. For
anything beyond that, we already have the existing --lcores syntax to be
used.

My 2c.

/Bruce

  reply	other threads:[~2025-04-07 11:56 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 11:38 [RFC PATCH " Bruce Richardson
2025-03-13 11:38 ` [RFC PATCH 1/3] eal: centralize core parameter parsing Bruce Richardson
2025-03-13 11:38 ` [RFC PATCH 2/3] eal: convert core masks and lists to core sets Bruce Richardson
2025-03-13 11:38 ` [RFC PATCH 3/3] eal: allow automatic mapping of high lcore ids Bruce Richardson
2025-03-24 17:30 ` [PATCH v2 0/3] allow easier use of high lcore-ids Bruce Richardson
2025-03-24 17:30   ` [PATCH v2 1/3] eal: centralize core parameter parsing Bruce Richardson
2025-04-07  6:58     ` David Marchand
2025-03-24 17:30   ` [PATCH v2 2/3] eal: convert core masks and lists to core sets Bruce Richardson
2025-04-07  6:59     ` David Marchand
2025-03-24 17:30   ` [PATCH v2 3/3] eal: allow automatic mapping of high lcore ids Bruce Richardson
2025-04-01 14:06   ` [PATCH v2 0/3] allow easier use of high lcore-ids Bruce Richardson
2025-04-07  7:04     ` David Marchand
2025-04-07  9:48       ` Bruce Richardson
2025-04-07 10:15         ` Morten Brørup
2025-04-07 10:40           ` Bruce Richardson
2025-04-07 11:32             ` Morten Brørup
2025-04-07 11:56               ` Bruce Richardson [this message]
2025-04-07 12:25                 ` Morten Brørup
2025-04-07 12:41                   ` Bruce Richardson
2025-04-07 13:18                     ` Morten Brørup
2025-04-07 13:24                       ` Bruce Richardson
2025-04-07 15:14           ` Stephen Hemminger
2025-04-07 15:38             ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_O9Uq_Mg86yjKzi@bricha3-mobl1.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=mb@smartsharesystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).