DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: David Marchand <david.marchand@redhat.com>, <dev@dpdk.org>
Subject: Re: [PATCH v2 0/3] allow easier use of high lcore-ids
Date: Mon, 7 Apr 2025 13:41:40 +0100	[thread overview]
Message-ID: <Z_PIBLQIT3eu6Ab3@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9FB9E@smartserver.smartshare.dk>

On Mon, Apr 07, 2025 at 02:25:46PM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Monday, 7 April 2025 13.56
> > 
> > On Mon, Apr 07, 2025 at 01:32:59PM +0200, Morten Brørup wrote:
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Monday, 7 April 2025 12.41
> > > >
> > > > On Mon, Apr 07, 2025 at 12:15:13PM +0200, Morten Brørup wrote:
> > > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent:
> > > > > > Monday, 7 April 2025 11.49
> > > > > >
> > > > > > On Mon, Apr 07, 2025 at 09:04:05AM +0200, David Marchand wrote:
> > > > > > > Hello Bruce,
> > > > > > >
> > > > > > > On Tue, Apr 1, 2025 at 4:08 PM Bruce Richardson
> > > > > > > <bruce.richardson@intel.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Mar 24, 2025 at 05:30:26PM +0000, Bruce Richardson
> > > > wrote:
> > > > > > > > > Traditionally, DPDK has had a direct mapping of internal
> > > > lcore-
> > > > > > ids, to
> > > > > > > > > the actual core numbers in use. With higher core count
> > > > servers
> > > > > > becoming
> > > > > > > > > more prevalent the issue becomes one of increasing memory
> > > > > > footprint when
> > > > > > > > > using such a scheme, due to the need to have all arrays
> > > > > > dimensioned for
> > > > > > > > > all cores on the system, whether or not those cores are
> > in
> > > > use by
> > > > > > the
> > > > > > > > > app.
> > > > > > > > >
> > > > > > > > > Therefore, the decision was made in the past to not
> > expand
> > > > the
> > > > > > > > > build-time RTE_MAX_LCORE value beyond 128. Instead, it
> > was
> > > > > > recommended
> > > > > > > > > that users use the "--lcores" EAL parameter to take the
> > high-
> > > > > > numbered
> > > > > > > > > cores they wish to use and map them to lcore-ids within
> > the 0
> > > > -
> > > > > > 128
> > > > > > > > > range. While this works, this is a little clunky as it
> > means
> > > > that
> > > > > > > > > instead of just passing, for example, "-l 130-139", the
> > user
> > > > must
> > > > > > > > > instead pass "--lcores 0@130,1@131,2@132,3@133,...."
> > > > > > > > >
> > > > > > > > > This patchset attempts to simplify the situation by
> > adding a
> > > > new
> > > > > > flag to
> > > > > > > > > do this mapping automatically. To use cores 130-139 and
> > map
> > > > them
> > > > > > to ids
> > > > > > > > > 0-9 internally, the EAL args now become: "-l 130-139 --
> > map-
> > > > lcore-
> > > > > > ids",
> > > > > > > > > or using the shorter "-M" version of the flag: "-Ml 130-
> > 139".
> > > > > > > > >
> > > > > > > > > Adding this new parameter required some rework of the
> > > > existing
> > > > > > arg
> > > > > > > > > parsing code, because in current DPDK the args are parsed
> > and
> > > > > > checked in
> > > > > > > > > the order they appear on the commandline. This means that
> > > > using
> > > > > > the
> > > > > > > > > example above, the core parameter 130-139 will be
> > rejected
> > > > > > immediately
> > > > > > > > > before the "map-lcore-ids" parameter is seen. To work
> > around
> > > > > > this, the
> > > > > > > > > core (and service core) parameters are not parsed when
> > seen,
> > > > > > instead
> > > > > > > > > they are only saved off and parsed after all arguments
> > are
> > > > > > parsed. The
> > > > > > > > > "-l" and "-c" parameters are converted into "--lcores"
> > > > arguments,
> > > > > > so all
> > > > > > > > > assigning of lcore ids is done there in all cases.
> > > > > > > > >
> > > > > > > > > RFC->v2: * converted printf to DEBUG log * added "-M" as
> > > > shorter
> > > > > > > > > version of flag * added documentation * renamed internal
> > API
> > > > that
> > > > > > > > > was changed to avoid any potential
> > > > > > hidden
> > > > > > > > >   runtime issues.
> > > > > > > > >
> > > > > > > > > Bruce Richardson (3): eal: centralize core parameter
> > parsing
> > > > eal:
> > > > > > > > > convert core masks and lists to core sets eal: allow
> > > > automatic
> > > > > > > > > mapping of high lcore ids
> > > > > > > > >
> > > > > > > > Ping for review.
> > > > > > > >
> > > > > > > > At a high level, does this feature seem useful to users?
> > > > > > >
> > > > > > > This seems useful, though I am not I would touch the existing
> > > > > > options.
> > > > > > > I would have gone with a simple -L option (taking the same
> > kind
> > > > of
> > > > > > > input than -l but with new behavior), and not combine a flag
> > with
> > > > > > > existing options.
> > > > > > >
> > > > > >
> > > > > > That would be an easier patchset to do up. However, it would
> > then
> > > > mean
> > > > > > that we have no less than 4 different ways to specify the cores
> > to
> > > > use:
> > > > > > "- c", "-l", "-L", "--lcores" - and therefore need 4 different
> > sets
> > > > of
> > > > > > parsing options for them, and more checks to ensure we have
> > only
> > > > one of
> > > > > > the 4 specified in any run. That's why I chose the modifier
> > option,
> > > > and
> > > > > > to try and consolidate the core setup a bit.
> > > > > >
> > > > > > However, if having a completely new option is preferred, I am
> > happy
> > > > > > enough to do up a different patchset for that.
> > > > > >
> > > > > > > I scanned through the series, not much to say.  Maybe add a
> > unit
> > > > test
> > > > > > > for new cmdline option.
> > > > > > >
> > > > > > Sure. Once it's decided what approach (if any) to take, I'll do
> > up
> > > > a
> > > > > > new patchset, taking into account any relevant feedback on this
> > > > set.
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > Changing the EAL parameter parser to a "two pass parser" makes
> > sense.
> > > > I
> > > > > think checking for existence of more than one lcore specification
> > > > options
> > > > > should suffice; we don't need to accept multiple lcore
> > specification
> > > > > options and check for conflicts.
> > > > >
> > > > > When remapping, do we need to support gaps in the "lcore"
> > (logical
> > > > cores)
> > > > > array, e.g. for secondary processes, or can it be continuous from
> > 0
> > > > to
> > > > > the number of specified lcores?
> > > > >
> > > > > And are new EAL parameters for this really beneficial?  Doesn't
> > e.g.
> > > > "-l
> > > > > 0-9@130-139,100@140" suffice?
> > > > >
> > > > Actually, I believe "0-9@130-139"[1]  is not the same as
> > > > "0@130,1@131,2@132,...". The latter affinities one thread to one
> > core
> > > > ten
> > > > times over, while the former affinitizes 10 threads to 10 cores -
> > > > leaving
> > > > those threads free to move about within the 10 cores specified.
> > >
> > > Interesting. The documentation [GSG] isn't clear to me about this; a
> > example there could help clarify.
> > >
> > > [GSG]:
> > https://doc.dpdk.org/guides/linux_gsg/linux_eal_parameters.html#lcore-
> > related-options
> > >
> > 
> > Yep, agreed.
> > 
> > > If users are manually passing lcore parameters to the EAL, then I see
> > why some sort of remapping shorthand is useful.
> > > IMO, if the mappings are relatively exotic, it should be acceptable
> > requiring an external script to build a long list of mapping parameters
> > and then invoke the application with those script-generated EAL
> > parameters.
> > > This would reduce the scope to support relatively simple, common
> > mappings.
> > >
> > > Could we expand the --lcores syntax to support common mappings?
> > >
> > > E.g. "0-9@130+" to do what I thought.
> > > The lack of "()" treats the entries individually (not as a group),
> > and the "+" indicates auto-increment.
> > >
> > > A more advanced example:
> > > "0-9@(130-131)+", meaning lcore 0 gets cpus 130-131, lcore 1 gets
> > cpus 132-133, etc.
> > >
> > 
> > My issues with the above syntax idea is:
> > * I think it's overly complicating the lcores parameter adding in the
> >   support for the "+" symbol - especially in the advanced case example
> > you
> >   provide. I worry about code maintainability here.
> > * More significantly for me, I think it's also getting things backwards
> > in
> >   that it is focusing more on the lcore ids visible to the app, rather
> > than
> >   the physical cores to be used. For the example above of 0-9@130+,
> > what I
> >   would expect is that the user is mainly thinking about the cores he
> > wants
> >   to use 130-139, which are then to be mapped to lower ids. If we had
> > the
> >   syntax reversed where the physical cores were first, I'd say it would
> >   make more sense, e.g. 130-139@0+
> 
> I 100 % agree on the syntax being backwards.
> A good reason for introducing a new parameter, rather than expanding on "--lcores".
> 

Yes and no.
I agree with not expanding on --lcores, but I also don't think any new
parameter added should attempt to reproduce all of what lcores does. I
would leave --lcores as-is, as the power-tool for lcore config e.g. what
you talk about below for mapping multiple lcores to the same physical cpu.

> We should consider deprecating the old (backwards) syntax, so users don’t get confused about one EAL parameter being "backwards" of the other.
> 

I disagree with this. I would be ok with deprecating the old "-c" coremask
syntax - I think the time is long past when we should be dealing with
masks. However, removing "-l" and "--lcores" flag is, to me anyway, too big
and jarring a change for end users for us to consider.

> > * finally, as I found out last month, there are systems on which the
> > cores
> >   are spread across numa-nodes on odd/even boundaries, so to have an
> > app
> >   running on socket 0, you need to give the core ids as a list i.e.
> >   0,2,4,6, and cannot use ranges. [This also reenforces the point above
> >   too, where again it's the internal ids we need to generalize, not the
> >   physical cpus]
> 
> For this, we could use "/2" (like in subnets), or "+2" as increment parameter.
> 
> > 
> > My thinking on the matter, and I'm happy to be corrected if I'm wrong
> > here,
> > is that end-users are unlikely to significantly care what the actual
> > lcore
> > ids are internally in the program, so long as they work and are unique.
> 
> Generally yes.
> Note that we should allow the same physical CPU core to be assigned to multiple lcores...
> If a CPU core is shared between multiple worker lcores, then it could be problematic, but with a mix of lcore roles, this might be handy for some applications. E.g. a virtual machine with only one CPU core could use that single CPU core as both main and worker lcore, or as main and service lcore.
> Sharing an lcore between threads requires the developer to take special care of e.g. spinlocks, but the EAL parameter parser should not prohibit it. It might log a notice, though.
> 

This is already taken care of via --lcores, so I don't see the need to
reimplement it using a new flag also. Any new flags we add should be kept
deliberately simple.

> > What does matter is the physical cpus on which the code is to run.
> > Therefore, my approach to this is to find the simplest possible
> > solution
> > whereby the user can just provide a list of cores and tell EAL to just
> > map
> > them to reasonable values. For the "reasonable" values, I would imagine
> > that for the vast majority of cases starting "0" is what is wanted. For
> > anything beyond that, we already have the existing --lcores syntax to
> > be
> > used.
> 
> Agree with all of the above. :-)
> 

Great. That still leaves us with the problem of what exactly to do as the
best solution. Here are some alternatives that I see:

1. Add a modifier flag for -l and -c parameters to auto-remap the lcore ids
   to zero, so user is just specifying physical CPU id's.
2. Add a new flag for specifying physical cpu ids, which auto-remaps the
   cores specified to zero.
   2a. To simplify our code and user experience we could at the same time:
       * deprecate "-c" flag for coremasks
       * make "-l" and "--lcores" the same flag just in long and short
         versions. This should not break anything as "-l" parameter is
         just as subset of what "--lcores" provides.
       * that would leave us with effectively two core flag paths:
          - -l/--lcores, behaviour as now, full explicit control
          - -L/--lcores-remapped, takes simplified core list (only "," and
            "-" supported as with "-l" now), and maps them to zero-based.

Third options? Any other feedback?

/Bruce

  reply	other threads:[~2025-04-07 12:41 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 11:38 [RFC PATCH " Bruce Richardson
2025-03-13 11:38 ` [RFC PATCH 1/3] eal: centralize core parameter parsing Bruce Richardson
2025-03-13 11:38 ` [RFC PATCH 2/3] eal: convert core masks and lists to core sets Bruce Richardson
2025-03-13 11:38 ` [RFC PATCH 3/3] eal: allow automatic mapping of high lcore ids Bruce Richardson
2025-03-24 17:30 ` [PATCH v2 0/3] allow easier use of high lcore-ids Bruce Richardson
2025-03-24 17:30   ` [PATCH v2 1/3] eal: centralize core parameter parsing Bruce Richardson
2025-04-07  6:58     ` David Marchand
2025-03-24 17:30   ` [PATCH v2 2/3] eal: convert core masks and lists to core sets Bruce Richardson
2025-04-07  6:59     ` David Marchand
2025-03-24 17:30   ` [PATCH v2 3/3] eal: allow automatic mapping of high lcore ids Bruce Richardson
2025-04-01 14:06   ` [PATCH v2 0/3] allow easier use of high lcore-ids Bruce Richardson
2025-04-07  7:04     ` David Marchand
2025-04-07  9:48       ` Bruce Richardson
2025-04-07 10:15         ` Morten Brørup
2025-04-07 10:40           ` Bruce Richardson
2025-04-07 11:32             ` Morten Brørup
2025-04-07 11:56               ` Bruce Richardson
2025-04-07 12:25                 ` Morten Brørup
2025-04-07 12:41                   ` Bruce Richardson [this message]
2025-04-07 13:18                     ` Morten Brørup
2025-04-07 13:24                       ` Bruce Richardson
2025-04-07 15:14           ` Stephen Hemminger
2025-04-07 15:38             ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_PIBLQIT3eu6Ab3@bricha3-mobl1.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=mb@smartsharesystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).