From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id C317943B57;
	Tue, 20 Feb 2024 14:37:11 +0100 (CET)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id B45A1402A7;
	Tue, 20 Feb 2024 14:37:11 +0100 (CET)
Received: from dkmailrelay1.smartsharesystems.com
 (smartserver.smartsharesystems.com [77.243.40.215])
 by mails.dpdk.org (Postfix) with ESMTP id 903CC40289
 for <dev@dpdk.org>; Tue, 20 Feb 2024 14:37:10 +0100 (CET)
Received: from smartserver.smartsharesystems.com
 (smartserver.smartsharesys.local [192.168.4.10])
 by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id 7027420893;
 Tue, 20 Feb 2024 14:37:07 +0100 (CET)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [RFC v3 1/6] eal: add static per-lcore memory allocation facility
X-MimeOLE: Produced By Microsoft Exchange V6.5
Date: Tue, 20 Feb 2024 14:37:04 +0100
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F236@smartserver.smartshare.dk>
In-Reply-To: <ZdSPb9SLnH9WsoWW@bricha3-mobl1.ger.corp.intel.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [RFC v3 1/6] eal: add static per-lcore memory allocation facility
Thread-Index: Adpj8YBgFffazJSmR6OfFjmtkxcKQwACmcfQ
References: <20240219094036.485727-2-mattias.ronnblom@ericsson.com>
 <20240220084908.488252-1-mattias.ronnblom@ericsson.com>
 <20240220084908.488252-2-mattias.ronnblom@ericsson.com>
 <ZdRszO-P4qZubtp3@bricha3-mobl1.ger.corp.intel.com>
 <68c8f01c-d404-4b63-adca-13b560c95df8@lysator.liu.se>
 <ZdSPb9SLnH9WsoWW@bricha3-mobl1.ger.corp.intel.com>
From: =?iso-8859-1?Q?Morten_Br=F8rup?= <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>,
 =?iso-8859-1?Q?Mattias_R=F6nnblom?= <hofors@lysator.liu.se>
Cc: =?iso-8859-1?Q?Mattias_R=F6nnblom?= <mattias.ronnblom@ericsson.com>,
 <dev@dpdk.org>, "Stephen Hemminger" <stephen@networkplumber.org>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Tuesday, 20 February 2024 12.39
>=20
> On Tue, Feb 20, 2024 at 11:47:14AM +0100, Mattias R=F6nnblom wrote:
> > On 2024-02-20 10:11, Bruce Richardson wrote:
> > > On Tue, Feb 20, 2024 at 09:49:03AM +0100, Mattias R=F6nnblom =
wrote:
> > > > Introduce DPDK per-lcore id variables, or lcore variables for
> short.
> > > >
> > > > An lcore variable has one value for every current and future
> lcore
> > > > id-equipped thread.
> > > >
> > > > The primary <rte_lcore_var.h> use case is for statically
> allocating
> > > > small chunks of often-used data, which is related logically, but
> where
> > > > there are performance benefits to reap from having updates being
> local
> > > > to an lcore.
> > > >
> > > > Lcore variables are similar to thread-local storage (TLS, e.g.,
> C11
> > > > _Thread_local), but decoupling the values' life time with that =
of
> the
> > > > threads.
>=20
> <snip>
>=20
> > > > +/*
> > > > + * Avoid using offset zero, since it would result in a NULL-
> value
> > > > + * "handle" (offset) pointer, which in principle and per the =
API
> > > > + * definition shouldn't be an issue, but may confuse some tools
> and
> > > > + * users.
> > > > + */
> > > > +#define INITIAL_OFFSET 1
> > > > +
> > > > +char rte_lcore_var[RTE_MAX_LCORE][RTE_MAX_LCORE_VAR]
> __rte_cache_aligned;
> > > > +
> > >
> > > While I like the idea of improved handling for per-core variables,
> my main
> > > concern with this set is this definition here, which adds yet
> another
> > > dependency on the compile-time defined RTE_MAX_LCORE value.
> > >
> >
> > lcore variables replaces one RTE_MAX_LCORE-dependent pattern with
> another.
> >
> > You could even argue the dependency on RTE_MAX_LCORE is reduced with
> lcore
> > variables, if you look at where/in how many places in the code base
> this
> > macro is being used. Centralizing per-lcore data management may also
> provide
> > some opportunity in the future for extending the API to cope with
> some more
> > dynamic RTE_MAX_LCORE variant. Not without ABI breakage of course,
> but we
> > are not ever going to change anything related to RTE_MAX_LCORE
> without
> > breaking the ABI, since this constant is everywhere, including
> compiled into
> > the application itself.
> >
>=20
> Yep, that is true if it's widely used.
>=20
> > > I believe we already have an issue with this #define where it's
> impossible
> > > to come up with a single value that works for all, or nearly all
> cases. The
> > > current default is still 128, yet DPDK needs to support systems
> where the
> > > number of cores is well into the hundreds, requiring workarounds =
of
> core
> > > mappings or customized builds of DPDK. Upping the value fixes =
those
> issues
> > > at the cost to memory footprint explosion for smaller systems.
> > >
> >
> > I agree this is an issue.
> >
> > RTE_MAX_LCORE also need to be sized to accommodate not only all =
cores
> used,
> > but the sum of all EAL threads and registered non-EAL threads.
> >
> > So, there is no reliable way to discover what RTE_MAX_LCORE is on a
> > particular piece of hardware, since the actual number of lcore ids
> needed is
> > up to the application.
> >
> > Why is the default set so low? Linux has MAX_CPUS, which serves the
> same
> > purpose, which is set to 4096 by default, if I recall correctly.
> Shouldn't
> > we at least be able to increase it to 256?

I recall a recent techboard meeting where the default was discussed. The =
default was agreed so low because it suffices for the vast majority of =
hardware out there, and applications for bigger platforms can be =
expected to build DPDK with a different configuration themselves. And as =
Bruce also mentions, it's a tradeoff for memory consumption.

>=20
> The default is so low because of the mempool caches. These are an =
array
> of
> buffer pointers with 512 (IIRC) entries per core up to RTE_MAX_LCORE.

The decision was based on a need to make a quick decision, so we used =
narrow guesstimates, not a broader memory consumption analysis.

If we really cared about default memory consumption, we should reduce =
the default RTE_MAX_QUEUES_PER_PORT from 1024 too. It has quite an =
effect.

Having hard data about which build time configuration parameters have =
the biggest effect on memory consumption would be extremely useful for =
tweaking the parameters for resource limited hardware.
It's a mix of static and dynamic allocation, so it's not obvious which =
scalable data structures consume the most memory.

>=20
> >
> > > I'm therefore nervous about putting more dependencies on this
> value, when I
> > > feel we should be moving away from its use, to allow more runtime
> > > configurability of cores.
> > >
> >
> > What more specifically do you have in mind?
> >
>=20
> I don't think having a dynamically scaling RTE_MAX_LCORE is feasible,
> but
> what I would like to see is a runtime specified value. For example, =
you
> could run DPDK with EAL parameter "--max-lcores=3D1024" for large =
systems
> or
> "--max-lcores=3D32" for small ones. That would then be used at =
init-time
> to
> scale all internal datastructures appropriately.
>=20

I agree 100 % that a better long term solution should be on the general =
road map.
Memory is a precious resource, but few seem to care about it.

A mix could provide an easy migration path:
Having RTE_MAX_LCORE as the hard upper limit (and default value) for a =
runtime specified max number ("rte_max_lcores").
With this, the goal would be for modules with very small data sets to =
continue using RTE_MAX_LCORE fixed size arrays, and for modules with =
larger data sets to migrate to rte_max_lcores dynamically sized arrays.

I am opposed to blocking a new patch series, only because it adds =
another RTE_MAX_LCORE sized array. We already have plenty of those.
It can be migrated towards dynamically sized array at a later time, just =
like the other modules with RTE_MAX_LCORE sized arrays.
Perhaps "fixing" an existing module would free up more memory than =
fixing this module. Let's spend development resources where they have =
the biggest impact.