DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/3] Defer lcore variables allocation
@ 2024-12-05 17:57 David Marchand
  2024-12-05 17:57 ` [PATCH 1/3] random: defer seeding to EAL init David Marchand
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: David Marchand @ 2024-12-05 17:57 UTC (permalink / raw)
  To: dev; +Cc: thomas, frode.nordahl, mattias.ronnblom

As I had reported in rc2, the lcore variables allocation have a
noticeable impact on applications consuming DPDK, even when such
applications does not use DPDK, or use features associated to
some lcore variables.

While the amount has been reduced in a rush before rc2,
there are still cases when the increased memory footprint is noticed
like in scaling tests.
See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931


lcore variable allocations in constructor is a bad idea, as the
application consuming DPDK has no control over such allocation:
linking some code does not mean that all of it will be used at runtime.

The general question on whether lcore variables in constructor should
be forbidden, is left to a later discussion.

For now, this series only focus on fixing subsystems using lcore
variables so that those allocations are deferred either in rte_eal_init()
or in the path that does require such lcore variables.


-- 
David Marchand

David Marchand (3):
  random: defer seeding to EAL init
  power: defer lcore variable allocation
  eal/x86: defer power intrinsics variable allocation

 lib/eal/common/eal_private.h       |  6 ++++++
 lib/eal/common/rte_random.c        |  7 +++++--
 lib/eal/freebsd/eal.c              |  2 ++
 lib/eal/linux/eal.c                |  2 ++
 lib/eal/windows/eal.c              |  2 ++
 lib/eal/x86/rte_power_intrinsics.c | 15 ++++++++++++---
 lib/power/rte_power_pmd_mgmt.c     | 27 +++++++++++++++++++--------
 7 files changed, 48 insertions(+), 13 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/3] random: defer seeding to EAL init
  2024-12-05 17:57 [PATCH 0/3] Defer lcore variables allocation David Marchand
@ 2024-12-05 17:57 ` David Marchand
  2024-12-06 11:09   ` Mattias Rönnblom
  2024-12-16  9:38   ` Burakov, Anatoly
  2024-12-05 17:57 ` [PATCH 2/3] power: defer lcore variable allocation David Marchand
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 36+ messages in thread
From: David Marchand @ 2024-12-05 17:57 UTC (permalink / raw)
  To: dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable, Tyler Retzlaff,
	Bruce Richardson, Dmitry Kozlyuk, Morten Brørup,
	Konstantin Ananyev, Chengwen Feng, Stephen Hemminger

The RNG is documented as being seeded as part of EAL init.

/**
 * Seed the pseudo-random generator.
 *
 * The generator is automatically seeded by the EAL init with a timer
 * value. It may need to be re-seeded by the user with a real random
 * value.
 *
...

Move the initialisation (seeding) helper out of a constructor and
call it explicitly from rte_eal_init() as it was done before commit
3f002f069612 ("eal: replace libc-based random generation with LFSR").

This also moves the unconditional lcore variable allocation out of a
constructor.

While at it, mark local symbol rand_state as static.

Fixes: 29c39cd3d54d ("random: keep PRNG state in lcore variable")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/eal/common/eal_private.h | 6 ++++++
 lib/eal/common/rte_random.c  | 7 +++++--
 lib/eal/freebsd/eal.c        | 2 ++
 lib/eal/linux/eal.c          | 2 ++
 lib/eal/windows/eal.c        | 2 ++
 5 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index bb315dab04..a10db6a399 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -702,6 +702,12 @@ eal_get_internal_configuration(void);
 rte_usage_hook_t
 eal_get_application_usage_hook(void);
 
+/**
+ * Initialise random subsystem.
+ */
+void
+eal_rand_init(void);
+
 /**
  * Instruct primary process that a secondary process wants to attach.
  */
diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c
index cf0756f26a..307c26bb7c 100644
--- a/lib/eal/common/rte_random.c
+++ b/lib/eal/common/rte_random.c
@@ -14,6 +14,8 @@
 #include <rte_lcore_var.h>
 #include <rte_random.h>
 
+#include "eal_private.h"
+
 struct __rte_cache_aligned rte_rand_state {
 	uint64_t z1;
 	uint64_t z2;
@@ -22,7 +24,7 @@ struct __rte_cache_aligned rte_rand_state {
 	uint64_t z5;
 };
 
-RTE_LCORE_VAR_HANDLE(struct rte_rand_state, rand_state);
+static RTE_LCORE_VAR_HANDLE(struct rte_rand_state, rand_state);
 
 /* instance to be shared by all unregistered non-EAL threads */
 static struct rte_rand_state unregistered_rand_state;
@@ -228,7 +230,8 @@ __rte_random_initial_seed(void)
 	return rte_get_tsc_cycles();
 }
 
-RTE_INIT(rte_rand_init)
+void
+eal_rand_init(void)
 {
 	uint64_t seed;
 
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index a96bbf5836..d07cff8651 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -777,6 +777,8 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	eal_rand_init();
+
 	eal_check_mem_on_local_socket();
 
 	if (rte_thread_set_affinity_by_id(rte_thread_self(),
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index a6220524a4..b1e63e37fc 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1173,6 +1173,8 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	eal_rand_init();
+
 	eal_check_mem_on_local_socket();
 
 	if (rte_thread_set_affinity_by_id(rte_thread_self(),
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 5cdc053a02..5c7526f922 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -410,6 +410,8 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	eal_rand_init();
+
 	if (rte_thread_set_affinity_by_id(rte_thread_self(),
 			&lcore_config[config->main_lcore].cpuset) != 0) {
 		rte_eal_init_alert("Cannot set affinity");
-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 2/3] power: defer lcore variable allocation
  2024-12-05 17:57 [PATCH 0/3] Defer lcore variables allocation David Marchand
  2024-12-05 17:57 ` [PATCH 1/3] random: defer seeding to EAL init David Marchand
@ 2024-12-05 17:57 ` David Marchand
  2024-12-06 11:29   ` Mattias Rönnblom
  2024-12-05 17:57 ` [PATCH 3/3] eal/x86: defer power intrinsics " David Marchand
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-05 17:57 UTC (permalink / raw)
  To: dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable, Anatoly Burakov,
	David Hunt, Sivaprasad Tummala, Stephen Hemminger, Chengwen Feng,
	Konstantin Ananyev, Morten Brørup

The lcore variable in this code unit is only used through
rte_power_ethdev_pmgmt_queue_*() public symbols.

Defer the unconditional lcore variable allocation in those symbols.

Fixes: 130643319579 ("power: keep per-lcore state in lcore variable")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/power/rte_power_pmd_mgmt.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index a2fff3b765..29e2d438a3 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -72,6 +72,19 @@ struct __rte_cache_aligned pmd_core_cfg {
 };
 static RTE_LCORE_VAR_HANDLE(struct pmd_core_cfg, lcore_cfgs);
 
+static void
+alloc_lcore_cfgs(void)
+{
+	struct pmd_core_cfg *lcore_cfg;
+	unsigned int lcore_id;
+
+	RTE_LCORE_VAR_ALLOC(lcore_cfgs);
+
+	/* initialize all tailqs */
+	RTE_LCORE_VAR_FOREACH(lcore_id, lcore_cfg, lcore_cfgs)
+		TAILQ_INIT(&lcore_cfg->head);
+}
+
 static inline bool
 queue_equal(const union queue *l, const union queue *r)
 {
@@ -517,6 +530,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	if (lcore_cfgs == NULL)
+		alloc_lcore_cfgs();
+
 	lcore_cfg = RTE_LCORE_VAR_LCORE(lcore_id, lcore_cfgs);
 
 	/* check if other queues are stopped as well */
@@ -617,6 +633,9 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		return ret < 0 ? -EINVAL : -EBUSY;
 	}
 
+	if (lcore_cfgs == NULL)
+		alloc_lcore_cfgs();
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	lcore_cfg = RTE_LCORE_VAR_LCORE(lcore_id, lcore_cfgs);
 
@@ -768,16 +787,8 @@ rte_power_pmd_mgmt_get_scaling_freq_max(unsigned int lcore)
 }
 
 RTE_INIT(rte_power_ethdev_pmgmt_init) {
-	unsigned int lcore_id;
-	struct pmd_core_cfg *lcore_cfg;
 	int i;
 
-	RTE_LCORE_VAR_ALLOC(lcore_cfgs);
-
-	/* initialize all tailqs */
-	RTE_LCORE_VAR_FOREACH(lcore_id, lcore_cfg, lcore_cfgs)
-		TAILQ_INIT(&lcore_cfg->head);
-
 	/* initialize config defaults */
 	emptypoll_max = 512;
 	pause_duration = 1;
-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 3/3] eal/x86: defer power intrinsics variable allocation
  2024-12-05 17:57 [PATCH 0/3] Defer lcore variables allocation David Marchand
  2024-12-05 17:57 ` [PATCH 1/3] random: defer seeding to EAL init David Marchand
  2024-12-05 17:57 ` [PATCH 2/3] power: defer lcore variable allocation David Marchand
@ 2024-12-05 17:57 ` David Marchand
  2024-12-06 11:32   ` Mattias Rönnblom
  2024-12-06 11:01 ` [PATCH 0/3] Defer lcore variables allocation Mattias Rönnblom
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-05 17:57 UTC (permalink / raw)
  To: dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable,
	Bruce Richardson, Konstantin Ananyev, Morten Brørup,
	Chengwen Feng, Stephen Hemminger

The lcore variable in this code unit is only used through
rte_power_monitor*() public symbols.

Defer the unconditional lcore variable allocation in those symbols.

Fixes: 18b5049ab4fe ("eal/x86: keep power intrinsics state in lcore variable")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/eal/x86/rte_power_intrinsics.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index e4cb913590..6fc3b2b1cb 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -22,8 +22,6 @@ struct power_wait_status {
 
 RTE_LCORE_VAR_HANDLE(struct power_wait_status, wait_status);
 
-RTE_LCORE_VAR_INIT(wait_status);
-
 /*
  * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state.
  * For more information about usage of these instructions, please refer to
@@ -177,6 +175,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (pmc->fn == NULL)
 		return -EINVAL;
 
+	if (wait_status == NULL)
+		RTE_LCORE_VAR_ALLOC(wait_status);
+
 	s = RTE_LCORE_VAR_LCORE(lcore_id, wait_status);
 
 	/* update sleep address */
@@ -269,6 +270,9 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 	if (lcore_id >= RTE_MAX_LCORE)
 		return -EINVAL;
 
+	if (wait_status == NULL)
+		RTE_LCORE_VAR_ALLOC(wait_status);
+
 	s = RTE_LCORE_VAR_LCORE(lcore_id, wait_status);
 
 	/*
@@ -308,7 +312,7 @@ int
 rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
 		const uint32_t num, const uint64_t tsc_timestamp)
 {
-	struct power_wait_status *s = RTE_LCORE_VAR(wait_status);
+	struct power_wait_status *s;
 	uint32_t i, rc;
 
 	/* check if supported */
@@ -318,6 +322,11 @@ rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
 	if (pmc == NULL || num == 0)
 		return -EINVAL;
 
+	if (wait_status == NULL)
+		RTE_LCORE_VAR_ALLOC(wait_status);
+
+	s = RTE_LCORE_VAR(wait_status);
+
 	/* we are already inside transaction region, return */
 	if (rte_xtest() != 0)
 		return 0;
-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-05 17:57 [PATCH 0/3] Defer lcore variables allocation David Marchand
                   ` (2 preceding siblings ...)
  2024-12-05 17:57 ` [PATCH 3/3] eal/x86: defer power intrinsics " David Marchand
@ 2024-12-06 11:01 ` Mattias Rönnblom
  2024-12-06 15:55   ` Thomas Monjalon
  2024-12-09 11:03   ` David Marchand
  2024-12-16  9:42 ` Burakov, Anatoly
  2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
  5 siblings, 2 replies; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-06 11:01 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: thomas, frode.nordahl, mattias.ronnblom

On 2024-12-05 18:57, David Marchand wrote:
> As I had reported in rc2, the lcore variables allocation have a
> noticeable impact on applications consuming DPDK, even when such
> applications does not use DPDK, or use features associated to
> some lcore variables.
> 
> While the amount has been reduced in a rush before rc2,
> there are still cases when the increased memory footprint is noticed
> like in scaling tests.
> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
> 

What this bug report fails to mention is that it only affects 
applications using locked memory.

> 
> lcore variable allocations in constructor is a bad idea, as the
> application consuming DPDK has no control over such allocation:
> linking some code does not mean that all of it will be used at runtime.
> 

The core issue isn't really the allocations themselves, or the lack of 
app-level control. Lcore variables are no different from the static 
arrays they are replacing, and existing ones are small enough to be 
pretty harmless.

Such innocent allocations trigger a larger (bulk) allocation, which is 
also harmless, unless you use locked memory.

I was clear on the impact on RSS for locked-memory type apps, but nobody 
pointed this out as an issue worth fixing, so I made no attempt in doing so.

In retrospect, maybe the offset between lcore variable instances could 
have been encoded into the handle, and thus one could use 
different-sized offset for different variables.

> The general question on whether lcore variables in constructor should
> be forbidden, is left to a later discussion.
> 

That discussion could be extended to cover the question if RTE_INIT() 
type constructors should be used at all. Intuitively, it seems better if 
all DPDK initialization, or at least all EAL init, happens at the time 
of rte_eal_init(), in some ordered/organized fashion.

> For now, this series only focus on fixing subsystems using lcore
> variables so that those allocations are deferred either in rte_eal_init()
> or in the path that does require such lcore variables.
> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] random: defer seeding to EAL init
  2024-12-05 17:57 ` [PATCH 1/3] random: defer seeding to EAL init David Marchand
@ 2024-12-06 11:09   ` Mattias Rönnblom
  2024-12-16  9:38   ` Burakov, Anatoly
  1 sibling, 0 replies; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-06 11:09 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable, Tyler Retzlaff,
	Bruce Richardson, Dmitry Kozlyuk, Morten Brørup,
	Konstantin Ananyev, Chengwen Feng, Stephen Hemminger

On 2024-12-05 18:57, David Marchand wrote:
> The RNG is documented as being seeded as part of EAL init.
> 
> /**
>   * Seed the pseudo-random generator.
>   *
>   * The generator is automatically seeded by the EAL init with a timer
>   * value. It may need to be re-seeded by the user with a real random
>   * value.
>   *
> ...
> 

Remove the superfluous quote.

> Move the initialisation (seeding) helper out of a constructor and
> call it explicitly from rte_eal_init() as it was done before commit
> 3f002f069612 ("eal: replace libc-based random generation with LFSR").
> 
> This also moves the unconditional lcore variable allocation out of a
> constructor.
> 
> While at it, mark local symbol rand_state as static.
> 

Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

> Fixes: 29c39cd3d54d ("random: keep PRNG state in lcore variable")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>   lib/eal/common/eal_private.h | 6 ++++++
>   lib/eal/common/rte_random.c  | 7 +++++--
>   lib/eal/freebsd/eal.c        | 2 ++
>   lib/eal/linux/eal.c          | 2 ++
>   lib/eal/windows/eal.c        | 2 ++
>   5 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
> index bb315dab04..a10db6a399 100644
> --- a/lib/eal/common/eal_private.h
> +++ b/lib/eal/common/eal_private.h
> @@ -702,6 +702,12 @@ eal_get_internal_configuration(void);
>   rte_usage_hook_t
>   eal_get_application_usage_hook(void);
>   
> +/**
> + * Initialise random subsystem.
> + */
> +void
> +eal_rand_init(void);
> +
>   /**
>    * Instruct primary process that a secondary process wants to attach.
>    */
> diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c
> index cf0756f26a..307c26bb7c 100644
> --- a/lib/eal/common/rte_random.c
> +++ b/lib/eal/common/rte_random.c
> @@ -14,6 +14,8 @@
>   #include <rte_lcore_var.h>
>   #include <rte_random.h>
>   
> +#include "eal_private.h"
> +
>   struct __rte_cache_aligned rte_rand_state {
>   	uint64_t z1;
>   	uint64_t z2;
> @@ -22,7 +24,7 @@ struct __rte_cache_aligned rte_rand_state {
>   	uint64_t z5;
>   };
>   
> -RTE_LCORE_VAR_HANDLE(struct rte_rand_state, rand_state);
> +static RTE_LCORE_VAR_HANDLE(struct rte_rand_state, rand_state);
>   
>   /* instance to be shared by all unregistered non-EAL threads */
>   static struct rte_rand_state unregistered_rand_state;
> @@ -228,7 +230,8 @@ __rte_random_initial_seed(void)
>   	return rte_get_tsc_cycles();
>   }
>   
> -RTE_INIT(rte_rand_init)
> +void
> +eal_rand_init(void)
>   {
>   	uint64_t seed;
>   
> diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
> index a96bbf5836..d07cff8651 100644
> --- a/lib/eal/freebsd/eal.c
> +++ b/lib/eal/freebsd/eal.c
> @@ -777,6 +777,8 @@ rte_eal_init(int argc, char **argv)
>   		return -1;
>   	}
>   
> +	eal_rand_init();
> +
>   	eal_check_mem_on_local_socket();
>   
>   	if (rte_thread_set_affinity_by_id(rte_thread_self(),
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> index a6220524a4..b1e63e37fc 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -1173,6 +1173,8 @@ rte_eal_init(int argc, char **argv)
>   		return -1;
>   	}
>   
> +	eal_rand_init();
> +
>   	eal_check_mem_on_local_socket();
>   
>   	if (rte_thread_set_affinity_by_id(rte_thread_self(),
> diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
> index 5cdc053a02..5c7526f922 100644
> --- a/lib/eal/windows/eal.c
> +++ b/lib/eal/windows/eal.c
> @@ -410,6 +410,8 @@ rte_eal_init(int argc, char **argv)
>   		return -1;
>   	}
>   
> +	eal_rand_init();
> +
>   	if (rte_thread_set_affinity_by_id(rte_thread_self(),
>   			&lcore_config[config->main_lcore].cpuset) != 0) {
>   		rte_eal_init_alert("Cannot set affinity");


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] power: defer lcore variable allocation
  2024-12-05 17:57 ` [PATCH 2/3] power: defer lcore variable allocation David Marchand
@ 2024-12-06 11:29   ` Mattias Rönnblom
  2024-12-12  7:57     ` David Marchand
  0 siblings, 1 reply; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-06 11:29 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable, Anatoly Burakov,
	David Hunt, Sivaprasad Tummala, Stephen Hemminger, Chengwen Feng,
	Konstantin Ananyev, Morten Brørup

On 2024-12-05 18:57, David Marchand wrote:
> The lcore variable in this code unit is only used through
> rte_power_ethdev_pmgmt_queue_*() public symbols.
> 
> Defer the unconditional lcore variable allocation in those symbols.
> 
> Fixes: 130643319579 ("power: keep per-lcore state in lcore variable")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>   lib/power/rte_power_pmd_mgmt.c | 27 +++++++++++++++++++--------
>   1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> index a2fff3b765..29e2d438a3 100644
> --- a/lib/power/rte_power_pmd_mgmt.c
> +++ b/lib/power/rte_power_pmd_mgmt.c
> @@ -72,6 +72,19 @@ struct __rte_cache_aligned pmd_core_cfg {
>   };
>   static RTE_LCORE_VAR_HANDLE(struct pmd_core_cfg, lcore_cfgs);
>   
> +static void
> +alloc_lcore_cfgs(void)
> +{
> +	struct pmd_core_cfg *lcore_cfg;
> +	unsigned int lcore_id;
> +
> +	RTE_LCORE_VAR_ALLOC(lcore_cfgs);
> +
> +	/* initialize all tailqs */
> +	RTE_LCORE_VAR_FOREACH(lcore_id, lcore_cfg, lcore_cfgs)
> +		TAILQ_INIT(&lcore_cfg->head);
> +}
> +
>   static inline bool
>   queue_equal(const union queue *l, const union queue *r)
>   {
> @@ -517,6 +530,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
>   		goto end;
>   	}
>   
> +	if (lcore_cfgs == NULL)
> +		alloc_lcore_cfgs();
> +


I would wrap all RTE_LCORE_VAR_LCORE() and RTE_LCORE_VAR().

static struct pmd_core_cfg *
get_cfg_lcore(unsigned int lcore_id)
{
	assure_lcore_cfgs_alloced();
	return RTE_LCORE_VAR_LCORE(lcore_cfgs, lcore_id);
}

static struct pmd_core_cfg *
get_cfg(void)
{
	get_cfg_lcore(rte_lcore_id());
}

Add

static void
assure_lcore_cfgs_alloced(unsigned int lcore_id)
{
	if (lcore_cfgs != NULL)
		lcore_cfgs_alloc();
}

..or maybe better merge assure_lcore_cfgs_alloced() and lcore_cfgs_alloc().

Makes it a little harder to make mistakes.

A somewhat unrelated question: why is pmd_core_cfg cache-line aligned? I 
don't think it should be.

>   	lcore_cfg = RTE_LCORE_VAR_LCORE(lcore_id, lcore_cfgs);
>   
>   	/* check if other queues are stopped as well */
> @@ -617,6 +633,9 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
>   		return ret < 0 ? -EINVAL : -EBUSY;
>   	}
>   
> +	if (lcore_cfgs == NULL)
> +		alloc_lcore_cfgs();
> +
>   	/* no need to check queue id as wrong queue id would not be enabled */
>   	lcore_cfg = RTE_LCORE_VAR_LCORE(lcore_id, lcore_cfgs);
>   
> @@ -768,16 +787,8 @@ rte_power_pmd_mgmt_get_scaling_freq_max(unsigned int lcore)
>   }
>   
>   RTE_INIT(rte_power_ethdev_pmgmt_init) {
> -	unsigned int lcore_id;
> -	struct pmd_core_cfg *lcore_cfg;
>   	int i;
>   
> -	RTE_LCORE_VAR_ALLOC(lcore_cfgs);
> -
> -	/* initialize all tailqs */
> -	RTE_LCORE_VAR_FOREACH(lcore_id, lcore_cfg, lcore_cfgs)
> -		TAILQ_INIT(&lcore_cfg->head);
> -
>   	/* initialize config defaults */
>   	emptypoll_max = 512;
>   	pause_duration = 1;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/3] eal/x86: defer power intrinsics variable allocation
  2024-12-05 17:57 ` [PATCH 3/3] eal/x86: defer power intrinsics " David Marchand
@ 2024-12-06 11:32   ` Mattias Rönnblom
  0 siblings, 0 replies; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-06 11:32 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable,
	Bruce Richardson, Konstantin Ananyev, Morten Brørup,
	Chengwen Feng, Stephen Hemminger

On 2024-12-05 18:57, David Marchand wrote:
> The lcore variable in this code unit is only used through
> rte_power_monitor*() public symbols.
> 
> Defer the unconditional lcore variable allocation in those symbols.
> 
> Fixes: 18b5049ab4fe ("eal/x86: keep power intrinsics state in lcore variable")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>   lib/eal/x86/rte_power_intrinsics.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
> index e4cb913590..6fc3b2b1cb 100644
> --- a/lib/eal/x86/rte_power_intrinsics.c
> +++ b/lib/eal/x86/rte_power_intrinsics.c
> @@ -22,8 +22,6 @@ struct power_wait_status {
>   
>   RTE_LCORE_VAR_HANDLE(struct power_wait_status, wait_status);
>   
> -RTE_LCORE_VAR_INIT(wait_status);
> -
>   /*
>    * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state.
>    * For more information about usage of these instructions, please refer to
> @@ -177,6 +175,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
>   	if (pmc->fn == NULL)
>   		return -EINVAL;
>   
> +	if (wait_status == NULL)
> +		RTE_LCORE_VAR_ALLOC(wait_status);
> +
>   	s = RTE_LCORE_VAR_LCORE(lcore_id, wait_status);
>   
>   	/* update sleep address */
> @@ -269,6 +270,9 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
>   	if (lcore_id >= RTE_MAX_LCORE)
>   		return -EINVAL;
>   
> +	if (wait_status == NULL)
> +		RTE_LCORE_VAR_ALLOC(wait_status);
> +
>   	s = RTE_LCORE_VAR_LCORE(lcore_id, wait_status);
>   
>   	/*
> @@ -308,7 +312,7 @@ int
>   rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
>   		const uint32_t num, const uint64_t tsc_timestamp)
>   {
> -	struct power_wait_status *s = RTE_LCORE_VAR(wait_status);
> +	struct power_wait_status *s;
>   	uint32_t i, rc;
>   
>   	/* check if supported */
> @@ -318,6 +322,11 @@ rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
>   	if (pmc == NULL || num == 0)
>   		return -EINVAL;
>   
> +	if (wait_status == NULL)
> +		RTE_LCORE_VAR_ALLOC(wait_status);
> +
> +	s = RTE_LCORE_VAR(wait_status);
> +
>   	/* we are already inside transaction region, return */
>   	if (rte_xtest() != 0)
>   		return 0;

The same kind of structure I suggested for the lib/power patch should be 
employed here.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-06 11:01 ` [PATCH 0/3] Defer lcore variables allocation Mattias Rönnblom
@ 2024-12-06 15:55   ` Thomas Monjalon
  2024-12-10 17:09     ` Stephen Hemminger
  2024-12-09 11:03   ` David Marchand
  1 sibling, 1 reply; 36+ messages in thread
From: Thomas Monjalon @ 2024-12-06 15:55 UTC (permalink / raw)
  To: David Marchand, Mattias Rönnblom
  Cc: dev, frode.nordahl, mattias.ronnblom

06/12/2024 12:01, Mattias Rönnblom:
> On 2024-12-05 18:57, David Marchand wrote:
> In retrospect, maybe the offset between lcore variable instances could 
> have been encoded into the handle, and thus one could use 
> different-sized offset for different variables.

Yes it would allow to allocate a minimum size,
instead of having a default which is also a maximum limit size of an object.

It is not too late to change the behavior as the API is experimental.

> > The general question on whether lcore variables in constructor should
> > be forbidden, is left to a later discussion.
> 
> That discussion could be extended to cover the question if RTE_INIT() 
> type constructors should be used at all. Intuitively, it seems better if 
> all DPDK initialization, or at least all EAL init, happens at the time 
> of rte_eal_init(), in some ordered/organized fashion.

Yes we may avoid constructors and instead have callbacks called in rte_eal_init().
In order to not break the RTE_INIT API, we could define some new macros
for registering such rte_eal_init callbacks.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-06 11:01 ` [PATCH 0/3] Defer lcore variables allocation Mattias Rönnblom
  2024-12-06 15:55   ` Thomas Monjalon
@ 2024-12-09 11:03   ` David Marchand
  2024-12-09 15:39     ` Mattias Rönnblom
  1 sibling, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-09 11:03 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, thomas, frode.nordahl, mattias.ronnblom

Hello,

On Fri, Dec 6, 2024 at 12:02 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>
> On 2024-12-05 18:57, David Marchand wrote:
> > As I had reported in rc2, the lcore variables allocation have a
> > noticeable impact on applications consuming DPDK, even when such
> > applications does not use DPDK, or use features associated to
> > some lcore variables.
> >
> > While the amount has been reduced in a rush before rc2,
> > there are still cases when the increased memory footprint is noticed
> > like in scaling tests.
> > See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
> >
>
> What this bug report fails to mention is that it only affects
> applications using locked memory.

- By locked memory, are you referring to mlock() and friends?
No ovsdb binary calls them, only the datapath cares about mlocking.


- At a minimum, I understand the lcore var change introduced an
increase in memory of 4kB * 128 (getpagesize() * RTE_MAX_LCORES),
since lcore_var_alloc() calls memset() of the lcore var size, for
every lcore.

In this unit test where 1000 processes are kept alive in parallel,
this means memory consumption increased by 512k * 1000, so ~500M at
least.
This amount of memory is probably significant in a resource-restrained
env like a (Ubuntu) CI.


- I went and traced this unit tests on my laptop by monitoring
kmem:mm_page_alloc, though there may be a better metrics when it comes
to memory consumption.

# dir=build; perf stat -e kmem:mm_page_alloc -- tests/testsuite -C
$dir/tests AUTOTEST_PATH=$dir/utilities:$dir/vswitchd:$dir/ovsdb:$dir/vtep:$dir/tests:$dir/ipsec::
2154

Which gives:
- 1 635 489      kmem:mm_page_alloc for v23.11
- 5 777 043      kmem:mm_page_alloc for v24.11

There is a 4M difference, where I would expect 128k.
So something more happens, than a simple page allocation per lcore,
though I fail to understand what.


Btw, just focusing on lcore var, I did two more tests:
- 1 606 998      kmem:mm_page_alloc for v24.11 + revert all lcore var changes.
- 1 634 606      kmem:mm_page_alloc for v24.11 + current series with
postponed allocations.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-09 11:03   ` David Marchand
@ 2024-12-09 15:39     ` Mattias Rönnblom
  2024-12-09 17:40       ` David Marchand
  0 siblings, 1 reply; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-09 15:39 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, frode.nordahl, mattias.ronnblom

On 2024-12-09 12:03, David Marchand wrote:
> Hello,
> 
> On Fri, Dec 6, 2024 at 12:02 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>>
>> On 2024-12-05 18:57, David Marchand wrote:
>>> As I had reported in rc2, the lcore variables allocation have a
>>> noticeable impact on applications consuming DPDK, even when such
>>> applications does not use DPDK, or use features associated to
>>> some lcore variables.
>>>
>>> While the amount has been reduced in a rush before rc2,
>>> there are still cases when the increased memory footprint is noticed
>>> like in scaling tests.
>>> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
>>>
>>
>> What this bug report fails to mention is that it only affects
>> applications using locked memory.
> 
> - By locked memory, are you referring to mlock() and friends?
> No ovsdb binary calls them, only the datapath cares about mlocking.
> 
> 
> - At a minimum, I understand the lcore var change introduced an
> increase in memory of 4kB * 128 (getpagesize() * RTE_MAX_LCORES),
> since lcore_var_alloc() calls memset() of the lcore var size, for
> every lcore.
> 

Yes, that is my understanding. It's also consistent with the 
measurements I've posted on this list.

> In this unit test where 1000 processes are kept alive in parallel,
> this means memory consumption increased by 512k * 1000, so ~500M at
> least.
> This amount of memory is probably significant in a resource-restrained
> env like a (Ubuntu) CI.
> 
> 

I wouldn't expect thousands of concurrent processes in a 
resource-constrained system. Sounds wasteful indeed. But sure, there may 
well be scenarios where this make sense.

> - I went and traced this unit tests on my laptop by monitoring
> kmem:mm_page_alloc, though there may be a better metrics when it comes
> to memory consumption.
> 
> # dir=build; perf stat -e kmem:mm_page_alloc -- tests/testsuite -C
> $dir/tests AUTOTEST_PATH=$dir/utilities:$dir/vswitchd:$dir/ovsdb:$dir/vtep:$dir/tests:$dir/ipsec::
> 2154
> 
> Which gives:
> - 1 635 489      kmem:mm_page_alloc for v23.11
> - 5 777 043      kmem:mm_page_alloc for v24.11
> 

Interesting. What is vm.overcommit_memory set to?

How much does process RSS differ between the two version?

This is what I get with a hello world type test program, which links to 
DPDK, but have not yet called rte_eal_init():

DPDK Version  Linking  RSS [kB]
22.11         Static   8448
23.11         Static   8192
9bbd44d638*   Static   8192
18b5049ab4**  Static   8704
24.11         Static   9472

22.11         Dynamic  2304
9bbd44d638*   Dynamic  2304
18b5049ab4**  Dynamic  2816
24.11         Dynamic  2816

No DPDK       -        1024

* 24.11rc commit before lcore variables patch set.
* Last commit in the initial lcore variables patch set.

Ubuntu 24.04 w/ all defaults. --no-huge was used.

In the dynamic case, you see the expected ~0.5 MB increase in foot print 
for yet-to-be-used lcore variables.

In the static case, you also see a ~0.5 MB bump with the lcore 
variables, and then another ~0.5 MB bump (for some other reason) until 
24.11.

(The dynamic hello world program only ends up being dynamically linked 
to very few of the DPDK shared objects, so ymmv).

> There is a 4M difference, where I would expect 128k.
> So something more happens, than a simple page allocation per lcore,
> though I fail to understand what.
> 
> 
> Btw, just focusing on lcore var, I did two more tests:
> - 1 606 998      kmem:mm_page_alloc for v24.11 + revert all lcore var changes.
> - 1 634 606      kmem:mm_page_alloc for v24.11 + current series with
> postponed allocations.
> 
> 

If one move initialization to shared object constructors (from having 
been at some later time), and then end up not running that 
initialization code at all (e.g., DPDK is not used), those code pages 
will increase RSS. That might well hurt more than the lcore variable 
memory itself, depending on how much code is run.

However, such read-only pages can be replaced with something more useful 
if the system is under memory pressure, so they aren't really a big 
issue as far as (real) memory footprint is concerned.

Just linking to DPDK (and its dependencies) already came with a 1-7 MB 
RSS penalty, prior to lcore variables. I wonder how much of that goes 
away if all RTE_INIT() type constructors are removed.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-09 15:39     ` Mattias Rönnblom
@ 2024-12-09 17:40       ` David Marchand
  2024-12-10  9:41         ` Mattias Rönnblom
  0 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-09 17:40 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, thomas, frode.nordahl, mattias.ronnblom

On Mon, Dec 9, 2024 at 4:39 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> On 2024-12-09 12:03, David Marchand wrote:
> > On Fri, Dec 6, 2024 at 12:02 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> >> On 2024-12-05 18:57, David Marchand wrote:
> >>> As I had reported in rc2, the lcore variables allocation have a
> >>> noticeable impact on applications consuming DPDK, even when such
> >>> applications does not use DPDK, or use features associated to
> >>> some lcore variables.
> >>>
> >>> While the amount has been reduced in a rush before rc2,
> >>> there are still cases when the increased memory footprint is noticed
> >>> like in scaling tests.
> >>> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
> >>>
> >>
> >> What this bug report fails to mention is that it only affects
> >> applications using locked memory.
> >
> > - By locked memory, are you referring to mlock() and friends?
> > No ovsdb binary calls them, only the datapath cares about mlocking.
> >
> >
> > - At a minimum, I understand the lcore var change introduced an
> > increase in memory of 4kB * 128 (getpagesize() * RTE_MAX_LCORES),
> > since lcore_var_alloc() calls memset() of the lcore var size, for
> > every lcore.
> >
>
> Yes, that is my understanding. It's also consistent with the
> measurements I've posted on this list.
>
> > In this unit test where 1000 processes are kept alive in parallel,
> > this means memory consumption increased by 512k * 1000, so ~500M at
> > least.
> > This amount of memory is probably significant in a resource-restrained
> > env like a (Ubuntu) CI.
> >
> >
>
> I wouldn't expect thousands of concurrent processes in a
> resource-constrained system. Sounds wasteful indeed. But sure, there may
> well be scenarios where this make sense.
>
> > - I went and traced this unit tests on my laptop by monitoring
> > kmem:mm_page_alloc, though there may be a better metrics when it comes
> > to memory consumption.
> >
> > # dir=build; perf stat -e kmem:mm_page_alloc -- tests/testsuite -C
> > $dir/tests AUTOTEST_PATH=$dir/utilities:$dir/vswitchd:$dir/ovsdb:$dir/vtep:$dir/tests:$dir/ipsec::
> > 2154
> >
> > Which gives:
> > - 1 635 489      kmem:mm_page_alloc for v23.11
> > - 5 777 043      kmem:mm_page_alloc for v24.11
> >
>
> Interesting. What is vm.overcommit_memory set to?

# cat /proc/sys/vm/overcommit_memory
0

And I am not sure what is being used in Ubuntu CI.

But the problem is, in the end, simpler.

[snip]

>
> > There is a 4M difference, where I would expect 128k.
> > So something more happens, than a simple page allocation per lcore,
> > though I fail to understand what.

Isolating the perf events for one process of this huge test, I counted
4878 page alloc calls.
From them, 4108 had rte_lcore_var_alloc in their calling stack which
is unexpected.

After spending some time reading glibc, I noticed alloc_perturb().
*bingo*, I remembered that OVS unit tests are run with MALLOC_PERTURB_
(=165 after double checking OVS sources).

"""
Tunable: glibc.malloc.perturb

This tunable supersedes the MALLOC_PERTURB_ environment variable and
is identical in features.

If set to a non-zero value, memory blocks are initialized with values
depending on some low order bits of this tunable when they are
allocated (except when allocated by calloc) and freed. This can be
used to debug the use of uninitialized or freed heap memory. Note that
this option does not guarantee that the freed block will have any
specific values. It only guarantees that the content the block had
before it was freed will be overwritten.

The default value of this tunable is ‘0’.
"""

Now, reproducing this out of the test:

$ perf stat -e kmem:mm_page_alloc -- ./build/ovsdb/ovsdb-client --help
>/dev/null
 Performance counter stats for './build/ovsdb/ovsdb-client --help':
               810      kmem:mm_page_alloc
       0,003277941 seconds time elapsed
       0,003260000 seconds user
       0,000000000 seconds sys

$ MALLOC_PERTURB_=165 perf stat -e kmem:mm_page_alloc --
./build/ovsdb/ovsdb-client --help >/dev/null
 Performance counter stats for './build/ovsdb/ovsdb-client --help':
             4 789      kmem:mm_page_alloc
       0,008766171 seconds time elapsed
       0,000976000 seconds user
       0,007794000 seconds sys

So the issue is not triggered by mlock'd memory, but by the whole
buffer of 16M for lcore variables being touched by a glibc debugging
feature.

And in Ubuntu CI, it translated to requesting 16G.

> >
> >
> > Btw, just focusing on lcore var, I did two more tests:
> > - 1 606 998      kmem:mm_page_alloc for v24.11 + revert all lcore var changes.
> > - 1 634 606      kmem:mm_page_alloc for v24.11 + current series with
> > postponed allocations.
> >
> >
>
> If one move initialization to shared object constructors (from having
> been at some later time), and then end up not running that
> initialization code at all (e.g., DPDK is not used), those code pages
> will increase RSS. That might well hurt more than the lcore variable
> memory itself, depending on how much code is run.
>
> However, such read-only pages can be replaced with something more useful
> if the system is under memory pressure, so they aren't really a big
> issue as far as (real) memory footprint is concerned.
>
> Just linking to DPDK (and its dependencies) already came with a 1-7 MB
> RSS penalty, prior to lcore variables. I wonder how much of that goes
> away if all RTE_INIT() type constructors are removed.

Regardless of the RSS change, removing completely constructors is not simple.
Postponing *all* existing constructors from DPDK code would be an ABI
breakage, as RTE_INIT have a priority notion and an application
callbacks using RTE_INIT may rely on this.
Just deferring "unprioritised" constructors would be doable on paper,
but the location in rte_eal_init where those are is deferred would
have to be carefully evaluated (with -d plugins in mind).


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-09 17:40       ` David Marchand
@ 2024-12-10  9:41         ` Mattias Rönnblom
  2024-12-16 10:01           ` Burakov, Anatoly
  0 siblings, 1 reply; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-10  9:41 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, frode.nordahl, mattias.ronnblom

On 2024-12-09 18:40, David Marchand wrote:
> On Mon, Dec 9, 2024 at 4:39 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>> On 2024-12-09 12:03, David Marchand wrote:
>>> On Fri, Dec 6, 2024 at 12:02 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>>>> On 2024-12-05 18:57, David Marchand wrote:
>>>>> As I had reported in rc2, the lcore variables allocation have a
>>>>> noticeable impact on applications consuming DPDK, even when such
>>>>> applications does not use DPDK, or use features associated to
>>>>> some lcore variables.
>>>>>
>>>>> While the amount has been reduced in a rush before rc2,
>>>>> there are still cases when the increased memory footprint is noticed
>>>>> like in scaling tests.
>>>>> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
>>>>>
>>>>
>>>> What this bug report fails to mention is that it only affects
>>>> applications using locked memory.
>>>
>>> - By locked memory, are you referring to mlock() and friends?
>>> No ovsdb binary calls them, only the datapath cares about mlocking.
>>>
>>>
>>> - At a minimum, I understand the lcore var change introduced an
>>> increase in memory of 4kB * 128 (getpagesize() * RTE_MAX_LCORES),
>>> since lcore_var_alloc() calls memset() of the lcore var size, for
>>> every lcore.
>>>
>>
>> Yes, that is my understanding. It's also consistent with the
>> measurements I've posted on this list.
>>
>>> In this unit test where 1000 processes are kept alive in parallel,
>>> this means memory consumption increased by 512k * 1000, so ~500M at
>>> least.
>>> This amount of memory is probably significant in a resource-restrained
>>> env like a (Ubuntu) CI.
>>>
>>>
>>
>> I wouldn't expect thousands of concurrent processes in a
>> resource-constrained system. Sounds wasteful indeed. But sure, there may
>> well be scenarios where this make sense.
>>
>>> - I went and traced this unit tests on my laptop by monitoring
>>> kmem:mm_page_alloc, though there may be a better metrics when it comes
>>> to memory consumption.
>>>
>>> # dir=build; perf stat -e kmem:mm_page_alloc -- tests/testsuite -C
>>> $dir/tests AUTOTEST_PATH=$dir/utilities:$dir/vswitchd:$dir/ovsdb:$dir/vtep:$dir/tests:$dir/ipsec::
>>> 2154
>>>
>>> Which gives:
>>> - 1 635 489      kmem:mm_page_alloc for v23.11
>>> - 5 777 043      kmem:mm_page_alloc for v24.11
>>>
>>
>> Interesting. What is vm.overcommit_memory set to?
> 
> # cat /proc/sys/vm/overcommit_memory
> 0
> 
> And I am not sure what is being used in Ubuntu CI.
> 
> But the problem is, in the end, simpler.
> 
> [snip]
> 
>>
>>> There is a 4M difference, where I would expect 128k.
>>> So something more happens, than a simple page allocation per lcore,
>>> though I fail to understand what.
> 
> Isolating the perf events for one process of this huge test, I counted
> 4878 page alloc calls.
>  From them, 4108 had rte_lcore_var_alloc in their calling stack which
> is unexpected.
> 
> After spending some time reading glibc, I noticed alloc_perturb().
> *bingo*, I remembered that OVS unit tests are run with MALLOC_PERTURB_
> (=165 after double checking OVS sources).
> 
> """
> Tunable: glibc.malloc.perturb
> 
> This tunable supersedes the MALLOC_PERTURB_ environment variable and
> is identical in features.
> 
> If set to a non-zero value, memory blocks are initialized with values
> depending on some low order bits of this tunable when they are
> allocated (except when allocated by calloc) and freed. This can be
> used to debug the use of uninitialized or freed heap memory. Note that
> this option does not guarantee that the freed block will have any
> specific values. It only guarantees that the content the block had
> before it was freed will be overwritten.
> 
> The default value of this tunable is ‘0’.
> """
> 

OK, excellent work, detective. :)

Do have a workaround for this issue, so that this test suite will work 
with vanilla DPDK 24.11? I guess OVS wants to keep the PERTURB settings.

The fix you've suggested will solve this issue for the no-DPDK-usage 
case. I'm guessing allocating the first lcore var block off of the BSS 
(e.g., via a static variable) would as well, in addition to solving 
similar cases but where there is "light" DPDK usage (i.e., 
rte_eal_init() is called, but with no real app).

> Now, reproducing this out of the test:
> 
> $ perf stat -e kmem:mm_page_alloc -- ./build/ovsdb/ovsdb-client --help
>> /dev/null
>   Performance counter stats for './build/ovsdb/ovsdb-client --help':
>                 810      kmem:mm_page_alloc
>         0,003277941 seconds time elapsed
>         0,003260000 seconds user
>         0,000000000 seconds sys
> 
> $ MALLOC_PERTURB_=165 perf stat -e kmem:mm_page_alloc --
> ./build/ovsdb/ovsdb-client --help >/dev/null
>   Performance counter stats for './build/ovsdb/ovsdb-client --help':
>               4 789      kmem:mm_page_alloc
>         0,008766171 seconds time elapsed
>         0,000976000 seconds user
>         0,007794000 seconds sys
> 
> So the issue is not triggered by mlock'd memory, but by the whole
> buffer of 16M for lcore variables being touched by a glibc debugging
> feature.
> > And in Ubuntu CI, it translated to requesting 16G.
> 
>>>
>>>
>>> Btw, just focusing on lcore var, I did two more tests:
>>> - 1 606 998      kmem:mm_page_alloc for v24.11 + revert all lcore var changes.
>>> - 1 634 606      kmem:mm_page_alloc for v24.11 + current series with
>>> postponed allocations.
>>>
>>>
>>
>> If one move initialization to shared object constructors (from having
>> been at some later time), and then end up not running that
>> initialization code at all (e.g., DPDK is not used), those code pages
>> will increase RSS. That might well hurt more than the lcore variable
>> memory itself, depending on how much code is run.
>>
>> However, such read-only pages can be replaced with something more useful
>> if the system is under memory pressure, so they aren't really a big
>> issue as far as (real) memory footprint is concerned.
>>
>> Just linking to DPDK (and its dependencies) already came with a 1-7 MB
>> RSS penalty, prior to lcore variables. I wonder how much of that goes
>> away if all RTE_INIT() type constructors are removed.
> 
> Regardless of the RSS change, removing completely constructors is not simple.
> Postponing *all* existing constructors from DPDK code would be an ABI
> breakage, as RTE_INIT have a priority notion and an application
> callbacks using RTE_INIT may rely on this.

Agreed.

> Just deferring "unprioritised" constructors would be doable on paper,
> but the location in rte_eal_init where those are is deferred would
> have to be carefully evaluated (with -d plugins in mind).
> 
> 

It seems to me that a reworking of this area should have a bigger scope 
than just addressing this issue.

RTE_INIT() should probably be deprecated, and DPDK shouldn't encourage 
the use of shared-object level constructors.

For dynamically loaded modules (-d), there needs to be some kind of 
replacement, serving the same function.

There should probably be some way to hook into the initialization 
process (available also for apps), which should all happen at 
rte_eal_init() (or later).

Does the priority concept make sense? At least conceptually, the 
initialization should be based off a dependency graph (DAG).

You could reduce the priorities to a number of named stages (just like 
in FreeBSD or Linux). A minor tweak to the current model. However, in 
DPDK, it would be useful if a generic facility could be used by apps, 
and thus the number and names of the stages are open ended (unlike the 
UNIX kernels').

You could rely on explicit initialization alone, where each module 
initializes it's dependencies. That would lead to repeated init function 
calls on the same module, unless there's some init framework help from 
EAL to prevent that. Overall, that would lead to more code, where 
various higher-level modules needs to initialize many dependencies.

Maybe the DAG is available on the build (meson) level, and thus the code 
can be generated out of that?

Some random thoughts.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-06 15:55   ` Thomas Monjalon
@ 2024-12-10 17:09     ` Stephen Hemminger
  0 siblings, 0 replies; 36+ messages in thread
From: Stephen Hemminger @ 2024-12-10 17:09 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: David Marchand, Mattias Rönnblom, dev, frode.nordahl,
	mattias.ronnblom

On Fri, 06 Dec 2024 16:55:30 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> 06/12/2024 12:01, Mattias Rönnblom:
> > On 2024-12-05 18:57, David Marchand wrote:
> > In retrospect, maybe the offset between lcore variable instances could 
> > have been encoded into the handle, and thus one could use 
> > different-sized offset for different variables.  
> 
> Yes it would allow to allocate a minimum size,
> instead of having a default which is also a maximum limit size of an object.
> 
> It is not too late to change the behavior as the API is experimental.
> 
> > > The general question on whether lcore variables in constructor should
> > > be forbidden, is left to a later discussion.  
> > 
> > That discussion could be extended to cover the question if RTE_INIT() 
> > type constructors should be used at all. Intuitively, it seems better if 
> > all DPDK initialization, or at least all EAL init, happens at the time 
> > of rte_eal_init(), in some ordered/organized fashion.  
> 
> Yes we may avoid constructors and instead have callbacks called in rte_eal_init().
> In order to not break the RTE_INIT API, we could define some new macros
> for registering such rte_eal_init callbacks.
> 
> 

My intuition is that the OVS problem with using mlockall() is caused because when
malloc is used, the malloc code will pre-allocate a new arena (memory area) for use.
If the malloc takes before the mlockall() it will then be pinned even if not used.
If the malloc takes place later, perhaps that arena is coming from unpinned area.
Many more details on glibc malloc here:
   https://sourceware.org/glibc/wiki/MallocInternals

Using blunt tool like mlockall() will have unintended side effects.

The issue with constructors, is they look good when they are simple, statless,
and only a few of them. But they get to be a undebuggable mess when the constructor
does complex stuff; has dependencies; and there are lots of them.

As a refinement, maybe having a way to register callback to be called in parallel
after EAL has started threads.  But some things like random() need to be available
early in startup.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] power: defer lcore variable allocation
  2024-12-06 11:29   ` Mattias Rönnblom
@ 2024-12-12  7:57     ` David Marchand
  2024-12-13  6:58       ` Mattias Rönnblom
  0 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-12  7:57 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, thomas, frode.nordahl, mattias.ronnblom, stable,
	Anatoly Burakov, David Hunt, Sivaprasad Tummala,
	Stephen Hemminger, Chengwen Feng, Konstantin Ananyev,
	Morten Brørup

On Fri, Dec 6, 2024 at 12:29 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> I would wrap all RTE_LCORE_VAR_LCORE() and RTE_LCORE_VAR().
>
> static struct pmd_core_cfg *
> get_cfg_lcore(unsigned int lcore_id)
> {
>         assure_lcore_cfgs_alloced();
>         return RTE_LCORE_VAR_LCORE(lcore_cfgs, lcore_id);
> }
>
> static struct pmd_core_cfg *
> get_cfg(void)
> {
>         get_cfg_lcore(rte_lcore_id());
> }
>
> Add
>
> static void
> assure_lcore_cfgs_alloced(unsigned int lcore_id)
> {
>         if (lcore_cfgs != NULL)

==

>                 lcore_cfgs_alloc();
> }
>
> ..or maybe better merge assure_lcore_cfgs_alloced() and lcore_cfgs_alloc().
>
> Makes it a little harder to make mistakes.

clb_multiwait, clb_pause and clb_scale_freq callbacks can only be
reached after a successful call to
rte_power_ethdev_pmgmt_queue_enable.
Triggering an allocation in them means we are hiding a (internal)
programatic error as allocation and use of a lcore variable are
clearly separated atm.
If we keep the lcore var api as is, I would add an assert() (maybe
under a debug build option) in RTE_LCORE_VAR macros themselves, as
calling with a NULL handle means the initialisation path in some
code/RTE_LCORE_VAR API use was incorrect.


Or because you propose to add the same type of helpers in both this
patch and the next, I am considering the other way: hide the
allocation in the RTE_LCORE_VAR* macros.
Checking for already allocated var in RTE_LCORE_VAR_ALLOC seems fine.
But the "fast path" RTE_LCORE_VAR would have an unwanted branch in most cases.


> A somewhat unrelated question: why is pmd_core_cfg cache-line aligned? I
> don't think it should be.

Before the conversion to per lcore variable, it was probably useful
(avoiding false sharing).
With the conversion, indeed, it looks like a waste of space.
It seems worth a separate fix.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] power: defer lcore variable allocation
  2024-12-12  7:57     ` David Marchand
@ 2024-12-13  6:58       ` Mattias Rönnblom
  2024-12-16 10:02         ` David Marchand
  0 siblings, 1 reply; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-13  6:58 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, thomas, frode.nordahl, mattias.ronnblom, stable,
	Anatoly Burakov, David Hunt, Sivaprasad Tummala,
	Stephen Hemminger, Chengwen Feng, Konstantin Ananyev,
	Morten Brørup

On 2024-12-12 08:57, David Marchand wrote:
> On Fri, Dec 6, 2024 at 12:29 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>> I would wrap all RTE_LCORE_VAR_LCORE() and RTE_LCORE_VAR().
>>
>> static struct pmd_core_cfg *
>> get_cfg_lcore(unsigned int lcore_id)
>> {
>>          assure_lcore_cfgs_alloced();
>>          return RTE_LCORE_VAR_LCORE(lcore_cfgs, lcore_id);
>> }
>>
>> static struct pmd_core_cfg *
>> get_cfg(void)
>> {
>>          get_cfg_lcore(rte_lcore_id());
>> }
>>
>> Add
>>
>> static void
>> assure_lcore_cfgs_alloced(unsigned int lcore_id)
>> {
>>          if (lcore_cfgs != NULL)
> 
> ==
> 

Oops.

>>                  lcore_cfgs_alloc();
>> }
>>
>> ..or maybe better merge assure_lcore_cfgs_alloced() and lcore_cfgs_alloc().
>>
>> Makes it a little harder to make mistakes.
> 
> clb_multiwait, clb_pause and clb_scale_freq callbacks can only be
> reached after a successful call to
> rte_power_ethdev_pmgmt_queue_enable.
> Triggering an allocation in them means we are hiding a (internal)
> programatic error as allocation and use of a lcore variable are
> clearly separated atm.
> If we keep the lcore var api as is, I would add an assert() (maybe
> under a debug build option) in RTE_LCORE_VAR macros themselves, as
> calling with a NULL handle means the initialisation path in some
> code/RTE_LCORE_VAR API use was incorrect.
> 

Sure, that would make sense. RTE_ASSERT(), that is. RTE_VERIFY() would 
be too expensive.

> 
> Or because you propose to add the same type of helpers in both this
> patch and the next, I am considering the other way: hide the
> allocation in the RTE_LCORE_VAR* macros.
> Checking for already allocated var in RTE_LCORE_VAR_ALLOC seems fine.
> But the "fast path" RTE_LCORE_VAR would have an unwanted branch in most cases.
> 

I would prefer to have the ALLOC() macro with semantics most people 
expect from a macro (or function) with that name, which is, I would 
argue, an unconditional allocation.

It would make sense to have another macro, which performs an allocation 
only if the handle is NULL.

RTE_LCORE_VAR_ASSURE_ALLOCATED(), or just RTE_LCORE_VAR_ASSURE() 
(although the latter sounds a little like an assertion, and not an 
allocation).

RTE_LCORE_VAR_LAZY_ALLOC()

I don't know. Something like that.

> 
>> A somewhat unrelated question: why is pmd_core_cfg cache-line aligned? I
>> don't think it should be.
> 
> Before the conversion to per lcore variable, it was probably useful
> (avoiding false sharing).
> With the conversion, indeed, it looks like a waste of space.
> It seems worth a separate fix.
> 
> 

You will include it, or should I submit a separate patch?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] random: defer seeding to EAL init
  2024-12-05 17:57 ` [PATCH 1/3] random: defer seeding to EAL init David Marchand
  2024-12-06 11:09   ` Mattias Rönnblom
@ 2024-12-16  9:38   ` Burakov, Anatoly
  1 sibling, 0 replies; 36+ messages in thread
From: Burakov, Anatoly @ 2024-12-16  9:38 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable, Tyler Retzlaff,
	Bruce Richardson, Dmitry Kozlyuk, Morten Brørup,
	Konstantin Ananyev, Chengwen Feng, Stephen Hemminger

On 12/5/2024 6:57 PM, David Marchand wrote:
> The RNG is documented as being seeded as part of EAL init.
> 
> /**
>   * Seed the pseudo-random generator.
>   *
>   * The generator is automatically seeded by the EAL init with a timer
>   * value. It may need to be re-seeded by the user with a real random
>   * value.
>   *
> ...
> 
> Move the initialisation (seeding) helper out of a constructor and
> call it explicitly from rte_eal_init() as it was done before commit
> 3f002f069612 ("eal: replace libc-based random generation with LFSR").
> 
> This also moves the unconditional lcore variable allocation out of a
> constructor.
> 
> While at it, mark local symbol rand_state as static.
> 
> Fixes: 29c39cd3d54d ("random: keep PRNG state in lcore variable")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-05 17:57 [PATCH 0/3] Defer lcore variables allocation David Marchand
                   ` (3 preceding siblings ...)
  2024-12-06 11:01 ` [PATCH 0/3] Defer lcore variables allocation Mattias Rönnblom
@ 2024-12-16  9:42 ` Burakov, Anatoly
  2024-12-16  9:49   ` David Marchand
  2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
  5 siblings, 1 reply; 36+ messages in thread
From: Burakov, Anatoly @ 2024-12-16  9:42 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: thomas, frode.nordahl, mattias.ronnblom

On 12/5/2024 6:57 PM, David Marchand wrote:
> As I had reported in rc2, the lcore variables allocation have a
> noticeable impact on applications consuming DPDK, even when such
> applications does not use DPDK, or use features associated to
> some lcore variables.
> 
> While the amount has been reduced in a rush before rc2,
> there are still cases when the increased memory footprint is noticed
> like in scaling tests.
> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
> 
> 
> lcore variable allocations in constructor is a bad idea, as the
> application consuming DPDK has no control over such allocation:
> linking some code does not mean that all of it will be used at runtime.
> 
> The general question on whether lcore variables in constructor should
> be forbidden, is left to a later discussion.
> 
> For now, this series only focus on fixing subsystems using lcore
> variables so that those allocations are deferred either in rte_eal_init()
> or in the path that does require such lcore variables.
> 
> 

An idle question: would this have any consequences in use case of eal 
init -> eal cleanup -> eal init with different arguments?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-16  9:42 ` Burakov, Anatoly
@ 2024-12-16  9:49   ` David Marchand
  2024-12-17  9:06     ` David Marchand
  2024-12-18 20:10     ` Mattias Rönnblom
  0 siblings, 2 replies; 36+ messages in thread
From: David Marchand @ 2024-12-16  9:49 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, thomas, frode.nordahl, mattias.ronnblom

On Mon, Dec 16, 2024 at 10:42 AM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
>
> On 12/5/2024 6:57 PM, David Marchand wrote:
> > As I had reported in rc2, the lcore variables allocation have a
> > noticeable impact on applications consuming DPDK, even when such
> > applications does not use DPDK, or use features associated to
> > some lcore variables.
> >
> > While the amount has been reduced in a rush before rc2,
> > there are still cases when the increased memory footprint is noticed
> > like in scaling tests.
> > See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
> >
> >
> > lcore variable allocations in constructor is a bad idea, as the
> > application consuming DPDK has no control over such allocation:
> > linking some code does not mean that all of it will be used at runtime.
> >
> > The general question on whether lcore variables in constructor should
> > be forbidden, is left to a later discussion.
> >
> > For now, this series only focus on fixing subsystems using lcore
> > variables so that those allocations are deferred either in rte_eal_init()
> > or in the path that does require such lcore variables.
> >
> >
>
> An idle question: would this have any consequences in use case of eal
> init -> eal cleanup -> eal init with different arguments?

Hum, interesting question.

I would say that initialising lcore variables in constructors means
that this usecase is broken, since lcore variables are freed in
eal_lcore_var_cleanup().


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-10  9:41         ` Mattias Rönnblom
@ 2024-12-16 10:01           ` Burakov, Anatoly
  0 siblings, 0 replies; 36+ messages in thread
From: Burakov, Anatoly @ 2024-12-16 10:01 UTC (permalink / raw)
  To: Mattias Rönnblom, David Marchand
  Cc: dev, thomas, frode.nordahl, mattias.ronnblom

On 12/10/2024 10:41 AM, Mattias Rönnblom wrote:

<snip>

> 
> Maybe the DAG is available on the build (meson) level, and thus the code 
> can be generated out of that?

There is in fact a patchset that produces just that kind of graph:

https://patches.dpdk.org/project/dpdk/list/?series=34055

It's currently not very consumable by DPDK and it'd be a difficult task 
to integrate it into DPDK in such a way as to allow arbitrary libraries 
to use it, but building such a graph from build system is doable.

However, IMO it would be easier to build such a graph at init time, 
where a component would register its dependencies with EAL, and EAL 
would handle the rest. That way, the bulk of the work lands in EAL, 
while component developers only have to specify the things they depend 
on. It could be a little tricky in that some components may depend on 
e.g. IPC which isn't a "component" as much as it's an init stage, but 
depending on IPC could just as well be depending on EAL having done 
init, while for internal components we can come up with some sort of 
internal init order mechanism.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] power: defer lcore variable allocation
  2024-12-13  6:58       ` Mattias Rönnblom
@ 2024-12-16 10:02         ` David Marchand
  0 siblings, 0 replies; 36+ messages in thread
From: David Marchand @ 2024-12-16 10:02 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, thomas, frode.nordahl, mattias.ronnblom, stable,
	Anatoly Burakov, David Hunt, Sivaprasad Tummala,
	Stephen Hemminger, Chengwen Feng, Konstantin Ananyev,
	Morten Brørup

On Fri, Dec 13, 2024 at 7:58 AM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> On 2024-12-12 08:57, David Marchand wrote:
> > clb_multiwait, clb_pause and clb_scale_freq callbacks can only be
> > reached after a successful call to
> > rte_power_ethdev_pmgmt_queue_enable.
> > Triggering an allocation in them means we are hiding a (internal)
> > programatic error as allocation and use of a lcore variable are
> > clearly separated atm.
> > If we keep the lcore var api as is, I would add an assert() (maybe
> > under a debug build option) in RTE_LCORE_VAR macros themselves, as
> > calling with a NULL handle means the initialisation path in some
> > code/RTE_LCORE_VAR API use was incorrect.
> >
>
> Sure, that would make sense. RTE_ASSERT(), that is. RTE_VERIFY() would
> be too expensive.

Yes, I'll send in next revision.


>
> >
> > Or because you propose to add the same type of helpers in both this
> > patch and the next, I am considering the other way: hide the
> > allocation in the RTE_LCORE_VAR* macros.
> > Checking for already allocated var in RTE_LCORE_VAR_ALLOC seems fine.
> > But the "fast path" RTE_LCORE_VAR would have an unwanted branch in most cases.
> >
>
> I would prefer to have the ALLOC() macro with semantics most people
> expect from a macro (or function) with that name, which is, I would
> argue, an unconditional allocation.
> It would make sense to have another macro, which performs an allocation
> only if the handle is NULL.
>
> RTE_LCORE_VAR_ASSURE_ALLOCATED(), or just RTE_LCORE_VAR_ASSURE()
> (although the latter sounds a little like an assertion, and not an
> allocation).
>
> RTE_LCORE_VAR_LAZY_ALLOC()
>
> I don't know. Something like that.

- In the power libary case, allocating the lcore variable is followed
by the initialisation of the lcore variable internals (per lcore
tailqs).
For this patch, I will rename the alloc_lcore_cfgs helper I had in v1 as:

static void
+init_lcore_cfgs(void)
+{
+    struct pmd_core_cfg *lcore_cfg;
+    unsigned int lcore_id;
+
+    if (lcore_cfgs != NULL)
+        return;
+
+    RTE_LCORE_VAR_ALLOC(lcore_cfgs);
+
+    /* initialize all tailqs */
+    RTE_LCORE_VAR_FOREACH(lcore_id, lcore_cfg, lcore_cfgs)
+        TAILQ_INIT(&lcore_cfg->head);
+}

and only keep those checks in the public symbols.


- About more macros, I am wondering if this is needed in the end.
Adding assertions in the lcore var accessor should catch incorrect
initialisation path.


> >
> >> A somewhat unrelated question: why is pmd_core_cfg cache-line aligned? I
> >> don't think it should be.
> >
> > Before the conversion to per lcore variable, it was probably useful
> > (avoiding false sharing).
> > With the conversion, indeed, it looks like a waste of space.
> > It seems worth a separate fix.
> >
> >
>
> You will include it, or should I submit a separate patch?

I'll send it in next revision.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/5] Defer lcore variables allocation
  2024-12-05 17:57 [PATCH 0/3] Defer lcore variables allocation David Marchand
                   ` (4 preceding siblings ...)
  2024-12-16  9:42 ` Burakov, Anatoly
@ 2024-12-17  8:59 ` David Marchand
  2024-12-17  8:59   ` [PATCH v2 1/5] eal: check lcore variable handle David Marchand
                     ` (5 more replies)
  5 siblings, 6 replies; 36+ messages in thread
From: David Marchand @ 2024-12-17  8:59 UTC (permalink / raw)
  To: dev; +Cc: thomas, frode.nordahl, mattias.ronnblom, anatoly.burakov

As I had reported in rc2, the lcore variables allocation have a
noticeable impact on applications consuming DPDK, even when such
applications does not use DPDK, or use features associated to
some lcore variables.

While the amount has been reduced in a rush before rc2,
there are still cases when the increased memory footprint is noticed
like in scaling tests.
See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931


lcore variable allocations in constructor is a bad idea, as the
application consuming DPDK has no control over such allocation:
linking some code does not mean that all of it will be used at runtime.

The general question on whether lcore variables in constructor should
be forbidden, is left to a later discussion.

For now, this series only focus on fixing subsystems using lcore
variables so that those allocations are deferred either in rte_eal_init()
or in the path that does require such lcore variables.


-- 
David Marchand

Changes since v1:
- added a check on lcore handle,
- fixed lcore variable size in lib/power,
- renamed/introduced allocation helpers,


David Marchand (5):
  eal: check lcore variable handle
  random: defer seeding to EAL init
  power: defer lcore variable allocation
  power: reduce memory footprint of per-lcore state
  eal/x86: defer power intrinsics variable allocation

 lib/eal/common/eal_private.h       |  6 ++++++
 lib/eal/common/rte_random.c        |  7 +++++--
 lib/eal/freebsd/eal.c              |  2 ++
 lib/eal/include/rte_lcore_var.h    |  2 ++
 lib/eal/linux/eal.c                |  2 ++
 lib/eal/windows/eal.c              |  2 ++
 lib/eal/x86/rte_power_intrinsics.c | 15 +++++++++++++--
 lib/power/rte_power_pmd_mgmt.c     | 29 ++++++++++++++++++++---------
 8 files changed, 52 insertions(+), 13 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/5] eal: check lcore variable handle
  2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
@ 2024-12-17  8:59   ` David Marchand
  2024-12-18 11:18     ` Burakov, Anatoly
  2024-12-17  8:59   ` [PATCH v2 2/5] random: defer seeding to EAL init David Marchand
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-17  8:59 UTC (permalink / raw)
  To: dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, anatoly.burakov, Tyler Retzlaff

Add an assert to double check the passed handle is not NULL, as it
points at an initialisation/allocation issue prior to accessing this
lcore variable.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/eal/include/rte_lcore_var.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/include/rte_lcore_var.h b/lib/eal/include/rte_lcore_var.h
index 0216a67cab..ca31dff6fd 100644
--- a/lib/eal/include/rte_lcore_var.h
+++ b/lib/eal/include/rte_lcore_var.h
@@ -24,6 +24,7 @@
 
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_debug.h>
 #include <rte_lcore.h>
 
 #ifdef __cplusplus
@@ -125,6 +126,7 @@ extern "C" {
 static inline void *
 rte_lcore_var_lcore(unsigned int lcore_id, void *handle)
 {
+	RTE_ASSERT(handle != NULL);
 	return RTE_PTR_ADD(handle, lcore_id * RTE_MAX_LCORE_VAR);
 }
 /* >8 end of access function */
-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 2/5] random: defer seeding to EAL init
  2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
  2024-12-17  8:59   ` [PATCH v2 1/5] eal: check lcore variable handle David Marchand
@ 2024-12-17  8:59   ` David Marchand
  2024-12-18 16:35     ` Stephen Hemminger
  2024-12-17  8:59   ` [PATCH v2 3/5] power: defer lcore variable allocation David Marchand
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-17  8:59 UTC (permalink / raw)
  To: dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, anatoly.burakov, stable,
	Tyler Retzlaff, Bruce Richardson, Dmitry Kozlyuk,
	Stephen Hemminger, Chengwen Feng, Morten Brørup,
	Konstantin Ananyev

The RNG is documented as being seeded as part of EAL init.

Move the initialisation (seeding) helper out of a constructor and
call it explicitly from rte_eal_init() as it was done before commit
3f002f069612 ("eal: replace libc-based random generation with LFSR").

This also moves the unconditional lcore variable allocation out of a
constructor.

While at it, mark local symbol rand_state as static.

Fixes: 29c39cd3d54d ("random: keep PRNG state in lcore variable")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Changes since v1:
- updated commitlog,

---
 lib/eal/common/eal_private.h | 6 ++++++
 lib/eal/common/rte_random.c  | 7 +++++--
 lib/eal/freebsd/eal.c        | 2 ++
 lib/eal/linux/eal.c          | 2 ++
 lib/eal/windows/eal.c        | 2 ++
 5 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index bb315dab04..a10db6a399 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -702,6 +702,12 @@ eal_get_internal_configuration(void);
 rte_usage_hook_t
 eal_get_application_usage_hook(void);
 
+/**
+ * Initialise random subsystem.
+ */
+void
+eal_rand_init(void);
+
 /**
  * Instruct primary process that a secondary process wants to attach.
  */
diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c
index cf0756f26a..307c26bb7c 100644
--- a/lib/eal/common/rte_random.c
+++ b/lib/eal/common/rte_random.c
@@ -14,6 +14,8 @@
 #include <rte_lcore_var.h>
 #include <rte_random.h>
 
+#include "eal_private.h"
+
 struct __rte_cache_aligned rte_rand_state {
 	uint64_t z1;
 	uint64_t z2;
@@ -22,7 +24,7 @@ struct __rte_cache_aligned rte_rand_state {
 	uint64_t z5;
 };
 
-RTE_LCORE_VAR_HANDLE(struct rte_rand_state, rand_state);
+static RTE_LCORE_VAR_HANDLE(struct rte_rand_state, rand_state);
 
 /* instance to be shared by all unregistered non-EAL threads */
 static struct rte_rand_state unregistered_rand_state;
@@ -228,7 +230,8 @@ __rte_random_initial_seed(void)
 	return rte_get_tsc_cycles();
 }
 
-RTE_INIT(rte_rand_init)
+void
+eal_rand_init(void)
 {
 	uint64_t seed;
 
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index a96bbf5836..d07cff8651 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -777,6 +777,8 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	eal_rand_init();
+
 	eal_check_mem_on_local_socket();
 
 	if (rte_thread_set_affinity_by_id(rte_thread_self(),
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index a6220524a4..b1e63e37fc 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1173,6 +1173,8 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	eal_rand_init();
+
 	eal_check_mem_on_local_socket();
 
 	if (rte_thread_set_affinity_by_id(rte_thread_self(),
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 5cdc053a02..5c7526f922 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -410,6 +410,8 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	eal_rand_init();
+
 	if (rte_thread_set_affinity_by_id(rte_thread_self(),
 			&lcore_config[config->main_lcore].cpuset) != 0) {
 		rte_eal_init_alert("Cannot set affinity");
-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 3/5] power: defer lcore variable allocation
  2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
  2024-12-17  8:59   ` [PATCH v2 1/5] eal: check lcore variable handle David Marchand
  2024-12-17  8:59   ` [PATCH v2 2/5] random: defer seeding to EAL init David Marchand
@ 2024-12-17  8:59   ` David Marchand
  2024-12-18 11:17     ` Burakov, Anatoly
  2024-12-17  8:59   ` [PATCH v2 4/5] power: reduce memory footprint of per-lcore state David Marchand
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-17  8:59 UTC (permalink / raw)
  To: dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, anatoly.burakov, stable,
	David Hunt, Sivaprasad Tummala, Konstantin Ananyev,
	Stephen Hemminger, Morten Brørup, Chengwen Feng

The lcore variable in this code unit is only used through
rte_power_ethdev_pmgmt_queue_*() public symbols.

Defer the unconditional lcore variable allocation in those symbols.

Fixes: 130643319579 ("power: keep per-lcore state in lcore variable")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- moved some unrelated comment on queue_id,
- renamed initialisation helper,

---
 lib/power/rte_power_pmd_mgmt.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index a2fff3b765..369ce3c354 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -72,6 +72,22 @@ struct __rte_cache_aligned pmd_core_cfg {
 };
 static RTE_LCORE_VAR_HANDLE(struct pmd_core_cfg, lcore_cfgs);
 
+static void
+init_lcore_cfgs(void)
+{
+	struct pmd_core_cfg *lcore_cfg;
+	unsigned int lcore_id;
+
+	if (lcore_cfgs != NULL)
+		return;
+
+	RTE_LCORE_VAR_ALLOC(lcore_cfgs);
+
+	/* initialize all tailqs */
+	RTE_LCORE_VAR_FOREACH(lcore_id, lcore_cfg, lcore_cfgs)
+		TAILQ_INIT(&lcore_cfg->head);
+}
+
 static inline bool
 queue_equal(const union queue *l, const union queue *r)
 {
@@ -517,6 +533,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	init_lcore_cfgs();
 	lcore_cfg = RTE_LCORE_VAR_LCORE(lcore_id, lcore_cfgs);
 
 	/* check if other queues are stopped as well */
@@ -618,6 +635,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	}
 
 	/* no need to check queue id as wrong queue id would not be enabled */
+
+	init_lcore_cfgs();
 	lcore_cfg = RTE_LCORE_VAR_LCORE(lcore_id, lcore_cfgs);
 
 	/* check if other queues are stopped as well */
@@ -768,16 +787,8 @@ rte_power_pmd_mgmt_get_scaling_freq_max(unsigned int lcore)
 }
 
 RTE_INIT(rte_power_ethdev_pmgmt_init) {
-	unsigned int lcore_id;
-	struct pmd_core_cfg *lcore_cfg;
 	int i;
 
-	RTE_LCORE_VAR_ALLOC(lcore_cfgs);
-
-	/* initialize all tailqs */
-	RTE_LCORE_VAR_FOREACH(lcore_id, lcore_cfg, lcore_cfgs)
-		TAILQ_INIT(&lcore_cfg->head);
-
 	/* initialize config defaults */
 	emptypoll_max = 512;
 	pause_duration = 1;
-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 4/5] power: reduce memory footprint of per-lcore state
  2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
                     ` (2 preceding siblings ...)
  2024-12-17  8:59   ` [PATCH v2 3/5] power: defer lcore variable allocation David Marchand
@ 2024-12-17  8:59   ` David Marchand
  2024-12-18 11:17     ` Burakov, Anatoly
  2024-12-17  8:59   ` [PATCH v2 5/5] eal/x86: defer power intrinsics variable allocation David Marchand
  2024-12-19 16:19   ` [PATCH v2 0/5] Defer lcore variables allocation Frode Nordahl
  5 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-17  8:59 UTC (permalink / raw)
  To: dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, anatoly.burakov,
	David Hunt, Sivaprasad Tummala, Chengwen Feng, Stephen Hemminger,
	Konstantin Ananyev, Morten Brørup

Now that the per-lcore state was moved into a lcore variable,
there is no reason to align a per-lcore state on a cache line to avoid
false sharing.
Remove this alignment and save a few bytes.

Fixes: 130643319579 ("power: keep per-lcore state in lcore variable")

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/power/rte_power_pmd_mgmt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 369ce3c354..8ec925bd65 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -56,7 +56,7 @@ struct queue_list_entry {
 	const struct rte_eth_rxtx_callback *cb;
 };
 
-struct __rte_cache_aligned pmd_core_cfg {
+struct pmd_core_cfg {
 	TAILQ_HEAD(queue_list_head, queue_list_entry) head;
 	/**< List of queues associated with this lcore */
 	size_t n_queues;
-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 5/5] eal/x86: defer power intrinsics variable allocation
  2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
                     ` (3 preceding siblings ...)
  2024-12-17  8:59   ` [PATCH v2 4/5] power: reduce memory footprint of per-lcore state David Marchand
@ 2024-12-17  8:59   ` David Marchand
  2024-12-18 11:17     ` Burakov, Anatoly
  2024-12-19 16:19   ` [PATCH v2 0/5] Defer lcore variables allocation Frode Nordahl
  5 siblings, 1 reply; 36+ messages in thread
From: David Marchand @ 2024-12-17  8:59 UTC (permalink / raw)
  To: dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, anatoly.burakov, stable,
	Bruce Richardson, Konstantin Ananyev, Chengwen Feng,
	Stephen Hemminger, Morten Brørup

The lcore variable in this code unit is only used through
rte_power_monitor*() public symbols.

Defer the unconditional lcore variable allocation in those symbols.

Fixes: 18b5049ab4fe ("eal/x86: keep power intrinsics state in lcore variable")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- added an initialiser helper,

---
 lib/eal/x86/rte_power_intrinsics.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index e4cb913590..f7d558153e 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -22,7 +22,13 @@ struct power_wait_status {
 
 RTE_LCORE_VAR_HANDLE(struct power_wait_status, wait_status);
 
-RTE_LCORE_VAR_INIT(wait_status);
+static void
+init_wait_status(void)
+{
+	if (wait_status != NULL)
+		return;
+	RTE_LCORE_VAR_ALLOC(wait_status);
+}
 
 /*
  * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state.
@@ -177,6 +183,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (pmc->fn == NULL)
 		return -EINVAL;
 
+	init_wait_status();
 	s = RTE_LCORE_VAR_LCORE(lcore_id, wait_status);
 
 	/* update sleep address */
@@ -269,6 +276,7 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 	if (lcore_id >= RTE_MAX_LCORE)
 		return -EINVAL;
 
+	init_wait_status();
 	s = RTE_LCORE_VAR_LCORE(lcore_id, wait_status);
 
 	/*
@@ -308,7 +316,7 @@ int
 rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
 		const uint32_t num, const uint64_t tsc_timestamp)
 {
-	struct power_wait_status *s = RTE_LCORE_VAR(wait_status);
+	struct power_wait_status *s;
 	uint32_t i, rc;
 
 	/* check if supported */
@@ -318,6 +326,9 @@ rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
 	if (pmc == NULL || num == 0)
 		return -EINVAL;
 
+	init_wait_status();
+	s = RTE_LCORE_VAR(wait_status);
+
 	/* we are already inside transaction region, return */
 	if (rte_xtest() != 0)
 		return 0;
-- 
2.47.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-16  9:49   ` David Marchand
@ 2024-12-17  9:06     ` David Marchand
  2024-12-18 20:10     ` Mattias Rönnblom
  1 sibling, 0 replies; 36+ messages in thread
From: David Marchand @ 2024-12-17  9:06 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, thomas, frode.nordahl, mattias.ronnblom

On Mon, Dec 16, 2024 at 10:49 AM David Marchand
<david.marchand@redhat.com> wrote:
> > > For now, this series only focus on fixing subsystems using lcore
> > > variables so that those allocations are deferred either in rte_eal_init()
> > > or in the path that does require such lcore variables.
> > >
> > >
> >
> > An idle question: would this have any consequences in use case of eal
> > init -> eal cleanup -> eal init with different arguments?
>
> Hum, interesting question.
>
> I would say that initialising lcore variables in constructors means
> that this usecase is broken, since lcore variables are freed in
> eal_lcore_var_cleanup().

I posted a v2, though this issue you noticed is still present.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/5] power: reduce memory footprint of per-lcore state
  2024-12-17  8:59   ` [PATCH v2 4/5] power: reduce memory footprint of per-lcore state David Marchand
@ 2024-12-18 11:17     ` Burakov, Anatoly
  0 siblings, 0 replies; 36+ messages in thread
From: Burakov, Anatoly @ 2024-12-18 11:17 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, David Hunt,
	Sivaprasad Tummala, Chengwen Feng, Stephen Hemminger,
	Konstantin Ananyev, Morten Brørup

On 12/17/2024 9:59 AM, David Marchand wrote:
> Now that the per-lcore state was moved into a lcore variable,
> there is no reason to align a per-lcore state on a cache line to avoid
> false sharing.
> Remove this alignment and save a few bytes.
> 
> Fixes: 130643319579 ("power: keep per-lcore state in lcore variable")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>   lib/power/rte_power_pmd_mgmt.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> index 369ce3c354..8ec925bd65 100644
> --- a/lib/power/rte_power_pmd_mgmt.c
> +++ b/lib/power/rte_power_pmd_mgmt.c
> @@ -56,7 +56,7 @@ struct queue_list_entry {
>   	const struct rte_eth_rxtx_callback *cb;
>   };
>   
> -struct __rte_cache_aligned pmd_core_cfg {
> +struct pmd_core_cfg {
>   	TAILQ_HEAD(queue_list_head, queue_list_entry) head;
>   	/**< List of queues associated with this lcore */
>   	size_t n_queues;
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/5] power: defer lcore variable allocation
  2024-12-17  8:59   ` [PATCH v2 3/5] power: defer lcore variable allocation David Marchand
@ 2024-12-18 11:17     ` Burakov, Anatoly
  0 siblings, 0 replies; 36+ messages in thread
From: Burakov, Anatoly @ 2024-12-18 11:17 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable, David Hunt,
	Sivaprasad Tummala, Konstantin Ananyev, Stephen Hemminger,
	Morten Brørup, Chengwen Feng

On 12/17/2024 9:59 AM, David Marchand wrote:
> The lcore variable in this code unit is only used through
> rte_power_ethdev_pmgmt_queue_*() public symbols.
> 
> Defer the unconditional lcore variable allocation in those symbols.
> 
> Fixes: 130643319579 ("power: keep per-lcore state in lcore variable")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/5] eal/x86: defer power intrinsics variable allocation
  2024-12-17  8:59   ` [PATCH v2 5/5] eal/x86: defer power intrinsics variable allocation David Marchand
@ 2024-12-18 11:17     ` Burakov, Anatoly
  0 siblings, 0 replies; 36+ messages in thread
From: Burakov, Anatoly @ 2024-12-18 11:17 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, stable,
	Bruce Richardson, Konstantin Ananyev, Chengwen Feng,
	Stephen Hemminger, Morten Brørup

On 12/17/2024 9:59 AM, David Marchand wrote:
> The lcore variable in this code unit is only used through
> rte_power_monitor*() public symbols.
> 
> Defer the unconditional lcore variable allocation in those symbols.
> 
> Fixes: 18b5049ab4fe ("eal/x86: keep power intrinsics state in lcore variable")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/5] eal: check lcore variable handle
  2024-12-17  8:59   ` [PATCH v2 1/5] eal: check lcore variable handle David Marchand
@ 2024-12-18 11:18     ` Burakov, Anatoly
  0 siblings, 0 replies; 36+ messages in thread
From: Burakov, Anatoly @ 2024-12-18 11:18 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, frode.nordahl, mattias.ronnblom, Tyler Retzlaff

On 12/17/2024 9:59 AM, David Marchand wrote:
> Add an assert to double check the passed handle is not NULL, as it
> points at an initialisation/allocation issue prior to accessing this
> lcore variable.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>   lib/eal/include/rte_lcore_var.h | 2 ++
>   1 file changed, 2 insertions(+)
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/5] random: defer seeding to EAL init
  2024-12-17  8:59   ` [PATCH v2 2/5] random: defer seeding to EAL init David Marchand
@ 2024-12-18 16:35     ` Stephen Hemminger
  2024-12-18 17:03       ` Mattias Rönnblom
  0 siblings, 1 reply; 36+ messages in thread
From: Stephen Hemminger @ 2024-12-18 16:35 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, thomas, frode.nordahl, mattias.ronnblom, anatoly.burakov,
	stable, Tyler Retzlaff, Bruce Richardson, Dmitry Kozlyuk,
	Chengwen Feng, Morten Brørup, Konstantin Ananyev

On Tue, 17 Dec 2024 09:59:49 +0100
David Marchand <david.marchand@redhat.com> wrote:

> The RNG is documented as being seeded as part of EAL init.
> 
> Move the initialisation (seeding) helper out of a constructor and
> call it explicitly from rte_eal_init() as it was done before commit
> 3f002f069612 ("eal: replace libc-based random generation with LFSR").
> 
> This also moves the unconditional lcore variable allocation out of a
> constructor.
> 
> While at it, mark local symbol rand_state as static.
> 
> Fixes: 29c39cd3d54d ("random: keep PRNG state in lcore variable")
> Cc: stable@dpdk.org
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

Probably need to add a check to rte_random() so it crashes
if called before initialization, rather than returning an un-random
number which could be a hidden long term bug.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/5] random: defer seeding to EAL init
  2024-12-18 16:35     ` Stephen Hemminger
@ 2024-12-18 17:03       ` Mattias Rönnblom
  0 siblings, 0 replies; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-18 17:03 UTC (permalink / raw)
  To: Stephen Hemminger, David Marchand
  Cc: dev, thomas, frode.nordahl, mattias.ronnblom, anatoly.burakov,
	stable, Tyler Retzlaff, Bruce Richardson, Dmitry Kozlyuk,
	Chengwen Feng, Morten Brørup, Konstantin Ananyev

On 2024-12-18 17:35, Stephen Hemminger wrote:
> On Tue, 17 Dec 2024 09:59:49 +0100
> David Marchand <david.marchand@redhat.com> wrote:
> 
>> The RNG is documented as being seeded as part of EAL init.
>>
>> Move the initialisation (seeding) helper out of a constructor and
>> call it explicitly from rte_eal_init() as it was done before commit
>> 3f002f069612 ("eal: replace libc-based random generation with LFSR").
>>
>> This also moves the unconditional lcore variable allocation out of a
>> constructor.
>>
>> While at it, mark local symbol rand_state as static.
>>
>> Fixes: 29c39cd3d54d ("random: keep PRNG state in lcore variable")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>> Reviewed-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> Probably need to add a check to rte_random() so it crashes
> if called before initialization, rather than returning an un-random
> number which could be a hidden long term bug.

If you do, do it either as a RTE_ASSERT() or an RTE_VERIFY() in the 
(lcore_id == LCORE_ID_ANY) path, since that is what will be taken.

Preferably, you should have as little as possible in rte_rand() fast 
path, because this function is used in packet processing.

That said, the "unrandom" number will always be 0, so it shouldn't go 
unnoticed for too long.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Defer lcore variables allocation
  2024-12-16  9:49   ` David Marchand
  2024-12-17  9:06     ` David Marchand
@ 2024-12-18 20:10     ` Mattias Rönnblom
  1 sibling, 0 replies; 36+ messages in thread
From: Mattias Rönnblom @ 2024-12-18 20:10 UTC (permalink / raw)
  To: David Marchand, Burakov, Anatoly
  Cc: dev, thomas, frode.nordahl, mattias.ronnblom

On 2024-12-16 10:49, David Marchand wrote:
> On Mon, Dec 16, 2024 at 10:42 AM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
>>
>> On 12/5/2024 6:57 PM, David Marchand wrote:
>>> As I had reported in rc2, the lcore variables allocation have a
>>> noticeable impact on applications consuming DPDK, even when such
>>> applications does not use DPDK, or use features associated to
>>> some lcore variables.
>>>
>>> While the amount has been reduced in a rush before rc2,
>>> there are still cases when the increased memory footprint is noticed
>>> like in scaling tests.
>>> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
>>>
>>>
>>> lcore variable allocations in constructor is a bad idea, as the
>>> application consuming DPDK has no control over such allocation:
>>> linking some code does not mean that all of it will be used at runtime.
>>>
>>> The general question on whether lcore variables in constructor should
>>> be forbidden, is left to a later discussion.
>>>
>>> For now, this series only focus on fixing subsystems using lcore
>>> variables so that those allocations are deferred either in rte_eal_init()
>>> or in the path that does require such lcore variables.
>>>
>>>
>>
>> An idle question: would this have any consequences in use case of eal
>> init -> eal cleanup -> eal init with different arguments?
> 
> Hum, interesting question.
> 
> I would say that initialising lcore variables in constructors means
> that this usecase is broken, since lcore variables are freed in
> eal_lcore_var_cleanup().
> 
> 

After rte_eal_cleanup() is called, no DPDK calls may be made.

So, with the current API, there is no such use case to break.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] Defer lcore variables allocation
  2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
                     ` (4 preceding siblings ...)
  2024-12-17  8:59   ` [PATCH v2 5/5] eal/x86: defer power intrinsics variable allocation David Marchand
@ 2024-12-19 16:19   ` Frode Nordahl
  5 siblings, 0 replies; 36+ messages in thread
From: Frode Nordahl @ 2024-12-19 16:19 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, mattias.ronnblom, anatoly.burakov

On Tue, Dec 17, 2024 at 10:00 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> As I had reported in rc2, the lcore variables allocation have a
> noticeable impact on applications consuming DPDK, even when such
> applications does not use DPDK, or use features associated to
> some lcore variables.
>
> While the amount has been reduced in a rush before rc2,
> there are still cases when the increased memory footprint is noticed
> like in scaling tests.
> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931
>
>
> lcore variable allocations in constructor is a bad idea, as the
> application consuming DPDK has no control over such allocation:
> linking some code does not mean that all of it will be used at runtime.
>
> The general question on whether lcore variables in constructor should
> be forbidden, is left to a later discussion.
>
> For now, this series only focus on fixing subsystems using lcore
> variables so that those allocations are deferred either in rte_eal_init()
> or in the path that does require such lcore variables.

While I have not done code review of this series, I can confirm that
building Open vSwitch against DPDK built with this set of patches
resolves the issue outlined in
https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931 as in just
linking against DPDK no longer make a program increase its resident
size.

Acked-by: Frode Nordahl <frode.nordahl@canonical.com>

Thanks alot for addressing the issue!

-- 
Frode Nordahl

> --
> David Marchand
>
> Changes since v1:
> - added a check on lcore handle,
> - fixed lcore variable size in lib/power,
> - renamed/introduced allocation helpers,
>
>
> David Marchand (5):
>   eal: check lcore variable handle
>   random: defer seeding to EAL init
>   power: defer lcore variable allocation
>   power: reduce memory footprint of per-lcore state
>   eal/x86: defer power intrinsics variable allocation
>
>  lib/eal/common/eal_private.h       |  6 ++++++
>  lib/eal/common/rte_random.c        |  7 +++++--
>  lib/eal/freebsd/eal.c              |  2 ++
>  lib/eal/include/rte_lcore_var.h    |  2 ++
>  lib/eal/linux/eal.c                |  2 ++
>  lib/eal/windows/eal.c              |  2 ++
>  lib/eal/x86/rte_power_intrinsics.c | 15 +++++++++++++--
>  lib/power/rte_power_pmd_mgmt.c     | 29 ++++++++++++++++++++---------
>  8 files changed, 52 insertions(+), 13 deletions(-)
>
> --
> 2.47.0
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-12-23 10:36 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-05 17:57 [PATCH 0/3] Defer lcore variables allocation David Marchand
2024-12-05 17:57 ` [PATCH 1/3] random: defer seeding to EAL init David Marchand
2024-12-06 11:09   ` Mattias Rönnblom
2024-12-16  9:38   ` Burakov, Anatoly
2024-12-05 17:57 ` [PATCH 2/3] power: defer lcore variable allocation David Marchand
2024-12-06 11:29   ` Mattias Rönnblom
2024-12-12  7:57     ` David Marchand
2024-12-13  6:58       ` Mattias Rönnblom
2024-12-16 10:02         ` David Marchand
2024-12-05 17:57 ` [PATCH 3/3] eal/x86: defer power intrinsics " David Marchand
2024-12-06 11:32   ` Mattias Rönnblom
2024-12-06 11:01 ` [PATCH 0/3] Defer lcore variables allocation Mattias Rönnblom
2024-12-06 15:55   ` Thomas Monjalon
2024-12-10 17:09     ` Stephen Hemminger
2024-12-09 11:03   ` David Marchand
2024-12-09 15:39     ` Mattias Rönnblom
2024-12-09 17:40       ` David Marchand
2024-12-10  9:41         ` Mattias Rönnblom
2024-12-16 10:01           ` Burakov, Anatoly
2024-12-16  9:42 ` Burakov, Anatoly
2024-12-16  9:49   ` David Marchand
2024-12-17  9:06     ` David Marchand
2024-12-18 20:10     ` Mattias Rönnblom
2024-12-17  8:59 ` [PATCH v2 0/5] " David Marchand
2024-12-17  8:59   ` [PATCH v2 1/5] eal: check lcore variable handle David Marchand
2024-12-18 11:18     ` Burakov, Anatoly
2024-12-17  8:59   ` [PATCH v2 2/5] random: defer seeding to EAL init David Marchand
2024-12-18 16:35     ` Stephen Hemminger
2024-12-18 17:03       ` Mattias Rönnblom
2024-12-17  8:59   ` [PATCH v2 3/5] power: defer lcore variable allocation David Marchand
2024-12-18 11:17     ` Burakov, Anatoly
2024-12-17  8:59   ` [PATCH v2 4/5] power: reduce memory footprint of per-lcore state David Marchand
2024-12-18 11:17     ` Burakov, Anatoly
2024-12-17  8:59   ` [PATCH v2 5/5] eal/x86: defer power intrinsics variable allocation David Marchand
2024-12-18 11:17     ` Burakov, Anatoly
2024-12-19 16:19   ` [PATCH v2 0/5] Defer lcore variables allocation Frode Nordahl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).