patches for DPDK stable branches
 help / color / Atom feed
* [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
@ 2020-08-05 12:26 Nick Connolly
  2020-08-05 13:42 ` Nicolas Dichtel
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Nick Connolly @ 2020-08-05 12:26 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Nick Connolly, nicolas.dichtel, stable

Running dpdk-helloworld on Linux with lib numa present,
but no kernel support for NUMA (CONFIG_NUMA=n) causes
ret_service_init() to fail with EAL: error allocating
rte services array.

alloc_seg() calls get_mempolicy to verify that the allocation
has happened on the correct socket, but receives ENOSYS from
the kernel and fails the allocation.

The allocated socket should only be verified if check_numa() is true.

Fixes: 2a96c88be83e ("mem: ease init in a docker container")
Cc: nicolas.dichtel@6wind.com
Cc: stable@dpdk.org
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
---
 lib/librte_eal/linux/eal_memalloc.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/linux/eal_memalloc.c b/lib/librte_eal/linux/eal_memalloc.c
index db60e7997..179757809 100644
--- a/lib/librte_eal/linux/eal_memalloc.c
+++ b/lib/librte_eal/linux/eal_memalloc.c
@@ -610,17 +610,23 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	}
 
 #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
-	ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
-			    MPOL_F_NODE | MPOL_F_ADDR);
-	if (ret < 0) {
-		RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
-			__func__, strerror(errno));
-		goto mapped;
-	} else if (cur_socket_id != socket_id) {
-		RTE_LOG(DEBUG, EAL,
-				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
-			__func__, socket_id, cur_socket_id);
-		goto mapped;
+	if (check_numa()) {
+		ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
+					MPOL_F_NODE | MPOL_F_ADDR);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
+				__func__, strerror(errno));
+			goto mapped;
+		} else if (cur_socket_id != socket_id) {
+			RTE_LOG(DEBUG, EAL,
+					"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+				__func__, socket_id, cur_socket_id);
+			goto mapped;
+		}
+	} else {
+		if (rte_socket_count() > 1)
+			RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation (wanted %d)\n",
+					__func__, socket_id);
 	}
 #else
 	if (rte_socket_count() > 1)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 12:26 [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel Nick Connolly
@ 2020-08-05 13:42 ` Nicolas Dichtel
  2020-08-05 14:20   ` Nick Connolly
  2020-09-17 11:31 ` Burakov, Anatoly
  2020-10-12 19:28 ` [dpdk-stable] [PATCH v2] " Nick Connolly
  2 siblings, 1 reply; 21+ messages in thread
From: Nicolas Dichtel @ 2020-08-05 13:42 UTC (permalink / raw)
  To: Nick Connolly, Anatoly Burakov; +Cc: dev, stable

Le 05/08/2020 à 14:26, Nick Connolly a écrit :
> Running dpdk-helloworld on Linux with lib numa present,
> but no kernel support for NUMA (CONFIG_NUMA=n) causes
> ret_service_init() to fail with EAL: error allocating
> rte services array.
> 
> alloc_seg() calls get_mempolicy to verify that the allocation
> has happened on the correct socket, but receives ENOSYS from
> the kernel and fails the allocation.
> 
> The allocated socket should only be verified if check_numa() is true.
> 
> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
I'm wondering if the bug existed before this commit.

Before this commit, it was:
       move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
       if (cur_socket_id != socket_id) {
               /* error */

Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?

[snip]
> +	if (check_numa()) {
> +		ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
> +					MPOL_F_NODE | MPOL_F_ADDR);
> +		if (ret < 0) {
> +			RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
> +				__func__, strerror(errno));
> +			goto mapped;
> +		} else if (cur_socket_id != socket_id) {
> +			RTE_LOG(DEBUG, EAL,
> +					"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
> +				__func__, socket_id, cur_socket_id);
> +			goto mapped;
> +		}
> +	} else {
> +		if (rte_socket_count() > 1)
> +			RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation (wanted %d)\n",
> +					__func__, socket_id);
nit: maybe an higher log level like WARNING?


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 13:42 ` Nicolas Dichtel
@ 2020-08-05 14:20   ` Nick Connolly
  2020-08-05 14:36     ` Nicolas Dichtel
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Connolly @ 2020-08-05 14:20 UTC (permalink / raw)
  To: nicolas.dichtel, Anatoly Burakov; +Cc: dev, stable

Hi Nicolas,

Thanks for the quick response.

On 05/08/2020 14:42, Nicolas Dichtel wrote:
> Le 05/08/2020 à 14:26, Nick Connolly a écrit :
>> Running dpdk-helloworld on Linux with lib numa present,
>> but no kernel support for NUMA (CONFIG_NUMA=n) causes
>> ret_service_init() to fail with EAL: error allocating
>> rte services array.
>>
>> alloc_seg() calls get_mempolicy to verify that the allocation
>> has happened on the correct socket, but receives ENOSYS from
>> the kernel and fails the allocation.
>>
>> The allocated socket should only be verified if check_numa() is true.
>>
>> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
> I'm wondering if the bug existed before this commit.
>
> Before this commit, it was:
>         move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
>         if (cur_socket_id != socket_id) {
>                 /* error */
>
> Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?
I've just run the previous code to test this out and you are right that 
move_pages does indeed return -1 with errno set to ENOSYS, but nothing 
checks this so execution carries on and compares cur_socket_id (which 
will be unchanged from the zero initialization) with socket_id (which is 
presumably also zero), thus allowing the allocation to succeed!

> [snip]
>> +	if (check_numa()) {
>> +		ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>> +					MPOL_F_NODE | MPOL_F_ADDR);
>> +		if (ret < 0) {
>> +			RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>> +				__func__, strerror(errno));
>> +			goto mapped;
>> +		} else if (cur_socket_id != socket_id) {
>> +			RTE_LOG(DEBUG, EAL,
>> +					"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
>> +				__func__, socket_id, cur_socket_id);
>> +			goto mapped;
>> +		}
>> +	} else {
>> +		if (rte_socket_count() > 1)
>> +			RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation (wanted %d)\n",
>> +					__func__, socket_id);
> nit: maybe an higher log level like WARNING?
Open to guidance here - my concern was that this is going to be 
generated for every call to alloc_seg() and I'm not sure what the 
frequency will be - I'm cautious about flooding the log with warnings 
under 'normal running'.  Are the implications of running on a multi 
socket system with NUMA support disabled in the kernel purely 
performance related for the DPDK or is there a functional correctness 
issue as well?
>
> Regards,
> Nicolas

Regards,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 14:20   ` Nick Connolly
@ 2020-08-05 14:36     ` Nicolas Dichtel
  2020-08-05 14:53       ` Nick Connolly
  0 siblings, 1 reply; 21+ messages in thread
From: Nicolas Dichtel @ 2020-08-05 14:36 UTC (permalink / raw)
  To: Nick Connolly, Anatoly Burakov; +Cc: dev, stable

Le 05/08/2020 à 16:20, Nick Connolly a écrit :
[snip]
>>> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
>> I'm wondering if the bug existed before this commit.
>>
>> Before this commit, it was:
>>         move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
>>         if (cur_socket_id != socket_id) {
>>                 /* error */
>>
>> Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?
> I've just run the previous code to test this out and you are right that
> move_pages does indeed return -1 with errno set to ENOSYS, but nothing checks
> this so execution carries on and compares cur_socket_id (which will be unchanged
> from the zero initialization) with socket_id (which is presumably also zero),
> thus allowing the allocation to succeed!
I came to this conclusion, but I didn't check if socket_id could be != from 0.

>> [snip]
>>> +    if (check_numa()) {
>>> +        ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>> +                    MPOL_F_NODE | MPOL_F_ADDR);
>>> +        if (ret < 0) {
>>> +            RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>> +                __func__, strerror(errno));
>>> +            goto mapped;
>>> +        } else if (cur_socket_id != socket_id) {
>>> +            RTE_LOG(DEBUG, EAL,
>>> +                    "%s(): allocation happened on wrong socket (wanted %d,
>>> got %d)\n",
>>> +                __func__, socket_id, cur_socket_id);
>>> +            goto mapped;
>>> +        }
>>> +    } else {
>>> +        if (rte_socket_count() > 1)
>>> +            RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation
>>> (wanted %d)\n",
>>> +                    __func__, socket_id);
>> nit: maybe an higher log level like WARNING?
> Open to guidance here - my concern was that this is going to be generated for
> every call to alloc_seg() and I'm not sure what the frequency will be - I'm
> cautious about flooding the log with warnings under 'normal running'.  Are the
> implications of running on a multi socket system with NUMA support disabled in
> the kernel purely performance related for the DPDK or is there a functional
> correctness issue as well?
Is it really a 'normal running' to have CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
dpdk and not CONFIG_NUMA in the kernel?


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 14:36     ` Nicolas Dichtel
@ 2020-08-05 14:53       ` Nick Connolly
  2020-08-05 15:13         ` Nicolas Dichtel
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Connolly @ 2020-08-05 14:53 UTC (permalink / raw)
  To: nicolas.dichtel, Anatoly Burakov; +Cc: dev, stable



On 05/08/2020 15:36, Nicolas Dichtel wrote:
> Le 05/08/2020 à 16:20, Nick Connolly a écrit :
> [snip]
>>>> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
>>> I'm wondering if the bug existed before this commit.
>>>
>>> Before this commit, it was:
>>>          move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
>>>          if (cur_socket_id != socket_id) {
>>>                  /* error */
>>>
>>> Isn't it possible to hit this error case if CONFIG_NUMA is unset in the kernel?
>> I've just run the previous code to test this out and you are right that
>> move_pages does indeed return -1 with errno set to ENOSYS, but nothing checks
>> this so execution carries on and compares cur_socket_id (which will be unchanged
>> from the zero initialization) with socket_id (which is presumably also zero),
>> thus allowing the allocation to succeed!
> I came to this conclusion, but I didn't check if socket_id could be != from 0.
>
>>> [snip]
>>>> +    if (check_numa()) {
>>>> +        ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>>> +                    MPOL_F_NODE | MPOL_F_ADDR);
>>>> +        if (ret < 0) {
>>>> +            RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>>> +                __func__, strerror(errno));
>>>> +            goto mapped;
>>>> +        } else if (cur_socket_id != socket_id) {
>>>> +            RTE_LOG(DEBUG, EAL,
>>>> +                    "%s(): allocation happened on wrong socket (wanted %d,
>>>> got %d)\n",
>>>> +                __func__, socket_id, cur_socket_id);
>>>> +            goto mapped;
>>>> +        }
>>>> +    } else {
>>>> +        if (rte_socket_count() > 1)
>>>> +            RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation
>>>> (wanted %d)\n",
>>>> +                    __func__, socket_id);
>>> nit: maybe an higher log level like WARNING?
>> Open to guidance here - my concern was that this is going to be generated for
>> every call to alloc_seg() and I'm not sure what the frequency will be - I'm
>> cautious about flooding the log with warnings under 'normal running'.  Are the
>> implications of running on a multi socket system with NUMA support disabled in
>> the kernel purely performance related for the DPDK or is there a functional
>> correctness issue as well?
> Is it really a 'normal running' to have CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
> dpdk and not CONFIG_NUMA in the kernel?

I'm not an expert of DPDK, but I think it needs to be treated as 'normal 
running', for the following reasons:

 1. The existing code in eal_memalloc_alloc_seg_bulk() is designed to
    work even if check_numa() indicates that NUMA support is not enabled:

    #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
    if (check_numa()) {
             oldmask = numa_allocate_nodemask();
             prepare_numa(&oldpolicy, oldmask, socket);
             have_numa = true;
         }
    #endif
 2. The DPDK application could be built with
    CONFIG_RTE_EAL_NUMA_AWARE_HUGE_PAGES and then the binary run on
    different systems with and without NUMA support.

Regards,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 14:53       ` Nick Connolly
@ 2020-08-05 15:13         ` Nicolas Dichtel
  2020-08-05 15:21           ` Nick Connolly
  0 siblings, 1 reply; 21+ messages in thread
From: Nicolas Dichtel @ 2020-08-05 15:13 UTC (permalink / raw)
  To: Nick Connolly, Anatoly Burakov; +Cc: dev, stable

Le 05/08/2020 à 16:53, Nick Connolly a écrit :
[snip]
>>>>> +    if (check_numa()) {
>>>>> +        ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>>>> +                    MPOL_F_NODE | MPOL_F_ADDR);
>>>>> +        if (ret < 0) {
>>>>> +            RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>>>> +                __func__, strerror(errno));
>>>>> +            goto mapped;
>>>>> +        } else if (cur_socket_id != socket_id) {
>>>>> +            RTE_LOG(DEBUG, EAL,
>>>>> +                    "%s(): allocation happened on wrong socket (wanted %d,
>>>>> got %d)\n",
>>>>> +                __func__, socket_id, cur_socket_id);
>>>>> +            goto mapped;
>>>>> +        }
>>>>> +    } else {
>>>>> +        if (rte_socket_count() > 1)
>>>>> +            RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation
>>>>> (wanted %d)\n",
>>>>> +                    __func__, socket_id);
>>>> nit: maybe an higher log level like WARNING?
>>> Open to guidance here - my concern was that this is going to be generated for
>>> every call to alloc_seg() and I'm not sure what the frequency will be - I'm
>>> cautious about flooding the log with warnings under 'normal running'.  Are the
>>> implications of running on a multi socket system with NUMA support disabled in
>>> the kernel purely performance related for the DPDK or is there a functional
>>> correctness issue as well?
>> Is it really a 'normal running' to have CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
>> dpdk and not CONFIG_NUMA in the kernel?
> 
> I'm not an expert of DPDK, but I think it needs to be treated as 'normal
> running', for the following reasons:
> 
> 1. The existing code in eal_memalloc_alloc_seg_bulk() is designed to
>    work even if check_numa() indicates that NUMA support is not enabled:
> 
>    #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
>    if (check_numa()) {
>             oldmask = numa_allocate_nodemask();
>             prepare_numa(&oldpolicy, oldmask, socket);
>             have_numa = true;
>         }
>    #endif
The question was not to return an error, but to display a warning. So the code
will work (after your patch), no problem.

> 2. The DPDK application could be built with
>    CONFIG_RTE_EAL_NUMA_AWARE_HUGE_PAGES and then the binary run on
>    different systems with and without NUMA support.
In a production environment, it seems odd to have a custom kernel and a generic
dpdk app, it's why I propose the log level WARNING (or NOTICE maybe?).
I let other comment about this, I don't have a strong opinion.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 15:13         ` Nicolas Dichtel
@ 2020-08-05 15:21           ` Nick Connolly
  2020-09-17 11:28             ` Burakov, Anatoly
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Connolly @ 2020-08-05 15:21 UTC (permalink / raw)
  To: nicolas.dichtel, Anatoly Burakov; +Cc: dev, stable



On 05/08/2020 16:13, Nicolas Dichtel wrote:
> Le 05/08/2020 à 16:53, Nick Connolly a écrit :
> [snip]
>>>>>> +    if (check_numa()) {
>>>>>> +        ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>>>>> +                    MPOL_F_NODE | MPOL_F_ADDR);
>>>>>> +        if (ret < 0) {
>>>>>> +            RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>>>>> +                __func__, strerror(errno));
>>>>>> +            goto mapped;
>>>>>> +        } else if (cur_socket_id != socket_id) {
>>>>>> +            RTE_LOG(DEBUG, EAL,
>>>>>> +                    "%s(): allocation happened on wrong socket (wanted %d,
>>>>>> got %d)\n",
>>>>>> +                __func__, socket_id, cur_socket_id);
>>>>>> +            goto mapped;
>>>>>> +        }
>>>>>> +    } else {
>>>>>> +        if (rte_socket_count() > 1)
>>>>>> +            RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation
>>>>>> (wanted %d)\n",
>>>>>> +                    __func__, socket_id);
>>>>> nit: maybe an higher log level like WARNING?
>>>> Open to guidance here - my concern was that this is going to be generated for
>>>> every call to alloc_seg() and I'm not sure what the frequency will be - I'm
>>>> cautious about flooding the log with warnings under 'normal running'.  Are the
>>>> implications of running on a multi socket system with NUMA support disabled in
>>>> the kernel purely performance related for the DPDK or is there a functional
>>>> correctness issue as well?
>>> Is it really a 'normal running' to have CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
>>> dpdk and not CONFIG_NUMA in the kernel?
>> I'm not an expert of DPDK, but I think it needs to be treated as 'normal
>> running', for the following reasons:
>>
>> 1. The existing code in eal_memalloc_alloc_seg_bulk() is designed to
>>     work even if check_numa() indicates that NUMA support is not enabled:
>>
>>     #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
>>     if (check_numa()) {
>>              oldmask = numa_allocate_nodemask();
>>              prepare_numa(&oldpolicy, oldmask, socket);
>>              have_numa = true;
>>          }
>>     #endif
> The question was not to return an error, but to display a warning. So the code
> will work (after your patch), no problem.
>
>> 2. The DPDK application could be built with
>>     CONFIG_RTE_EAL_NUMA_AWARE_HUGE_PAGES and then the binary run on
>>     different systems with and without NUMA support.
> In a production environment, it seems odd to have a custom kernel and a generic
> dpdk app, it's why I propose the log level WARNING (or NOTICE maybe?).
> I let other comment about this, I don't have a strong opinion.
Thanks - appreciate the input - I also have no strong opinions here and 
am happy to be guided.

Regards,
Nick

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 15:21           ` Nick Connolly
@ 2020-09-17 11:28             ` Burakov, Anatoly
  0 siblings, 0 replies; 21+ messages in thread
From: Burakov, Anatoly @ 2020-09-17 11:28 UTC (permalink / raw)
  To: Nick Connolly, nicolas.dichtel; +Cc: dev, stable

On 05-Aug-20 4:21 PM, Nick Connolly wrote:
> 
> 
> On 05/08/2020 16:13, Nicolas Dichtel wrote:
>> Le 05/08/2020 à 16:53, Nick Connolly a écrit :
>> [snip]
>>>>>>> +    if (check_numa()) {
>>>>>>> +        ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>>>>>>> +                    MPOL_F_NODE | MPOL_F_ADDR);
>>>>>>> +        if (ret < 0) {
>>>>>>> +            RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>>>>>>> +                __func__, strerror(errno));
>>>>>>> +            goto mapped;
>>>>>>> +        } else if (cur_socket_id != socket_id) {
>>>>>>> +            RTE_LOG(DEBUG, EAL,
>>>>>>> +                    "%s(): allocation happened on wrong socket 
>>>>>>> (wanted %d,
>>>>>>> got %d)\n",
>>>>>>> +                __func__, socket_id, cur_socket_id);
>>>>>>> +            goto mapped;
>>>>>>> +        }
>>>>>>> +    } else {
>>>>>>> +        if (rte_socket_count() > 1)
>>>>>>> +            RTE_LOG(DEBUG, EAL, "%s(): not checking socket for 
>>>>>>> allocation
>>>>>>> (wanted %d)\n",
>>>>>>> +                    __func__, socket_id);
>>>>>> nit: maybe an higher log level like WARNING?
>>>>> Open to guidance here - my concern was that this is going to be 
>>>>> generated for
>>>>> every call to alloc_seg() and I'm not sure what the frequency will 
>>>>> be - I'm
>>>>> cautious about flooding the log with warnings under 'normal 
>>>>> running'.  Are the
>>>>> implications of running on a multi socket system with NUMA support 
>>>>> disabled in
>>>>> the kernel purely performance related for the DPDK or is there a 
>>>>> functional
>>>>> correctness issue as well?
>>>> Is it really a 'normal running' to have 
>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES in
>>>> dpdk and not CONFIG_NUMA in the kernel?
>>> I'm not an expert of DPDK, but I think it needs to be treated as 'normal
>>> running', for the following reasons:
>>>
>>> 1. The existing code in eal_memalloc_alloc_seg_bulk() is designed to
>>>     work even if check_numa() indicates that NUMA support is not 
>>> enabled:
>>>
>>>     #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
>>>     if (check_numa()) {
>>>              oldmask = numa_allocate_nodemask();
>>>              prepare_numa(&oldpolicy, oldmask, socket);
>>>              have_numa = true;
>>>          }
>>>     #endif
>> The question was not to return an error, but to display a warning. So 
>> the code
>> will work (after your patch), no problem.
>>
>>> 2. The DPDK application could be built with
>>>     CONFIG_RTE_EAL_NUMA_AWARE_HUGE_PAGES and then the binary run on
>>>     different systems with and without NUMA support.
>> In a production environment, it seems odd to have a custom kernel and 
>> a generic
>> dpdk app, it's why I propose the log level WARNING (or NOTICE maybe?).
>> I let other comment about this, I don't have a strong opinion.
> Thanks - appreciate the input - I also have no strong opinions here and 
> am happy to be guided.

If there is a socket mismatch, wouldn't allocation fail anyway, which 
would result in an error message? Usually, when things fail, the first 
obvious thing to do is turn up the debug log. I'm inclined to leave it 
as DEBUG.

> 
> Regards,
> Nick


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 12:26 [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel Nick Connolly
  2020-08-05 13:42 ` Nicolas Dichtel
@ 2020-09-17 11:31 ` Burakov, Anatoly
  2020-09-17 12:29   ` Nick Connolly
  2020-10-12 19:28 ` [dpdk-stable] [PATCH v2] " Nick Connolly
  2 siblings, 1 reply; 21+ messages in thread
From: Burakov, Anatoly @ 2020-09-17 11:31 UTC (permalink / raw)
  To: Nick Connolly; +Cc: dev, nicolas.dichtel, stable

On 05-Aug-20 1:26 PM, Nick Connolly wrote:
> Running dpdk-helloworld on Linux with lib numa present,
> but no kernel support for NUMA (CONFIG_NUMA=n) causes
> ret_service_init() to fail with EAL: error allocating
> rte services array.
> 
> alloc_seg() calls get_mempolicy to verify that the allocation
> has happened on the correct socket, but receives ENOSYS from
> the kernel and fails the allocation.
> 
> The allocated socket should only be verified if check_numa() is true.
> 
> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
> Cc: nicolas.dichtel@6wind.com
> Cc: stable@dpdk.org
> Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
> ---
>   lib/librte_eal/linux/eal_memalloc.c | 28 +++++++++++++++++-----------
>   1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/librte_eal/linux/eal_memalloc.c b/lib/librte_eal/linux/eal_memalloc.c
> index db60e7997..179757809 100644
> --- a/lib/librte_eal/linux/eal_memalloc.c
> +++ b/lib/librte_eal/linux/eal_memalloc.c
> @@ -610,17 +610,23 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
>   	}
>   
>   #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
> -	ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
> -			    MPOL_F_NODE | MPOL_F_ADDR);
> -	if (ret < 0) {
> -		RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
> -			__func__, strerror(errno));
> -		goto mapped;
> -	} else if (cur_socket_id != socket_id) {
> -		RTE_LOG(DEBUG, EAL,
> -				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
> -			__func__, socket_id, cur_socket_id);
> -		goto mapped;
> +	if (check_numa()) {
> +		ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
> +					MPOL_F_NODE | MPOL_F_ADDR);
> +		if (ret < 0) {
> +			RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
> +				__func__, strerror(errno));
> +			goto mapped;
> +		} else if (cur_socket_id != socket_id) {
> +			RTE_LOG(DEBUG, EAL,
> +					"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
> +				__func__, socket_id, cur_socket_id);
> +			goto mapped;
> +		}
> +	} else {
> +		if (rte_socket_count() > 1)
> +			RTE_LOG(DEBUG, EAL, "%s(): not checking socket for allocation (wanted %d)\n",
> +					__func__, socket_id);
>   	}

If there is no kernel support for NUMA, how would we end up with >1 
socket count?

>   #else
>   	if (rte_socket_count() > 1)
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-09-17 11:31 ` Burakov, Anatoly
@ 2020-09-17 12:29   ` Nick Connolly
  2020-09-17 12:57     ` Burakov, Anatoly
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Connolly @ 2020-09-17 12:29 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, nicolas.dichtel, stable

Hi Anatoly,

Thanks for the response.  You are asking a good question - here's what I 
know:

The issue arose on a single socket system, running WSL2 (full Linux 
kernel running as a lightweight VM under Windows).
The default kernel in this environment is built with CONFIG_NUMA=n which 
means get_mempolicy() returns an error.
This causes the check to ensure that the allocated memory is associated 
with the correct socket to fail.

The change is to skip the allocation check if check_numa() indicates 
that NUMA-aware memory is not supported.

Researching the meaning of CONFIG_NUMA, I found 
https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
> Enable NUMA (Non-Uniform Memory Access) support.
> The kernel will try to allocate memory used by a CPU on the local 
> memory controller of the CPU and add some more NUMA awareness to the 
> kernel.

Clearly CONFIG_NUMA enables memory awareness, but there's no indication 
in the description whether information about the NUMA physical 
architecture is 'hidden', or whether it is still exposed through 
/sys/devices/system/node* (which is used by the rte initialisation code 
to determine how many sockets there are). Unfortunately, I don't have 
ready access to a multi-socket Linux system that I can test this out on, 
so I took the conservative approach that it may be possible to have 
CONFIG_NUMA disabled, but the kernel still report more than one node, 
and coded the change to generate a debug message if this occurs.

Do you know whether CONFIG_NUMA turns off all knowledge about the 
hardware architecture?  If it does, then I agree that the test for 
rte_socket_count() serves no purpose and should be removed.

Many thanks,
Nick


On 17/09/2020 12:31, Burakov, Anatoly wrote:
> On 05-Aug-20 1:26 PM, Nick Connolly wrote:
>> Running dpdk-helloworld on Linux with lib numa present,
>> but no kernel support for NUMA (CONFIG_NUMA=n) causes
>> ret_service_init() to fail with EAL: error allocating
>> rte services array.
>>
>> alloc_seg() calls get_mempolicy to verify that the allocation
>> has happened on the correct socket, but receives ENOSYS from
>> the kernel and fails the allocation.
>>
>> The allocated socket should only be verified if check_numa() is true.
>>
>> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
>> Cc: nicolas.dichtel@6wind.com
>> Cc: stable@dpdk.org
>> Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
>> ---
>>   lib/librte_eal/linux/eal_memalloc.c | 28 +++++++++++++++++-----------
>>   1 file changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/lib/librte_eal/linux/eal_memalloc.c 
>> b/lib/librte_eal/linux/eal_memalloc.c
>> index db60e7997..179757809 100644
>> --- a/lib/librte_eal/linux/eal_memalloc.c
>> +++ b/lib/librte_eal/linux/eal_memalloc.c
>> @@ -610,17 +610,23 @@ alloc_seg(struct rte_memseg *ms, void *addr, 
>> int socket_id,
>>       }
>>     #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
>> -    ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>> -                MPOL_F_NODE | MPOL_F_ADDR);
>> -    if (ret < 0) {
>> -        RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>> -            __func__, strerror(errno));
>> -        goto mapped;
>> -    } else if (cur_socket_id != socket_id) {
>> -        RTE_LOG(DEBUG, EAL,
>> -                "%s(): allocation happened on wrong socket (wanted 
>> %d, got %d)\n",
>> -            __func__, socket_id, cur_socket_id);
>> -        goto mapped;
>> +    if (check_numa()) {
>> +        ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
>> +                    MPOL_F_NODE | MPOL_F_ADDR);
>> +        if (ret < 0) {
>> +            RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
>> +                __func__, strerror(errno));
>> +            goto mapped;
>> +        } else if (cur_socket_id != socket_id) {
>> +            RTE_LOG(DEBUG, EAL,
>> +                    "%s(): allocation happened on wrong socket 
>> (wanted %d, got %d)\n",
>> +                __func__, socket_id, cur_socket_id);
>> +            goto mapped;
>> +        }
>> +    } else {
>> +        if (rte_socket_count() > 1)
>> +            RTE_LOG(DEBUG, EAL, "%s(): not checking socket for 
>> allocation (wanted %d)\n",
>> +                    __func__, socket_id);
>>       }
>
> If there is no kernel support for NUMA, how would we end up with >1 
> socket count?
>
>>   #else
>>       if (rte_socket_count() > 1)
>>
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-09-17 12:29   ` Nick Connolly
@ 2020-09-17 12:57     ` Burakov, Anatoly
  2020-09-17 13:05       ` Nick Connolly
  0 siblings, 1 reply; 21+ messages in thread
From: Burakov, Anatoly @ 2020-09-17 12:57 UTC (permalink / raw)
  To: Nick Connolly; +Cc: dev, nicolas.dichtel, stable

On 17-Sep-20 1:29 PM, Nick Connolly wrote:
> Hi Anatoly,
> 
> Thanks for the response.  You are asking a good question - here's what I 
> know:
> 
> The issue arose on a single socket system, running WSL2 (full Linux 
> kernel running as a lightweight VM under Windows).
> The default kernel in this environment is built with CONFIG_NUMA=n which 
> means get_mempolicy() returns an error.
> This causes the check to ensure that the allocated memory is associated 
> with the correct socket to fail.
> 
> The change is to skip the allocation check if check_numa() indicates 
> that NUMA-aware memory is not supported.
> 
> Researching the meaning of CONFIG_NUMA, I found 
> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>> Enable NUMA (Non-Uniform Memory Access) support.
>> The kernel will try to allocate memory used by a CPU on the local 
>> memory controller of the CPU and add some more NUMA awareness to the 
>> kernel.
> 
> Clearly CONFIG_NUMA enables memory awareness, but there's no indication 
> in the description whether information about the NUMA physical 
> architecture is 'hidden', or whether it is still exposed through 
> /sys/devices/system/node* (which is used by the rte initialisation code 
> to determine how many sockets there are). Unfortunately, I don't have 
> ready access to a multi-socket Linux system that I can test this out on, 
> so I took the conservative approach that it may be possible to have 
> CONFIG_NUMA disabled, but the kernel still report more than one node, 
> and coded the change to generate a debug message if this occurs.
> 
> Do you know whether CONFIG_NUMA turns off all knowledge about the 
> hardware architecture?  If it does, then I agree that the test for 
> rte_socket_count() serves no purpose and should be removed.
> 

I have a system with a custom compiled kernel, i can recompile it 
without this flag and test this. I'll report back with results :)

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-09-17 12:57     ` Burakov, Anatoly
@ 2020-09-17 13:05       ` Nick Connolly
  2020-09-17 14:07         ` Burakov, Anatoly
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Connolly @ 2020-09-17 13:05 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, nicolas.dichtel, stable

Hi Anatoly,

Thanks.  My recollection is that all of the NUMA configuration flags 
were set to 'n'.

Regards,
Nick

On 17/09/2020 13:57, Burakov, Anatoly wrote:
> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>> Hi Anatoly,
>>
>> Thanks for the response.  You are asking a good question - here's 
>> what I know:
>>
>> The issue arose on a single socket system, running WSL2 (full Linux 
>> kernel running as a lightweight VM under Windows).
>> The default kernel in this environment is built with CONFIG_NUMA=n 
>> which means get_mempolicy() returns an error.
>> This causes the check to ensure that the allocated memory is 
>> associated with the correct socket to fail.
>>
>> The change is to skip the allocation check if check_numa() indicates 
>> that NUMA-aware memory is not supported.
>>
>> Researching the meaning of CONFIG_NUMA, I found 
>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>> Enable NUMA (Non-Uniform Memory Access) support.
>>> The kernel will try to allocate memory used by a CPU on the local 
>>> memory controller of the CPU and add some more NUMA awareness to the 
>>> kernel.
>>
>> Clearly CONFIG_NUMA enables memory awareness, but there's no 
>> indication in the description whether information about the NUMA 
>> physical architecture is 'hidden', or whether it is still exposed 
>> through /sys/devices/system/node* (which is used by the rte 
>> initialisation code to determine how many sockets there are). 
>> Unfortunately, I don't have ready access to a multi-socket Linux 
>> system that I can test this out on, so I took the conservative 
>> approach that it may be possible to have CONFIG_NUMA disabled, but 
>> the kernel still report more than one node, and coded the change to 
>> generate a debug message if this occurs.
>>
>> Do you know whether CONFIG_NUMA turns off all knowledge about the 
>> hardware architecture?  If it does, then I agree that the test for 
>> rte_socket_count() serves no purpose and should be removed.
>>
>
> I have a system with a custom compiled kernel, i can recompile it 
> without this flag and test this. I'll report back with results :)
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-09-17 13:05       ` Nick Connolly
@ 2020-09-17 14:07         ` Burakov, Anatoly
  2020-09-17 14:08           ` Nick Connolly
  0 siblings, 1 reply; 21+ messages in thread
From: Burakov, Anatoly @ 2020-09-17 14:07 UTC (permalink / raw)
  To: Nick Connolly; +Cc: dev, nicolas.dichtel, stable

On 17-Sep-20 2:05 PM, Nick Connolly wrote:
> Hi Anatoly,
> 
> Thanks.  My recollection is that all of the NUMA configuration flags 
> were set to 'n'.
> 
> Regards,
> Nick
> 
> On 17/09/2020 13:57, Burakov, Anatoly wrote:
>> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>>> Hi Anatoly,
>>>
>>> Thanks for the response.  You are asking a good question - here's 
>>> what I know:
>>>
>>> The issue arose on a single socket system, running WSL2 (full Linux 
>>> kernel running as a lightweight VM under Windows).
>>> The default kernel in this environment is built with CONFIG_NUMA=n 
>>> which means get_mempolicy() returns an error.
>>> This causes the check to ensure that the allocated memory is 
>>> associated with the correct socket to fail.
>>>
>>> The change is to skip the allocation check if check_numa() indicates 
>>> that NUMA-aware memory is not supported.
>>>
>>> Researching the meaning of CONFIG_NUMA, I found 
>>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>>> Enable NUMA (Non-Uniform Memory Access) support.
>>>> The kernel will try to allocate memory used by a CPU on the local 
>>>> memory controller of the CPU and add some more NUMA awareness to the 
>>>> kernel.
>>>
>>> Clearly CONFIG_NUMA enables memory awareness, but there's no 
>>> indication in the description whether information about the NUMA 
>>> physical architecture is 'hidden', or whether it is still exposed 
>>> through /sys/devices/system/node* (which is used by the rte 
>>> initialisation code to determine how many sockets there are). 
>>> Unfortunately, I don't have ready access to a multi-socket Linux 
>>> system that I can test this out on, so I took the conservative 
>>> approach that it may be possible to have CONFIG_NUMA disabled, but 
>>> the kernel still report more than one node, and coded the change to 
>>> generate a debug message if this occurs.
>>>
>>> Do you know whether CONFIG_NUMA turns off all knowledge about the 
>>> hardware architecture?  If it does, then I agree that the test for 
>>> rte_socket_count() serves no purpose and should be removed.
>>>
>>
>> I have a system with a custom compiled kernel, i can recompile it 
>> without this flag and test this. I'll report back with results :)
>>
> 

With CONFIG_NUMA set to 'n':

[root@xxx ~]# find /sys -name "node*"
/sys/kernel/software_nodes/node0
[root@xxx ~]#

This is confirmed by running DPDK on that machine - i can see all cores 
from all sockets, but they're all appearing on socket 0. So, yes, that 
check isn't necessary :)

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-09-17 14:07         ` Burakov, Anatoly
@ 2020-09-17 14:08           ` Nick Connolly
  2020-09-17 14:18             ` Burakov, Anatoly
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Connolly @ 2020-09-17 14:08 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, nicolas.dichtel, stable

Excellent - thanks - I'll amend the patch.

On 17/09/2020 15:07, Burakov, Anatoly wrote:
> On 17-Sep-20 2:05 PM, Nick Connolly wrote:
>> Hi Anatoly,
>>
>> Thanks.  My recollection is that all of the NUMA configuration flags 
>> were set to 'n'.
>>
>> Regards,
>> Nick
>>
>> On 17/09/2020 13:57, Burakov, Anatoly wrote:
>>> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>>>> Hi Anatoly,
>>>>
>>>> Thanks for the response.  You are asking a good question - here's 
>>>> what I know:
>>>>
>>>> The issue arose on a single socket system, running WSL2 (full Linux 
>>>> kernel running as a lightweight VM under Windows).
>>>> The default kernel in this environment is built with CONFIG_NUMA=n 
>>>> which means get_mempolicy() returns an error.
>>>> This causes the check to ensure that the allocated memory is 
>>>> associated with the correct socket to fail.
>>>>
>>>> The change is to skip the allocation check if check_numa() 
>>>> indicates that NUMA-aware memory is not supported.
>>>>
>>>> Researching the meaning of CONFIG_NUMA, I found 
>>>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>>>> Enable NUMA (Non-Uniform Memory Access) support.
>>>>> The kernel will try to allocate memory used by a CPU on the local 
>>>>> memory controller of the CPU and add some more NUMA awareness to 
>>>>> the kernel.
>>>>
>>>> Clearly CONFIG_NUMA enables memory awareness, but there's no 
>>>> indication in the description whether information about the NUMA 
>>>> physical architecture is 'hidden', or whether it is still exposed 
>>>> through /sys/devices/system/node* (which is used by the rte 
>>>> initialisation code to determine how many sockets there are). 
>>>> Unfortunately, I don't have ready access to a multi-socket Linux 
>>>> system that I can test this out on, so I took the conservative 
>>>> approach that it may be possible to have CONFIG_NUMA disabled, but 
>>>> the kernel still report more than one node, and coded the change to 
>>>> generate a debug message if this occurs.
>>>>
>>>> Do you know whether CONFIG_NUMA turns off all knowledge about the 
>>>> hardware architecture?  If it does, then I agree that the test for 
>>>> rte_socket_count() serves no purpose and should be removed.
>>>>
>>>
>>> I have a system with a custom compiled kernel, i can recompile it 
>>> without this flag and test this. I'll report back with results :)
>>>
>>
>
> With CONFIG_NUMA set to 'n':
>
> [root@xxx ~]# find /sys -name "node*"
> /sys/kernel/software_nodes/node0
> [root@xxx ~]#
>
> This is confirmed by running DPDK on that machine - i can see all 
> cores from all sockets, but they're all appearing on socket 0. So, 
> yes, that check isn't necessary :)
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-09-17 14:08           ` Nick Connolly
@ 2020-09-17 14:18             ` Burakov, Anatoly
  2020-09-17 14:19               ` Nick Connolly
  0 siblings, 1 reply; 21+ messages in thread
From: Burakov, Anatoly @ 2020-09-17 14:18 UTC (permalink / raw)
  To: Nick Connolly; +Cc: dev, nicolas.dichtel, stable

On 17-Sep-20 3:08 PM, Nick Connolly wrote:
> Excellent - thanks - I'll amend the patch.
> 
> On 17/09/2020 15:07, Burakov, Anatoly wrote:
>> On 17-Sep-20 2:05 PM, Nick Connolly wrote:
>>> Hi Anatoly,
>>>
>>> Thanks.  My recollection is that all of the NUMA configuration flags 
>>> were set to 'n'.
>>>
>>> Regards,
>>> Nick
>>>
>>> On 17/09/2020 13:57, Burakov, Anatoly wrote:
>>>> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>>>>> Hi Anatoly,
>>>>>
>>>>> Thanks for the response.  You are asking a good question - here's 
>>>>> what I know:
>>>>>
>>>>> The issue arose on a single socket system, running WSL2 (full Linux 
>>>>> kernel running as a lightweight VM under Windows).
>>>>> The default kernel in this environment is built with CONFIG_NUMA=n 
>>>>> which means get_mempolicy() returns an error.
>>>>> This causes the check to ensure that the allocated memory is 
>>>>> associated with the correct socket to fail.
>>>>>
>>>>> The change is to skip the allocation check if check_numa() 
>>>>> indicates that NUMA-aware memory is not supported.
>>>>>
>>>>> Researching the meaning of CONFIG_NUMA, I found 
>>>>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>>>>> Enable NUMA (Non-Uniform Memory Access) support.
>>>>>> The kernel will try to allocate memory used by a CPU on the local 
>>>>>> memory controller of the CPU and add some more NUMA awareness to 
>>>>>> the kernel.
>>>>>
>>>>> Clearly CONFIG_NUMA enables memory awareness, but there's no 
>>>>> indication in the description whether information about the NUMA 
>>>>> physical architecture is 'hidden', or whether it is still exposed 
>>>>> through /sys/devices/system/node* (which is used by the rte 
>>>>> initialisation code to determine how many sockets there are). 
>>>>> Unfortunately, I don't have ready access to a multi-socket Linux 
>>>>> system that I can test this out on, so I took the conservative 
>>>>> approach that it may be possible to have CONFIG_NUMA disabled, but 
>>>>> the kernel still report more than one node, and coded the change to 
>>>>> generate a debug message if this occurs.
>>>>>
>>>>> Do you know whether CONFIG_NUMA turns off all knowledge about the 
>>>>> hardware architecture?  If it does, then I agree that the test for 
>>>>> rte_socket_count() serves no purpose and should be removed.
>>>>>
>>>>
>>>> I have a system with a custom compiled kernel, i can recompile it 
>>>> without this flag and test this. I'll report back with results :)
>>>>
>>>
>>
>> With CONFIG_NUMA set to 'n':
>>
>> [root@xxx ~]# find /sys -name "node*"
>> /sys/kernel/software_nodes/node0
>> [root@xxx ~]#
>>
>> This is confirmed by running DPDK on that machine - i can see all 
>> cores from all sockets, but they're all appearing on socket 0. So, 
>> yes, that check isn't necessary :)
>>
> 

I would also add a comment explaining why we're checking for NUMA 
support when NUMA support is defined at compiled time.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel
  2020-09-17 14:18             ` Burakov, Anatoly
@ 2020-09-17 14:19               ` Nick Connolly
  0 siblings, 0 replies; 21+ messages in thread
From: Nick Connolly @ 2020-09-17 14:19 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, nicolas.dichtel, stable

Sure.

On 17/09/2020 15:18, Burakov, Anatoly wrote:
> On 17-Sep-20 3:08 PM, Nick Connolly wrote:
>> Excellent - thanks - I'll amend the patch.
>>
>> On 17/09/2020 15:07, Burakov, Anatoly wrote:
>>> On 17-Sep-20 2:05 PM, Nick Connolly wrote:
>>>> Hi Anatoly,
>>>>
>>>> Thanks.  My recollection is that all of the NUMA configuration 
>>>> flags were set to 'n'.
>>>>
>>>> Regards,
>>>> Nick
>>>>
>>>> On 17/09/2020 13:57, Burakov, Anatoly wrote:
>>>>> On 17-Sep-20 1:29 PM, Nick Connolly wrote:
>>>>>> Hi Anatoly,
>>>>>>
>>>>>> Thanks for the response.  You are asking a good question - here's 
>>>>>> what I know:
>>>>>>
>>>>>> The issue arose on a single socket system, running WSL2 (full 
>>>>>> Linux kernel running as a lightweight VM under Windows).
>>>>>> The default kernel in this environment is built with 
>>>>>> CONFIG_NUMA=n which means get_mempolicy() returns an error.
>>>>>> This causes the check to ensure that the allocated memory is 
>>>>>> associated with the correct socket to fail.
>>>>>>
>>>>>> The change is to skip the allocation check if check_numa() 
>>>>>> indicates that NUMA-aware memory is not supported.
>>>>>>
>>>>>> Researching the meaning of CONFIG_NUMA, I found 
>>>>>> https://cateee.net/lkddb/web-lkddb/NUMA.html which says:
>>>>>>> Enable NUMA (Non-Uniform Memory Access) support.
>>>>>>> The kernel will try to allocate memory used by a CPU on the 
>>>>>>> local memory controller of the CPU and add some more NUMA 
>>>>>>> awareness to the kernel.
>>>>>>
>>>>>> Clearly CONFIG_NUMA enables memory awareness, but there's no 
>>>>>> indication in the description whether information about the NUMA 
>>>>>> physical architecture is 'hidden', or whether it is still exposed 
>>>>>> through /sys/devices/system/node* (which is used by the rte 
>>>>>> initialisation code to determine how many sockets there are). 
>>>>>> Unfortunately, I don't have ready access to a multi-socket Linux 
>>>>>> system that I can test this out on, so I took the conservative 
>>>>>> approach that it may be possible to have CONFIG_NUMA disabled, 
>>>>>> but the kernel still report more than one node, and coded the 
>>>>>> change to generate a debug message if this occurs.
>>>>>>
>>>>>> Do you know whether CONFIG_NUMA turns off all knowledge about the 
>>>>>> hardware architecture?  If it does, then I agree that the test 
>>>>>> for rte_socket_count() serves no purpose and should be removed.
>>>>>>
>>>>>
>>>>> I have a system with a custom compiled kernel, i can recompile it 
>>>>> without this flag and test this. I'll report back with results :)
>>>>>
>>>>
>>>
>>> With CONFIG_NUMA set to 'n':
>>>
>>> [root@xxx ~]# find /sys -name "node*"
>>> /sys/kernel/software_nodes/node0
>>> [root@xxx ~]#
>>>
>>> This is confirmed by running DPDK on that machine - i can see all 
>>> cores from all sockets, but they're all appearing on socket 0. So, 
>>> yes, that check isn't necessary :)
>>>
>>
>
> I would also add a comment explaining why we're checking for NUMA 
> support when NUMA support is defined at compiled time.
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-stable] [PATCH v2] mem: fix allocation failure on non-NUMA kernel
  2020-08-05 12:26 [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel Nick Connolly
  2020-08-05 13:42 ` Nicolas Dichtel
  2020-09-17 11:31 ` Burakov, Anatoly
@ 2020-10-12 19:28 ` " Nick Connolly
  2020-10-13  7:59   ` Nicolas Dichtel
  2 siblings, 1 reply; 21+ messages in thread
From: Nick Connolly @ 2020-10-12 19:28 UTC (permalink / raw)
  To: Anatoly Burakov, Didier Pallard, Olivier Matz, Nicolas Dichtel,
	David Marchand
  Cc: dev, Nick Connolly, stable

Running dpdk-helloworld on Linux with lib numa present,
but no kernel support for NUMA (CONFIG_NUMA=n) causes
ret_service_init() to fail with EAL: error allocating
rte services array.

alloc_seg() calls get_mempolicy to verify that the allocation
has happened on the correct socket, but receives ENOSYS from
the kernel and fails the allocation.

The allocated socket should only be verified if check_numa() is true.

Fixes: 2a96c88be83e ("mem: ease init in a docker container")
Cc: nicolas.dichtel@6wind.com
Cc: stable@dpdk.org

Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
---
v2:
* Remove unnecessary debug message and add explanatory comment

 lib/librte_eal/linux/eal_memalloc.c | 30 ++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/linux/eal_memalloc.c b/lib/librte_eal/linux/eal_memalloc.c
index 40a5c4aa1..6dc1b2bae 100644
--- a/lib/librte_eal/linux/eal_memalloc.c
+++ b/lib/librte_eal/linux/eal_memalloc.c
@@ -625,17 +625,25 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	}
 
 #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
-	ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
-			    MPOL_F_NODE | MPOL_F_ADDR);
-	if (ret < 0) {
-		RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
-			__func__, strerror(errno));
-		goto mapped;
-	} else if (cur_socket_id != socket_id) {
-		RTE_LOG(DEBUG, EAL,
-				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
-			__func__, socket_id, cur_socket_id);
-		goto mapped;
+	/*
+	 * If the kernel has been built without NUMA support, get_mempolicy()
+	 * will return an error. If check_numa() returns false, memory
+	 * allocation is not NUMA aware and the socket_id should not be
+	 * checked.
+	 */
+	if (check_numa()) {
+		ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
+					MPOL_F_NODE | MPOL_F_ADDR);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
+				__func__, strerror(errno));
+			goto mapped;
+		} else if (cur_socket_id != socket_id) {
+			RTE_LOG(DEBUG, EAL,
+					"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+				__func__, socket_id, cur_socket_id);
+			goto mapped;
+		}
 	}
 #else
 	if (rte_socket_count() > 1)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH v2] mem: fix allocation failure on non-NUMA kernel
  2020-10-12 19:28 ` [dpdk-stable] [PATCH v2] " Nick Connolly
@ 2020-10-13  7:59   ` Nicolas Dichtel
  2020-10-13 12:01     ` David Marchand
  0 siblings, 1 reply; 21+ messages in thread
From: Nicolas Dichtel @ 2020-10-13  7:59 UTC (permalink / raw)
  To: Nick Connolly, Anatoly Burakov, Didier Pallard, Olivier Matz,
	David Marchand
  Cc: dev, stable

Le 12/10/2020 à 21:28, Nick Connolly a écrit :
> Running dpdk-helloworld on Linux with lib numa present,
> but no kernel support for NUMA (CONFIG_NUMA=n) causes
> ret_service_init() to fail with EAL: error allocating
> rte services array.
> 
> alloc_seg() calls get_mempolicy to verify that the allocation
> has happened on the correct socket, but receives ENOSYS from
> the kernel and fails the allocation.
> 
> The allocated socket should only be verified if check_numa() is true.
> 
> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
> Cc: nicolas.dichtel@6wind.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH v2] mem: fix allocation failure on non-NUMA kernel
  2020-10-13  7:59   ` Nicolas Dichtel
@ 2020-10-13 12:01     ` David Marchand
  0 siblings, 0 replies; 21+ messages in thread
From: David Marchand @ 2020-10-13 12:01 UTC (permalink / raw)
  To: Nick Connolly
  Cc: Nicolas Dichtel, Anatoly Burakov, Didier Pallard, Olivier Matz,
	dev, dpdk stable

On Tue, Oct 13, 2020 at 10:00 AM Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
>
> Le 12/10/2020 à 21:28, Nick Connolly a écrit :
> > Running dpdk-helloworld on Linux with lib numa present,
> > but no kernel support for NUMA (CONFIG_NUMA=n) causes
> > ret_service_init() to fail with EAL: error allocating
> > rte services array.
> >
> > alloc_seg() calls get_mempolicy to verify that the allocation
> > has happened on the correct socket, but receives ENOSYS from
> > the kernel and fails the allocation.
> >
> > The allocated socket should only be verified if check_numa() is true.
> >
> > Fixes: 2a96c88be83e ("mem: ease init in a docker container")
> > Cc: nicolas.dichtel@6wind.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

Applied, thanks Nick.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-stable] [PATCH v2] mem: fix allocation failure on non-NUMA kernel
  2020-10-12 19:22 Nick Connolly
@ 2020-10-13  9:42 ` Burakov, Anatoly
  0 siblings, 0 replies; 21+ messages in thread
From: Burakov, Anatoly @ 2020-10-13  9:42 UTC (permalink / raw)
  To: Nick Connolly, David Marchand, Olivier Matz, Didier Pallard,
	Nicolas Dichtel
  Cc: dev, stable

On 12-Oct-20 8:22 PM, Nick Connolly wrote:
> Running dpdk-helloworld on Linux with lib numa present,
> but no kernel support for NUMA (CONFIG_NUMA=n) causes
> ret_service_init() to fail with EAL: error allocating
> rte services array.
> 
> alloc_seg() calls get_mempolicy to verify that the allocation
> has happened on the correct socket, but receives ENOSYS from
> the kernel and fails the allocation.
> 
> The allocated socket should only be verified if check_numa() is true.
> 
> Fixes: 2a96c88be83e ("mem: ease init in a docker container")
> Cc: nicolas.dichtel@6wind.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-stable] [PATCH v2] mem: fix allocation failure on non-NUMA kernel
@ 2020-10-12 19:22 Nick Connolly
  2020-10-13  9:42 ` Burakov, Anatoly
  0 siblings, 1 reply; 21+ messages in thread
From: Nick Connolly @ 2020-10-12 19:22 UTC (permalink / raw)
  To: Anatoly Burakov, David Marchand, Olivier Matz, Didier Pallard,
	Nicolas Dichtel
  Cc: dev, Nick Connolly, stable

Running dpdk-helloworld on Linux with lib numa present,
but no kernel support for NUMA (CONFIG_NUMA=n) causes
ret_service_init() to fail with EAL: error allocating
rte services array.

alloc_seg() calls get_mempolicy to verify that the allocation
has happened on the correct socket, but receives ENOSYS from
the kernel and fails the allocation.

The allocated socket should only be verified if check_numa() is true.

Fixes: 2a96c88be83e ("mem: ease init in a docker container")
Cc: nicolas.dichtel@6wind.com
Cc: stable@dpdk.org

Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
---
v2:
* Remove unnecessary debug message and add explanatory comment

 lib/librte_eal/linux/eal_memalloc.c | 30 ++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/linux/eal_memalloc.c b/lib/librte_eal/linux/eal_memalloc.c
index 40a5c4aa1..6dc1b2bae 100644
--- a/lib/librte_eal/linux/eal_memalloc.c
+++ b/lib/librte_eal/linux/eal_memalloc.c
@@ -625,17 +625,25 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	}
 
 #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
-	ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
-			    MPOL_F_NODE | MPOL_F_ADDR);
-	if (ret < 0) {
-		RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
-			__func__, strerror(errno));
-		goto mapped;
-	} else if (cur_socket_id != socket_id) {
-		RTE_LOG(DEBUG, EAL,
-				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
-			__func__, socket_id, cur_socket_id);
-		goto mapped;
+	/*
+	 * If the kernel has been built without NUMA support, get_mempolicy()
+	 * will return an error. If check_numa() returns false, memory
+	 * allocation is not NUMA aware and the socket_id should not be
+	 * checked.
+	 */
+	if (check_numa()) {
+		ret = get_mempolicy(&cur_socket_id, NULL, 0, addr,
+					MPOL_F_NODE | MPOL_F_ADDR);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, EAL, "%s(): get_mempolicy: %s\n",
+				__func__, strerror(errno));
+			goto mapped;
+		} else if (cur_socket_id != socket_id) {
+			RTE_LOG(DEBUG, EAL,
+					"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+				__func__, socket_id, cur_socket_id);
+			goto mapped;
+		}
 	}
 #else
 	if (rte_socket_count() > 1)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, back to index

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-05 12:26 [dpdk-stable] [PATCH] mem: fix allocation failure on non-NUMA kernel Nick Connolly
2020-08-05 13:42 ` Nicolas Dichtel
2020-08-05 14:20   ` Nick Connolly
2020-08-05 14:36     ` Nicolas Dichtel
2020-08-05 14:53       ` Nick Connolly
2020-08-05 15:13         ` Nicolas Dichtel
2020-08-05 15:21           ` Nick Connolly
2020-09-17 11:28             ` Burakov, Anatoly
2020-09-17 11:31 ` Burakov, Anatoly
2020-09-17 12:29   ` Nick Connolly
2020-09-17 12:57     ` Burakov, Anatoly
2020-09-17 13:05       ` Nick Connolly
2020-09-17 14:07         ` Burakov, Anatoly
2020-09-17 14:08           ` Nick Connolly
2020-09-17 14:18             ` Burakov, Anatoly
2020-09-17 14:19               ` Nick Connolly
2020-10-12 19:28 ` [dpdk-stable] [PATCH v2] " Nick Connolly
2020-10-13  7:59   ` Nicolas Dichtel
2020-10-13 12:01     ` David Marchand
2020-10-12 19:22 Nick Connolly
2020-10-13  9:42 ` Burakov, Anatoly

patches for DPDK stable branches

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/stable/0 stable/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 stable stable/ http://inbox.dpdk.org/stable \
		stable@dpdk.org
	public-inbox-index stable


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.stable


AGPL code for this site: git clone https://public-inbox.org/ public-inbox