DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: gowrishankar muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Cc: dev@dpdk.org, Bruce Richardson <bruce.richardson@intel.com>,
	Chao Zhu <chaozhu@linux.vnet.ibm.com>
Subject: Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
Date: Wed, 21 Mar 2018 10:24:44 +0000	[thread overview]
Message-ID: <db9677b5-63f5-0d92-6c2c-4cefe4b7b801@intel.com> (raw)
In-Reply-To: <18deafea-5662-88ef-2ddc-3a1970d67405@linux.vnet.ibm.com>

On 21-Mar-18 4:59 AM, gowrishankar muthukrishnan wrote:
> On Wednesday 07 February 2018 03:28 PM, Anatoly Burakov wrote:
>> During lcore scan, find maximum socket ID and store it. This will
>> break the ABI, so bump ABI version.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>>      v4:
>>      - Remove backwards ABI compatibility, bump ABI instead
>>      v3:
>>      - Added ABI compatibility
>>      v2:
>>      - checkpatch changes
>>      - check socket before deciding if the core is not to be used
>>
>>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>>   lib/librte_eal/common/eal_common_lcore.c  | 37 
>> +++++++++++++++++++++----------
>>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>>   6 files changed, 44 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
>> b/lib/librte_eal/bsdapp/eal/Makefile
>> index dd455e6..ed1d17b 100644
>> --- a/lib/librte_eal/bsdapp/eal/Makefile
>> +++ b/lib/librte_eal/bsdapp/eal/Makefile
>> @@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
>>
>>   EXPORT_MAP := ../../rte_eal_version.map
>>
>> -LIBABIVER := 6
>> +LIBABIVER := 7
>>
>>   # specific to bsdapp exec-env
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
>> diff --git a/lib/librte_eal/common/eal_common_lcore.c 
>> b/lib/librte_eal/common/eal_common_lcore.c
>> index 7724fa4..827ddeb 100644
>> --- a/lib/librte_eal/common/eal_common_lcore.c
>> +++ b/lib/librte_eal/common/eal_common_lcore.c
>> @@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
>>       struct rte_config *config = rte_eal_get_configuration();
>>       unsigned lcore_id;
>>       unsigned count = 0;
>> +    unsigned int socket_id, max_socket_id = 0;
>>
>>       /*
>>        * Parse the maximum set of logical cores, detect the subset of 
>> running
>> @@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
>>           /* init cpuset for per lcore config */
>>           CPU_ZERO(&lcore_config[lcore_id].cpuset);
>>
>> +        /* find socket first */
>> +        socket_id = eal_cpu_socket_id(lcore_id);
>> +        if (socket_id >= RTE_MAX_NUMA_NODES) {
>> +#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
>> +            socket_id = 0;
>> +#else
>> +            RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than 
>> RTE_MAX_NUMA_NODES (%d)\n",
>> +                    socket_id, RTE_MAX_NUMA_NODES);
>> +            return -1;
>> +#endif
>> +        }
>> +        max_socket_id = RTE_MAX(max_socket_id, socket_id);
>> +
>>           /* in 1:1 mapping, record related cpu detected state */
>>           lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
>>           if (lcore_config[lcore_id].detected == 0) {
>> @@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
>>           config->lcore_role[lcore_id] = ROLE_RTE;
>>           lcore_config[lcore_id].core_role = ROLE_RTE;
>>           lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
>> -        lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
>> -        if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
>> -#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
>> -            lcore_config[lcore_id].socket_id = 0;
>> -#else
>> -            RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
>> -                "RTE_MAX_NUMA_NODES (%d)\n",
>> -                lcore_config[lcore_id].socket_id,
>> -                RTE_MAX_NUMA_NODES);
>> -            return -1;
>> -#endif
>> -        }
>> +        lcore_config[lcore_id].socket_id = socket_id;
>>           RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
>>                   "core %u on socket %u\n",
>>                   lcore_id, lcore_config[lcore_id].core_id,
>> @@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
>>           RTE_MAX_LCORE);
>>       RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
>>
>> +    config->numa_node_count = max_socket_id + 1;
> 
> In some IBM servers, socket ID number does not seem to be in sequence. 
> For an instance, 0 and 8 for a 2 node server.
> 
> In this case, numa_node_count would mislead users if wrongly understood 
> by its variable name IMO (see below)
>> +    RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", 
>> config->numa_node_count);
> 
> For an instance, reading above message would tell 'EAL detected 8 nodes' 
> in my server, but actually there are only two nodes.
> 
> Could its name better be 'numa_node_id_max' ?. Also, we store in actual 
> count of numa nodes in _count variable.
> 
> Also, there could be a case when there is no local memory available to a 
> numa node too.
> 
> Thanks,
> Gowrishankar

The point of this patchset is to (pre)allocate memory only on existing 
sockets.

If we don't know how many sockets there are, we are forced to 
preallocate VA space per each *possible* NUMA node - that is, reserve 
e.g. 8x128G of memory, 6 of which will go unused on a 2-socket system. 
We can't know if there is no memory on socket in advance, but we can at 
least avoid preallocating VA space for sockets that don't exist in the 
first place.

How about we store all possible socket id's instead? e.g. something like:

static int numa_node_ids[MAX_NUMA_NODES];
<...>
int rte_eal_cpu_init() {
	int sockets[RTE_MAX_LCORE];
	<...>
	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
		core_to_socket[lcore_id] = socket;
	}
	<...>
	qsort(sockets);
	<...>
	// store all unique sockets in numa_node_ids in ascending order
}
<...>

on a 2 socket system we then get:

rte_num_sockets() => return 2
rte_get_socket_id(int idx) => return numa_node_ids[idx]

Would that be suitable?

-- 
Thanks,
Anatoly

  reply	other threads:[~2018-03-21 10:24 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-22 11:58 [dpdk-dev] [PATCH] " Anatoly Burakov
2017-12-22 12:41 ` [dpdk-dev] [PATCH v2] " Anatoly Burakov
2018-01-11 22:20   ` Thomas Monjalon
2018-01-12 11:44     ` Burakov, Anatoly
2018-01-12 11:50       ` Thomas Monjalon
2018-01-16 11:56         ` Burakov, Anatoly
2018-01-16 12:20           ` Thomas Monjalon
2018-01-16 15:05             ` Burakov, Anatoly
2018-01-16 17:34               ` Thomas Monjalon
2018-01-16 17:38                 ` Burakov, Anatoly
2018-01-16 18:26                   ` Thomas Monjalon
2018-01-16 17:53   ` [dpdk-dev] [PATCH] doc: add ABI change notice for numa_node_count in eal Anatoly Burakov
2018-01-23 10:39     ` Mcnamara, John
2018-02-07 10:10       ` Jerin Jacob
2018-02-09 14:42         ` Bruce Richardson
2018-02-14  0:04           ` Thomas Monjalon
2018-02-14 14:25             ` Thomas Monjalon
2018-02-12 16:00     ` Jonas Pfefferle
     [not found]   ` <cover.1517848624.git.anatoly.burakov@intel.com>
2018-02-05 16:37     ` [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets Anatoly Burakov
2018-02-05 17:39       ` Burakov, Anatoly
2018-02-05 22:45         ` Thomas Monjalon
2018-02-06  9:28           ` Burakov, Anatoly
2018-02-06  9:47             ` Thomas Monjalon
2018-02-07  9:58       ` [dpdk-dev] [PATCH 18.05 v4] Add " Anatoly Burakov
2018-02-07  9:58       ` [dpdk-dev] [PATCH 18.05 v4] eal: add " Anatoly Burakov
2018-03-08 12:12         ` Bruce Richardson
2018-03-08 14:38           ` Burakov, Anatoly
2018-03-09 16:32             ` Bruce Richardson
2018-03-20 22:43             ` Thomas Monjalon
2018-03-21  4:59         ` gowrishankar muthukrishnan
2018-03-21 10:24           ` Burakov, Anatoly [this message]
2018-03-22  5:16             ` gowrishankar muthukrishnan
2018-03-22  9:04               ` Burakov, Anatoly
2018-03-22 10:58       ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
2018-03-22 11:45         ` Burakov, Anatoly
2018-03-22 12:36         ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
2018-03-22 17:07           ` gowrishankar muthukrishnan
2018-03-27 16:24           ` Thomas Monjalon
2018-03-31 13:35             ` Burakov, Anatoly
2018-04-02 15:27               ` Thomas Monjalon
2018-03-31 17:08           ` [dpdk-dev] [PATCH v7] " Anatoly Burakov
2018-04-04 22:31             ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db9677b5-63f5-0d92-6c2c-4cefe4b7b801@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=chaozhu@linux.vnet.ibm.com \
    --cc=dev@dpdk.org \
    --cc=gowrishankar.m@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).