* [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
@ 2014-04-30 14:15 David Marchand
2014-05-01 16:06 ` Burakov, Anatoly
0 siblings, 1 reply; 9+ messages in thread
From: David Marchand @ 2014-04-30 14:15 UTC (permalink / raw)
To: dev
From: Didier Pallard <didier.pallard@6wind.com>
Currently, if there is more memory in hugepages than the amount
requested by dpdk application, the memory is allocated by taking as much
memory as possible from each socket, starting from first one.
For example if a system is configured with 8 GB in 2 sockets (4 GB per
socket), and dpdk is requesting only 4GB of memory, all memory will be
taken in socket 0 (that have exactly 4GB of free hugepages) even if some
cores are configured on socket 1, and there are free hugepages on socket
1...
Change this behaviour to allocate memory on all sockets where some cores
are configured, spreading the memory amongst sockets using following
ratio per socket:
N° of cores configured on the socket / Total number of configured cores
* requested memory
This algorithm is used when memory amount is specified globally using
-m option. Per socket memory allocation can always be done using
--socket-mem option.
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
---
lib/librte_eal/bsdapp/eal/eal.c | 24 ++++++++++++++++++++++++
lib/librte_eal/linuxapp/eal/eal.c | 24 ++++++++++++++++++++++++
lib/librte_eal/linuxapp/eal/eal_memory.c | 8 ++++----
3 files changed, 52 insertions(+), 4 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 5c181b3..b6b5f20 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -869,6 +869,30 @@ rte_eal_init(int argc, char **argv)
internal_config.memory = eal_get_hugepage_mem_size();
}
+ /* Automatically spread requested memory amongst detected sockets according */
+ /* to number of cores from cpu mask present on each socket */
+ if (internal_config.no_hugetlbfs == 0 &&
+ internal_config.process_type != RTE_PROC_SECONDARY &&
+ internal_config.xen_dom0_support == 0 &&
+ internal_config.force_sockets == 0) {
+ int cpu_per_socket[RTE_MAX_NUMA_NODES];
+ unsigned lcore_id, socket_id;
+
+ /* Compute number of cores per socket */
+ memset(cpu_per_socket, 0, sizeof(cpu_per_socket));
+ RTE_LCORE_FOREACH(lcore_id) {
+ cpu_per_socket[rte_lcore_to_socket_id(lcore_id)]++;
+ }
+
+ /* Set memory amount per socket; round up to be sure that sum of all */
+ /* sockets allocation is greater than requested memory size */
+ for (socket_id=0 ; socket_id<RTE_MAX_NUMA_NODES ; socket_id++) {
+ internal_config.socket_mem[socket_id] = (internal_config.memory *
+ cpu_per_socket[socket_id] + rte_lcore_count() - 1) /
+ rte_lcore_count();
+ }
+ }
+
if (internal_config.vmware_tsc_map == 1) {
#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
rte_cycles_vmware_tsc_map = 1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index de082ab..37701ec 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -1035,6 +1035,30 @@ rte_eal_init(int argc, char **argv)
internal_config.memory = eal_get_hugepage_mem_size();
}
+ /* Automatically spread requested memory amongst detected sockets according */
+ /* to number of cores from cpu mask present on each socket */
+ if (internal_config.no_hugetlbfs == 0 &&
+ internal_config.process_type != RTE_PROC_SECONDARY &&
+ internal_config.xen_dom0_support == 0 &&
+ internal_config.force_sockets == 0) {
+ int cpu_per_socket[RTE_MAX_NUMA_NODES];
+ unsigned lcore_id, socket_id;
+
+ /* Compute number of cores per socket */
+ memset(cpu_per_socket, 0, sizeof(cpu_per_socket));
+ RTE_LCORE_FOREACH(lcore_id) {
+ cpu_per_socket[rte_lcore_to_socket_id(lcore_id)]++;
+ }
+
+ /* Set memory amount per socket; round up to be sure that sum of all */
+ /* sockets allocation is greater than requested memory size */
+ for (socket_id=0 ; socket_id<RTE_MAX_NUMA_NODES ; socket_id++) {
+ internal_config.socket_mem[socket_id] = (internal_config.memory *
+ cpu_per_socket[socket_id] + rte_lcore_count() - 1) /
+ rte_lcore_count();
+ }
+ }
+
if (internal_config.vmware_tsc_map == 1) {
#ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
rte_cycles_vmware_tsc_map = 1;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 73a6394..a9e6fa1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -876,16 +876,16 @@ calc_num_pages_per_socket(uint64_t * memory,
unsigned requested, available;
int total_num_pages = 0;
uint64_t remaining_mem, cur_mem;
- uint64_t total_mem = internal_config.memory;
+ int64_t total_mem = (int64_t) internal_config.memory;
if (num_hp_info == 0)
return -1;
- for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem != 0; socket++) {
+ for (socket = 0; socket < RTE_MAX_NUMA_NODES && total_mem > 0; socket++) {
/* if specific memory amounts per socket weren't requested */
- if (internal_config.force_sockets == 0) {
+ if (internal_config.force_sockets == 0 && memory[socket] == 0) {
/* take whatever is available */
- memory[socket] = RTE_MIN(get_socket_mem_size(socket),
+ memory[socket] = RTE_MIN((int64_t) get_socket_mem_size(socket),
total_mem);
}
/* skips if the memory on specific socket wasn't requested */
--
1.7.10.4
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
2014-04-30 14:15 [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation David Marchand
@ 2014-05-01 16:06 ` Burakov, Anatoly
2014-05-02 8:54 ` Burakov, Anatoly
0 siblings, 1 reply; 9+ messages in thread
From: Burakov, Anatoly @ 2014-05-01 16:06 UTC (permalink / raw)
To: David Marchand, dev
Hi David, Didier,
> + /* Set memory amount per socket; round up to be sure that
> sum of all */
> + /* sockets allocation is greater than requested memory size
> */
> + for (socket_id=0 ; socket_id<RTE_MAX_NUMA_NODES ;
> socket_id++) {
> + internal_config.socket_mem[socket_id] =
> (internal_config.memory *
> + cpu_per_socket[socket_id] + rte_lcore_count() - 1)
> /
> + rte_lcore_count();
> + }
> + }
> +
Can I suggest to do an RTE_MAX between (internal_config.memory - total_mem) and (internal_config.memory * cpu_per_socket[socket_id] + rte_lcore_count() - 1) / rte_lcore_count() ? I don't think it's a good idea to go over the requested amount. Let the last core have a chance of reserving slightly less memory than other cores, but don't let it go over the limit. If specific memory constraints are required, let the user use --socket-mem instead.
Also, nitpicking, but multiline commends should go under a single /* ... */ in order to keep consistency with other DPDK code.
Best regards,
Anatoly Burakov
DPDK SW Engineer
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
2014-05-01 16:06 ` Burakov, Anatoly
@ 2014-05-02 8:54 ` Burakov, Anatoly
2014-05-02 9:05 ` Venkatesan, Venky
0 siblings, 1 reply; 9+ messages in thread
From: Burakov, Anatoly @ 2014-05-02 8:54 UTC (permalink / raw)
To: Burakov, Anatoly, David Marchand, dev
Hi again David/Didier,
> Can I suggest to do an RTE_MAX between (internal_config.memory -
> total_mem) and (internal_config.memory * cpu_per_socket[socket_id] +
> rte_lcore_count() - 1) / rte_lcore_count() ? I don't think it's a good idea to go
> over the requested amount. Let the last core have a chance of reserving
> slightly less memory than other cores, but don't let it go over the limit. If
> specific memory constraints are required, let the user use --socket-mem
> instead.
Sorry for spamming, but now that I think of it, I don't believe this change makes much sense. If the user wants memory on specific sockets, there's already --socket-mem option. If the user doesn't care, there's -m option, which gives the user memory from whatever sockets it is available. With this change applied, DPDK will fail when run with -m switch under certain circumstances (e.g. cores from socket 0 present in the coremask but no memory left on socket 0), which is quite the opposite of a simple "give me n megs, I don't care where it comes from" option -m is providing.
Best regards,
Anatoly Burakov
DPDK SW Engineer
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
2014-05-02 8:54 ` Burakov, Anatoly
@ 2014-05-02 9:05 ` Venkatesan, Venky
2014-05-05 9:26 ` David Marchand
0 siblings, 1 reply; 9+ messages in thread
From: Venkatesan, Venky @ 2014-05-02 9:05 UTC (permalink / raw)
To: Burakov, Anatoly, Burakov, Anatoly, David Marchand, dev
Agree with Anatoly - I would much rather not change legacy option behaviour that has existed for a while, especially when --socket-mem is available to do exactly what is needed.
-Venky
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Burakov, Anatoly
Sent: Friday, May 02, 2014 1:54 AM
To: Burakov, Anatoly; David Marchand; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
Hi again David/Didier,
> Can I suggest to do an RTE_MAX between (internal_config.memory -
> total_mem) and (internal_config.memory * cpu_per_socket[socket_id] +
> rte_lcore_count() - 1) / rte_lcore_count() ? I don't think it's a good
> idea to go over the requested amount. Let the last core have a chance
> of reserving slightly less memory than other cores, but don't let it
> go over the limit. If specific memory constraints are required, let
> the user use --socket-mem instead.
Sorry for spamming, but now that I think of it, I don't believe this change makes much sense. If the user wants memory on specific sockets, there's already --socket-mem option. If the user doesn't care, there's -m option, which gives the user memory from whatever sockets it is available. With this change applied, DPDK will fail when run with -m switch under certain circumstances (e.g. cores from socket 0 present in the coremask but no memory left on socket 0), which is quite the opposite of a simple "give me n megs, I don't care where it comes from" option -m is providing.
Best regards,
Anatoly Burakov
DPDK SW Engineer
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
2014-05-02 9:05 ` Venkatesan, Venky
@ 2014-05-05 9:26 ` David Marchand
2014-05-06 10:05 ` Burakov, Anatoly
2014-05-07 14:56 ` Venkatesan, Venky
0 siblings, 2 replies; 9+ messages in thread
From: David Marchand @ 2014-05-05 9:26 UTC (permalink / raw)
To: Venkatesan, Venky; +Cc: dev
Hello Venky, Anatoly,
On Fri, May 2, 2014 at 11:05 AM, Venkatesan, Venky <
venky.venkatesan@intel.com> wrote:
> Agree with Anatoly - I would much rather not change legacy option
> behaviour that has existed for a while, especially when --socket-mem is
> available to do exactly what is needed.
>
> -Venky
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Burakov, Anatoly
> Sent: Friday, May 02, 2014 1:54 AM
> To: Burakov, Anatoly; David Marchand; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory
> allocation
>
> Sorry for spamming, but now that I think of it, I don't believe this
> change makes much sense. If the user wants memory on specific sockets,
> there's already --socket-mem option. If the user doesn't care, there's -m
> option, which gives the user memory from whatever sockets it is available.
> With this change applied, DPDK will fail when run with -m switch under
> certain circumstances (e.g. cores from socket 0 present in the coremask but
> no memory left on socket 0), which is quite the opposite of a simple "give
> me n megs, I don't care where it comes from" option -m is providing.
>
>
Actually, if we don't care where memory is allocated, we can at least try
to have the more common setup work properly (i.e. spread memory allocations
based on used cores).
I can see no usual setup where you want to use cores on a socket while
having all memory on another socket but still expect performance to be good.
So here is another approach for Didier's patch.
We can try to spread memory on numa sockets, if this fails, then we default
to previous behavior but leave a trace with a warning log "Could not spread
memory on numa sockets".
What do you think about this ?
I would also take into account Anatoly's comments (multi line comments +
ensure we won't try to get more memory than asked by user).
--
David Marchand
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
2014-05-05 9:26 ` David Marchand
@ 2014-05-06 10:05 ` Burakov, Anatoly
2014-05-06 15:18 ` Thomas Monjalon
2014-05-07 14:56 ` Venkatesan, Venky
1 sibling, 1 reply; 9+ messages in thread
From: Burakov, Anatoly @ 2014-05-06 10:05 UTC (permalink / raw)
To: David Marchand, Venkatesan, Venky; +Cc: dev
Hi David,
> Actually, if we don't care where memory is allocated, we can at least try to have the more common setup work properly (i.e. spread memory allocations based on used cores).
> I can see no usual setup where you want to use cores on a socket while having all memory on another socket but still expect performance to be good.
>
> So here is another approach for Didier's patch.
> We can try to spread memory on numa sockets, if this fails, then we default to previous behavior but leave a trace with a warning log "Could not spread memory on numa sockets".
>
> What do you think about this ?
Sounds like an overcomplication to me. There could be cases where performance doesn't matter, for example the -m switch could be used to run various tests (unit tests, functional tests etc.). For anything performance-related, the recommended option is to use --socket-mem, especially if you have NICs on specific sockets. Presumably, when you're setting up a coremask, you already know which sockets your cores are on, so I don't see a problem with specifying which sockets you want memory from.
Best regards,
Anatoly Burakov
DPDK SW Engineer
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
2014-05-06 10:05 ` Burakov, Anatoly
@ 2014-05-06 15:18 ` Thomas Monjalon
2014-05-06 15:56 ` Burakov, Anatoly
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Monjalon @ 2014-05-06 15:18 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: dev
2014-05-06 10:05, Burakov, Anatoly:
> David Marchand:
> > Actually, if we don't care where memory is allocated, we can at least try
> > to have the more common setup work properly (i.e. spread memory
> > allocations based on used cores).
> > I can see no usual setup where you
> > want to use cores on a socket while having all memory on another socket
> > but still expect performance to be good.
> > So here is another approach for Didier's patch.
> > We can try to spread memory on numa sockets, if this fails, then we
> > default to previous behavior but leave a trace with a warning log "Could
> > not spread memory on numa sockets".
> > What do you think about this ?
>
> Sounds like an overcomplication to me. There could be cases where
> performance doesn't matter, for example the -m switch could be used to run
> various tests (unit tests, functional tests etc.). For anything
> performance-related, the recommended option is to use --socket-mem,
> especially if you have NICs on specific sockets. Presumably, when you're
> setting up a coremask, you already know which sockets your cores are on, so
> I don't see a problem with specifying which sockets you want memory from.
Having --socket-mem option to explicitly configure NUMA is OK.
Having -m option for simple configuration is OK.
Making -m option working for most use cases would be really nice.
So I don't understand why we shouldn't do this enhancement. I don't know if
"overcomplication" is a good argument. Maybe we should wait the patch to
discuss it.
--
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
2014-05-06 15:18 ` Thomas Monjalon
@ 2014-05-06 15:56 ` Burakov, Anatoly
0 siblings, 0 replies; 9+ messages in thread
From: Burakov, Anatoly @ 2014-05-06 15:56 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev
Hi Thomas,
> Having --socket-mem option to explicitly configure NUMA is OK.
> Having -m option for simple configuration is OK.
Exactly. No explicit requirements - use -m option. Explicit socket requirements - use --socket-mem option.
> So I don't understand why we shouldn't do this enhancement. I don't know if
> "overcomplication" is a good argument.
I don't see any reasons *for* this change. IMO this patch tries to fix a problem that doesn't exist (or, rather, a problem that is already solved with --socket-mem switch). I'm open to persuasion on that one of course, but so far I don't see any compelling reason to change -m switch to be more like --socket-mem switch when we already have the --socket-mem switch for cases when the user cares about where the memory comes from.
Best regards,
Anatoly Burakov
DPDK SW Engineer
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
2014-05-05 9:26 ` David Marchand
2014-05-06 10:05 ` Burakov, Anatoly
@ 2014-05-07 14:56 ` Venkatesan, Venky
1 sibling, 0 replies; 9+ messages in thread
From: Venkatesan, Venky @ 2014-05-07 14:56 UTC (permalink / raw)
To: David Marchand; +Cc: dev
David,
Sorry for the late response. Yes, your suggestion would work. Let’s implement it …
Regards,
-Venky
From: David Marchand [mailto:david.marchand@6wind.com]
Sent: Monday, May 05, 2014 2:26 AM
To: Venkatesan, Venky
Cc: Burakov, Anatoly; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
Hello Venky, Anatoly,
On Fri, May 2, 2014 at 11:05 AM, Venkatesan, Venky <venky.venkatesan@intel.com<mailto:venky.venkatesan@intel.com>> wrote:
Agree with Anatoly - I would much rather not change legacy option behaviour that has existed for a while, especially when --socket-mem is available to do exactly what is needed.
-Venky
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Burakov, Anatoly
Sent: Friday, May 02, 2014 1:54 AM
To: Burakov, Anatoly; David Marchand; dev@dpdk.org<mailto:dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation
Sorry for spamming, but now that I think of it, I don't believe this change makes much sense. If the user wants memory on specific sockets, there's already --socket-mem option. If the user doesn't care, there's -m option, which gives the user memory from whatever sockets it is available. With this change applied, DPDK will fail when run with -m switch under certain circumstances (e.g. cores from socket 0 present in the coremask but no memory left on socket 0), which is quite the opposite of a simple "give me n megs, I don't care where it comes from" option -m is providing.
Actually, if we don't care where memory is allocated, we can at least try to have the more common setup work properly (i.e. spread memory allocations based on used cores).
I can see no usual setup where you want to use cores on a socket while having all memory on another socket but still expect performance to be good.
So here is another approach for Didier's patch.
We can try to spread memory on numa sockets, if this fails, then we default to previous behavior but leave a trace with a warning log "Could not spread memory on numa sockets".
What do you think about this ?
I would also take into account Anatoly's comments (multi line comments + ensure we won't try to get more memory than asked by user).
--
David Marchand
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-05-07 14:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-30 14:15 [dpdk-dev] [PATCH RFC] eal: change default per socket memory allocation David Marchand
2014-05-01 16:06 ` Burakov, Anatoly
2014-05-02 8:54 ` Burakov, Anatoly
2014-05-02 9:05 ` Venkatesan, Venky
2014-05-05 9:26 ` David Marchand
2014-05-06 10:05 ` Burakov, Anatoly
2014-05-06 15:18 ` Thomas Monjalon
2014-05-06 15:56 ` Burakov, Anatoly
2014-05-07 14:56 ` Venkatesan, Venky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).