From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
To: Ilya Maximets <i.maximets@samsung.com>,
"dev@dpdk.org" <dev@dpdk.org>,
David Marchand <david.marchand@6wind.com>,
"Gonzalez Monroy, Sergio" <sergio.gonzalez.monroy@intel.com>
Cc: Heetae Ahn <heetae82.ahn@samsung.com>,
Yuanhan Liu <yuanhan.liu@linux.intel.com>,
Neil Horman <nhorman@tuxdriver.com>,
"Pei, Yulong" <yulong.pei@intel.com>,
"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] mem: balanced allocation of hugepages
Date: Thu, 16 Feb 2017 13:26:26 +0000 [thread overview]
Message-ID: <ED26CBA2FAD1BF48A8719AEF02201E3651138958@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <1487250070-13973-1-git-send-email-i.maximets@samsung.com>
Hi,
> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets@samsung.com]
> Sent: Thursday, February 16, 2017 9:01 PM
> To: dev@dpdk.org; David Marchand; Gonzalez Monroy, Sergio
> Cc: Heetae Ahn; Yuanhan Liu; Tan, Jianfeng; Neil Horman; Pei, Yulong; Ilya
> Maximets; stable@dpdk.org
> Subject: [PATCH] mem: balanced allocation of hugepages
>
> Currently EAL allocates hugepages one by one not paying
> attention from which NUMA node allocation was done.
>
> Such behaviour leads to allocation failure if number of
> available hugepages for application limited by cgroups
> or hugetlbfs and memory requested not only from the first
> socket.
>
> Example:
> # 90 x 1GB hugepages availavle in a system
>
> cgcreate -g hugetlb:/test
> # Limit to 32GB of hugepages
> cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
> # Request 4GB from each of 2 sockets
> cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...
>
> EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
> EAL: 32 not 90 hugepages of size 1024 MB allocated
> EAL: Not enough memory available on socket 1!
> Requested: 4096MB, available: 0MB
> PANIC in rte_eal_init():
> Cannot init memory
>
> This happens beacause all allocated pages are
> on socket 0.
For such an use case, why not just use "numactl --interleave=0,1 <DPDK app> xxx"?
Do you see use case like --socket-mem 2048,1024 and only three 1GB-hugepage are allowed?
Thanks,
Jianfeng
>
> Fix this issue by setting mempolicy MPOL_PREFERRED for each
> hugepage to one of requested nodes in a round-robin fashion.
> In this case all allocated pages will be fairly distributed
> between all requested nodes.
>
> New config option RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
> introduced and disabled by default because of external
> dependency from libnuma.
>
> Cc: <stable@dpdk.org>
> Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")
>
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
> config/common_base | 1 +
> lib/librte_eal/Makefile | 4 ++
> lib/librte_eal/linuxapp/eal/eal_memory.c | 66
> ++++++++++++++++++++++++++++++++
> mk/rte.app.mk | 3 ++
> 4 files changed, 74 insertions(+)
>
> diff --git a/config/common_base b/config/common_base
> index 71a4fcb..fbcebbd 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -97,6 +97,7 @@ CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
> CONFIG_RTE_EAL_IGB_UIO=n
> CONFIG_RTE_EAL_VFIO=n
> CONFIG_RTE_MALLOC_DEBUG=n
> +CONFIG_RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES=n
>
> # Default driver path (or "" to disable)
> CONFIG_RTE_EAL_PMD_PATH=""
> diff --git a/lib/librte_eal/Makefile b/lib/librte_eal/Makefile
> index cf11a09..5ae3846 100644
> --- a/lib/librte_eal/Makefile
> +++ b/lib/librte_eal/Makefile
> @@ -35,4 +35,8 @@ DIRS-y += common
> DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += linuxapp
> DIRS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += bsdapp
>
> +ifeq ($(CONFIG_RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES),y)
> +LDLIBS += -lnuma
> +endif
> +
> include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index a956bb2..8536a36 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -82,6 +82,9 @@
> #include <sys/time.h>
> #include <signal.h>
> #include <setjmp.h>
> +#ifdef RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
> +#include <numaif.h>
> +#endif
>
> #include <rte_log.h>
> #include <rte_memory.h>
> @@ -359,6 +362,21 @@ static int huge_wrap_sigsetjmp(void)
> return sigsetjmp(huge_jmpenv, 1);
> }
>
> +#ifdef RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
> +#ifndef ULONG_SIZE
> +#define ULONG_SIZE sizeof(unsigned long)
> +#endif
> +#ifndef ULONG_BITS
> +#define ULONG_BITS (ULONG_SIZE * CHAR_BIT)
> +#endif
> +#ifndef DIV_ROUND_UP
> +#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
> +#endif
> +#ifndef BITS_TO_LONGS
> +#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, ULONG_SIZE)
> +#endif
> +#endif
> +
> /*
> * Mmap all hugepages of hugepage table: it first open a file in
> * hugetlbfs, then mmap() hugepage_sz data in it. If orig is set, the
> @@ -375,10 +393,48 @@ map_all_hugepages(struct hugepage_file
> *hugepg_tbl,
> void *virtaddr;
> void *vma_addr = NULL;
> size_t vma_len = 0;
> +#ifdef RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
> + unsigned long
> nodemask[BITS_TO_LONGS(RTE_MAX_NUMA_NODES)] = {0UL};
> + unsigned long maxnode = 0;
> + int node_id = -1;
> +
> + for (i = 0; i < RTE_MAX_NUMA_NODES; i++)
> + if (internal_config.socket_mem[i])
> + maxnode = i + 1;
> +#endif
>
> for (i = 0; i < hpi->num_pages[0]; i++) {
> uint64_t hugepage_sz = hpi->hugepage_sz;
>
> +#ifdef RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
> + if (maxnode) {
> + node_id = (node_id + 1) % RTE_MAX_NUMA_NODES;
> + while (!internal_config.socket_mem[node_id])
> + node_id = (node_id + 1) %
> RTE_MAX_NUMA_NODES;
> +
> + nodemask[node_id / ULONG_BITS] =
> + 1UL << (node_id %
> ULONG_BITS);
> +
> + RTE_LOG(DEBUG, EAL,
> + "Setting policy MPOL_PREFERRED for
> socket %d\n",
> + node_id);
> + /*
> + * Due to old linux kernel bug (feature?) we have to
> + * increase maxnode by 1. It will be unconditionally
> + * decreased back to normal value inside the syscall
> + * handler.
> + */
> + if (set_mempolicy(MPOL_PREFERRED,
> + nodemask, maxnode + 1) < 0) {
> + RTE_LOG(ERR, EAL,
> + "Failed to set policy
> MPOL_PREFERRED: "
> + "%s\n", strerror(errno));
> + return i;
> + }
> +
> + nodemask[node_id / ULONG_BITS] = 0UL;
> + }
> +#endif
> if (orig) {
> hugepg_tbl[i].file_id = i;
> hugepg_tbl[i].size = hugepage_sz;
> @@ -489,6 +545,10 @@ map_all_hugepages(struct hugepage_file
> *hugepg_tbl,
> vma_len -= hugepage_sz;
> }
>
> +#ifdef RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
> + if (maxnode && set_mempolicy(MPOL_DEFAULT, NULL, 0) < 0)
> + RTE_LOG(ERR, EAL, "Failed to set mempolicy
> MPOL_DEFAULT\n");
> +#endif
> return i;
> }
>
> @@ -573,6 +634,11 @@ find_numasocket(struct hugepage_file *hugepg_tbl,
> struct hugepage_info *hpi)
> if (hugepg_tbl[i].orig_va == va) {
> hugepg_tbl[i].socket_id = socket_id;
> hp_count++;
> +#ifdef RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES
> + RTE_LOG(DEBUG, EAL,
> + "Hugepage %s is on socket %d\n",
> + hugepg_tbl[i].filepath, socket_id);
> +#endif
> }
> }
> }
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 92f3635..c2153b9 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -159,6 +159,9 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
> # The static libraries do not know their dependencies.
> # So linking with static library requires explicit dependencies.
> _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL) += -lrt
> +ifeq ($(CONFIG_RTE_LIBRTE_EAL_NUMA_AWARE_HUGEPAGES),y)
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_EAL) += -lnuma
> +endif
> _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED) += -lm
> _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED) += -lrt
> _LDLIBS-$(CONFIG_RTE_LIBRTE_METER) += -lm
> --
> 2.7.4
next prev parent reply other threads:[~2017-02-16 13:26 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20170216130139eucas1p2512567d6f5db9eaac5ee840b56bf920a@eucas1p2.samsung.com>
2017-02-16 13:01 ` Ilya Maximets
2017-02-16 13:26 ` Tan, Jianfeng [this message]
2017-02-16 13:55 ` Ilya Maximets
2017-02-16 13:57 ` Ilya Maximets
2017-02-16 13:31 ` Bruce Richardson
2017-03-06 9:34 ` Ilya Maximets
2017-03-08 13:46 ` Sergio Gonzalez Monroy
2017-03-09 12:57 ` Ilya Maximets
2017-03-27 13:01 ` Sergio Gonzalez Monroy
2017-03-27 14:43 ` Ilya Maximets
2017-04-07 15:14 ` Ilya Maximets
2017-04-07 15:44 ` Thomas Monjalon
2017-04-10 7:11 ` Ilya Maximets
2017-04-10 7:51 ` Sergio Gonzalez Monroy
2017-04-10 8:05 ` Ilya Maximets
[not found] ` <CGME20170410080425eucas1p27fd424ae58151f13b1a7a3723aa4ad1e@eucas1p2.samsung.com>
2017-04-10 8:04 ` [dpdk-dev] [PATCH v2] " Ilya Maximets
2017-04-10 10:03 ` Thomas Monjalon
[not found] ` <CGME20170606062227eucas1p2c49a95fb0fe11a4cadd5b4ceeb9712b1@eucas1p2.samsung.com>
2017-06-06 6:22 ` [dpdk-dev] [PATCH v3 0/2] Balanced " Ilya Maximets
[not found] ` <CGME20170606062232eucas1p11d2c304a28353d32b93ddfbd134d4da9@eucas1p1.samsung.com>
2017-06-06 6:22 ` [dpdk-dev] [PATCH v3 1/2] mem: balanced " Ilya Maximets
[not found] ` <CGME20170606062237eucas1p1de58fdde1bff816e480e50308804ba7a@eucas1p1.samsung.com>
2017-06-06 6:22 ` [dpdk-dev] [PATCH v3 2/2] config: enable vhost numa awareness by default Ilya Maximets
[not found] ` <CGME20170606081359eucas1p2f7eafa1abc346c5bb910c783df1d1520@eucas1p2.samsung.com>
2017-06-06 8:13 ` [dpdk-dev] [PATCH v4 0/2] Balanced allocation of hugepages Ilya Maximets
[not found] ` <CGME20170606081403eucas1p20c561b9177a51cfe58dd53b76cbfaaf7@eucas1p2.samsung.com>
2017-06-06 8:13 ` [dpdk-dev] [PATCH v4 1/2] mem: balanced " Ilya Maximets
[not found] ` <CGME20170606081409eucas1p2eed4a7dc49f1028c723f8c0a7a61fadf@eucas1p2.samsung.com>
2017-06-06 8:13 ` [dpdk-dev] [PATCH v4 2/2] config: enable vhost numa awareness by default Ilya Maximets
[not found] ` <CGME20170606133348eucas1p1cc5c3c05f88b2101c2ea47b26e0cac24@eucas1p1.samsung.com>
2017-06-06 13:33 ` [dpdk-dev] [PATCH v5 0/2] Balanced allocation of hugepages Ilya Maximets
[not found] ` <CGME20170606133352eucas1p13d1e860e996057a50a084f9365189e4d@eucas1p1.samsung.com>
2017-06-06 13:33 ` [dpdk-dev] [PATCH v5 1/2] mem: balanced " Ilya Maximets
[not found] ` <CGME20170606133354eucas1p284ae347e9ff07d6e8ab2bc09344ad1e5@eucas1p2.samsung.com>
2017-06-06 13:33 ` [dpdk-dev] [PATCH v5 2/2] config: enable vhost numa awareness by default Ilya Maximets
2017-06-08 11:21 ` [dpdk-dev] [PATCH v5 0/2] Balanced allocation of hugepages Ilya Maximets
2017-06-08 12:14 ` Bruce Richardson
2017-06-08 15:44 ` Sergio Gonzalez Monroy
2017-06-14 6:11 ` Ilya Maximets
2017-06-19 11:10 ` Hemant Agrawal
2017-06-20 13:07 ` Thomas Monjalon
2017-06-20 13:58 ` Ilya Maximets
2017-06-20 14:35 ` Thomas Monjalon
2017-06-20 14:58 ` Sergio Gonzalez Monroy
2017-06-20 15:41 ` Jerin Jacob
2017-06-20 15:51 ` Sergio Gonzalez Monroy
2017-06-21 8:14 ` Hemant Agrawal
2017-06-21 8:25 ` Sergio Gonzalez Monroy
2017-06-21 8:36 ` Ilya Maximets
2017-06-21 8:41 ` Jerin Jacob
2017-06-21 8:49 ` Thomas Monjalon
2017-06-21 9:27 ` Jerin Jacob
2017-06-21 9:58 ` Thomas Monjalon
2017-06-21 10:29 ` Jerin Jacob
2017-06-21 10:36 ` Ilya Maximets
2017-06-21 11:22 ` Jerin Jacob
2017-06-21 11:29 ` Thomas Monjalon
2017-06-27 9:13 ` Hemant Agrawal
2017-06-27 9:26 ` Thomas Monjalon
2017-06-27 9:48 ` Hemant Agrawal
[not found] ` <CGME20170621080434eucas1p18d3d4e4133c1cf885c849d022806408d@eucas1p1.samsung.com>
2017-06-21 8:04 ` [dpdk-dev] [PATCH v6 " Ilya Maximets
[not found] ` <CGME20170621080441eucas1p2dc01b29e7c8e4c1546ace6cd76ae51ff@eucas1p2.samsung.com>
2017-06-21 8:04 ` [dpdk-dev] [PATCH v6 1/2] mem: balanced " Ilya Maximets
2017-06-21 8:51 ` Thomas Monjalon
2017-06-21 8:58 ` Bruce Richardson
2017-06-21 9:25 ` Ilya Maximets
2017-06-21 9:34 ` Bruce Richardson
2017-06-21 9:28 ` Thomas Monjalon
[not found] ` <CGME20170621080448eucas1p28951fac6e4910cc599fe88d7edac9734@eucas1p2.samsung.com>
2017-06-21 8:04 ` [dpdk-dev] [PATCH v6 2/2] config: enable vhost numa awareness by default Ilya Maximets
[not found] ` <CGME20170621100837eucas1p1c570092cac733a66d939ca7ff04ac9e6@eucas1p1.samsung.com>
2017-06-21 10:08 ` [dpdk-dev] [PATCH v7 0/2] Balanced allocation of hugepages Ilya Maximets
[not found] ` <CGME20170621100841eucas1p1114078b1d8a38920c3633e9bddbabc02@eucas1p1.samsung.com>
2017-06-21 10:08 ` [dpdk-dev] [PATCH v7 1/2] mem: balanced " Ilya Maximets
[not found] ` <CGME20170621100845eucas1p2a457b1694d20de8e2d8126df679c43ae@eucas1p2.samsung.com>
2017-06-21 10:08 ` [dpdk-dev] [PATCH v7 2/2] config: enable vhost numa awareness by default Ilya Maximets
2017-06-27 9:20 ` Hemant Agrawal
2017-06-26 10:44 ` [dpdk-dev] [PATCH v7 0/2] Balanced allocation of hugepages Ilya Maximets
2017-06-26 14:07 ` Jerin Jacob
2017-06-26 15:33 ` Sergio Gonzalez Monroy
2017-06-27 8:42 ` Ilya Maximets
[not found] ` <CGME20170627084632eucas1p28133ee4b425b3938e2564fca03e1140b@eucas1p2.samsung.com>
2017-06-27 8:46 ` [dpdk-dev] [PATCH v8 " Ilya Maximets
[not found] ` <CGME20170627084637eucas1p2c591db905fa9f143fa5dbb3c08fae82f@eucas1p2.samsung.com>
2017-06-27 8:46 ` [dpdk-dev] [PATCH v8 1/2] mem: balanced " Ilya Maximets
[not found] ` <CGME20170627084641eucas1p182cac065efef74445ffa234a6dcbb23d@eucas1p1.samsung.com>
2017-06-27 8:46 ` [dpdk-dev] [PATCH v8 2/2] config: enable vhost numa awareness by default Ilya Maximets
2017-06-27 9:18 ` Hemant Agrawal
2017-06-27 9:21 ` Thomas Monjalon
2017-06-27 9:41 ` Hemant Agrawal
2017-06-27 9:59 ` Thomas Monjalon
2017-06-27 9:59 ` Jerin Jacob
2017-06-27 12:17 ` Hemant Agrawal
2017-06-27 12:45 ` Jerin Jacob
2017-06-27 13:00 ` Hemant Agrawal
2017-06-27 9:19 ` Thomas Monjalon
2017-06-27 10:26 ` Ilya Maximets
[not found] ` <CGME20170627102447eucas1p15a57bbaaf46944c0935d4ef71b55cd83@eucas1p1.samsung.com>
2017-06-27 10:24 ` [dpdk-dev] [PATCH v9 0/2] Balanced allocation of hugepages Ilya Maximets
[not found] ` <CGME20170627102451eucas1p2254d8679f70e261b9db9d2123aa80091@eucas1p2.samsung.com>
2017-06-27 10:24 ` [dpdk-dev] [PATCH v9 1/2] mem: balanced " Ilya Maximets
2017-06-28 10:30 ` Sergio Gonzalez Monroy
2017-06-29 5:32 ` Hemant Agrawal
2017-06-29 5:48 ` Ilya Maximets
2017-06-29 6:08 ` Ilya Maximets
[not found] ` <CGME20170627102454eucas1p14b2a1024d77158ad0bf40d62e6ad4365@eucas1p1.samsung.com>
2017-06-27 10:24 ` [dpdk-dev] [PATCH v9 2/2] config: enable vhost numa awareness by default Ilya Maximets
2017-06-29 5:31 ` Hemant Agrawal
[not found] ` <CGME20170629055928eucas1p17e823d821cfe95953bfa59dc9883ca4f@eucas1p1.samsung.com>
2017-06-29 5:59 ` [dpdk-dev] [PATCH v10 0/2] Balanced allocation of hugepages Ilya Maximets
[not found] ` <CGME20170629055933eucas1p1e5eba5f07850f63f9afbd48e6ca64c42@eucas1p1.samsung.com>
2017-06-29 5:59 ` [dpdk-dev] [PATCH v10 1/2] mem: balanced " Ilya Maximets
2017-06-29 7:03 ` Hemant Agrawal
[not found] ` <CGME20170629055940eucas1p1c9adcb26bec3ce5de97fe56753fd941a@eucas1p1.samsung.com>
2017-06-29 5:59 ` [dpdk-dev] [PATCH v10 2/2] config: enable vhost numa awareness by default Ilya Maximets
2017-06-30 15:50 ` Thomas Monjalon
2017-06-29 6:29 ` [dpdk-dev] [PATCH v10 0/2] Balanced allocation of hugepages Jerin Jacob
2017-06-30 8:36 ` Ilya Maximets
2017-06-30 16:12 ` [dpdk-dev] [PATCH v11 " Thomas Monjalon
2017-06-30 16:12 ` [dpdk-dev] [PATCH v11 1/2] mem: balanced " Thomas Monjalon
2017-06-30 16:12 ` [dpdk-dev] [PATCH v11 2/2] config: enable vhost NUMA awareness by default Thomas Monjalon
2017-07-01 10:59 ` [dpdk-dev] [PATCH v11 0/2] Balanced allocation of hugepages Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ED26CBA2FAD1BF48A8719AEF02201E3651138958@SHSMSX103.ccr.corp.intel.com \
--to=jianfeng.tan@intel.com \
--cc=david.marchand@6wind.com \
--cc=dev@dpdk.org \
--cc=heetae82.ahn@samsung.com \
--cc=i.maximets@samsung.com \
--cc=nhorman@tuxdriver.com \
--cc=sergio.gonzalez.monroy@intel.com \
--cc=stable@dpdk.org \
--cc=yuanhan.liu@linux.intel.com \
--cc=yulong.pei@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).