* [dpdk-dev] long initialization of rte_eal_hugepage_init @ 2017-09-06 3:24 王志克 2017-09-06 4:24 ` Stephen Hemminger ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: 王志克 @ 2017-09-06 3:24 UTC (permalink / raw) To: users, dev Hi All, I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case. If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket. What is the proposal from DPDK community? Any solution? Note I tried version dpdk 16.11. Br, Wang Zhike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 3:24 [dpdk-dev] long initialization of rte_eal_hugepage_init 王志克 @ 2017-09-06 4:24 ` Stephen Hemminger 2017-09-06 6:45 ` 王志克 2017-09-06 4:35 ` Pavan Nikhilesh Bhagavatula 2017-09-06 4:36 ` Tan, Jianfeng 2 siblings, 1 reply; 10+ messages in thread From: Stephen Hemminger @ 2017-09-06 4:24 UTC (permalink / raw) To: 王志克; +Cc: dev, users Linux zeros huge pages by default. There was a fix in later releases On Sep 5, 2017 8:24 PM, "王志克" <wangzhike@jd.com> wrote: > Hi All, > > I observed that rte_eal_hugepage_init() will take quite long time if there > are lots of huge pages. Example I have 500 1G huge pages, and it takes > about 2 minutes. That is too long especially for application restart case. > > If the application only needs limited huge page while the host have lots > of huge pages, the algorithm is not so efficent. Example, we only need 1G > memory from each socket. > > What is the proposal from DPDK community? Any solution? > > Note I tried version dpdk 16.11. > > Br, > Wang Zhike > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 4:24 ` Stephen Hemminger @ 2017-09-06 6:45 ` 王志克 0 siblings, 0 replies; 10+ messages in thread From: 王志克 @ 2017-09-06 6:45 UTC (permalink / raw) To: Stephen Hemminger, zhihong.wang; +Cc: dev, users Hi Stephen, Do you means “disable zero huge page” would improve the performance? How can the memory be guarantee<http://www.baidu.com/link?url=OcSiFdTLN-XzcXbWcNS7WKEDAs5KPRf5SoQeihstSK0eIPPoRsFICa7XLymTk-ln_XJ5mXmGU9C4srI6Nwax6IgorIeptfF9NvgooO1z4B3>d to be allocated? Would it introduce function issue? I checked below commit, and I guess the commit at least means the “zero the huge page” is needed. commit 5ce3ace1de458e2ded1b408acfe59c15cf9863f1 Author: Zhihong Wang <zhihong.wang@intel.com> Date: Sun Nov 22 14:13:35 2015 -0500 eal: remove unnecessary hugepage zero-filling The kernel fills new allocated (huge) pages with zeros. DPDK just has to populate page tables to trigger the allocation. Signed-off-by: Zhihong Wang <zhihong.wang@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> From: Stephen Hemminger [mailto:stephen@networkplumber.org] Sent: Wednesday, September 06, 2017 12:24 PM To: 王志克 Cc: dev@dpdk.org; users@dpdk.org Subject: Re: [dpdk-dev] long initialization of rte_eal_hugepage_init Linux zeros huge pages by default. There was a fix in later releases On Sep 5, 2017 8:24 PM, "王志克" <wangzhike@jd.com<mailto:wangzhike@jd.com>> wrote: Hi All, I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case. If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket. What is the proposal from DPDK community? Any solution? Note I tried version dpdk 16.11. Br, Wang Zhike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 3:24 [dpdk-dev] long initialization of rte_eal_hugepage_init 王志克 2017-09-06 4:24 ` Stephen Hemminger @ 2017-09-06 4:35 ` Pavan Nikhilesh Bhagavatula 2017-09-06 7:37 ` Sergio Gonzalez Monroy 2017-09-06 4:36 ` Tan, Jianfeng 2 siblings, 1 reply; 10+ messages in thread From: Pavan Nikhilesh Bhagavatula @ 2017-09-06 4:35 UTC (permalink / raw) To: 王志克; +Cc: dev On Wed, Sep 06, 2017 at 03:24:52AM +0000, 王志克 wrote: > Hi All, > > I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case. > > If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket. > There is a EAL option --socket-mem which can be used to limit the memory aquired from each socket. > What is the proposal from DPDK community? Any solution? > > Note I tried version dpdk 16.11. > > Br, > Wang Zhike -Pavan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 4:35 ` Pavan Nikhilesh Bhagavatula @ 2017-09-06 7:37 ` Sergio Gonzalez Monroy 2017-09-06 8:59 ` 王志克 0 siblings, 1 reply; 10+ messages in thread From: Sergio Gonzalez Monroy @ 2017-09-06 7:37 UTC (permalink / raw) To: Pavan Nikhilesh Bhagavatula, 王志克; +Cc: dev On 06/09/2017 05:35, Pavan Nikhilesh Bhagavatula wrote: > On Wed, Sep 06, 2017 at 03:24:52AM +0000, 王志克 wrote: >> Hi All, >> >> I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case. >> >> If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket. >> > There is a EAL option --socket-mem which can be used to limit the memory > aquired from each socket. > >> What is the proposal from DPDK community? Any solution? >> >> Note I tried version dpdk 16.11. >> >> Br, >> Wang Zhike > -Pavan Since DPDK 17.08+ we use libnuma to first get the amount of pages we need from each socket, then as many more as we can. So you can setup your huge page mount point or cgroups to limit the amount of pages you can get. So basically: 1. setup mount quota or cgroup limit 2. use --socket-mem option to limit amount per socket Note that pre-17.08 we did not have libnuma support so chances are that if you have a low quota/limit and need memory from both sockets it would likely fail to allocate the request. Thanks, Sergio ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 7:37 ` Sergio Gonzalez Monroy @ 2017-09-06 8:59 ` 王志克 0 siblings, 0 replies; 10+ messages in thread From: 王志克 @ 2017-09-06 8:59 UTC (permalink / raw) To: Sergio Gonzalez Monroy, Pavan Nikhilesh Bhagavatula; +Cc: dev Thanks Sergio. It really helps. Br, Wang Zhike -----Original Message----- From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.monroy@intel.com] Sent: Wednesday, September 06, 2017 3:37 PM To: Pavan Nikhilesh Bhagavatula; 王志克 Cc: dev@dpdk.org Subject: Re: [dpdk-dev] long initialization of rte_eal_hugepage_init On 06/09/2017 05:35, Pavan Nikhilesh Bhagavatula wrote: > On Wed, Sep 06, 2017 at 03:24:52AM +0000, 王志克 wrote: >> Hi All, >> >> I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case. >> >> If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket. >> > There is a EAL option --socket-mem which can be used to limit the memory > aquired from each socket. > >> What is the proposal from DPDK community? Any solution? >> >> Note I tried version dpdk 16.11. >> >> Br, >> Wang Zhike > -Pavan Since DPDK 17.08+ we use libnuma to first get the amount of pages we need from each socket, then as many more as we can. So you can setup your huge page mount point or cgroups to limit the amount of pages you can get. So basically: 1. setup mount quota or cgroup limit 2. use --socket-mem option to limit amount per socket Note that pre-17.08 we did not have libnuma support so chances are that if you have a low quota/limit and need memory from both sockets it would likely fail to allocate the request. Thanks, Sergio ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 3:24 [dpdk-dev] long initialization of rte_eal_hugepage_init 王志克 2017-09-06 4:24 ` Stephen Hemminger 2017-09-06 4:35 ` Pavan Nikhilesh Bhagavatula @ 2017-09-06 4:36 ` Tan, Jianfeng 2017-09-06 6:02 ` 王志克 2 siblings, 1 reply; 10+ messages in thread From: Tan, Jianfeng @ 2017-09-06 4:36 UTC (permalink / raw) To: wangzhike, users, dev > -----Original Message----- > From: users [mailto:users-bounces@dpdk.org] On Behalf Of ??? > Sent: Wednesday, September 6, 2017 11:25 AM > To: users@dpdk.org; dev@dpdk.org > Subject: [dpdk-users] long initialization of rte_eal_hugepage_init > > Hi All, > > I observed that rte_eal_hugepage_init() will take quite long time if there are > lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 > minutes. That is too long especially for application restart case. > > If the application only needs limited huge page while the host have lots of > huge pages, the algorithm is not so efficent. Example, we only need 1G > memory from each socket. > > What is the proposal from DPDK community? Any solution? You can mount hugetlbfs with "size" option + use "--socket-mem" option in DPDK to restrict the memory to be used. Thanks, Jianfeng > > Note I tried version dpdk 16.11. > > Br, > Wang Zhike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 4:36 ` Tan, Jianfeng @ 2017-09-06 6:02 ` 王志克 2017-09-06 7:17 ` Tan, Jianfeng 0 siblings, 1 reply; 10+ messages in thread From: 王志克 @ 2017-09-06 6:02 UTC (permalink / raw) To: Tan, Jianfeng, users, dev Do you mean "pagesize" when you say "size" option? I have specified the pagesize as 1G. Also, I already use "--socket-mem " to specify that the application only needs 1G per NUMA node. The problem is that map_all_hugepages() would map all free huge pages, and then select the proper ones. If I have 500 free huge pages (each 1G), and application only needs 1G per NUMA socket, it is unreasonable for such mapping. My use case is OVS+DPDK. The OVS+DPDK would only need 2G, and other application (Qemu/VM) would use the other huge pages. Br, Wang Zhike -----Original Message----- From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] Sent: Wednesday, September 06, 2017 12:36 PM To: 王志克; users@dpdk.org; dev@dpdk.org Subject: RE: long initialization of rte_eal_hugepage_init > -----Original Message----- > From: users [mailto:users-bounces@dpdk.org] On Behalf Of ??? > Sent: Wednesday, September 6, 2017 11:25 AM > To: users@dpdk.org; dev@dpdk.org > Subject: [dpdk-users] long initialization of rte_eal_hugepage_init > > Hi All, > > I observed that rte_eal_hugepage_init() will take quite long time if there are > lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 > minutes. That is too long especially for application restart case. > > If the application only needs limited huge page while the host have lots of > huge pages, the algorithm is not so efficent. Example, we only need 1G > memory from each socket. > > What is the proposal from DPDK community? Any solution? You can mount hugetlbfs with "size" option + use "--socket-mem" option in DPDK to restrict the memory to be used. Thanks, Jianfeng > > Note I tried version dpdk 16.11. > > Br, > Wang Zhike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 6:02 ` 王志克 @ 2017-09-06 7:17 ` Tan, Jianfeng 2017-09-06 8:58 ` 王志克 0 siblings, 1 reply; 10+ messages in thread From: Tan, Jianfeng @ 2017-09-06 7:17 UTC (permalink / raw) To: wangzhike, users, dev > -----Original Message----- > From: 王志克 [mailto:wangzhike@jd.com] > Sent: Wednesday, September 6, 2017 2:03 PM > To: Tan, Jianfeng; users@dpdk.org; dev@dpdk.org > Subject: RE: long initialization of rte_eal_hugepage_init > > Do you mean "pagesize" when you say "size" option? I have specified the > pagesize as 1G. No, I mean "size". I mean adding another hugetlbfs with total size = what you need for your app. And with another DPDK option "--huge-dir", we can avoid allocating all free hugepages. If you want to allocate memory on different sockets, e.g., --socket-mem 1024,1024, you need a newer DPDK with below commit by Ilya Maximets: commit 1b72605d241 ("mem: balanced allocation of hugepages"). Thanks, Jianfeng > Also, I already use "--socket-mem " to specify that the application only needs > 1G per NUMA node. > > The problem is that map_all_hugepages() would map all free huge pages, > and then select the proper ones. If I have 500 free huge pages (each 1G), and > application only needs 1G per NUMA socket, it is unreasonable for such > mapping. > > My use case is OVS+DPDK. The OVS+DPDK would only need 2G, and other > application (Qemu/VM) would use the other huge pages. > > Br, > Wang Zhike > > > -----Original Message----- > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] > Sent: Wednesday, September 06, 2017 12:36 PM > To: 王志克; users@dpdk.org; dev@dpdk.org > Subject: RE: long initialization of rte_eal_hugepage_init > > > > > -----Original Message----- > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of ??? > > Sent: Wednesday, September 6, 2017 11:25 AM > > To: users@dpdk.org; dev@dpdk.org > > Subject: [dpdk-users] long initialization of rte_eal_hugepage_init > > > > Hi All, > > > > I observed that rte_eal_hugepage_init() will take quite long time if there > are > > lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 > > minutes. That is too long especially for application restart case. > > > > If the application only needs limited huge page while the host have lots of > > huge pages, the algorithm is not so efficent. Example, we only need 1G > > memory from each socket. > > > > What is the proposal from DPDK community? Any solution? > > You can mount hugetlbfs with "size" option + use "--socket-mem" option in > DPDK to restrict the memory to be used. > > Thanks, > Jianfeng > > > > > Note I tried version dpdk 16.11. > > > > Br, > > Wang Zhike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] long initialization of rte_eal_hugepage_init 2017-09-06 7:17 ` Tan, Jianfeng @ 2017-09-06 8:58 ` 王志克 0 siblings, 0 replies; 10+ messages in thread From: 王志克 @ 2017-09-06 8:58 UTC (permalink / raw) To: Tan, Jianfeng, users, dev Thanks Jianfeng for your suggestion. I get the point. Br, Wang Zhike -----Original Message----- From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] Sent: Wednesday, September 06, 2017 3:18 PM To: 王志克; users@dpdk.org; dev@dpdk.org Subject: RE: long initialization of rte_eal_hugepage_init > -----Original Message----- > From: 王志克 [mailto:wangzhike@jd.com] > Sent: Wednesday, September 6, 2017 2:03 PM > To: Tan, Jianfeng; users@dpdk.org; dev@dpdk.org > Subject: RE: long initialization of rte_eal_hugepage_init > > Do you mean "pagesize" when you say "size" option? I have specified the > pagesize as 1G. No, I mean "size". I mean adding another hugetlbfs with total size = what you need for your app. And with another DPDK option "--huge-dir", we can avoid allocating all free hugepages. If you want to allocate memory on different sockets, e.g., --socket-mem 1024,1024, you need a newer DPDK with below commit by Ilya Maximets: commit 1b72605d241 ("mem: balanced allocation of hugepages"). Thanks, Jianfeng > Also, I already use "--socket-mem " to specify that the application only needs > 1G per NUMA node. > > The problem is that map_all_hugepages() would map all free huge pages, > and then select the proper ones. If I have 500 free huge pages (each 1G), and > application only needs 1G per NUMA socket, it is unreasonable for such > mapping. > > My use case is OVS+DPDK. The OVS+DPDK would only need 2G, and other > application (Qemu/VM) would use the other huge pages. > > Br, > Wang Zhike > > > -----Original Message----- > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] > Sent: Wednesday, September 06, 2017 12:36 PM > To: 王志克; users@dpdk.org; dev@dpdk.org > Subject: RE: long initialization of rte_eal_hugepage_init > > > > > -----Original Message----- > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of ??? > > Sent: Wednesday, September 6, 2017 11:25 AM > > To: users@dpdk.org; dev@dpdk.org > > Subject: [dpdk-users] long initialization of rte_eal_hugepage_init > > > > Hi All, > > > > I observed that rte_eal_hugepage_init() will take quite long time if there > are > > lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 > > minutes. That is too long especially for application restart case. > > > > If the application only needs limited huge page while the host have lots of > > huge pages, the algorithm is not so efficent. Example, we only need 1G > > memory from each socket. > > > > What is the proposal from DPDK community? Any solution? > > You can mount hugetlbfs with "size" option + use "--socket-mem" option in > DPDK to restrict the memory to be used. > > Thanks, > Jianfeng > > > > > Note I tried version dpdk 16.11. > > > > Br, > > Wang Zhike ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-09-06 8:59 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-06 3:24 [dpdk-dev] long initialization of rte_eal_hugepage_init 王志克 2017-09-06 4:24 ` Stephen Hemminger 2017-09-06 6:45 ` 王志克 2017-09-06 4:35 ` Pavan Nikhilesh Bhagavatula 2017-09-06 7:37 ` Sergio Gonzalez Monroy 2017-09-06 8:59 ` 王志克 2017-09-06 4:36 ` Tan, Jianfeng 2017-09-06 6:02 ` 王志克 2017-09-06 7:17 ` Tan, Jianfeng 2017-09-06 8:58 ` 王志克
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).