From: Imre Pinter <imre.pinter@ericsson.com>
To: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>,
"users@dpdk.org" <users@dpdk.org>
Cc: "Gabor Halász" <gabor.halasz@ericsson.com>,
"Péter Suskovics" <peter.suskovics@ericsson.com>
Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
Date: Thu, 8 Jun 2017 14:30:42 +0000 [thread overview]
Message-ID: <VI1PR07MB1357CB431AC3B979A54D146E80C90@VI1PR07MB1357.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <2addf963-8e23-f7fa-038a-da23a9dbcde2@intel.com>
> -----Original Message-----
> From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.monroy@intel.com]
> Sent: 2017. június 1. 11:03
> To: Imre Pinter <imre.pinter@ericsson.com>; users@dpdk.org
> Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics
> <peter.suskovics@ericsson.com>
> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
>
> On 01/06/2017 08:55, Imre Pinter wrote:
> > Hi,
> >
> > We experience slow startup time in DPDK-OVS, when backing memory
> with 1G hugepages instead of 2M hugepages.
> > Currently we're mapping 2M hugepages as memory backend for DPDK
> OVS. In the future we would like to allocate this memory from the 1G
> hugepage pool. Currently in our deployments we have significant amount of
> 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
> hugepages.
> >
> > Typical setup for 2M hugepages:
> > GRUB:
> > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
> > default_hugepagesz=1G
> >
> > $ grep hugetlbfs /proc/mounts
> > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev
> > /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >
> > Typical setup for 1GB hugepages:
> > GRUB:
> > hugepagesz=1G hugepages=56 default_hugepagesz=1G
> >
> > $ grep hugetlbfs /proc/mounts
> > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >
> > DPDK OVS startup times based on the ovs-vswitchd.log logs:
> >
> > * 2M (2G memory allocated) - startup time ~3 sec:
> >
> > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> >
> > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> > Datapath supports recirculation
> >
> > * 1G (56G memory allocated) - startup time ~13 sec:
> > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> > Datapath supports recirculation I used DPDK 16.11 for OVS and testpmd
> and tested on Ubuntu 14.04 with kernel 3.13.0-117-generic and 4.4.0-78-
> generic.
> >
> > We had a discussion with Mark Gray (from Intel), and he come up with the
> following items:
> >
> > · The ~10 sec time difference is there with testpmd as well
> >
> > · They believe it is a kernel overhead (mmap is slow, perhaps it is
> zeroing pages). The following code from eal_memory.c does the above
> mentioned printout in EAL startup:
> > 469 /* map the segment, and populate page tables,
> > 470 * the kernel fills this segment with zeros */
> > 468 uint64_t start = rte_rdtsc();
> > 471 virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ |
> PROT_WRITE,
> > 472 MAP_SHARED | MAP_POPULATE, fd, 0);
> > 473 if (virtaddr == MAP_FAILED) {
> > 474 RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__,
> > 475 strerror(errno));
> > 476 close(fd);
> > 477 return i;
> > 478 }
> > 479
> > 480 if (orig) {
> > 481 hugepg_tbl[i].orig_va = virtaddr;
> > 482 printf("Original mapping of page %u took: %"PRIu64" ticks,
> %"PRIu64" ms\n ",
> > 483 i, rte_rdtsc() - start,
> > 484 (rte_rdtsc() - start) * 1000 /
> > 485 rte_get_timer_hz());
> > 486 }
> >
> >
> > A solution could be to mount 1G hugepages to 2 separate directory: 2G for
> OVS and the remaining for the VMs, but the NUMA location for these
> hugepages is non-deterministic. Since mount cannot handle NUMA related
> parameters during mounting hugetlbfs, and fstab forks the mounts during
> boot.
> >
> > Do you have a solution on how to use 1G hugepages for VMs and have
> reasonable DPDK EAL startup time?
>
> In theory, one solution would be to use cgroup , as described here:
> http://dpdk.org/ml/archives/dev/2017-February/057742.html
> http://dpdk.org/ml/archives/dev/2017-April/063442.html
>
> Then use 'numactl --interleave' policy.
>
> I said in theory because it does not seem to work as one would expect, so
> the proposed patch in above threads would be a solution by forcing
> allocation from specific numa node for each page.
>
> Thanks,
> Sergio
>
Thanks for the reply Sergio!
The following patch (v5) at the end of the mentioned mail thread seems to be solving the issue.
http://dpdk.org/dev/patchwork/patch/25069/
Thanks,
Imre
> > Thanks,
> > Imre
> >
prev parent reply other threads:[~2017-06-08 14:30 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <VI1PR07MB1357C989E2F7092D9A31ED9F80F30@VI1PR07MB1357.eurprd07.prod.outlook.com>
2017-06-01 7:55 ` Imre Pinter
2017-06-01 8:50 ` Tan, Jianfeng
2017-06-01 10:12 ` Marco Varlese
2017-06-02 1:40 ` Tan, Jianfeng
2017-06-06 12:39 ` Imre Pinter
2017-06-06 14:31 ` Tan, Jianfeng
2017-06-06 15:25 ` Imre Pinter
2017-06-07 8:22 ` Tan, Jianfeng
2017-06-08 14:40 ` Imre Pinter
2017-06-01 9:02 ` Sergio Gonzalez Monroy
2017-06-08 14:30 ` Imre Pinter [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR07MB1357CB431AC3B979A54D146E80C90@VI1PR07MB1357.eurprd07.prod.outlook.com \
--to=imre.pinter@ericsson.com \
--cc=gabor.halasz@ericsson.com \
--cc=peter.suskovics@ericsson.com \
--cc=sergio.gonzalez.monroy@intel.com \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).