DPDK usage discussions
 help / color / mirror / Atom feed
From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
To: Imre Pinter <imre.pinter@ericsson.com>,
	"users@dpdk.org" <users@dpdk.org>
Cc: "Gabor Halász" <gabor.halasz@ericsson.com>,
	"Péter Suskovics" <peter.suskovics@ericsson.com>
Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
Date: Thu, 1 Jun 2017 08:50:35 +0000	[thread overview]
Message-ID: <ED26CBA2FAD1BF48A8719AEF02201E36511F3564@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <VI1PR07MB13578627486437F8339E99FE80F60@VI1PR07MB1357.eurprd07.prod.outlook.com>



> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre Pinter
> Sent: Thursday, June 1, 2017 3:55 PM
> To: users@dpdk.org
> Cc: Gabor Halász; Péter Suskovics
> Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> Hi,
> 
> We experience slow startup time in DPDK-OVS, when backing memory with
> 1G hugepages instead of 2M hugepages.
> Currently we're mapping 2M hugepages as memory backend for DPDK OVS.
> In the future we would like to allocate this memory from the 1G hugepage
> pool. Currently in our deployments we have significant amount of 1G
> hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
> hugepages.
> 
> Typical setup for 2M hugepages:
>                 GRUB:
> hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
> default_hugepagesz=1G
> 
> $ grep hugetlbfs /proc/mounts
> nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0
> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> 
> Typical setup for 1GB hugepages:
> GRUB:
> hugepagesz=1G hugepages=56 default_hugepagesz=1G
> 
> $ grep hugetlbfs /proc/mounts
> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> 
> DPDK OVS startup times based on the ovs-vswitchd.log logs:
> 
>   *   2M (2G memory allocated) - startup time ~3 sec:
> 
> 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1
> --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> 
> 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> Datapath supports recirculation
> 
>   *   1G (56G memory allocated) - startup time ~13 sec:
> 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1
> --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> Datapath supports recirculation
> I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with
> kernel 3.13.0-117-generic and 4.4.0-78-generic.


You can shorten the time by this:

(1) Mount 1 GB hugepages into two directories.
nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how much you want to use in OVS> 0 0
nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0

(2) Force to use memory  interleave policy 
$ numactl --interleave=all ovs-vswitchd ...

Note: keep the huge-dir and socket-mem option, "--huge-dir /mnt/huge_ovs_1G --socket-mem 1024,1024".

> 
> We had a discussion with Mark Gray (from Intel), and he come up with the
> following items:
> 
> ·         The ~10 sec time difference is there with testpmd as well
> 
> ·         They believe it is a kernel overhead (mmap is slow, perhaps it is zeroing
> pages). The following code from eal_memory.c does the above mentioned
> printout in EAL startup:

Yes, correct.

> 469    /* map the segment, and populate page tables,
> 470     * the kernel fills this segment with zeros */
> 468    uint64_t start = rte_rdtsc();
> 471    virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE,
> 472                    MAP_SHARED | MAP_POPULATE, fd, 0);
> 473    if (virtaddr == MAP_FAILED) {
> 474            RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__,
> 475                            strerror(errno));
> 476            close(fd);
> 477            return i;
> 478    }
> 479
> 480    if (orig) {
> 481            hugepg_tbl[i].orig_va = virtaddr;
> 482            printf("Original mapping of page %u took: %"PRIu64"
> ticks, %"PRIu64" ms\n     ",
> 483                    i, rte_rdtsc() - start,
> 484                    (rte_rdtsc() - start) * 1000 /
> 485                    rte_get_timer_hz());
> 486    }
> 
> 
> A solution could be to mount 1G hugepages to 2 separate directory: 2G for
> OVS and the remaining for the VMs, but the NUMA location for these
> hugepages is non-deterministic. Since mount cannot handle NUMA related
> parameters during mounting hugetlbfs, and fstab forks the mounts during
> boot.

Oh, similar idea :-)

> 
> Do you have a solution on how to use 1G hugepages for VMs and have
> reasonable DPDK EAL startup time?

No, we still don't have such options.

Thanks,
Jianfeng

> 
> Thanks,
> Imre

  reply	other threads:[~2017-06-01  8:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <VI1PR07MB1357C989E2F7092D9A31ED9F80F30@VI1PR07MB1357.eurprd07.prod.outlook.com>
2017-06-01  7:55 ` Imre Pinter
2017-06-01  8:50   ` Tan, Jianfeng [this message]
2017-06-01 10:12     ` Marco Varlese
2017-06-02  1:40       ` Tan, Jianfeng
2017-06-06 12:39         ` Imre Pinter
2017-06-06 14:31           ` Tan, Jianfeng
2017-06-06 15:25             ` Imre Pinter
2017-06-07  8:22               ` Tan, Jianfeng
2017-06-08 14:40                 ` Imre Pinter
2017-06-01  9:02   ` Sergio Gonzalez Monroy
2017-06-08 14:30     ` Imre Pinter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ED26CBA2FAD1BF48A8719AEF02201E36511F3564@SHSMSX103.ccr.corp.intel.com \
    --to=jianfeng.tan@intel.com \
    --cc=gabor.halasz@ericsson.com \
    --cc=imre.pinter@ericsson.com \
    --cc=peter.suskovics@ericsson.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).