Re: [dpdk-users] Slow DPDK startup with many 1G hugepages

DPDK usage discussions
 help / color / mirror / Atom feed

From: Imre Pinter <imre.pinter@ericsson.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>,
	Marco Varlese <marco.varlese@suse.com>,
	"users@dpdk.org" <users@dpdk.org>
Cc: "Gabor Halász" <gabor.halasz@ericsson.com>,
	"Péter Suskovics" <peter.suskovics@ericsson.com>
Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
Date: Tue, 6 Jun 2017 15:25:31 +0000	[thread overview]
Message-ID: <VI1PR07MB13577DEE47414D3293CCBDA480CB0@VI1PR07MB1357.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <0f24fe8c-9294-9656-7338-1c09e5c83340@intel.com>



> -----Original Message-----
> From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> Sent: 2017. június 6. 16:32
> To: Imre Pinter <imre.pinter@ericsson.com>; Marco Varlese
> <marco.varlese@suse.com>; users@dpdk.org
> Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics
> <peter.suskovics@ericsson.com>
> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> 
> 
> On 6/6/2017 8:39 PM, Imre Pinter wrote:
> > Hi guys,
> >
> > Thanks for the replies. See my comments inline.
> >
> >
> > -----Original Message-----
> > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> > Sent: 2017. június 2. 3:40
> > To: Marco Varlese <marco.varlese@suse.com>; Imre Pinter
> > <imre.pinter@ericsson.com>; users@dpdk.org
> > Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics
> > <peter.suskovics@ericsson.com>
> > Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages
> >
> >
> >
> >> -----Original Message-----
> >> From: Marco Varlese [mailto:marco.varlese@suse.com]
> >> Sent: Thursday, June 1, 2017 6:12 PM
> >> To: Tan, Jianfeng; Imre Pinter; users@dpdk.org
> >> Cc: Gabor Halász; Péter Suskovics
> >> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> >>
> >> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
> >>>> -----Original Message-----
> >>>> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre
> >>>> Pinter
> >>>> Sent: Thursday, June 1, 2017 3:55 PM
> >>>> To: users@dpdk.org
> >>>> Cc: Gabor Halász; Péter Suskovics
> >>>> Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> >>>>
> >>>> Hi,
> >>>>
> >>>> We experience slow startup time in DPDK-OVS, when backing memory
> >> with
> >>>> 1G hugepages instead of 2M hugepages.
> >>>> Currently we're mapping 2M hugepages as memory backend for DPDK
> >> OVS.
> >>>> In the future we would like to allocate this memory from the 1G
> >> hugepage
> >>>> pool. Currently in our deployments we have significant amount of 1G
> >>>> hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
> >>>> hugepages.
> >>>>
> >>>> Typical setup for 2M hugepages:
> >>>>                  GRUB:
> >>>> hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
> >>>> default_hugepagesz=1G
> >>>>
> >>>> $ grep hugetlbfs /proc/mounts
> >>>> nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0
> nodev
> >>>> /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >>>>
> >>>> Typical setup for 1GB hugepages:
> >>>> GRUB:
> >>>> hugepagesz=1G hugepages=56 default_hugepagesz=1G
> >>>>
> >>>> $ grep hugetlbfs /proc/mounts
> >>>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >>>>
> >>>> DPDK OVS startup times based on the ovs-vswitchd.log logs:
> >>>>
> >>>>    *   2M (2G memory allocated) - startup time ~3 sec:
> >>>>
> >>>> 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -
> c
> >> 0x1
> >>>> --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> >>>>
> >>>> 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-
> >> netdev:
> >>>> Datapath supports recirculation
> >>>>
> >>>>    *   1G (56G memory allocated) - startup time ~13 sec:
> >>>> 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -
> c
> >> 0x1
> >>>> --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> >>>> 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-
> >> netdev:
> >>>> Datapath supports recirculation
> >>>> I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04
> >>>> with kernel 3.13.0-117-generic and 4.4.0-78-generic.
> >>>
> >>> You can shorten the time by this:
> >>>
> >>> (1) Mount 1 GB hugepages into two directories.
> >>> nodev /mnt/huge_ovs_1G hugetlbfs
> rw,relatime,pagesize=1G,size=<how
> >> much you
> >>> want to use in OVS> 0 0
> >>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >> I understood (reading Imre) that this does not really work because of
> >> non- deterministic allocation of hugepages in a NUMA architecture.
> >> e.g. we would end up (potentially) using hugepages allocated on
> >> different nodes even when accessing the OVS directory.
> >> Did I understand this correctly?
> > Did you try step 2? And Sergio also gives more options on another email in
> this thread for your reference.
> >
> > Thanks,
> > Jianfeng
> >
> > @Jianfeng: Step (1) will not help in our case. Hence 'mount' will not allocate
> hugepages from NUMA1 till the system has free hugepages on NUMA0.
> > I have 56G hugepages allocated from 1G size. This means 28-28G
> hugepages available per NUMA node. If mounting action is performed via
> fstab, then we'll end up in one of the following scenarios randomly.
> > First mount for OVS, then for VMs:
> > +---------------------------------------+---------------------------------------+
> > |                 NUMA0                 |                 NUMA1                 |
> > +---------------------------------------+---------------------------------------+
> > | OVS(2G) |           VMs(26G)          |               VMs (28G)               |
> > +---------------------------------------+---------------------------------------+
> >
> > First mount for VMs, then OVS:
> > +---------------------------------------+---------------------------------------+
> > |                 NUMA0                 |                 NUMA1                 |
> > +---------------------------------------+---------------------------------------+
> > |               VMs (28G)               |           VMs(26G)          | OVS(2G) |
> > +---------------------------------------+---------------------------------------+
> 
> This is why I suggested step 2 to allocate memory in an interleave way.
> Do you try that?
> 
> Thanks,
> Jianfeng
> 
I've double-checked it, and if I combine Step (1) and Step (2), then OVS start end up in the following error:
EAL: Detected 32 lcore(s)
EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: Probing VFIO support...
EAL: Not enough memory available on socket 1! Requested: 1024MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory

I experienced the same behavior with testpmd.
However when they use hugepages from the 'huge_qemu_1G' mountpoint, then they start properly.

Br,
Imre

> > @Marco: After the hugepages were allocated, the ones in OVS directory
> were either from NUMA0, or NUMA1, but not from both (different setup
> come after a roboot). This caused error in DPDK startup, hence 1-1
> hugepages were requested from both NUMA nodes, and there was no
> hugepages allocated to the other NUMA node.
> >
> >>> (2) Force to use memory  interleave policy $ numactl
> >>> --interleave=all ovs-vswitchd ...
> >>>
> >>> Note: keep the huge-dir and socket-mem option, "--huge-dir
> >> /mnt/huge_ovs_1G --
> >>> socket-mem 1024,1024".
> >>>
> > @Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all ovs-
> vswitchd ...' cannot help, because all the hugepages mounted to OVS
> directory will be from one of the NUMA nodes. The DPDK application
> requires 1-1G hugepage from both of the NUMA nodes, so DPDK returns
> with an error.
> > I have also tried without Step (1), and we still has the slower startup.
> > Currently I'm looking into Sergio's mail.
> >
> > Br,
> > Imre

next prev parent reply	other threads:[~2017-06-06 15:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <VI1PR07MB1357C989E2F7092D9A31ED9F80F30@VI1PR07MB1357.eurprd07.prod.outlook.com>
2017-06-01  7:55 ` Imre Pinter
2017-06-01  8:50   ` Tan, Jianfeng
2017-06-01 10:12     ` Marco Varlese
2017-06-02  1:40       ` Tan, Jianfeng
2017-06-06 12:39         ` Imre Pinter
2017-06-06 14:31           ` Tan, Jianfeng
2017-06-06 15:25             ` Imre Pinter [this message]
2017-06-07  8:22               ` Tan, Jianfeng
2017-06-08 14:40                 ` Imre Pinter
2017-06-01  9:02   ` Sergio Gonzalez Monroy
2017-06-08 14:30     ` Imre Pinter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR07MB13577DEE47414D3293CCBDA480CB0@VI1PR07MB1357.eurprd07.prod.outlook.com \
    --to=imre.pinter@ericsson.com \
    --cc=gabor.halasz@ericsson.com \
    --cc=jianfeng.tan@intel.com \
    --cc=marco.varlese@suse.com \
    --cc=peter.suskovics@ericsson.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).