DPDK usage discussions
 help / color / mirror / Atom feed
From: Imre Pinter <imre.pinter@ericsson.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>,
	Marco Varlese <marco.varlese@suse.com>,
	"users@dpdk.org" <users@dpdk.org>
Cc: "Gabor Halász" <gabor.halasz@ericsson.com>,
	"Péter Suskovics" <peter.suskovics@ericsson.com>
Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
Date: Tue, 6 Jun 2017 12:39:47 +0000	[thread overview]
Message-ID: <VI1PR07MB13571DCB1459FD6D2D361C4C80CB0@VI1PR07MB1357.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <ED26CBA2FAD1BF48A8719AEF02201E36511F6551@SHSMSX103.ccr.corp.intel.com>

Hi guys,

Thanks for the replies. See my comments inline.


-----Original Message-----
From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] 
Sent: 2017. június 2. 3:40
To: Marco Varlese <marco.varlese@suse.com>; Imre Pinter <imre.pinter@ericsson.com>; users@dpdk.org
Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics <peter.suskovics@ericsson.com>
Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages



> -----Original Message-----
> From: Marco Varlese [mailto:marco.varlese@suse.com]
> Sent: Thursday, June 1, 2017 6:12 PM
> To: Tan, Jianfeng; Imre Pinter; users@dpdk.org
> Cc: Gabor Halász; Péter Suskovics
> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
> >
> > >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre 
> > > Pinter
> > > Sent: Thursday, June 1, 2017 3:55 PM
> > > To: users@dpdk.org
> > > Cc: Gabor Halász; Péter Suskovics
> > > Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> > >
> > > Hi,
> > >
> > > We experience slow startup time in DPDK-OVS, when backing memory
> with
> > > 1G hugepages instead of 2M hugepages.
> > > Currently we're mapping 2M hugepages as memory backend for DPDK
> OVS.
> > > In the future we would like to allocate this memory from the 1G
> hugepage
> > > pool. Currently in our deployments we have significant amount of 
> > > 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M 
> > > hugepages.
> > >
> > > Typical setup for 2M hugepages:
> > >                 GRUB:
> > > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 
> > > default_hugepagesz=1G
> > >
> > > $ grep hugetlbfs /proc/mounts
> > > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev 
> > > /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > >
> > > Typical setup for 1GB hugepages:
> > > GRUB:
> > > hugepagesz=1G hugepages=56 default_hugepagesz=1G
> > >
> > > $ grep hugetlbfs /proc/mounts
> > > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > >
> > > DPDK OVS startup times based on the ovs-vswitchd.log logs:
> > >
> > >   *   2M (2G memory allocated) - startup time ~3 sec:
> > >
> > > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> > >
> > > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-
> netdev:
> > > Datapath supports recirculation
> > >
> > >   *   1G (56G memory allocated) - startup time ~13 sec:
> > > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> > > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-
> netdev:
> > > Datapath supports recirculation
> > > I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 
> > > with kernel 3.13.0-117-generic and 4.4.0-78-generic.
> >
> >
> > You can shorten the time by this:
> >
> > (1) Mount 1 GB hugepages into two directories.
> > nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how
> much you
> > want to use in OVS> 0 0
> > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> I understood (reading Imre) that this does not really work because of 
> non- deterministic allocation of hugepages in a NUMA architecture.
> e.g. we would end up (potentially) using hugepages allocated on 
> different nodes even when accessing the OVS directory.
> Did I understand this correctly?

Did you try step 2? And Sergio also gives more options on another email in this thread for your reference.

Thanks,
Jianfeng

@Jianfeng: Step (1) will not help in our case. Hence 'mount' will not allocate hugepages from NUMA1 till the system has free hugepages on NUMA0.
I have 56G hugepages allocated from 1G size. This means 28-28G hugepages available per NUMA node. If mounting action is performed via fstab, then we'll end up in one of the following scenarios randomly.
First mount for OVS, then for VMs:
+---------------------------------------+---------------------------------------+
|                 NUMA0                 |                 NUMA1                 |
+---------------------------------------+---------------------------------------+
| OVS(2G) |           VMs(26G)          |               VMs (28G)               |
+---------------------------------------+---------------------------------------+

First mount for VMs, then OVS:
+---------------------------------------+---------------------------------------+
|                 NUMA0                 |                 NUMA1                 |
+---------------------------------------+---------------------------------------+
|               VMs (28G)               |           VMs(26G)          | OVS(2G) |
+---------------------------------------+---------------------------------------+
@Marco: After the hugepages were allocated, the ones in OVS directory were either from NUMA0, or NUMA1, but not from both (different setup come after a roboot). This caused error in DPDK startup, hence 1-1 hugepages were requested from both NUMA nodes, and there was no hugepages allocated to the other NUMA node.

> 
> >
> > (2) Force to use memory  interleave policy $ numactl 
> > --interleave=all ovs-vswitchd ...
> >
> > Note: keep the huge-dir and socket-mem option, "--huge-dir
> /mnt/huge_ovs_1G --
> > socket-mem 1024,1024".
> >
@Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all ovs-vswitchd ...' cannot help, because all the hugepages mounted to OVS directory will be from one of the NUMA nodes. The DPDK application requires 1-1G hugepage from both of the NUMA nodes, so DPDK returns with an error.
I have also tried without Step (1), and we still has the slower startup.
Currently I'm looking into Sergio's mail.

Br,
Imre

  reply	other threads:[~2017-06-06 12:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <VI1PR07MB1357C989E2F7092D9A31ED9F80F30@VI1PR07MB1357.eurprd07.prod.outlook.com>
2017-06-01  7:55 ` Imre Pinter
2017-06-01  8:50   ` Tan, Jianfeng
2017-06-01 10:12     ` Marco Varlese
2017-06-02  1:40       ` Tan, Jianfeng
2017-06-06 12:39         ` Imre Pinter [this message]
2017-06-06 14:31           ` Tan, Jianfeng
2017-06-06 15:25             ` Imre Pinter
2017-06-07  8:22               ` Tan, Jianfeng
2017-06-08 14:40                 ` Imre Pinter
2017-06-01  9:02   ` Sergio Gonzalez Monroy
2017-06-08 14:30     ` Imre Pinter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR07MB13571DCB1459FD6D2D361C4C80CB0@VI1PR07MB1357.eurprd07.prod.outlook.com \
    --to=imre.pinter@ericsson.com \
    --cc=gabor.halasz@ericsson.com \
    --cc=jianfeng.tan@intel.com \
    --cc=marco.varlese@suse.com \
    --cc=peter.suskovics@ericsson.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).