DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] Slow DPDK startup with many 1G hugepages
       [not found] <VI1PR07MB1357C989E2F7092D9A31ED9F80F30@VI1PR07MB1357.eurprd07.prod.outlook.com>
@ 2017-06-01  7:55 ` Imre Pinter
  2017-06-01  8:50   ` Tan, Jianfeng
  2017-06-01  9:02   ` Sergio Gonzalez Monroy
  0 siblings, 2 replies; 11+ messages in thread
From: Imre Pinter @ 2017-06-01  7:55 UTC (permalink / raw)
  To: users; +Cc: Gabor Halász, Péter Suskovics

Hi,

We experience slow startup time in DPDK-OVS, when backing memory with 1G hugepages instead of 2M hugepages.
Currently we're mapping 2M hugepages as memory backend for DPDK OVS. In the future we would like to allocate this memory from the 1G hugepage pool. Currently in our deployments we have significant amount of 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M hugepages.

Typical setup for 2M hugepages:
                GRUB:
hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 default_hugepagesz=1G

$ grep hugetlbfs /proc/mounts
nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0
nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0

Typical setup for 1GB hugepages:
GRUB:
hugepagesz=1G hugepages=56 default_hugepagesz=1G

$ grep hugetlbfs /proc/mounts
nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0

DPDK OVS startup times based on the ovs-vswitchd.log logs:

  *   2M (2G memory allocated) - startup time ~3 sec:

2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024

2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation

  *   1G (56G memory allocated) - startup time ~13 sec:
2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation
I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with kernel 3.13.0-117-generic and 4.4.0-78-generic.

We had a discussion with Mark Gray (from Intel), and he come up with the following items:

·         The ~10 sec time difference is there with testpmd as well

·         They believe it is a kernel overhead (mmap is slow, perhaps it is zeroing pages). The following code from eal_memory.c does the above mentioned printout in EAL startup:
469    /* map the segment, and populate page tables,
470     * the kernel fills this segment with zeros */
468    uint64_t start = rte_rdtsc();
471    virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE,
472                    MAP_SHARED | MAP_POPULATE, fd, 0);
473    if (virtaddr == MAP_FAILED) {
474            RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__,
475                            strerror(errno));
476            close(fd);
477            return i;
478    }
479
480    if (orig) {
481            hugepg_tbl[i].orig_va = virtaddr;
482            printf("Original mapping of page %u took: %"PRIu64" ticks, %"PRIu64" ms\n     ",
483                    i, rte_rdtsc() - start,
484                    (rte_rdtsc() - start) * 1000 /
485                    rte_get_timer_hz());
486    }


A solution could be to mount 1G hugepages to 2 separate directory: 2G for OVS and the remaining for the VMs, but the NUMA location for these hugepages is non-deterministic. Since mount cannot handle NUMA related parameters during mounting hugetlbfs, and fstab forks the mounts during boot.

Do you have a solution on how to use 1G hugepages for VMs and have reasonable DPDK EAL startup time?

Thanks,
Imre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-01  7:55 ` [dpdk-users] Slow DPDK startup with many 1G hugepages Imre Pinter
@ 2017-06-01  8:50   ` Tan, Jianfeng
  2017-06-01 10:12     ` Marco Varlese
  2017-06-01  9:02   ` Sergio Gonzalez Monroy
  1 sibling, 1 reply; 11+ messages in thread
From: Tan, Jianfeng @ 2017-06-01  8:50 UTC (permalink / raw)
  To: Imre Pinter, users; +Cc: Gabor Halász, Péter Suskovics



> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre Pinter
> Sent: Thursday, June 1, 2017 3:55 PM
> To: users@dpdk.org
> Cc: Gabor Halász; Péter Suskovics
> Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> Hi,
> 
> We experience slow startup time in DPDK-OVS, when backing memory with
> 1G hugepages instead of 2M hugepages.
> Currently we're mapping 2M hugepages as memory backend for DPDK OVS.
> In the future we would like to allocate this memory from the 1G hugepage
> pool. Currently in our deployments we have significant amount of 1G
> hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
> hugepages.
> 
> Typical setup for 2M hugepages:
>                 GRUB:
> hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
> default_hugepagesz=1G
> 
> $ grep hugetlbfs /proc/mounts
> nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0
> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> 
> Typical setup for 1GB hugepages:
> GRUB:
> hugepagesz=1G hugepages=56 default_hugepagesz=1G
> 
> $ grep hugetlbfs /proc/mounts
> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> 
> DPDK OVS startup times based on the ovs-vswitchd.log logs:
> 
>   *   2M (2G memory allocated) - startup time ~3 sec:
> 
> 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1
> --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> 
> 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> Datapath supports recirculation
> 
>   *   1G (56G memory allocated) - startup time ~13 sec:
> 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1
> --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> Datapath supports recirculation
> I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with
> kernel 3.13.0-117-generic and 4.4.0-78-generic.


You can shorten the time by this:

(1) Mount 1 GB hugepages into two directories.
nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how much you want to use in OVS> 0 0
nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0

(2) Force to use memory  interleave policy 
$ numactl --interleave=all ovs-vswitchd ...

Note: keep the huge-dir and socket-mem option, "--huge-dir /mnt/huge_ovs_1G --socket-mem 1024,1024".

> 
> We had a discussion with Mark Gray (from Intel), and he come up with the
> following items:
> 
> ·         The ~10 sec time difference is there with testpmd as well
> 
> ·         They believe it is a kernel overhead (mmap is slow, perhaps it is zeroing
> pages). The following code from eal_memory.c does the above mentioned
> printout in EAL startup:

Yes, correct.

> 469    /* map the segment, and populate page tables,
> 470     * the kernel fills this segment with zeros */
> 468    uint64_t start = rte_rdtsc();
> 471    virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE,
> 472                    MAP_SHARED | MAP_POPULATE, fd, 0);
> 473    if (virtaddr == MAP_FAILED) {
> 474            RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__,
> 475                            strerror(errno));
> 476            close(fd);
> 477            return i;
> 478    }
> 479
> 480    if (orig) {
> 481            hugepg_tbl[i].orig_va = virtaddr;
> 482            printf("Original mapping of page %u took: %"PRIu64"
> ticks, %"PRIu64" ms\n     ",
> 483                    i, rte_rdtsc() - start,
> 484                    (rte_rdtsc() - start) * 1000 /
> 485                    rte_get_timer_hz());
> 486    }
> 
> 
> A solution could be to mount 1G hugepages to 2 separate directory: 2G for
> OVS and the remaining for the VMs, but the NUMA location for these
> hugepages is non-deterministic. Since mount cannot handle NUMA related
> parameters during mounting hugetlbfs, and fstab forks the mounts during
> boot.

Oh, similar idea :-)

> 
> Do you have a solution on how to use 1G hugepages for VMs and have
> reasonable DPDK EAL startup time?

No, we still don't have such options.

Thanks,
Jianfeng

> 
> Thanks,
> Imre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-01  7:55 ` [dpdk-users] Slow DPDK startup with many 1G hugepages Imre Pinter
  2017-06-01  8:50   ` Tan, Jianfeng
@ 2017-06-01  9:02   ` Sergio Gonzalez Monroy
  2017-06-08 14:30     ` Imre Pinter
  1 sibling, 1 reply; 11+ messages in thread
From: Sergio Gonzalez Monroy @ 2017-06-01  9:02 UTC (permalink / raw)
  To: Imre Pinter, users; +Cc: Gabor Halász, Péter Suskovics

On 01/06/2017 08:55, Imre Pinter wrote:
> Hi,
>
> We experience slow startup time in DPDK-OVS, when backing memory with 1G hugepages instead of 2M hugepages.
> Currently we're mapping 2M hugepages as memory backend for DPDK OVS. In the future we would like to allocate this memory from the 1G hugepage pool. Currently in our deployments we have significant amount of 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M hugepages.
>
> Typical setup for 2M hugepages:
>                  GRUB:
> hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 default_hugepagesz=1G
>
> $ grep hugetlbfs /proc/mounts
> nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0
> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
>
> Typical setup for 1GB hugepages:
> GRUB:
> hugepagesz=1G hugepages=56 default_hugepagesz=1G
>
> $ grep hugetlbfs /proc/mounts
> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
>
> DPDK OVS startup times based on the ovs-vswitchd.log logs:
>
>    *   2M (2G memory allocated) - startup time ~3 sec:
>
> 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
>
> 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation
>
>    *   1G (56G memory allocated) - startup time ~13 sec:
> 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation
> I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with kernel 3.13.0-117-generic and 4.4.0-78-generic.
>
> We had a discussion with Mark Gray (from Intel), and he come up with the following items:
>
> ·         The ~10 sec time difference is there with testpmd as well
>
> ·         They believe it is a kernel overhead (mmap is slow, perhaps it is zeroing pages). The following code from eal_memory.c does the above mentioned printout in EAL startup:
> 469    /* map the segment, and populate page tables,
> 470     * the kernel fills this segment with zeros */
> 468    uint64_t start = rte_rdtsc();
> 471    virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE,
> 472                    MAP_SHARED | MAP_POPULATE, fd, 0);
> 473    if (virtaddr == MAP_FAILED) {
> 474            RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__,
> 475                            strerror(errno));
> 476            close(fd);
> 477            return i;
> 478    }
> 479
> 480    if (orig) {
> 481            hugepg_tbl[i].orig_va = virtaddr;
> 482            printf("Original mapping of page %u took: %"PRIu64" ticks, %"PRIu64" ms\n     ",
> 483                    i, rte_rdtsc() - start,
> 484                    (rte_rdtsc() - start) * 1000 /
> 485                    rte_get_timer_hz());
> 486    }
>
>
> A solution could be to mount 1G hugepages to 2 separate directory: 2G for OVS and the remaining for the VMs, but the NUMA location for these hugepages is non-deterministic. Since mount cannot handle NUMA related parameters during mounting hugetlbfs, and fstab forks the mounts during boot.
>
> Do you have a solution on how to use 1G hugepages for VMs and have reasonable DPDK EAL startup time?

In theory, one solution would be to use cgroup , as described here:
http://dpdk.org/ml/archives/dev/2017-February/057742.html
http://dpdk.org/ml/archives/dev/2017-April/063442.html

Then use 'numactl --interleave' policy.

I said in theory because it does not seem to work as one would expect, 
so the proposed patch in above threads would be a solution by forcing 
allocation from specific numa node for each page.

Thanks,
Sergio

> Thanks,
> Imre
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-01  8:50   ` Tan, Jianfeng
@ 2017-06-01 10:12     ` Marco Varlese
  2017-06-02  1:40       ` Tan, Jianfeng
  0 siblings, 1 reply; 11+ messages in thread
From: Marco Varlese @ 2017-06-01 10:12 UTC (permalink / raw)
  To: Tan, Jianfeng, Imre Pinter, users; +Cc: Gabor Halász, Péter Suskovics

On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
> 
> > 
> > -----Original Message-----
> > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre Pinter
> > Sent: Thursday, June 1, 2017 3:55 PM
> > To: users@dpdk.org
> > Cc: Gabor Halász; Péter Suskovics
> > Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> > 
> > Hi,
> > 
> > We experience slow startup time in DPDK-OVS, when backing memory with
> > 1G hugepages instead of 2M hugepages.
> > Currently we're mapping 2M hugepages as memory backend for DPDK OVS.
> > In the future we would like to allocate this memory from the 1G hugepage
> > pool. Currently in our deployments we have significant amount of 1G
> > hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
> > hugepages.
> > 
> > Typical setup for 2M hugepages:
> >                 GRUB:
> > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
> > default_hugepagesz=1G
> > 
> > $ grep hugetlbfs /proc/mounts
> > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0
> > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > 
> > Typical setup for 1GB hugepages:
> > GRUB:
> > hugepagesz=1G hugepages=56 default_hugepagesz=1G
> > 
> > $ grep hugetlbfs /proc/mounts
> > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > 
> > DPDK OVS startup times based on the ovs-vswitchd.log logs:
> > 
> >   *   2M (2G memory allocated) - startup time ~3 sec:
> > 
> > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1
> > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> > 
> > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> > Datapath supports recirculation
> > 
> >   *   1G (56G memory allocated) - startup time ~13 sec:
> > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1
> > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> > Datapath supports recirculation
> > I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with
> > kernel 3.13.0-117-generic and 4.4.0-78-generic.
> 
> 
> You can shorten the time by this:
> 
> (1) Mount 1 GB hugepages into two directories.
> nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how much you
> want to use in OVS> 0 0
> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
I understood (reading Imre) that this does not really work because of non-
deterministic allocation of hugepages in a NUMA architecture.
e.g. we would end up (potentially) using hugepages allocated on different nodes
even when accessing the OVS directory. 
Did I understand this correctly?

> 
> (2) Force to use memory  interleave policy 
> $ numactl --interleave=all ovs-vswitchd ...
> 
> Note: keep the huge-dir and socket-mem option, "--huge-dir /mnt/huge_ovs_1G --
> socket-mem 1024,1024".
> 
> > 
> > 
> > We had a discussion with Mark Gray (from Intel), and he come up with the
> > following items:
> > 
> > ·         The ~10 sec time difference is there with testpmd as well
> > 
> > ·         They believe it is a kernel overhead (mmap is slow, perhaps it is
> > zeroing
> > pages). The following code from eal_memory.c does the above mentioned
> > printout in EAL startup:
> 
> Yes, correct.
> 
> > 
> > 469    /* map the segment, and populate page tables,
> > 470     * the kernel fills this segment with zeros */
> > 468    uint64_t start = rte_rdtsc();
> > 471    virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE,
> > 472                    MAP_SHARED | MAP_POPULATE, fd, 0);
> > 473    if (virtaddr == MAP_FAILED) {
> > 474            RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__,
> > 475                            strerror(errno));
> > 476            close(fd);
> > 477            return i;
> > 478    }
> > 479
> > 480    if (orig) {
> > 481            hugepg_tbl[i].orig_va = virtaddr;
> > 482            printf("Original mapping of page %u took: %"PRIu64"
> > ticks, %"PRIu64" ms\n     ",
> > 483                    i, rte_rdtsc() - start,
> > 484                    (rte_rdtsc() - start) * 1000 /
> > 485                    rte_get_timer_hz());
> > 486    }
> > 
> > 
> > A solution could be to mount 1G hugepages to 2 separate directory: 2G for
> > OVS and the remaining for the VMs, but the NUMA location for these
> > hugepages is non-deterministic. Since mount cannot handle NUMA related
> > parameters during mounting hugetlbfs, and fstab forks the mounts during
> > boot.
> 
> Oh, similar idea :-)
> 
> > 
> > 
> > Do you have a solution on how to use 1G hugepages for VMs and have
> > reasonable DPDK EAL startup time?
> 
> No, we still don't have such options.
> 
> Thanks,
> Jianfeng
> 
> > 
> > 
> > Thanks,
> > Imre
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-01 10:12     ` Marco Varlese
@ 2017-06-02  1:40       ` Tan, Jianfeng
  2017-06-06 12:39         ` Imre Pinter
  0 siblings, 1 reply; 11+ messages in thread
From: Tan, Jianfeng @ 2017-06-02  1:40 UTC (permalink / raw)
  To: Marco Varlese, Imre Pinter, users; +Cc: Gabor Halász, Péter Suskovics



> -----Original Message-----
> From: Marco Varlese [mailto:marco.varlese@suse.com]
> Sent: Thursday, June 1, 2017 6:12 PM
> To: Tan, Jianfeng; Imre Pinter; users@dpdk.org
> Cc: Gabor Halász; Péter Suskovics
> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
> >
> > >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre Pinter
> > > Sent: Thursday, June 1, 2017 3:55 PM
> > > To: users@dpdk.org
> > > Cc: Gabor Halász; Péter Suskovics
> > > Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> > >
> > > Hi,
> > >
> > > We experience slow startup time in DPDK-OVS, when backing memory
> with
> > > 1G hugepages instead of 2M hugepages.
> > > Currently we're mapping 2M hugepages as memory backend for DPDK
> OVS.
> > > In the future we would like to allocate this memory from the 1G
> hugepage
> > > pool. Currently in our deployments we have significant amount of 1G
> > > hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
> > > hugepages.
> > >
> > > Typical setup for 2M hugepages:
> > >                 GRUB:
> > > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
> > > default_hugepagesz=1G
> > >
> > > $ grep hugetlbfs /proc/mounts
> > > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0
> > > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > >
> > > Typical setup for 1GB hugepages:
> > > GRUB:
> > > hugepagesz=1G hugepages=56 default_hugepagesz=1G
> > >
> > > $ grep hugetlbfs /proc/mounts
> > > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > >
> > > DPDK OVS startup times based on the ovs-vswitchd.log logs:
> > >
> > >   *   2M (2G memory allocated) - startup time ~3 sec:
> > >
> > > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> > >
> > > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-
> netdev:
> > > Datapath supports recirculation
> > >
> > >   *   1G (56G memory allocated) - startup time ~13 sec:
> > > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> > > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-
> netdev:
> > > Datapath supports recirculation
> > > I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with
> > > kernel 3.13.0-117-generic and 4.4.0-78-generic.
> >
> >
> > You can shorten the time by this:
> >
> > (1) Mount 1 GB hugepages into two directories.
> > nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how
> much you
> > want to use in OVS> 0 0
> > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> I understood (reading Imre) that this does not really work because of non-
> deterministic allocation of hugepages in a NUMA architecture.
> e.g. we would end up (potentially) using hugepages allocated on different
> nodes
> even when accessing the OVS directory.
> Did I understand this correctly?

Did you try step 2? And Sergio also gives more options on another email in this thread for your reference.

Thanks,
Jianfeng

> 
> >
> > (2) Force to use memory  interleave policy
> > $ numactl --interleave=all ovs-vswitchd ...
> >
> > Note: keep the huge-dir and socket-mem option, "--huge-dir
> /mnt/huge_ovs_1G --
> > socket-mem 1024,1024".
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-02  1:40       ` Tan, Jianfeng
@ 2017-06-06 12:39         ` Imre Pinter
  2017-06-06 14:31           ` Tan, Jianfeng
  0 siblings, 1 reply; 11+ messages in thread
From: Imre Pinter @ 2017-06-06 12:39 UTC (permalink / raw)
  To: Tan, Jianfeng, Marco Varlese, users
  Cc: Gabor Halász, Péter Suskovics

Hi guys,

Thanks for the replies. See my comments inline.


-----Original Message-----
From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] 
Sent: 2017. június 2. 3:40
To: Marco Varlese <marco.varlese@suse.com>; Imre Pinter <imre.pinter@ericsson.com>; users@dpdk.org
Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics <peter.suskovics@ericsson.com>
Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages



> -----Original Message-----
> From: Marco Varlese [mailto:marco.varlese@suse.com]
> Sent: Thursday, June 1, 2017 6:12 PM
> To: Tan, Jianfeng; Imre Pinter; users@dpdk.org
> Cc: Gabor Halász; Péter Suskovics
> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
> >
> > >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre 
> > > Pinter
> > > Sent: Thursday, June 1, 2017 3:55 PM
> > > To: users@dpdk.org
> > > Cc: Gabor Halász; Péter Suskovics
> > > Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> > >
> > > Hi,
> > >
> > > We experience slow startup time in DPDK-OVS, when backing memory
> with
> > > 1G hugepages instead of 2M hugepages.
> > > Currently we're mapping 2M hugepages as memory backend for DPDK
> OVS.
> > > In the future we would like to allocate this memory from the 1G
> hugepage
> > > pool. Currently in our deployments we have significant amount of 
> > > 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M 
> > > hugepages.
> > >
> > > Typical setup for 2M hugepages:
> > >                 GRUB:
> > > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 
> > > default_hugepagesz=1G
> > >
> > > $ grep hugetlbfs /proc/mounts
> > > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev 
> > > /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > >
> > > Typical setup for 1GB hugepages:
> > > GRUB:
> > > hugepagesz=1G hugepages=56 default_hugepagesz=1G
> > >
> > > $ grep hugetlbfs /proc/mounts
> > > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> > >
> > > DPDK OVS startup times based on the ovs-vswitchd.log logs:
> > >
> > >   *   2M (2G memory allocated) - startup time ~3 sec:
> > >
> > > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> > >
> > > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-
> netdev:
> > > Datapath supports recirculation
> > >
> > >   *   1G (56G memory allocated) - startup time ~13 sec:
> > > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> > > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-
> netdev:
> > > Datapath supports recirculation
> > > I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 
> > > with kernel 3.13.0-117-generic and 4.4.0-78-generic.
> >
> >
> > You can shorten the time by this:
> >
> > (1) Mount 1 GB hugepages into two directories.
> > nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how
> much you
> > want to use in OVS> 0 0
> > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> I understood (reading Imre) that this does not really work because of 
> non- deterministic allocation of hugepages in a NUMA architecture.
> e.g. we would end up (potentially) using hugepages allocated on 
> different nodes even when accessing the OVS directory.
> Did I understand this correctly?

Did you try step 2? And Sergio also gives more options on another email in this thread for your reference.

Thanks,
Jianfeng

@Jianfeng: Step (1) will not help in our case. Hence 'mount' will not allocate hugepages from NUMA1 till the system has free hugepages on NUMA0.
I have 56G hugepages allocated from 1G size. This means 28-28G hugepages available per NUMA node. If mounting action is performed via fstab, then we'll end up in one of the following scenarios randomly.
First mount for OVS, then for VMs:
+---------------------------------------+---------------------------------------+
|                 NUMA0                 |                 NUMA1                 |
+---------------------------------------+---------------------------------------+
| OVS(2G) |           VMs(26G)          |               VMs (28G)               |
+---------------------------------------+---------------------------------------+

First mount for VMs, then OVS:
+---------------------------------------+---------------------------------------+
|                 NUMA0                 |                 NUMA1                 |
+---------------------------------------+---------------------------------------+
|               VMs (28G)               |           VMs(26G)          | OVS(2G) |
+---------------------------------------+---------------------------------------+
@Marco: After the hugepages were allocated, the ones in OVS directory were either from NUMA0, or NUMA1, but not from both (different setup come after a roboot). This caused error in DPDK startup, hence 1-1 hugepages were requested from both NUMA nodes, and there was no hugepages allocated to the other NUMA node.

> 
> >
> > (2) Force to use memory  interleave policy $ numactl 
> > --interleave=all ovs-vswitchd ...
> >
> > Note: keep the huge-dir and socket-mem option, "--huge-dir
> /mnt/huge_ovs_1G --
> > socket-mem 1024,1024".
> >
@Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all ovs-vswitchd ...' cannot help, because all the hugepages mounted to OVS directory will be from one of the NUMA nodes. The DPDK application requires 1-1G hugepage from both of the NUMA nodes, so DPDK returns with an error.
I have also tried without Step (1), and we still has the slower startup.
Currently I'm looking into Sergio's mail.

Br,
Imre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-06 12:39         ` Imre Pinter
@ 2017-06-06 14:31           ` Tan, Jianfeng
  2017-06-06 15:25             ` Imre Pinter
  0 siblings, 1 reply; 11+ messages in thread
From: Tan, Jianfeng @ 2017-06-06 14:31 UTC (permalink / raw)
  To: Imre Pinter, Marco Varlese, users; +Cc: Gabor Halász, Péter Suskovics



On 6/6/2017 8:39 PM, Imre Pinter wrote:
> Hi guys,
>
> Thanks for the replies. See my comments inline.
>
>
> -----Original Message-----
> From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> Sent: 2017. június 2. 3:40
> To: Marco Varlese <marco.varlese@suse.com>; Imre Pinter <imre.pinter@ericsson.com>; users@dpdk.org
> Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics <peter.suskovics@ericsson.com>
> Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages
>
>
>
>> -----Original Message-----
>> From: Marco Varlese [mailto:marco.varlese@suse.com]
>> Sent: Thursday, June 1, 2017 6:12 PM
>> To: Tan, Jianfeng; Imre Pinter; users@dpdk.org
>> Cc: Gabor Halász; Péter Suskovics
>> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
>>
>> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
>>>> -----Original Message-----
>>>> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre
>>>> Pinter
>>>> Sent: Thursday, June 1, 2017 3:55 PM
>>>> To: users@dpdk.org
>>>> Cc: Gabor Halász; Péter Suskovics
>>>> Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
>>>>
>>>> Hi,
>>>>
>>>> We experience slow startup time in DPDK-OVS, when backing memory
>> with
>>>> 1G hugepages instead of 2M hugepages.
>>>> Currently we're mapping 2M hugepages as memory backend for DPDK
>> OVS.
>>>> In the future we would like to allocate this memory from the 1G
>> hugepage
>>>> pool. Currently in our deployments we have significant amount of
>>>> 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
>>>> hugepages.
>>>>
>>>> Typical setup for 2M hugepages:
>>>>                  GRUB:
>>>> hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
>>>> default_hugepagesz=1G
>>>>
>>>> $ grep hugetlbfs /proc/mounts
>>>> nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev
>>>> /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
>>>>
>>>> Typical setup for 1GB hugepages:
>>>> GRUB:
>>>> hugepagesz=1G hugepages=56 default_hugepagesz=1G
>>>>
>>>> $ grep hugetlbfs /proc/mounts
>>>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
>>>>
>>>> DPDK OVS startup times based on the ovs-vswitchd.log logs:
>>>>
>>>>    *   2M (2G memory allocated) - startup time ~3 sec:
>>>>
>>>> 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
>> 0x1
>>>> --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
>>>>
>>>> 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-
>> netdev:
>>>> Datapath supports recirculation
>>>>
>>>>    *   1G (56G memory allocated) - startup time ~13 sec:
>>>> 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
>> 0x1
>>>> --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
>>>> 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-
>> netdev:
>>>> Datapath supports recirculation
>>>> I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04
>>>> with kernel 3.13.0-117-generic and 4.4.0-78-generic.
>>>
>>> You can shorten the time by this:
>>>
>>> (1) Mount 1 GB hugepages into two directories.
>>> nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how
>> much you
>>> want to use in OVS> 0 0
>>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
>> I understood (reading Imre) that this does not really work because of
>> non- deterministic allocation of hugepages in a NUMA architecture.
>> e.g. we would end up (potentially) using hugepages allocated on
>> different nodes even when accessing the OVS directory.
>> Did I understand this correctly?
> Did you try step 2? And Sergio also gives more options on another email in this thread for your reference.
>
> Thanks,
> Jianfeng
>
> @Jianfeng: Step (1) will not help in our case. Hence 'mount' will not allocate hugepages from NUMA1 till the system has free hugepages on NUMA0.
> I have 56G hugepages allocated from 1G size. This means 28-28G hugepages available per NUMA node. If mounting action is performed via fstab, then we'll end up in one of the following scenarios randomly.
> First mount for OVS, then for VMs:
> +---------------------------------------+---------------------------------------+
> |                 NUMA0                 |                 NUMA1                 |
> +---------------------------------------+---------------------------------------+
> | OVS(2G) |           VMs(26G)          |               VMs (28G)               |
> +---------------------------------------+---------------------------------------+
>
> First mount for VMs, then OVS:
> +---------------------------------------+---------------------------------------+
> |                 NUMA0                 |                 NUMA1                 |
> +---------------------------------------+---------------------------------------+
> |               VMs (28G)               |           VMs(26G)          | OVS(2G) |
> +---------------------------------------+---------------------------------------+

This is why I suggested step 2 to allocate memory in an interleave way. 
Do you try that?

Thanks,
Jianfeng

> @Marco: After the hugepages were allocated, the ones in OVS directory were either from NUMA0, or NUMA1, but not from both (different setup come after a roboot). This caused error in DPDK startup, hence 1-1 hugepages were requested from both NUMA nodes, and there was no hugepages allocated to the other NUMA node.
>
>>> (2) Force to use memory  interleave policy $ numactl
>>> --interleave=all ovs-vswitchd ...
>>>
>>> Note: keep the huge-dir and socket-mem option, "--huge-dir
>> /mnt/huge_ovs_1G --
>>> socket-mem 1024,1024".
>>>
> @Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all ovs-vswitchd ...' cannot help, because all the hugepages mounted to OVS directory will be from one of the NUMA nodes. The DPDK application requires 1-1G hugepage from both of the NUMA nodes, so DPDK returns with an error.
> I have also tried without Step (1), and we still has the slower startup.
> Currently I'm looking into Sergio's mail.
>
> Br,
> Imre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-06 14:31           ` Tan, Jianfeng
@ 2017-06-06 15:25             ` Imre Pinter
  2017-06-07  8:22               ` Tan, Jianfeng
  0 siblings, 1 reply; 11+ messages in thread
From: Imre Pinter @ 2017-06-06 15:25 UTC (permalink / raw)
  To: Tan, Jianfeng, Marco Varlese, users
  Cc: Gabor Halász, Péter Suskovics



> -----Original Message-----
> From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> Sent: 2017. június 6. 16:32
> To: Imre Pinter <imre.pinter@ericsson.com>; Marco Varlese
> <marco.varlese@suse.com>; users@dpdk.org
> Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics
> <peter.suskovics@ericsson.com>
> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> 
> 
> On 6/6/2017 8:39 PM, Imre Pinter wrote:
> > Hi guys,
> >
> > Thanks for the replies. See my comments inline.
> >
> >
> > -----Original Message-----
> > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> > Sent: 2017. június 2. 3:40
> > To: Marco Varlese <marco.varlese@suse.com>; Imre Pinter
> > <imre.pinter@ericsson.com>; users@dpdk.org
> > Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics
> > <peter.suskovics@ericsson.com>
> > Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages
> >
> >
> >
> >> -----Original Message-----
> >> From: Marco Varlese [mailto:marco.varlese@suse.com]
> >> Sent: Thursday, June 1, 2017 6:12 PM
> >> To: Tan, Jianfeng; Imre Pinter; users@dpdk.org
> >> Cc: Gabor Halász; Péter Suskovics
> >> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> >>
> >> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
> >>>> -----Original Message-----
> >>>> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre
> >>>> Pinter
> >>>> Sent: Thursday, June 1, 2017 3:55 PM
> >>>> To: users@dpdk.org
> >>>> Cc: Gabor Halász; Péter Suskovics
> >>>> Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
> >>>>
> >>>> Hi,
> >>>>
> >>>> We experience slow startup time in DPDK-OVS, when backing memory
> >> with
> >>>> 1G hugepages instead of 2M hugepages.
> >>>> Currently we're mapping 2M hugepages as memory backend for DPDK
> >> OVS.
> >>>> In the future we would like to allocate this memory from the 1G
> >> hugepage
> >>>> pool. Currently in our deployments we have significant amount of 1G
> >>>> hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
> >>>> hugepages.
> >>>>
> >>>> Typical setup for 2M hugepages:
> >>>>                  GRUB:
> >>>> hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
> >>>> default_hugepagesz=1G
> >>>>
> >>>> $ grep hugetlbfs /proc/mounts
> >>>> nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0
> nodev
> >>>> /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >>>>
> >>>> Typical setup for 1GB hugepages:
> >>>> GRUB:
> >>>> hugepagesz=1G hugepages=56 default_hugepagesz=1G
> >>>>
> >>>> $ grep hugetlbfs /proc/mounts
> >>>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >>>>
> >>>> DPDK OVS startup times based on the ovs-vswitchd.log logs:
> >>>>
> >>>>    *   2M (2G memory allocated) - startup time ~3 sec:
> >>>>
> >>>> 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -
> c
> >> 0x1
> >>>> --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> >>>>
> >>>> 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-
> >> netdev:
> >>>> Datapath supports recirculation
> >>>>
> >>>>    *   1G (56G memory allocated) - startup time ~13 sec:
> >>>> 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -
> c
> >> 0x1
> >>>> --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> >>>> 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-
> >> netdev:
> >>>> Datapath supports recirculation
> >>>> I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04
> >>>> with kernel 3.13.0-117-generic and 4.4.0-78-generic.
> >>>
> >>> You can shorten the time by this:
> >>>
> >>> (1) Mount 1 GB hugepages into two directories.
> >>> nodev /mnt/huge_ovs_1G hugetlbfs
> rw,relatime,pagesize=1G,size=<how
> >> much you
> >>> want to use in OVS> 0 0
> >>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >> I understood (reading Imre) that this does not really work because of
> >> non- deterministic allocation of hugepages in a NUMA architecture.
> >> e.g. we would end up (potentially) using hugepages allocated on
> >> different nodes even when accessing the OVS directory.
> >> Did I understand this correctly?
> > Did you try step 2? And Sergio also gives more options on another email in
> this thread for your reference.
> >
> > Thanks,
> > Jianfeng
> >
> > @Jianfeng: Step (1) will not help in our case. Hence 'mount' will not allocate
> hugepages from NUMA1 till the system has free hugepages on NUMA0.
> > I have 56G hugepages allocated from 1G size. This means 28-28G
> hugepages available per NUMA node. If mounting action is performed via
> fstab, then we'll end up in one of the following scenarios randomly.
> > First mount for OVS, then for VMs:
> > +---------------------------------------+---------------------------------------+
> > |                 NUMA0                 |                 NUMA1                 |
> > +---------------------------------------+---------------------------------------+
> > | OVS(2G) |           VMs(26G)          |               VMs (28G)               |
> > +---------------------------------------+---------------------------------------+
> >
> > First mount for VMs, then OVS:
> > +---------------------------------------+---------------------------------------+
> > |                 NUMA0                 |                 NUMA1                 |
> > +---------------------------------------+---------------------------------------+
> > |               VMs (28G)               |           VMs(26G)          | OVS(2G) |
> > +---------------------------------------+---------------------------------------+
> 
> This is why I suggested step 2 to allocate memory in an interleave way.
> Do you try that?
> 
> Thanks,
> Jianfeng
> 
I've double-checked it, and if I combine Step (1) and Step (2), then OVS start end up in the following error:
EAL: Detected 32 lcore(s)
EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: Probing VFIO support...
EAL: Not enough memory available on socket 1! Requested: 1024MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory

I experienced the same behavior with testpmd.
However when they use hugepages from the 'huge_qemu_1G' mountpoint, then they start properly.

Br,
Imre

> > @Marco: After the hugepages were allocated, the ones in OVS directory
> were either from NUMA0, or NUMA1, but not from both (different setup
> come after a roboot). This caused error in DPDK startup, hence 1-1
> hugepages were requested from both NUMA nodes, and there was no
> hugepages allocated to the other NUMA node.
> >
> >>> (2) Force to use memory  interleave policy $ numactl
> >>> --interleave=all ovs-vswitchd ...
> >>>
> >>> Note: keep the huge-dir and socket-mem option, "--huge-dir
> >> /mnt/huge_ovs_1G --
> >>> socket-mem 1024,1024".
> >>>
> > @Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all ovs-
> vswitchd ...' cannot help, because all the hugepages mounted to OVS
> directory will be from one of the NUMA nodes. The DPDK application
> requires 1-1G hugepage from both of the NUMA nodes, so DPDK returns
> with an error.
> > I have also tried without Step (1), and we still has the slower startup.
> > Currently I'm looking into Sergio's mail.
> >
> > Br,
> > Imre


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-06 15:25             ` Imre Pinter
@ 2017-06-07  8:22               ` Tan, Jianfeng
  2017-06-08 14:40                 ` Imre Pinter
  0 siblings, 1 reply; 11+ messages in thread
From: Tan, Jianfeng @ 2017-06-07  8:22 UTC (permalink / raw)
  To: Imre Pinter, Marco Varlese, users; +Cc: Gabor Halász, Péter Suskovics



On 6/6/2017 11:25 PM, Imre Pinter wrote:
> [...]
> I've double-checked it, and if I combine Step (1) and Step (2), then OVS start end up in the following error:
> EAL: Detected 32 lcore(s)
> EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
> EAL: Probing VFIO support...
> EAL: Not enough memory available on socket 1! Requested: 1024MB, available: 0MB
> PANIC in rte_eal_init():
> Cannot init memory
>
> I experienced the same behavior with testpmd.
> However when they use hugepages from the 'huge_qemu_1G' mountpoint, then they start properly.
>
> Br,
> Imre

Ah yes, I tried by myself and encounter similar error as you. And try to 
deep dive into kernel hugetlb related code to see what's going on:

The hugepage allocation path is: hugetlb_fault -> hugetlb_no_page -> 
alloc_huge_page -> dequeue_huge_page_vma. Inside 
dequeue_huge_page_vma(), we can see the code logic to iterate node. But 
from huge_zonelist(), we can see "interleave" policy only applies to a 
VMA. In our case, each hugepage file is an independent VMA. As a result, 
we will go though all hugepages from node to another node one by one.

Sorry that I take "interleave" as granted. Fortunately, there is a try 
to fix this: http://dpdk.org/dev/patchwork/patch/25069/.
Besides, we can write a simple application which will allocate all 
hugepages except those used by OVS-DPDK.

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-01  9:02   ` Sergio Gonzalez Monroy
@ 2017-06-08 14:30     ` Imre Pinter
  0 siblings, 0 replies; 11+ messages in thread
From: Imre Pinter @ 2017-06-08 14:30 UTC (permalink / raw)
  To: Sergio Gonzalez Monroy, users; +Cc: Gabor Halász, Péter Suskovics



> -----Original Message-----
> From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.monroy@intel.com]
> Sent: 2017. június 1. 11:03
> To: Imre Pinter <imre.pinter@ericsson.com>; users@dpdk.org
> Cc: Gabor Halász <gabor.halasz@ericsson.com>; Péter Suskovics
> <peter.suskovics@ericsson.com>
> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
> 
> On 01/06/2017 08:55, Imre Pinter wrote:
> > Hi,
> >
> > We experience slow startup time in DPDK-OVS, when backing memory
> with 1G hugepages instead of 2M hugepages.
> > Currently we're mapping 2M hugepages as memory backend for DPDK
> OVS. In the future we would like to allocate this memory from the 1G
> hugepage pool. Currently in our deployments we have significant amount of
> 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
> hugepages.
> >
> > Typical setup for 2M hugepages:
> >                  GRUB:
> > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
> > default_hugepagesz=1G
> >
> > $ grep hugetlbfs /proc/mounts
> > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev
> > /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >
> > Typical setup for 1GB hugepages:
> > GRUB:
> > hugepagesz=1G hugepages=56 default_hugepagesz=1G
> >
> > $ grep hugetlbfs /proc/mounts
> > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
> >
> > DPDK OVS startup times based on the ovs-vswitchd.log logs:
> >
> >    *   2M (2G memory allocated) - startup time ~3 sec:
> >
> > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
> >
> > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> > Datapath supports recirculation
> >
> >    *   1G (56G memory allocated) - startup time ~13 sec:
> > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
> 0x1
> > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
> > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev:
> > Datapath supports recirculation I used DPDK 16.11 for OVS and testpmd
> and tested on Ubuntu 14.04 with kernel 3.13.0-117-generic and 4.4.0-78-
> generic.
> >
> > We had a discussion with Mark Gray (from Intel), and he come up with the
> following items:
> >
> > ·         The ~10 sec time difference is there with testpmd as well
> >
> > ·         They believe it is a kernel overhead (mmap is slow, perhaps it is
> zeroing pages). The following code from eal_memory.c does the above
> mentioned printout in EAL startup:
> > 469    /* map the segment, and populate page tables,
> > 470     * the kernel fills this segment with zeros */
> > 468    uint64_t start = rte_rdtsc();
> > 471    virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ |
> PROT_WRITE,
> > 472                    MAP_SHARED | MAP_POPULATE, fd, 0);
> > 473    if (virtaddr == MAP_FAILED) {
> > 474            RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__,
> > 475                            strerror(errno));
> > 476            close(fd);
> > 477            return i;
> > 478    }
> > 479
> > 480    if (orig) {
> > 481            hugepg_tbl[i].orig_va = virtaddr;
> > 482            printf("Original mapping of page %u took: %"PRIu64" ticks,
> %"PRIu64" ms\n     ",
> > 483                    i, rte_rdtsc() - start,
> > 484                    (rte_rdtsc() - start) * 1000 /
> > 485                    rte_get_timer_hz());
> > 486    }
> >
> >
> > A solution could be to mount 1G hugepages to 2 separate directory: 2G for
> OVS and the remaining for the VMs, but the NUMA location for these
> hugepages is non-deterministic. Since mount cannot handle NUMA related
> parameters during mounting hugetlbfs, and fstab forks the mounts during
> boot.
> >
> > Do you have a solution on how to use 1G hugepages for VMs and have
> reasonable DPDK EAL startup time?
> 
> In theory, one solution would be to use cgroup , as described here:
> http://dpdk.org/ml/archives/dev/2017-February/057742.html
> http://dpdk.org/ml/archives/dev/2017-April/063442.html
> 
> Then use 'numactl --interleave' policy.
> 
> I said in theory because it does not seem to work as one would expect, so
> the proposed patch in above threads would be a solution by forcing
> allocation from specific numa node for each page.
> 
> Thanks,
> Sergio
> 

Thanks for the reply Sergio!
The following patch (v5) at the end of the mentioned mail thread seems to be solving the issue.
http://dpdk.org/dev/patchwork/patch/25069/
Thanks,
Imre

> > Thanks,
> > Imre
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
  2017-06-07  8:22               ` Tan, Jianfeng
@ 2017-06-08 14:40                 ` Imre Pinter
  0 siblings, 0 replies; 11+ messages in thread
From: Imre Pinter @ 2017-06-08 14:40 UTC (permalink / raw)
  To: Tan, Jianfeng, Marco Varlese, users, Sergio Gonzalez Monroy
  Cc: Gabor Halász, Péter Suskovics



> -----Original Message-----
> From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> Sent: 2017. június 7. 10:22
> 
> 
> 
> On 6/6/2017 11:25 PM, Imre Pinter wrote:
> > [...]
> > I've double-checked it, and if I combine Step (1) and Step (2), then OVS
> start end up in the following error:
> > EAL: Detected 32 lcore(s)
> > EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs
> > found for that size
> > EAL: Probing VFIO support...
> > EAL: Not enough memory available on socket 1! Requested: 1024MB,
> > available: 0MB PANIC in rte_eal_init():
> > Cannot init memory
> >
> > I experienced the same behavior with testpmd.
> > However when they use hugepages from the 'huge_qemu_1G'
> mountpoint, then they start properly.
> >
> > Br,
> > Imre
> 
> Ah yes, I tried by myself and encounter similar error as you. And try to deep
> dive into kernel hugetlb related code to see what's going on:
> 
> The hugepage allocation path is: hugetlb_fault -> hugetlb_no_page ->
> alloc_huge_page -> dequeue_huge_page_vma. Inside
> dequeue_huge_page_vma(), we can see the code logic to iterate node. But
> from huge_zonelist(), we can see "interleave" policy only applies to a VMA.
> In our case, each hugepage file is an independent VMA. As a result, we will
> go though all hugepages from node to another node one by one.
> 
> Sorry that I take "interleave" as granted. Fortunately, there is a try to fix this:
> http://dpdk.org/dev/patchwork/patch/25069/.
> Besides, we can write a simple application which will allocate all hugepages
> except those used by OVS-DPDK.
> 
> Thanks,
> Jianfeng

Thanks, this is the same patch, as the result of the mail thread suggested by Sergio. I've verified it, and the patch SOLVES the slow DPDK startup issue.

The working setup:
Hugepage mount in fstab:
    nodev    /mnt/huge_qemu_1G    hugetlbfs       pagesize=1G     0       0
    nodev    /mnt/huge_ovs_1G    hugetlbfs       pagesize=1G,size=2G     0       0
DPDK vesrion: 16.11 + patch: [dpdk-dev,v5,1/2] mem: balanced allocation of hugepages
Allocate memory for DPDK application from /mnt/huge_ovs_1G mountpoint.

Thanks,
Imre

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-06-08 14:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <VI1PR07MB1357C989E2F7092D9A31ED9F80F30@VI1PR07MB1357.eurprd07.prod.outlook.com>
2017-06-01  7:55 ` [dpdk-users] Slow DPDK startup with many 1G hugepages Imre Pinter
2017-06-01  8:50   ` Tan, Jianfeng
2017-06-01 10:12     ` Marco Varlese
2017-06-02  1:40       ` Tan, Jianfeng
2017-06-06 12:39         ` Imre Pinter
2017-06-06 14:31           ` Tan, Jianfeng
2017-06-06 15:25             ` Imre Pinter
2017-06-07  8:22               ` Tan, Jianfeng
2017-06-08 14:40                 ` Imre Pinter
2017-06-01  9:02   ` Sergio Gonzalez Monroy
2017-06-08 14:30     ` Imre Pinter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).