From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.nue.novell.com (smtp.nue.novell.com [195.135.221.5]) by dpdk.org (Postfix) with ESMTP id 48C0020F for ; Thu, 1 Jun 2017 12:12:43 +0200 (CEST) Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Thu, 01 Jun 2017 12:12:42 +0200 Received: from linux-yk3w.homenet.telecomitalia.it (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Thu, 01 Jun 2017 11:12:11 +0100 Message-ID: <1496311928.3871.7.camel@suse.com> From: Marco Varlese To: "Tan, Jianfeng" , Imre Pinter , "users@dpdk.org" Cc: Gabor =?ISO-8859-1?Q?Hal=E1sz?= , =?ISO-8859-1?Q?P=E9ter?= Suskovics Date: Thu, 01 Jun 2017 12:12:08 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.20.5 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 10:12:43 -0000 On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote: > > > > > -----Original Message----- > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre Pinter > > Sent: Thursday, June 1, 2017 3:55 PM > > To: users@dpdk.org > > Cc: Gabor Halász; Péter Suskovics > > Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages > > > > Hi, > > > > We experience slow startup time in DPDK-OVS, when backing memory with > > 1G hugepages instead of 2M hugepages. > > Currently we're mapping 2M hugepages as memory backend for DPDK OVS. > > In the future we would like to allocate this memory from the 1G hugepage > > pool. Currently in our deployments we have significant amount of 1G > > hugepages allocated (min. 54G) for VMs and only 2G memory on 2M > > hugepages. > > > > Typical setup for 2M hugepages: > >                 GRUB: > > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 > > default_hugepagesz=1G > > > > $ grep hugetlbfs /proc/mounts > > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 > > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > > > > Typical setup for 1GB hugepages: > > GRUB: > > hugepagesz=1G hugepages=56 default_hugepagesz=1G > > > > $ grep hugetlbfs /proc/mounts > > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > > > > DPDK OVS startup times based on the ovs-vswitchd.log logs: > > > >   *   2M (2G memory allocated) - startup time ~3 sec: > > > > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 > > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024 > > > > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: > > Datapath supports recirculation > > > >   *   1G (56G memory allocated) - startup time ~13 sec: > > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 > > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024 > > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: > > Datapath supports recirculation > > I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with > > kernel 3.13.0-117-generic and 4.4.0-78-generic. > > > You can shorten the time by this: > > (1) Mount 1 GB hugepages into two directories. > nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size= want to use in OVS> 0 0 > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 I understood (reading Imre) that this does not really work because of non- deterministic allocation of hugepages in a NUMA architecture. e.g. we would end up (potentially) using hugepages allocated on different nodes even when accessing the OVS directory.  Did I understand this correctly? > > (2) Force to use memory  interleave policy  > $ numactl --interleave=all ovs-vswitchd ... > > Note: keep the huge-dir and socket-mem option, "--huge-dir /mnt/huge_ovs_1G -- > socket-mem 1024,1024". > > > > > > > We had a discussion with Mark Gray (from Intel), and he come up with the > > following items: > > > > ·         The ~10 sec time difference is there with testpmd as well > > > > ·         They believe it is a kernel overhead (mmap is slow, perhaps it is > > zeroing > > pages). The following code from eal_memory.c does the above mentioned > > printout in EAL startup: > > Yes, correct. > > > > > 469    /* map the segment, and populate page tables, > > 470     * the kernel fills this segment with zeros */ > > 468    uint64_t start = rte_rdtsc(); > > 471    virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE, > > 472                    MAP_SHARED | MAP_POPULATE, fd, 0); > > 473    if (virtaddr == MAP_FAILED) { > > 474            RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__, > > 475                            strerror(errno)); > > 476            close(fd); > > 477            return i; > > 478    } > > 479 > > 480    if (orig) { > > 481            hugepg_tbl[i].orig_va = virtaddr; > > 482            printf("Original mapping of page %u took: %"PRIu64" > > ticks, %"PRIu64" ms\n     ", > > 483                    i, rte_rdtsc() - start, > > 484                    (rte_rdtsc() - start) * 1000 / > > 485                    rte_get_timer_hz()); > > 486    } > > > > > > A solution could be to mount 1G hugepages to 2 separate directory: 2G for > > OVS and the remaining for the VMs, but the NUMA location for these > > hugepages is non-deterministic. Since mount cannot handle NUMA related > > parameters during mounting hugetlbfs, and fstab forks the mounts during > > boot. > > Oh, similar idea :-) > > > > > > > Do you have a solution on how to use 1G hugepages for VMs and have > > reasonable DPDK EAL startup time? > > No, we still don't have such options. > > Thanks, > Jianfeng > > > > > > > Thanks, > > Imre > >