From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 2908037B1 for ; Thu, 1 Jun 2017 11:03:00 +0200 (CEST) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Jun 2017 02:02:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,278,1493708400"; d="scan'208";a="109564821" Received: from smonroyx-mobl.ger.corp.intel.com (HELO [10.237.221.116]) ([10.237.221.116]) by fmsmga006.fm.intel.com with ESMTP; 01 Jun 2017 02:02:58 -0700 To: Imre Pinter , "users@dpdk.org" References: Cc: =?UTF-8?Q?Gabor_Hal=c3=a1sz?= , =?UTF-8?Q?P=c3=a9ter_Suskovics?= From: Sergio Gonzalez Monroy Message-ID: <2addf963-8e23-f7fa-038a-da23a9dbcde2@intel.com> Date: Thu, 1 Jun 2017 10:02:57 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 09:03:00 -0000 On 01/06/2017 08:55, Imre Pinter wrote: > Hi, > > We experience slow startup time in DPDK-OVS, when backing memory with 1G hugepages instead of 2M hugepages. > Currently we're mapping 2M hugepages as memory backend for DPDK OVS. In the future we would like to allocate this memory from the 1G hugepage pool. Currently in our deployments we have significant amount of 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M hugepages. > > Typical setup for 2M hugepages: > GRUB: > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 default_hugepagesz=1G > > $ grep hugetlbfs /proc/mounts > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > > Typical setup for 1GB hugepages: > GRUB: > hugepagesz=1G hugepages=56 default_hugepagesz=1G > > $ grep hugetlbfs /proc/mounts > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > > DPDK OVS startup times based on the ovs-vswitchd.log logs: > > * 2M (2G memory allocated) - startup time ~3 sec: > > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024 > > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation > > * 1G (56G memory allocated) - startup time ~13 sec: > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024 > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation > I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with kernel 3.13.0-117-generic and 4.4.0-78-generic. > > We had a discussion with Mark Gray (from Intel), and he come up with the following items: > > · The ~10 sec time difference is there with testpmd as well > > · They believe it is a kernel overhead (mmap is slow, perhaps it is zeroing pages). The following code from eal_memory.c does the above mentioned printout in EAL startup: > 469 /* map the segment, and populate page tables, > 470 * the kernel fills this segment with zeros */ > 468 uint64_t start = rte_rdtsc(); > 471 virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE, > 472 MAP_SHARED | MAP_POPULATE, fd, 0); > 473 if (virtaddr == MAP_FAILED) { > 474 RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__, > 475 strerror(errno)); > 476 close(fd); > 477 return i; > 478 } > 479 > 480 if (orig) { > 481 hugepg_tbl[i].orig_va = virtaddr; > 482 printf("Original mapping of page %u took: %"PRIu64" ticks, %"PRIu64" ms\n ", > 483 i, rte_rdtsc() - start, > 484 (rte_rdtsc() - start) * 1000 / > 485 rte_get_timer_hz()); > 486 } > > > A solution could be to mount 1G hugepages to 2 separate directory: 2G for OVS and the remaining for the VMs, but the NUMA location for these hugepages is non-deterministic. Since mount cannot handle NUMA related parameters during mounting hugetlbfs, and fstab forks the mounts during boot. > > Do you have a solution on how to use 1G hugepages for VMs and have reasonable DPDK EAL startup time? In theory, one solution would be to use cgroup , as described here: http://dpdk.org/ml/archives/dev/2017-February/057742.html http://dpdk.org/ml/archives/dev/2017-April/063442.html Then use 'numactl --interleave' policy. I said in theory because it does not seem to work as one would expect, so the proposed patch in above threads would be a solution by forcing allocation from specific numa node for each page. Thanks, Sergio > Thanks, > Imre >