From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id B5A3C37B1 for ; Thu, 1 Jun 2017 10:50:39 +0200 (CEST) Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP; 01 Jun 2017 01:50:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,278,1493708400"; d="scan'208";a="109085590" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga005.fm.intel.com with ESMTP; 01 Jun 2017 01:50:38 -0700 Received: from fmsmsx154.amr.corp.intel.com (10.18.116.70) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 1 Jun 2017 01:50:38 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by FMSMSX154.amr.corp.intel.com (10.18.116.70) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 1 Jun 2017 01:50:37 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.116]) by SHSMSX104.ccr.corp.intel.com ([10.239.4.70]) with mapi id 14.03.0319.002; Thu, 1 Jun 2017 16:50:36 +0800 From: "Tan, Jianfeng" To: Imre Pinter , "users@dpdk.org" CC: =?iso-8859-1?Q?Gabor_Hal=E1sz?= , =?iso-8859-1?Q?P=E9ter_Suskovics?= Thread-Topic: Slow DPDK startup with many 1G hugepages Thread-Index: AdLYf50vhjMfOET4SySM24szocOCwABljQHwACaTBjA= Date: Thu, 1 Jun 2017 08:50:35 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 08:50:40 -0000 > -----Original Message----- > From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre Pinter > Sent: Thursday, June 1, 2017 3:55 PM > To: users@dpdk.org > Cc: Gabor Hal=E1sz; P=E9ter Suskovics > Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages >=20 > Hi, >=20 > We experience slow startup time in DPDK-OVS, when backing memory with > 1G hugepages instead of 2M hugepages. > Currently we're mapping 2M hugepages as memory backend for DPDK OVS. > In the future we would like to allocate this memory from the 1G hugepage > pool. Currently in our deployments we have significant amount of 1G > hugepages allocated (min. 54G) for VMs and only 2G memory on 2M > hugepages. >=20 > Typical setup for 2M hugepages: > GRUB: > hugepagesz=3D2M hugepages=3D1024 hugepagesz=3D1G hugepages=3D54 > default_hugepagesz=3D1G >=20 > $ grep hugetlbfs /proc/mounts > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=3D2M 0 0 > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=3D1G 0 0 >=20 > Typical setup for 1GB hugepages: > GRUB: > hugepagesz=3D1G hugepages=3D56 default_hugepagesz=3D1G >=20 > $ grep hugetlbfs /proc/mounts > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=3D1G 0 0 >=20 > DPDK OVS startup times based on the ovs-vswitchd.log logs: >=20 > * 2M (2G memory allocated) - startup time ~3 sec: >=20 > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024 >=20 > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: > Datapath supports recirculation >=20 > * 1G (56G memory allocated) - startup time ~13 sec: > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024 > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: > Datapath supports recirculation > I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 with > kernel 3.13.0-117-generic and 4.4.0-78-generic. You can shorten the time by this: (1) Mount 1 GB hugepages into two directories. nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=3D1G,size=3D 0 0 nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=3D1G 0 0 (2) Force to use memory interleave policy=20 $ numactl --interleave=3Dall ovs-vswitchd ... Note: keep the huge-dir and socket-mem option, "--huge-dir /mnt/huge_ovs_1G= --socket-mem 1024,1024". >=20 > We had a discussion with Mark Gray (from Intel), and he come up with the > following items: >=20 > =B7 The ~10 sec time difference is there with testpmd as well >=20 > =B7 They believe it is a kernel overhead (mmap is slow, perhaps i= t is zeroing > pages). The following code from eal_memory.c does the above mentioned > printout in EAL startup: Yes, correct. > 469 /* map the segment, and populate page tables, > 470 * the kernel fills this segment with zeros */ > 468 uint64_t start =3D rte_rdtsc(); > 471 virtaddr =3D mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE, > 472 MAP_SHARED | MAP_POPULATE, fd, 0); > 473 if (virtaddr =3D=3D MAP_FAILED) { > 474 RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__, > 475 strerror(errno)); > 476 close(fd); > 477 return i; > 478 } > 479 > 480 if (orig) { > 481 hugepg_tbl[i].orig_va =3D virtaddr; > 482 printf("Original mapping of page %u took: %"PRIu64" > ticks, %"PRIu64" ms\n ", > 483 i, rte_rdtsc() - start, > 484 (rte_rdtsc() - start) * 1000 / > 485 rte_get_timer_hz()); > 486 } >=20 >=20 > A solution could be to mount 1G hugepages to 2 separate directory: 2G for > OVS and the remaining for the VMs, but the NUMA location for these > hugepages is non-deterministic. Since mount cannot handle NUMA related > parameters during mounting hugetlbfs, and fstab forks the mounts during > boot. Oh, similar idea :-) >=20 > Do you have a solution on how to use 1G hugepages for VMs and have > reasonable DPDK EAL startup time? No, we still don't have such options. Thanks, Jianfeng >=20 > Thanks, > Imre