From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id AC1F920BD for ; Tue, 6 Jun 2017 16:31:58 +0200 (CEST) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Jun 2017 07:31:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,306,1493708400"; d="scan'208";a="95211359" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.28.221]) ([10.255.28.221]) by orsmga004.jf.intel.com with ESMTP; 06 Jun 2017 07:31:56 -0700 To: Imre Pinter , Marco Varlese , "users@dpdk.org" References: <1496311928.3871.7.camel@suse.com> Cc: =?UTF-8?Q?Gabor_Hal=c3=a1sz?= , =?UTF-8?Q?P=c3=a9ter_Suskovics?= From: "Tan, Jianfeng" Message-ID: <0f24fe8c-9294-9656-7338-1c09e5c83340@intel.com> Date: Tue, 6 Jun 2017 22:31:55 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jun 2017 14:31:59 -0000 On 6/6/2017 8:39 PM, Imre Pinter wrote: > Hi guys, > > Thanks for the replies. See my comments inline. > > > -----Original Message----- > From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] > Sent: 2017. június 2. 3:40 > To: Marco Varlese ; Imre Pinter ; users@dpdk.org > Cc: Gabor Halász ; Péter Suskovics > Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages > > > >> -----Original Message----- >> From: Marco Varlese [mailto:marco.varlese@suse.com] >> Sent: Thursday, June 1, 2017 6:12 PM >> To: Tan, Jianfeng; Imre Pinter; users@dpdk.org >> Cc: Gabor Halász; Péter Suskovics >> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages >> >> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote: >>>> -----Original Message----- >>>> From: users [mailto:users-bounces@dpdk.org] On Behalf Of Imre >>>> Pinter >>>> Sent: Thursday, June 1, 2017 3:55 PM >>>> To: users@dpdk.org >>>> Cc: Gabor Halász; Péter Suskovics >>>> Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages >>>> >>>> Hi, >>>> >>>> We experience slow startup time in DPDK-OVS, when backing memory >> with >>>> 1G hugepages instead of 2M hugepages. >>>> Currently we're mapping 2M hugepages as memory backend for DPDK >> OVS. >>>> In the future we would like to allocate this memory from the 1G >> hugepage >>>> pool. Currently in our deployments we have significant amount of >>>> 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M >>>> hugepages. >>>> >>>> Typical setup for 2M hugepages: >>>> GRUB: >>>> hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 >>>> default_hugepagesz=1G >>>> >>>> $ grep hugetlbfs /proc/mounts >>>> nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev >>>> /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 >>>> >>>> Typical setup for 1GB hugepages: >>>> GRUB: >>>> hugepagesz=1G hugepages=56 default_hugepagesz=1G >>>> >>>> $ grep hugetlbfs /proc/mounts >>>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 >>>> >>>> DPDK OVS startup times based on the ovs-vswitchd.log logs: >>>> >>>> * 2M (2G memory allocated) - startup time ~3 sec: >>>> >>>> 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c >> 0x1 >>>> --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024 >>>> >>>> 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs- >> netdev: >>>> Datapath supports recirculation >>>> >>>> * 1G (56G memory allocated) - startup time ~13 sec: >>>> 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c >> 0x1 >>>> --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024 >>>> 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs- >> netdev: >>>> Datapath supports recirculation >>>> I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 >>>> with kernel 3.13.0-117-generic and 4.4.0-78-generic. >>> >>> You can shorten the time by this: >>> >>> (1) Mount 1 GB hugepages into two directories. >>> nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=> much you >>> want to use in OVS> 0 0 >>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 >> I understood (reading Imre) that this does not really work because of >> non- deterministic allocation of hugepages in a NUMA architecture. >> e.g. we would end up (potentially) using hugepages allocated on >> different nodes even when accessing the OVS directory. >> Did I understand this correctly? > Did you try step 2? And Sergio also gives more options on another email in this thread for your reference. > > Thanks, > Jianfeng > > @Jianfeng: Step (1) will not help in our case. Hence 'mount' will not allocate hugepages from NUMA1 till the system has free hugepages on NUMA0. > I have 56G hugepages allocated from 1G size. This means 28-28G hugepages available per NUMA node. If mounting action is performed via fstab, then we'll end up in one of the following scenarios randomly. > First mount for OVS, then for VMs: > +---------------------------------------+---------------------------------------+ > | NUMA0 | NUMA1 | > +---------------------------------------+---------------------------------------+ > | OVS(2G) | VMs(26G) | VMs (28G) | > +---------------------------------------+---------------------------------------+ > > First mount for VMs, then OVS: > +---------------------------------------+---------------------------------------+ > | NUMA0 | NUMA1 | > +---------------------------------------+---------------------------------------+ > | VMs (28G) | VMs(26G) | OVS(2G) | > +---------------------------------------+---------------------------------------+ This is why I suggested step 2 to allocate memory in an interleave way. Do you try that? Thanks, Jianfeng > @Marco: After the hugepages were allocated, the ones in OVS directory were either from NUMA0, or NUMA1, but not from both (different setup come after a roboot). This caused error in DPDK startup, hence 1-1 hugepages were requested from both NUMA nodes, and there was no hugepages allocated to the other NUMA node. > >>> (2) Force to use memory interleave policy $ numactl >>> --interleave=all ovs-vswitchd ... >>> >>> Note: keep the huge-dir and socket-mem option, "--huge-dir >> /mnt/huge_ovs_1G -- >>> socket-mem 1024,1024". >>> > @Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all ovs-vswitchd ...' cannot help, because all the hugepages mounted to OVS directory will be from one of the NUMA nodes. The DPDK application requires 1-1G hugepage from both of the NUMA nodes, so DPDK returns with an error. > I have also tried without Step (1), and we still has the slower startup. > Currently I'm looking into Sergio's mail. > > Br, > Imre