From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 156C168B7 for ; Thu, 11 Sep 2014 11:47:58 +0200 (CEST) Received: from azsmga001.ch.intel.com ([10.2.17.19]) by orsmga102.jf.intel.com with ESMTP; 11 Sep 2014 02:47:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,504,1406617200"; d="scan'208";a="476847593" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by azsmga001.ch.intel.com with ESMTP; 11 Sep 2014 02:53:08 -0700 Received: from irsmsx153.ger.corp.intel.com (163.33.192.75) by IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS) id 14.3.195.1; Thu, 11 Sep 2014 10:53:03 +0100 Received: from irsmsx103.ger.corp.intel.com ([169.254.3.112]) by IRSMSX153.ger.corp.intel.com ([169.254.9.160]) with mapi id 14.03.0195.001; Thu, 11 Sep 2014 10:53:03 +0100 From: "Richardson, Bruce" To: "Michael Hu (NSBU)" , "dev@dpdk.org" Thread-Topic: dpdk starting issue with descending virtual address allocation in new kernel Thread-Index: AQHPzUg9qbpltOS5kEGRYY5uwZL+h5v7r90Q Date: Thu, 11 Sep 2014 09:53:02 +0000 Message-ID: <59AF69C657FD0841A61C55336867B5B0343F0A38@IRSMSX103.ger.corp.intel.com> References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] dpdk starting issue with descending virtual address allocation in new kernel X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Sep 2014 09:47:59 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Michael Hu (NSBU) > Sent: Wednesday, September 10, 2014 11:41 PM > To: dev@dpdk.org > Subject: [dpdk-dev] dpdk starting issue with descending virtual address > allocation in new kernel >=20 > Hi All, >=20 > We have a kernel config question to consult you. > DPDK failed to start due to mbuf creation issue with new kernel 3.14.17 + > grsecurity patches. > We tries to trace down the issue, it seems that the virtual address of h= uge page > is allocated from high address to low address by kernel where dpdk expect= s it to > be low to high to think it is as consecutive. See dumped virt address bel= low. It is > first 0x710421400000, then 0x710421200000. Where previously it would be > 0x710421200000 first , then 0x710421400000. But they are still consecutiv= e. > ---- > Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 00:0c:29:b3:30:db > Create: Default RX 0:0 - Memory used (MBUFs 4096 x (size 1984 + Hdr= 64)) + > 790720 =3D 8965 KB > Zone 0: name:, phys:0x6ac00000, len:0x2080, > virt:0x710421400000, socket_id:0, flags:0 > Zone 1: name:, phys:0x6ac02080, len:0x1d10c0, > virt:0x710421402080, socket_id:0, flags:0 > Zone 2: name:, phys:0x6ae00000, len:0x160000, > virt:0x710421200000, socket_id:0, flags:0 > Zone 3: name:, phys:0x6add3140, len:0x11a00, > virt:0x7104215d3140, socket_id:0, flags:0 > Zone 4: name:, phys:0x6ade4b40, len:0x300, > virt:0x7104215e4b40, socket_id:0, flags:0 > Zone 5: name:, phys:0x6ade4e80, len:0x200, > virt:0x7104215e4e80, socket_id:0, flags:0 > Zone 6: name:, phys:0x6ade5080, len:0x10080, > virt:0x7104215e5080, socket_id:0, flags:0 > Segment 0: phys:0x6ac00000, len:2097152, virt:0x710421400000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 1: phys:0x6ae00000, len:2097152, virt:0x710421200000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 2: phys:0x6b000000, len:2097152, virt:0x710421000000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 3: phys:0x6b200000, len:2097152, virt:0x710420e00000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 4: phys:0x6b400000, len:2097152, virt:0x710420c00000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 5: phys:0x6b600000, len:2097152, virt:0x710420a00000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 6: phys:0x6b800000, len:2097152, virt:0x710420800000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 7: phys:0x6ba00000, len:2097152, virt:0x710420600000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 8: phys:0x6bc00000, len:2097152, virt:0x710420400000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 9: phys:0x6be00000, len:2097152, virt:0x710420200000, socket_id:0= , > hugepage_sz:2097152, nchannel:0, nrank:0 > --- >=20 >=20 >=20 >=20 >=20 > Related dpdk code is in > dpdk/lib/librte_eal/linuxapp/eal/eal_memory.c :: rte_eal_hugepage_init() > for (i =3D 0; i < nr_hugefiles; i++) { > new_memseg =3D 0; >=20 > /* if this is a new section, create a new memseg */ > if (i =3D=3D 0) > new_memseg =3D 1; > else if (hugepage[i].socket_id !=3D hugepage[i-1].socket_id) > new_memseg =3D 1; > else if (hugepage[i].size !=3D hugepage[i-1].size) > new_memseg =3D 1; > else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) !=3D > hugepage[i].size) > new_memseg =3D 1; > else if (((unsigned long)hugepage[i].final_va - > (unsigned long)hugepage[i-1].final_va) !=3D hugepage[i].size)= { > new_memseg =3D 1; > } >=20 >=20 >=20 > Is this a known issue? Is there any workaround? Or Could you advise which > kernel config may relate this this kernel behavior change? >=20 > Thanks, > Michael This should not be a problem for Intel DPDK startup, as the EAL should take= care of mmaps being done in this order at startup.=20 By way of background, where I have seen this occur before is on 32-bit syst= ems, while 64-bit systems tend to mmap in ascending order in every case I'v= e looked at. [Q: is this 32-bit you are running, or 64-bit]. In either case= , we modified the EAL memory mapping code some time back to try and take ac= count of this - if you look in map_all_hugepages function in eal_memory.c, = you will see that, when we go to do the second mapping of hugepages to line= the pages up, we do so by explicitly specifying our preferred address. We = get this address by allocating a large block of memory from /dev/zero, taki= ng the address and then freeing it again. Then we map the pages one-at-a-ti= me into that free address block, so that even when the kernel wants to map = things from hi to lo address, our address hints still cause things to map f= rom lo to hi. If this does not work for you, I'd be curious to find out why= . Do any of the security patches you have applied prevent mmap address hint= ing from working, for instance? Regards, /Bruce