From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id 156C168B7
 for <dev@dpdk.org>; Thu, 11 Sep 2014 11:47:58 +0200 (CEST)
Received: from azsmga001.ch.intel.com ([10.2.17.19])
 by orsmga102.jf.intel.com with ESMTP; 11 Sep 2014 02:47:03 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.04,504,1406617200"; d="scan'208";a="476847593"
Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153])
 by azsmga001.ch.intel.com with ESMTP; 11 Sep 2014 02:53:08 -0700
Received: from irsmsx153.ger.corp.intel.com (163.33.192.75) by
 IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Thu, 11 Sep 2014 10:53:03 +0100
Received: from irsmsx103.ger.corp.intel.com ([169.254.3.112]) by
 IRSMSX153.ger.corp.intel.com ([169.254.9.160]) with mapi id 14.03.0195.001;
 Thu, 11 Sep 2014 10:53:03 +0100
From: "Richardson, Bruce" <bruce.richardson@intel.com>
To: "Michael Hu (NSBU)" <humichael@vmware.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: dpdk starting issue with descending virtual address allocation
 in new kernel
Thread-Index: AQHPzUg9qbpltOS5kEGRYY5uwZL+h5v7r90Q
Date: Thu, 11 Sep 2014 09:53:02 +0000
Message-ID: <59AF69C657FD0841A61C55336867B5B0343F0A38@IRSMSX103.ger.corp.intel.com>
References: <D03621C0.E4C3%humichael@vmware.com>
In-Reply-To: <D03621C0.E4C3%humichael@vmware.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] dpdk starting issue with descending virtual address
 allocation in new kernel
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Sep 2014 09:47:59 -0000

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Michael Hu (NSBU)
> Sent: Wednesday, September 10, 2014 11:41 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] dpdk starting issue with descending virtual address
> allocation in new kernel
>=20
> Hi All,
>=20
> We have a kernel config question to consult you.
> DPDK failed to start due to mbuf creation issue with new kernel 3.14.17 +
> grsecurity patches.
> We tries to trace down the issue, it seems that  the virtual address of h=
uge page
> is allocated from high address to low address by kernel where dpdk expect=
s it to
> be low to high to think it is as consecutive. See dumped virt address bel=
low. It is
> first 0x710421400000, then 0x710421200000. Where previously it would be
> 0x710421200000 first , then 0x710421400000. But they are still consecutiv=
e.
> ----
> Initialize Port 0 -- TxQ 1, RxQ 1,  Src MAC 00:0c:29:b3:30:db
>     Create: Default RX  0:0  - Memory used (MBUFs 4096 x (size 1984 + Hdr=
 64)) +
> 790720 =3D   8965 KB
> Zone 0: name:<RG_MP_log_history>, phys:0x6ac00000, len:0x2080,
> virt:0x710421400000, socket_id:0, flags:0
> Zone 1: name:<MP_log_history>, phys:0x6ac02080, len:0x1d10c0,
> virt:0x710421402080, socket_id:0, flags:0
> Zone 2: name:<MALLOC_S0_HEAP_0>, phys:0x6ae00000, len:0x160000,
> virt:0x710421200000, socket_id:0, flags:0
> Zone 3: name:<rte_eth_dev_data>, phys:0x6add3140, len:0x11a00,
> virt:0x7104215d3140, socket_id:0, flags:0
> Zone 4: name:<rte_vmxnet3_pmd_0_shared>, phys:0x6ade4b40, len:0x300,
> virt:0x7104215e4b40, socket_id:0, flags:0
> Zone 5: name:<rte_vmxnet3_pmd_0_queuedesc>, phys:0x6ade4e80, len:0x200,
> virt:0x7104215e4e80, socket_id:0, flags:0
> Zone 6: name:<RG_MP_Default RX  0:0>, phys:0x6ade5080, len:0x10080,
> virt:0x7104215e5080, socket_id:0, flags:0
> Segment 0: phys:0x6ac00000, len:2097152, virt:0x710421400000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 1: phys:0x6ae00000, len:2097152, virt:0x710421200000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 2: phys:0x6b000000, len:2097152, virt:0x710421000000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 3: phys:0x6b200000, len:2097152, virt:0x710420e00000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 4: phys:0x6b400000, len:2097152, virt:0x710420c00000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 5: phys:0x6b600000, len:2097152, virt:0x710420a00000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 6: phys:0x6b800000, len:2097152, virt:0x710420800000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 7: phys:0x6ba00000, len:2097152, virt:0x710420600000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 8: phys:0x6bc00000, len:2097152, virt:0x710420400000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 9: phys:0x6be00000, len:2097152, virt:0x710420200000, socket_id:0=
,
> hugepage_sz:2097152, nchannel:0, nrank:0
> ---
>=20
>=20
>=20
>=20
>=20
> Related dpdk code is in
> dpdk/lib/librte_eal/linuxapp/eal/eal_memory.c  :: rte_eal_hugepage_init()
>     for (i =3D 0; i < nr_hugefiles; i++) {
>         new_memseg =3D 0;
>=20
>         /* if this is a new section, create a new memseg */
>         if (i =3D=3D 0)
>             new_memseg =3D 1;
>         else if (hugepage[i].socket_id !=3D hugepage[i-1].socket_id)
>             new_memseg =3D 1;
>         else if (hugepage[i].size !=3D hugepage[i-1].size)
>             new_memseg =3D 1;
>         else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) !=3D
>             hugepage[i].size)
>             new_memseg =3D 1;
>         else if (((unsigned long)hugepage[i].final_va -
>             (unsigned long)hugepage[i-1].final_va) !=3D hugepage[i].size)=
 {
>             new_memseg =3D 1;
>         }
>=20
>=20
>=20
> Is this a known issue? Is there any workaround? Or Could you advise which
> kernel config may relate this this kernel behavior change?
>=20
> Thanks,
> Michael

This should not be a problem for Intel DPDK startup, as the EAL should take=
 care of mmaps being done in this order at startup.=20
By way of background, where I have seen this occur before is on 32-bit syst=
ems, while 64-bit systems tend to mmap in ascending order in every case I'v=
e looked at. [Q: is this 32-bit you are running, or 64-bit]. In either case=
, we modified the EAL memory mapping code some time back to try and take ac=
count of this - if you look in map_all_hugepages function in eal_memory.c, =
you will see that, when we go to do the second mapping of hugepages to line=
 the pages up, we do so by explicitly specifying our preferred address. We =
get this address by allocating a large block of memory from /dev/zero, taki=
ng the address and then freeing it again. Then we map the pages one-at-a-ti=
me into that free address block, so that even when the kernel wants to map =
things from hi to lo address, our address hints still cause things to map f=
rom lo to hi. If this does not work for you, I'd be curious to find out why=
. Do any of the security patches you have applied prevent mmap address hint=
ing from working, for instance?

Regards,
/Bruce