From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by dpdk.org (Postfix) with ESMTP id 0BD5A1BB12 for ; Fri, 27 Oct 2017 21:22:44 +0200 (CEST) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v9RJKMQN116797 for ; Fri, 27 Oct 2017 15:22:44 -0400 Received: from smtp.notes.na.collabserv.com (smtp.notes.na.collabserv.com [192.155.248.90]) by mx0b-001b2d01.pphosted.com with ESMTP id 2dv6vjkx0f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 27 Oct 2017 15:22:43 -0400 Received: from localhost by smtp.notes.na.collabserv.com with smtp.notes.na.collabserv.com ESMTP for from ; Fri, 27 Oct 2017 19:22:43 -0000 Received: from us1a3-smtp04.a3.dal06.isc4sb.com (10.106.154.237) by smtp.notes.na.collabserv.com (10.106.227.141) with smtp.notes.na.collabserv.com ESMTP; Fri, 27 Oct 2017 19:22:40 -0000 Received: from us1a3-mail173.a3.dal06.isc4sb.com ([10.146.71.126]) by us1a3-smtp04.a3.dal06.isc4sb.com with ESMTP id 2017102719223989-991792 ; Fri, 27 Oct 2017 19:22:39 +0000 In-Reply-To: To: "Burakov, Anatoly" Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org From: "Jonas Pfefferle1" Date: Fri, 27 Oct 2017 21:22:41 +0200 References: <921d836f-87dc-b017-2186-e70905f61612@intel.com> MIME-Version: 1.0 X-KeepSent: ADF97DED:D3F54CF8-002581C6:00697F36; type=4; name=$KeepSent X-Mailer: IBM Notes Release 9.0.1 October 14, 2013 X-LLNOutbound: False X-Disclaimed: 48387 X-TNEFEvaluated: 1 x-cbid: 17102719-9717-0000-0000-0000046BF99D X-IBM-SpamModules-Scores: BY=0.026603; FL=0; FP=0; FZ=0; HX=0; KW=0; PH=0; SC=0.459521; ST=0; TS=0; UL=0; ISC=; MB=0.479045 X-IBM-SpamModules-Versions: BY=3.00007963; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000239; SDB=6.00937339; UDB=6.00472427; IPR=6.00717601; BA=6.00005660; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017747; XFM=3.00000015; UTC=2017-10-27 19:22:42 X-IBM-AV-DETECTION: SAVI=unsuspicious REMOTE=unsuspicious XFE=unused X-IBM-AV-VERSION: SAVI=2017-10-27 18:48:30 - 6.00007522 x-cbparentid: 17102719-9718-0000-0000-000095AB5736 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-10-27_09:, , signatures=0 X-Proofpoint-Spam-Reason: safe Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] Huge mapping secondary process linux X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Oct 2017 19:22:45 -0000 "Burakov, Anatoly" wrote on 27/10/2017=20 18:00:27: > From: "Burakov, Anatoly" > To: Jonas Pfefferle1 > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org > Date: 27/10/2017 18:00 > Subject: Re: [dpdk-dev] Huge mapping secondary process linux >=20 > On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote: > > "dev" wrote on 10/27/2017 04:58:01 PM: > >=20 > > > From: "Jonas Pfefferle1" > > > To: "Burakov, Anatoly" > > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,=20 dev@dpdk.org > > > Date: 10/27/2017 04:58 PM > > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > > > Sent by: "dev" > > > > > > > > > "Burakov, Anatoly" wrote on 10/27/2017=20 > > 04:44:52 > > > PM: > > > > > > > From: "Burakov, Anatoly" > > > > To: Jonas Pfefferle1 > > > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,=20 > > dev@dpdk.org > > > > Date: 10/27/2017 04:45 PM > > > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > > > > > > > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote: > > > > > "Burakov, Anatoly" wrote on=20 10/27/2017 > > > > > 04:06:44 PM: > > > > > > > > > > =C2 > From: "Burakov, Anatoly" > > > > > =C2 > To: Jonas Pfefferle1 , dev@dpdk.org > > > > > =C2 > Cc: chaozhu@linux.vnet.ibm.com, bruce.richardson@intel.com > > > > > =C2 > Date: 10/27/2017 04:06 PM > > > > > =C2 > Subject: Re: [dpdk-dev] Huge mapping secondary process=20 linux > > > > > =C2 > > > > > > =C2 > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote: > > > > > =C2 > > > > > > > =C2 > > > > > > > =C2 > > Hi @all, > > > > > =C2 > > > > > > > =C2 > > I'm trying to make sense of the hugepage memory mappings= =20 in > > > > > =C2 > > librte=5Feal/linuxapp/eal/eal=5Fmemory.c: > > > > > =C2 > > * In rte=5Feal=5Fhugepage=5Fattach (line 1347) when we t= ry to=20 do a > > > private > > > > > =C2 > > mapping on /dev/zero (line 1393) why do we not use=20 MAP=5FFIXED=20 > > if we > > > > > > > > need the > > > > > =C2 > > addresses to be identical with the primary process? > > > > > =C2 > > * On POWER we have this weird business going on where we= =20 use > > > > > MAP=5FHUGETLB > > > > > =C2 > > because according to this commit: > > > > > =C2 > > > > > > > =C2 > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440 > > > > > =C2 > > Author: Chao Zhu > > > > > =C2 > > Date: =C2 Thu Apr 6 15:36:09 2017 +0530 > > > > > =C2 > > > > > > > =C2 > > =C2 =C2 =C2 eal/ppc: fix mmap for memory initialization > > > > > =C2 > > > > > > > =C2 > > =C2 =C2 =C2 On IBM POWER platform, when mapping /dev/z= ero=20 file to > > > hugepage > > > > > memory > > > > > =C2 > > =C2 =C2 =C2 space, mmap will not respect the requested= =20 address=20 > > hint.This > > > will > > > > > =C2 > > cause > > > > > =C2 > > =C2 =C2 =C2 the memory initialization for the second=20 > process fails.=20 > > This > > > > > patch adds > > > > > =C2 > > =C2 =C2 =C2 the required mmap flags to make it work.=20 > Beside this, users > > > > > need to set > > > > > =C2 > > =C2 =C2 =C2 the nr=5Fovercommit=5Fhugepages to expand = the VA=20 > range. When > > > > > =C2 > > =C2 =C2 =C2 doing the initialization, users need to se= t both=20 > > nr=5Fhugepages > > > and > > > > > =C2 > > =C2 =C2 =C2 nr=5Fovercommit=5Fhugepages to the same va= lue, like=20 64,=20 > > 128, etc. > > > > > =C2 > > > > > > > =C2 > > mmap address hints are not respected. Looking at the mma= p=20 > > code in > > > the > > > > > =C2 > > kernel this is not true entirely however under some=20 > > circumstances > > > > > the hint > > > > > =C2 > > can be ignored ( > > > > > =C2 > > https://urldefense.proofpoint.com/v2/url? > > > > > =C2 > > > > > > > > > > > > >=20 > >=20 >=20 u=3Dhttp-3A=5F=5Felixir.free-2Delectrons.com=5Flinux=5Flatest=5Fsource=5Far= ch=5Fpowerpc=5Fmm=5Fmmap.c-23L103&d=3DDwICaQ&c=3Djf=5FiaSHvJObTbx- > > > > > > > > =C2 > siA1ZOg&r=3DrOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN- > > > > > =C2 > pXjigIjRW0&m=3DcttQcHlAYixhsYS3lz- > > > > > =C2 > > > >=20 BAdEeg4dpbwGdPnj2R3I8Do0&s=3DGp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e= =3D > > > > > =C2 > > ). However I believe we can remove the extra case forPPC= =20 if we > > > use > > > > > =C2 > > MAP=5FFIXED when doing the secondary process mappings=20 because we > > > need > > > > > them to > > > > > =C2 > > be identical anyway. We could also use MAP=5FFIXED=20 whendoing the > > > primary > > > > > =C2 > > process mappings resp. get=5Fvirtual=5Farea if we want t= o=20 have any > > > > > guarantees > > > > > =C2 > > when specifying a base address. Any thoughts? > > > > > =C2 > > > > > > > =C2 > > Thanks, > > > > > =C2 > > Jonas > > > > > =C2 > > > > > > > =C2 > hi Jonas, > > > > > =C2 > > > > > > =C2 > MAP=5FFIXED is not used because it's dangerous, it unmaps = anything > > > that is > > > > > =C2 > already mapped into that space. We would rather know=20 > that we can't > > > map > > > > > =C2 > something than unwittingly unmap something that was=20 > mapped before. > > > > > > > > > > Ok, I see. Maybe we can add a check to the primary process's=20 memory > > > > > mappings whether the hint has been respected or not? At least=20 warn if > > > it > > > > > hasn't. > > > > > > > > Hi Jonas, > > > > > > > > I'm unfamiliar with POWER platform, so i'm afraid you'd have to=20 explain > > > > a bit more what you mean by "hint has been respected" :) > > > > > > Hi Anatoly, > > > > > > What I meant was the mmap address hint: > > > > > > "If addr is not NULL, then the kernel takes it as a hint > > > =C2 about where to place the mapping; on Linux, the mapping will be > > > =C2 created at a nearby page boundary." > > > > > > This is actually not true on POWER. It can happen that the address=20 > > hint is > > > ignored and you get any address back that fits your mapping. > > > > > > Thanks, > > > Jonas > >=20 > > Actually looking through the kernel code this is also not guaranteed=20 on x86. > > (https://urldefense.proofpoint.com/v2/url? >=20 u=3Dhttp-3A=5F=5Felixir.free-2Delectrons.com=5Flinux=5Flatest=5Fsource=5Far= ch=5Fx86=5Fkernel=5Fsys-5Fx86-5F64.c-23L165&d=3DDwID- > g&c=3Djf=5FiaSHvJObTbx-siA1ZOg&r=3DrOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN- >=20 pXjigIjRW0&m=3DiqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=3DVqzZpcTaCUMm= NieZ3WyUw- > jsnNP-hAcW487Mumv6xPw&e=3D) > >=20 > > So in any case the address hint can be ignored by the kernel and you=20 get=20 > > any address that fits your mapping. > > My suggestion is to check when we do the initial mapping in=20 > > get=5Fvirtual=5Farea if the hint was respected or not, i.e. if the=20 returned=20 > > address =3D=3D PAGE=5FALIGN(address=5Fhint). > >=20 >=20 > I'm not sure i see the issue here. So, just to make sure i understand=20 > things correctly: >=20 > Whenever we don't request a specific base address through base=5Faddress = > EAL parameter, none of this matters - we always ask for memory in=20 > arbitrary memory locations, correct? >=20 > It's also not an issue with secondary processes because we do check=20 > returned mmap address to see whether it's the same as we requested,=20 correct? >=20 > It's only whenever we *do* specify a base=5Faddress, we provide an addres= s=20 > hint to mmap to, but we don't check if the address we got from mmap is=20 > one in the vicinity of our requested base address, correct? We don't=20 > check, and the kernel can ignore address hint, so we're not guaranteed=20 > to respect the base=5Faddress flag. >=20 > I'm not sure this is a serious issue, because as far as i'm concerned,=20 > this flag is advisory - we only promise to *attempt* to map things at=20 > that particular address, not that it will succeed. If the kernel simply=20 > cannot find an address to satisfy our address hint, or ignores it for=20 > other reasons - well, tough, nothing we can do about that. I'm not sure=20 > putting a check like this, where we can't even predict an "expected"=20 > address is a good idea. >=20 > Am i getting this right? The problem is when we specify a base address we want it to be used. If it = is not respected we basically end up with the case like we would have never=20 specified it. This very likely leads to not being able to run a secondary process=20 because we will not be able to map the addresses from our primary process and that = is why we introduced the base address parameter in the first place. >=20 > --=20 > Thanks, > Anatoly >=20