From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <anatoly.burakov@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 19CC2AFCB
 for <dev@dpdk.org>; Mon, 16 Jun 2014 10:00:52 +0200 (CEST)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga101.fm.intel.com with ESMTP; 16 Jun 2014 01:00:38 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.01,484,1400050800"; d="scan'208";a="548506195"
Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157])
 by fmsmga001.fm.intel.com with ESMTP; 16 Jun 2014 01:00:28 -0700
Received: from irsmsx152.ger.corp.intel.com (163.33.192.66) by
 IRSMSX103.ger.corp.intel.com (163.33.3.157) with Microsoft SMTP Server (TLS)
 id 14.3.123.3; Mon, 16 Jun 2014 09:00:23 +0100
Received: from irsmsx101.ger.corp.intel.com ([169.254.1.245]) by
 IRSMSX152.ger.corp.intel.com ([169.254.6.197]) with mapi id 14.03.0123.003;
 Mon, 16 Jun 2014 09:00:23 +0100
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: "Richardson, Bruce" <bruce.richardson@intel.com>, "Gooch, Stephen (Wind
 River)" <stephen.gooch@windriver.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: mmap() hint address
Thread-Index: Ac+HSjModLMx+SPRQcmV/p2lSa3M/AAA1WOwAHrFVkA=
Date: Mon, 16 Jun 2014 08:00:22 +0000
Message-ID: <C6ECDF3AB251BE4894318F4E451236976CC9C23A@IRSMSX101.ger.corp.intel.com>
References: <9205DC19ECCD944CA2FAC59508A772BABCEFF60C@ALA-MBA.corp.ad.wrs.com>
 <59AF69C657FD0841A61C55336867B5B01AA361A6@IRSMSX103.ger.corp.intel.com>
In-Reply-To: <59AF69C657FD0841A61C55336867B5B01AA361A6@IRSMSX103.ger.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] mmap() hint address
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Jun 2014 08:00:53 -0000

Hi Bruce, Stephen,

> > Hello,
> >
> > I have seen a case where a secondary DPDK process tries to map uio
> > resource in which mmap() normally sends the corresponding virtual
> > address as a hint address.  However on some instances mmap() returns a
> > virtual address that is not the hint address, and it result in
> > rte_panic() and the secondary process goes defunct.
> >
> > This happens from time to time on an embedded device when
> nr_hugepages is
> > set to 128, but never when nr_hugepage is set to 256 on the same device=
.
> My
> > question is, if mmap() can find the correct memory regions when
> > hugepages is set to 256, would it not require less resources (and
> > therefore be more likely to
> > pass) at a lower value such as 128?
> >
> > Any ideas what would cause this mmap() behavior at a lower nr_hugepage
> > value?
> >
> > - Stephen
>=20
> Hi Stephen,
>=20
> That's a strange one!
> I don't know for definite why this is happening, but here is one possible
> theory. :-)
>=20
> It could be due to the size of the memory blocks that are getting mmapped=
.
> When you use 256 pages, the blocks of memory getting mapped may well be
> larger (depending on how fragmented in memory the 2MB pages are), and
> so may be getting mapped at a higher set of address ranges where there is
> more free memory. This set of address ranges is then free in the secondar=
y
> process and it is similarly able to map the memory.
> With the 128 hugepages, you may be looking for smaller amounts of memory
> and so the addresses get mapped in at a different spot in the virtual add=
ress
> space, one that may be more heavily used. Then when the secondary
> process tries to duplicate the mappings, it already has memory in that re=
gion
> in use and the mapping fails.
> In short - one theory is that having bigger blocks to map causes the memo=
ry
> to be mapped to a different location in memory which is free from conflic=
ts in
> the secondary process.
>=20
> So, how to confirm or refute this, and generally debug this issue?
> Well, in general we  would need to look at the messages printed out at
> startup in the primary process to see how big of blocks it is trying to m=
ap in
> each case, and where they end up in the virtual address-space.

As I remember, OVDK project has had vaguely similar issues (only they were =
trying to map hugepages into the space that QEMU  has already occupied). Th=
is resulted in us adding a --base-virtaddr EAL command-line flag that would=
 specify the start virtual address where primary process would start mappin=
g pages. I guess you can try that as well (just remember that it needs to b=
e done in the primary process, because the secondary one just copies the ma=
ppings and succeeds or fails to do so).

Best regards,
Anatoly Burakov
DPDK SW Engineer