From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <alejandro.lucero@netronome.com>
Received: from mail-ed1-f65.google.com (mail-ed1-f65.google.com
 [209.85.208.65]) by dpdk.org (Postfix) with ESMTP id 52D891B52B
 for <dev@dpdk.org>; Thu,  4 Oct 2018 19:58:21 +0200 (CEST)
Received: by mail-ed1-f65.google.com with SMTP id v18-v6so8308651edq.12
 for <dev@dpdk.org>; Thu, 04 Oct 2018 10:58:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=netronome-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=36wlcMywMmQ3kXLLzE/lfkwYLGfs/X90rNZgSUmB5s0=;
 b=HXtA4fW/1q/0+4u/avdidlCjy7lNO644anfHprpWPnN3tF+yeRPE5uAI/AJcZhLiWk
 Yvflgm0G4YhyxQ1X3+Ggjm1YSzGFJJtjdsVwrA2TvQVDr3bSVdyXeOu0wCj/wNy1TfYV
 MarKjyla+IGdSiKsnYwX7qlagblhacDk7KO7CoIPvIJ+fXTeIxwWCJWjkIrrSj0A06Np
 vzAJYKFnldD7eqK4Extno/yXALyChXdf6bed3xEztzUtZh0EIZkwLJS+j72wlnJHVJR4
 WNz/6VKhwrxXCiFz4SwAP4SqasWNpWm7xBAU82fo1FPapwTh8X/HFdeutrGSE4CCfH2W
 qVIg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=36wlcMywMmQ3kXLLzE/lfkwYLGfs/X90rNZgSUmB5s0=;
 b=GRMf6/M5/mPdO8jVu3g0ZrxcYr/6Jk2fBPv4MsdQKXYOusSgN3ADE1+lTHqXch/Krj
 BZGrQEYmcCLANqLokAR3+7aO3QnSJBhh5A7rVWwyZi+5tJlUsBG20UPYX1DGAMxfCnWA
 UrLET7gkSDdopHtNkMFSzoDS948bS6K+TlCdPwuCkFAV2t6Ebf3tBTWOOnskDyIfUl7q
 m5u46NHf1kglyXyrhAhIlCXIHgrJvmcaOo6C5ekgbLnxfhGMcLQ09jpbc1ZGRdmsOGFp
 YmrLI7nY6UO49r2hLOsLn3fPdAfMgslfRdOfRO9GDTzmKJOB3tMhui3IwcDYeg+mWcgB
 pYVA==
X-Gm-Message-State: ABuFfoiuo5liNh6KJm7yi0QCzlI7Pm+lBFiFAZwfP/0nSsxJCzZbkJwy
 7rdeNzU21VGbMiWWYkYiJ+xPsZeRRBUNa54ys/tUyg==
X-Google-Smtp-Source: ACcGV61IWV9r9eEzKdKcyizQMyRjoO4InAyOP9cS4bsd9SWVH90KzdGazmGFyymCsVNdagEY9i56NkQgGDizZE7GV1g=
X-Received: by 2002:a17:906:3792:: with SMTP id
 n18-v6mr7847274ejc.179.1538675900853; 
 Thu, 04 Oct 2018 10:58:20 -0700 (PDT)
MIME-Version: 1.0
References: <1535719857-19092-1-git-send-email-alejandro.lucero@netronome.com>
 <1535719857-19092-3-git-send-email-alejandro.lucero@netronome.com>
 <6bddf8bd-ecc0-5170-7265-e49488909f4e@intel.com>
 <CAD+H991=1mW0Xsd-+4FgajbZJjs1a2sqJMeOTTH5xFXTkzDzrg@mail.gmail.com>
 <48acfd73-0a14-54c2-dfea-7e78235f6cf2@intel.com>
 <CAD+H993iy76xz4PQBqhgrm24FDa2RTZdWjL5OHVLmZVgZ65Cjw@mail.gmail.com>
 <f915d5ca-e18d-9f40-f6c7-5f7f05e5ddc9@intel.com>
In-Reply-To: <f915d5ca-e18d-9f40-f6c7-5f7f05e5ddc9@intel.com>
From: Alejandro Lucero <alejandro.lucero@netronome.com>
Date: Thu, 4 Oct 2018 18:58:09 +0100
Message-ID: <CAD+H993HHLSWGCunS0DrsCEb6R70wn7bHKDtbkxTO7_03c2YyQ@mail.gmail.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: dev <dev@dpdk.org>, dpdk stable <stable@dpdk.org>
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [dpdk-dev] [PATCH v2 2/5] mem: use address hint for mapping
	hugepages
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Oct 2018 17:58:21 -0000

On Thu, Oct 4, 2018 at 4:43 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 04-Oct-18 2:15 PM, Alejandro Lucero wrote:
> >
> >
> > On Thu, Oct 4, 2018 at 1:08 PM Burakov, Anatoly
> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >
> >     On 04-Oct-18 12:43 PM, Alejandro Lucero wrote:
> >      >
> >      >
> >      > On Wed, Oct 3, 2018 at 1:50 PM Burakov, Anatoly
> >      > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>
> >     <mailto:anatoly.burakov@intel.com
> >     <mailto:anatoly.burakov@intel.com>>> wrote:
> >      >
> >      >     On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> >      >      > Linux kernel uses a really high address as starting
> >     address for
> >      >      > serving mmaps calls. If there exist addressing limitations
> and
> >      >      > IOVA mode is VA, this starting address is likely too high
> for
> >      >      > those devices. However, it is possible to use a lower
> >     address in
> >      >      > the process virtual address space as with 64 bits there is
> >     a lot
> >      >      > of available space.
> >      >      >
> >      >      > This patch adds an address hint as starting address for 64
> >     bits
> >      >      > systems.
> >      >      >
> >      >      > Signed-off-by: Alejandro Lucero
> >     <alejandro.lucero@netronome.com <mailto:
> alejandro.lucero@netronome.com>
> >      >     <mailto:alejandro.lucero@netronome.com
> >     <mailto:alejandro.lucero@netronome.com>>>
> >      >      > ---
> >      >
> >      >     <snip>
> >      >
> >      >      >
> >      >      >               mapped_addr = mmap(requested_addr,
> >     (size_t)map_sz,
> >      >     PROT_READ,
> >      >      >                               mmap_flags, -1, 0);
> >      >      > +
> >      >      >               if (mapped_addr == MAP_FAILED &&
> allow_shrink)
> >      >
> >      >     Unintended whitespace change?
> >      >
> >      >
> >      > Yes. I'll fix it.
> >      >
> >      >      >                       *size -= page_sz;
> >      >      > -     } while (allow_shrink && mapped_addr == MAP_FAILED
> >     && *size
> >      >      > 0);
> >      >      > +
> >      >      > +             if (mapped_addr != MAP_FAILED &&
> addr_is_hint &&
> >      >      > +                 mapped_addr != requested_addr) {
> >      >      > +                     /* hint was not used. Try with
> another
> >      >     offset */
> >      >      > +                     munmap(mapped_addr, map_sz);
> >      >      > +                     mapped_addr = MAP_FAILED;
> >      >      > +                     next_baseaddr =
> >     RTE_PTR_ADD(next_baseaddr,
> >      >     0x100000000);
> >      >
> >      >     Why not increment by page size? Sure, it could take some more
> >     time to
> >      >     allocate, but will result in less wasted memory.
> >      >
> >      >
> >      > I though the same or even using smaller increments than hugepage
> >     size.
> >      > Increment the address in such amount does not mean we are wasting
> >     memory
> >      > but just leaving space if some mmap fails. I think it is better
> >     to leave
> >      > as much as space as possible just in case the data allocated in
> the
> >      > conflicted area would need to grow in the future.
> >
> >     Not sure i follow. Could you give an example of a scenario where
> >     leaving
> >     huge chunks of memory free would be preferable to just adding page
> size
> >     and starting from page-size-aligned address next time we allocate?
> >
> >
> > Usually there is nothing at 4GB address in 64 bit processes, usually the
> > text section being the first process region mapped and currently at far
> > higher than 4GB. If there is something mapped there before executing the
> > EAL hugepage/memory initialization code, not sure what it will be for,
> > but maybe it needs to grow using contiguous virtual addresses. As I say,
> > no idea what this could be used for, but the shorter the space when
> > trying again in this code, the less likely that flexibility could be
> there.
>
> But you're already leaving holes there, what difference does it make? I
> mean, it's not important, i'm just not sure why the arbitrary
> 0x100000000 increment instead of page size. Most of the calls into this
> function are from init code, and with init code we're usually calling
> this function quite a few times in succession (especially during memseg
> list allocations), so you are skipping space that could've been used for
> that.
>
>
Note that the increment is pagesize if there is no problem and the 4GB
increment is just used if that specific address fails.
I'm not against change this to always use hugepage size instead and it
seems my previous comment did not convince you. So I'll change that because
I can not sustain my case without any real data. :-)



> (btw if you are to use this constant, it should be a macro, not a raw
> constant)
>
> --
> Thanks,
> Anatoly
>