From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <david.marchand@6wind.com>
Received: from mail-wm0-f42.google.com (mail-wm0-f42.google.com [74.125.82.42])
 by dpdk.org (Postfix) with ESMTP id 205532BC3
 for <dev@dpdk.org>; Tue, 17 May 2016 18:39:23 +0200 (CEST)
Received: by mail-wm0-f42.google.com with SMTP id g17so40878681wme.1
 for <dev@dpdk.org>; Tue, 17 May 2016 09:39:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=ZTh5MVVM+Y/Oew8Qbf9AbxDsUwZbrLCnnrzgB3gljvk=;
 b=xGhqi0fheYUNLTGN4Dg3H42ObH42PO1bdBvw0fYEgvrOI3KwdvFJZ81q+fqQ/S5YxG
 lKOdFgDOdsGJKrKvo6ZKx6GLdgG3pM+j+ykdmt/Xt2L2nU384SKdQSTMOO6wAHyUy/gM
 Nnd1F/lSw3j9noNEXjMa4INJiRfBTPqoKjQ9K+7JmqoduCsNC3ZR6ayeEkPBu6bMRPJg
 GGNmU3Ds3No//4clsxGgK6Vcma1WH7kD0ngWYuykpS6MV1W7rE5WoDHko7dlhKrbDFkC
 U6Yfa8FcUhKYpcvT7xtr1m8jzJYRpLbZamJ7doCBBmMRNhwnG47rnzY2IxvyRtmJwLph
 2sQg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=ZTh5MVVM+Y/Oew8Qbf9AbxDsUwZbrLCnnrzgB3gljvk=;
 b=F5uU42ZK0U4jZRA2Mc5xtFo8EXcgsUD8v1zjvyzEwWSVyT6Rro7MMGFCdr8yI77G3h
 Xa0DpNnIqSnc1q3OEn65GirIxNNa1kM4g2ZZsxmMKbRGsEP4Fys9zsdN0LjLKaNCgN2G
 K1+HFtlMJe2WWDzuDrb3qKRZXgXa4en16kE6eKsLyODiPVqFo6SysQG0LmUQiwBGehX7
 Z0hdvjN84E0aIrWg+Qh2PR5eCTha7s4SuQJL50AYVF73EgGEO3bhh/Jw6dhxe8bDQo/j
 U8r5S3QUlWDNXR8zkUdzhapTi1UgkCCiLfsPrkyWP0WHSreliHPNZqWuReruI6d/8tFp
 lh7Q==
X-Gm-Message-State: AOPr4FVmCHxs+O0JCJ9a1z8JNIiQwn2DjzV06V669IjcEeFZd9qOUd3UmMX3B7jC+0FMlKbyUxJ1duRiHqVtOhab
X-Received: by 10.194.123.67 with SMTP id ly3mr2464185wjb.135.1463503162910;
 Tue, 17 May 2016 09:39:22 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.28.16.2 with HTTP; Tue, 17 May 2016 09:39:03 -0700 (PDT)
In-Reply-To: <1463013881-27985-1-git-send-email-jianfeng.tan@intel.com>
References: <1457089092-4128-1-git-send-email-jianfeng.tan@intel.com>
 <1463013881-27985-1-git-send-email-jianfeng.tan@intel.com>
From: David Marchand <david.marchand@6wind.com>
Date: Tue, 17 May 2016 18:39:03 +0200
Message-ID: <CALwxeUv5du9KahwyR4OqD1AhyzcLy5j013n0QURGoXSAMQxEKQ@mail.gmail.com>
To: Jianfeng Tan <jianfeng.tan@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
 Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>, 
 Neil Horman <nhorman@tuxdriver.com>
Content-Type: text/plain; charset=UTF-8
Subject: Re: [dpdk-dev] [PATCH v4] eal: make hugetlb initialization more
	robust
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 16:39:23 -0000

Hello Jianfeng,

On Thu, May 12, 2016 at 2:44 AM, Jianfeng Tan <jianfeng.tan@intel.com> wrote:
> This patch adds an option, --huge-trybest, to use a recover mechanism to
> the case that there are not so many hugepages (declared in sysfs), which
> can be used. It relys on a mem access to fault-in hugepages, and if fails
> with SIGBUS, recover to previously saved stack environment with
> siglongjmp().
>
> Besides, this solution fixes an issue when hugetlbfs is specified with an
> option of size. Currently DPDK does not respect the quota of a hugetblfs
> mount. It fails to init the EAL because it tries to map the number of free
> hugepages in the system rather than using the number specified in the quota
> for that mount.
>
> It's still an open issue with CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS. Under
> this case (such as IVSHMEM target), having hugetlbfs mounts with quota will
> fail to remap hugepages as it relies on having mapped all free hugepages
> in the system.

For such a case case, maybe having some warning log message when it
fails would help the user.
+ a known issue in the release notes ?


> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 5b9132c..8c77010 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -417,12 +434,33 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>                         hugepg_tbl[i].final_va = virtaddr;
>                 }
>
> +               if (orig && internal_config.huge_trybest) {
> +                       /* In linux, hugetlb limitations, like cgroup, are
> +                        * enforced at fault time instead of mmap(), even
> +                        * with the option of MAP_POPULATE. Kernel will send
> +                        * a SIGBUS signal. To avoid to be killed, save stack
> +                        * environment here, if SIGBUS happens, we can jump
> +                        * back here.
> +                        */
> +                       if (wrap_sigsetjmp()) {
> +                               RTE_LOG(DEBUG, EAL, "SIGBUS: Cannot mmap more "
> +                                       "hugepages of size %u MB\n",
> +                                       (unsigned)(hugepage_sz / 0x100000));
> +                               munmap(virtaddr, hugepage_sz);
> +                               close(fd);
> +                               unlink(hugepg_tbl[i].filepath);
> +                               return i;
> +                       }
> +                       *(int *)virtaddr = 0;
> +               }
> +
> +
>                 /* set shared flock on the file. */
>                 if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
> -                       RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n",
> +                       RTE_LOG(DEBUG, EAL, "%s(): Locking file failed:%s \n",
>                                 __func__, strerror(errno));
>                         close(fd);
> -                       return -1;
> +                       return i;
>                 }
>
>                 close(fd);

Maybe I missed something, but we are writing into some hugepage before
the flock has been called.
Are we sure there is nobody else using this hugepage ?

Especially, can't this cause trouble to a primary process running if
we start the exact same primary process ?


-- 
David Marchand