From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f42.google.com (mail-wm0-f42.google.com [74.125.82.42]) by dpdk.org (Postfix) with ESMTP id 205532BC3 for ; Tue, 17 May 2016 18:39:23 +0200 (CEST) Received: by mail-wm0-f42.google.com with SMTP id g17so40878681wme.1 for ; Tue, 17 May 2016 09:39:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ZTh5MVVM+Y/Oew8Qbf9AbxDsUwZbrLCnnrzgB3gljvk=; b=xGhqi0fheYUNLTGN4Dg3H42ObH42PO1bdBvw0fYEgvrOI3KwdvFJZ81q+fqQ/S5YxG lKOdFgDOdsGJKrKvo6ZKx6GLdgG3pM+j+ykdmt/Xt2L2nU384SKdQSTMOO6wAHyUy/gM Nnd1F/lSw3j9noNEXjMa4INJiRfBTPqoKjQ9K+7JmqoduCsNC3ZR6ayeEkPBu6bMRPJg GGNmU3Ds3No//4clsxGgK6Vcma1WH7kD0ngWYuykpS6MV1W7rE5WoDHko7dlhKrbDFkC U6Yfa8FcUhKYpcvT7xtr1m8jzJYRpLbZamJ7doCBBmMRNhwnG47rnzY2IxvyRtmJwLph 2sQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ZTh5MVVM+Y/Oew8Qbf9AbxDsUwZbrLCnnrzgB3gljvk=; b=F5uU42ZK0U4jZRA2Mc5xtFo8EXcgsUD8v1zjvyzEwWSVyT6Rro7MMGFCdr8yI77G3h Xa0DpNnIqSnc1q3OEn65GirIxNNa1kM4g2ZZsxmMKbRGsEP4Fys9zsdN0LjLKaNCgN2G K1+HFtlMJe2WWDzuDrb3qKRZXgXa4en16kE6eKsLyODiPVqFo6SysQG0LmUQiwBGehX7 Z0hdvjN84E0aIrWg+Qh2PR5eCTha7s4SuQJL50AYVF73EgGEO3bhh/Jw6dhxe8bDQo/j U8r5S3QUlWDNXR8zkUdzhapTi1UgkCCiLfsPrkyWP0WHSreliHPNZqWuReruI6d/8tFp lh7Q== X-Gm-Message-State: AOPr4FVmCHxs+O0JCJ9a1z8JNIiQwn2DjzV06V669IjcEeFZd9qOUd3UmMX3B7jC+0FMlKbyUxJ1duRiHqVtOhab X-Received: by 10.194.123.67 with SMTP id ly3mr2464185wjb.135.1463503162910; Tue, 17 May 2016 09:39:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.16.2 with HTTP; Tue, 17 May 2016 09:39:03 -0700 (PDT) In-Reply-To: <1463013881-27985-1-git-send-email-jianfeng.tan@intel.com> References: <1457089092-4128-1-git-send-email-jianfeng.tan@intel.com> <1463013881-27985-1-git-send-email-jianfeng.tan@intel.com> From: David Marchand Date: Tue, 17 May 2016 18:39:03 +0200 Message-ID: To: Jianfeng Tan Cc: "dev@dpdk.org" , Sergio Gonzalez Monroy , Neil Horman Content-Type: text/plain; charset=UTF-8 Subject: Re: [dpdk-dev] [PATCH v4] eal: make hugetlb initialization more robust X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2016 16:39:23 -0000 Hello Jianfeng, On Thu, May 12, 2016 at 2:44 AM, Jianfeng Tan wrote: > This patch adds an option, --huge-trybest, to use a recover mechanism to > the case that there are not so many hugepages (declared in sysfs), which > can be used. It relys on a mem access to fault-in hugepages, and if fails > with SIGBUS, recover to previously saved stack environment with > siglongjmp(). > > Besides, this solution fixes an issue when hugetlbfs is specified with an > option of size. Currently DPDK does not respect the quota of a hugetblfs > mount. It fails to init the EAL because it tries to map the number of free > hugepages in the system rather than using the number specified in the quota > for that mount. > > It's still an open issue with CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS. Under > this case (such as IVSHMEM target), having hugetlbfs mounts with quota will > fail to remap hugepages as it relies on having mapped all free hugepages > in the system. For such a case case, maybe having some warning log message when it fails would help the user. + a known issue in the release notes ? > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 5b9132c..8c77010 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -417,12 +434,33 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, > hugepg_tbl[i].final_va = virtaddr; > } > > + if (orig && internal_config.huge_trybest) { > + /* In linux, hugetlb limitations, like cgroup, are > + * enforced at fault time instead of mmap(), even > + * with the option of MAP_POPULATE. Kernel will send > + * a SIGBUS signal. To avoid to be killed, save stack > + * environment here, if SIGBUS happens, we can jump > + * back here. > + */ > + if (wrap_sigsetjmp()) { > + RTE_LOG(DEBUG, EAL, "SIGBUS: Cannot mmap more " > + "hugepages of size %u MB\n", > + (unsigned)(hugepage_sz / 0x100000)); > + munmap(virtaddr, hugepage_sz); > + close(fd); > + unlink(hugepg_tbl[i].filepath); > + return i; > + } > + *(int *)virtaddr = 0; > + } > + > + > /* set shared flock on the file. */ > if (flock(fd, LOCK_SH | LOCK_NB) == -1) { > - RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n", > + RTE_LOG(DEBUG, EAL, "%s(): Locking file failed:%s \n", > __func__, strerror(errno)); > close(fd); > - return -1; > + return i; > } > > close(fd); Maybe I missed something, but we are writing into some hugepage before the flock has been called. Are we sure there is nobody else using this hugepage ? Especially, can't this cause trouble to a primary process running if we start the exact same primary process ? -- David Marchand