From: David Marchand <david.marchand@6wind.com>
To: Jianfeng Tan <jianfeng.tan@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>,
	 Neil Horman <nhorman@tuxdriver.com>
Subject: Re: [dpdk-dev] [PATCH v4] eal: make hugetlb initialization more robust
Date: Tue, 17 May 2016 18:39:03 +0200	[thread overview]
Message-ID: <CALwxeUv5du9KahwyR4OqD1AhyzcLy5j013n0QURGoXSAMQxEKQ@mail.gmail.com> (raw)
In-Reply-To: <1463013881-27985-1-git-send-email-jianfeng.tan@intel.com>
Hello Jianfeng,
On Thu, May 12, 2016 at 2:44 AM, Jianfeng Tan <jianfeng.tan@intel.com> wrote:
> This patch adds an option, --huge-trybest, to use a recover mechanism to
> the case that there are not so many hugepages (declared in sysfs), which
> can be used. It relys on a mem access to fault-in hugepages, and if fails
> with SIGBUS, recover to previously saved stack environment with
> siglongjmp().
>
> Besides, this solution fixes an issue when hugetlbfs is specified with an
> option of size. Currently DPDK does not respect the quota of a hugetblfs
> mount. It fails to init the EAL because it tries to map the number of free
> hugepages in the system rather than using the number specified in the quota
> for that mount.
>
> It's still an open issue with CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS. Under
> this case (such as IVSHMEM target), having hugetlbfs mounts with quota will
> fail to remap hugepages as it relies on having mapped all free hugepages
> in the system.
For such a case case, maybe having some warning log message when it
fails would help the user.
+ a known issue in the release notes ?
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 5b9132c..8c77010 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -417,12 +434,33 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>                         hugepg_tbl[i].final_va = virtaddr;
>                 }
>
> +               if (orig && internal_config.huge_trybest) {
> +                       /* In linux, hugetlb limitations, like cgroup, are
> +                        * enforced at fault time instead of mmap(), even
> +                        * with the option of MAP_POPULATE. Kernel will send
> +                        * a SIGBUS signal. To avoid to be killed, save stack
> +                        * environment here, if SIGBUS happens, we can jump
> +                        * back here.
> +                        */
> +                       if (wrap_sigsetjmp()) {
> +                               RTE_LOG(DEBUG, EAL, "SIGBUS: Cannot mmap more "
> +                                       "hugepages of size %u MB\n",
> +                                       (unsigned)(hugepage_sz / 0x100000));
> +                               munmap(virtaddr, hugepage_sz);
> +                               close(fd);
> +                               unlink(hugepg_tbl[i].filepath);
> +                               return i;
> +                       }
> +                       *(int *)virtaddr = 0;
> +               }
> +
> +
>                 /* set shared flock on the file. */
>                 if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
> -                       RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n",
> +                       RTE_LOG(DEBUG, EAL, "%s(): Locking file failed:%s \n",
>                                 __func__, strerror(errno));
>                         close(fd);
> -                       return -1;
> +                       return i;
>                 }
>
>                 close(fd);
Maybe I missed something, but we are writing into some hugepage before
the flock has been called.
Are we sure there is nobody else using this hugepage ?
Especially, can't this cause trouble to a primary process running if
we start the exact same primary process ?
-- 
David Marchand
next prev parent reply	other threads:[~2016-05-17 16:39 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-24 18:49 [dpdk-dev] [RFC] eal: add cgroup-aware resource self discovery Jianfeng Tan
2016-01-25 13:46 ` Neil Horman
2016-01-26  2:22   ` Tan, Jianfeng
2016-01-26 14:19     ` Neil Horman
2016-01-27 12:02       ` Tan, Jianfeng
2016-01-27 17:30         ` Neil Horman
2016-01-29 11:22 ` [dpdk-dev] [PATCH] eal: make resource initialization more robust Jianfeng Tan
2016-02-01 18:08   ` Neil Horman
2016-02-22  6:08   ` Tan, Jianfeng
2016-02-22 13:18     ` Neil Horman
2016-02-28 21:12   ` Thomas Monjalon
2016-02-29  1:50     ` Tan, Jianfeng
2016-03-04 10:05 ` [dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores Jianfeng Tan
2016-03-08  8:54   ` Panu Matilainen
2016-03-08 17:38     ` Tan, Jianfeng
2016-03-09 13:05       ` Panu Matilainen
2016-03-09 13:53         ` Tan, Jianfeng
2016-03-09 14:01           ` Ananyev, Konstantin
2016-03-09 14:17             ` Tan, Jianfeng
2016-03-09 14:44               ` Ananyev, Konstantin
2016-03-09 14:55                 ` Tan, Jianfeng
2016-03-09 15:17                   ` Ananyev, Konstantin
2016-03-09 17:45                     ` Tan, Jianfeng
2016-03-09 19:33                       ` Ananyev, Konstantin
2016-03-10  1:36                         ` Tan, Jianfeng
2016-05-18 12:46         ` David Marchand
2016-05-19  2:25           ` Tan, Jianfeng
2016-06-30 13:43             ` Thomas Monjalon
2016-07-01  0:52               ` Tan, Jianfeng
2016-04-26 12:39   ` Tan, Jianfeng
2016-03-04 10:58 ` [dpdk-dev] [PATCH] eal: make hugetlb initialization more robust Jianfeng Tan
2016-03-08  1:42   ` [dpdk-dev] [PATCH v2] " Jianfeng Tan
2016-03-08  8:46     ` Tan, Jianfeng
2016-05-04 11:07     ` Sergio Gonzalez Monroy
2016-05-04 11:28       ` Tan, Jianfeng
2016-05-04 12:25     ` Sergio Gonzalez Monroy
2016-05-09 10:48   ` [dpdk-dev] [PATCH v3] " Jianfeng Tan
2016-05-10  8:54     ` Sergio Gonzalez Monroy
2016-05-10  9:11       ` Tan, Jianfeng
2016-05-12  0:44   ` [dpdk-dev] [PATCH v4] " Jianfeng Tan
2016-05-17 16:39     ` David Marchand [this message]
2016-05-18  7:56       ` Sergio Gonzalez Monroy
2016-05-18  9:34         ` David Marchand
2016-05-19  2:00       ` Tan, Jianfeng
2016-05-17 16:40     ` Thomas Monjalon
2016-05-18  8:06       ` Sergio Gonzalez Monroy
2016-05-18  9:38         ` David Marchand
2016-05-19  2:11         ` Tan, Jianfeng
2016-05-31  3:37 ` [dpdk-dev] [PATCH v5] eal: fix allocating all free hugepages Jianfeng Tan
2016-06-06  2:49   ` Pei, Yulong
2016-06-08 11:27   ` Sergio Gonzalez Monroy
2016-06-30 13:34     ` Thomas Monjalon
2016-08-31  3:07 ` [dpdk-dev] [PATCH v2] eal: restrict cores detection Jianfeng Tan
2016-08-31 15:30   ` Stephen Hemminger
2016-09-01  1:15     ` Tan, Jianfeng
2016-09-01  1:31 ` [dpdk-dev] [PATCH v3] " Jianfeng Tan
2016-09-02 16:53   ` Bruce Richardson
2016-09-16 14:04     ` Thomas Monjalon
2016-09-16 14:02   ` Thomas Monjalon
2016-12-02 17:48   ` [dpdk-dev] [PATCH v4] eal: restrict cores auto detection Jianfeng Tan
2016-12-08 18:19     ` Thomas Monjalon
2016-12-09 15:14       ` Bruce Richardson
2016-12-21 14:31         ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=CALwxeUv5du9KahwyR4OqD1AhyzcLy5j013n0QURGoXSAMQxEKQ@mail.gmail.com \
    --to=david.marchand@6wind.com \
    --cc=dev@dpdk.org \
    --cc=jianfeng.tan@intel.com \
    --cc=nhorman@tuxdriver.com \
    --cc=sergio.gonzalez.monroy@intel.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).