From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 39F942C6B for ; Mon, 7 Mar 2016 14:11:14 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP; 07 Mar 2016 05:11:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,551,1449561600"; d="scan'208";a="918532019" Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.66.49]) by fmsmga001.fm.intel.com with ESMTP; 07 Mar 2016 05:11:09 -0800 Date: Mon, 7 Mar 2016 21:13:22 +0800 From: Yuanhan Liu To: Jianfeng Tan Message-ID: <20160307131322.GH14300@yliu-dev.sh.intel.com> References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com> <1454671228-33284-2-git-send-email-jianfeng.tan@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1454671228-33284-2-git-send-email-jianfeng.tan@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: nakajima.yoshihiro@lab.ntt.co.jp, mst@redhat.com, dev@dpdk.org, p.fedin@samsung.com, ann.zhuangyanying@huawei.com Subject: Re: [dpdk-dev] [PATCH v2 1/5] mem: add --single-file to create single mem-backed file X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 13:11:15 -0000 CC'ed EAL hugepage maintainer, which is something you should do when send a patch. On Fri, Feb 05, 2016 at 07:20:24PM +0800, Jianfeng Tan wrote: > Originally, there're two cons in using hugepage: a. needs root > privilege to touch /proc/self/pagemap, which is a premise to > alllocate physically contiguous memseg; b. possibly too many > hugepage file are created, especially used with 2M hugepage. > > For virtual devices, they don't care about physical-contiguity > of allocated hugepages at all. Option --single-file is to > provide a way to allocate all hugepages into single mem-backed > file. > > Known issue: > a. single-file option relys on kernel to allocate numa-affinitive > memory. > b. possible ABI break, originally, --no-huge uses anonymous memory > instead of file-backed way to create memory. > > Signed-off-by: Huawei Xie > Signed-off-by: Jianfeng Tan ... > @@ -956,6 +961,16 @@ eal_check_common_options(struct internal_config *internal_cfg) > "be specified together with --"OPT_NO_HUGE"\n"); > return -1; > } > + if (internal_cfg->single_file && internal_cfg->force_sockets == 1) { > + RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE" cannot " > + "be specified together with --"OPT_SOCKET_MEM"\n"); > + return -1; > + } > + if (internal_cfg->single_file && internal_cfg->hugepage_unlink) { > + RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot " > + "be specified together with --"OPT_SINGLE_FILE"\n"); > + return -1; > + } The two limitation doesn't make sense to me. > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 6008533..68ef49a 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -1102,20 +1102,54 @@ rte_eal_hugepage_init(void) > /* get pointer to global configuration */ > mcfg = rte_eal_get_configuration()->mem_config; > > - /* hugetlbfs can be disabled */ > - if (internal_config.no_hugetlbfs) { > - addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE, > - MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > + /* when hugetlbfs is disabled or single-file option is specified */ > + if (internal_config.no_hugetlbfs || internal_config.single_file) { > + int fd; > + uint64_t pagesize; > + unsigned socket_id = rte_socket_id(); > + char filepath[MAX_HUGEPAGE_PATH]; > + > + if (internal_config.no_hugetlbfs) { > + eal_get_hugefile_path(filepath, sizeof(filepath), > + "/dev/shm", 0); > + pagesize = RTE_PGSIZE_4K; > + } else { > + struct hugepage_info *hpi; > + > + hpi = &internal_config.hugepage_info[0]; > + eal_get_hugefile_path(filepath, sizeof(filepath), > + hpi->hugedir, 0); > + pagesize = hpi->hugepage_sz; > + } > + fd = open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > + if (fd < 0) { > + RTE_LOG(ERR, EAL, "%s: open %s failed: %s\n", > + __func__, filepath, strerror(errno)); > + return -1; > + } > + > + if (ftruncate(fd, internal_config.memory) < 0) { > + RTE_LOG(ERR, EAL, "ftuncate %s failed: %s\n", > + filepath, strerror(errno)); > + return -1; > + } > + > + addr = mmap(NULL, internal_config.memory, > + PROT_READ | PROT_WRITE, > + MAP_SHARED | MAP_POPULATE, fd, 0); > if (addr == MAP_FAILED) { > - RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__, > - strerror(errno)); > + RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", > + __func__, strerror(errno)); > return -1; > } > mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr; > mcfg->memseg[0].addr = addr; > - mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K; > + mcfg->memseg[0].hugepage_sz = pagesize; > mcfg->memseg[0].len = internal_config.memory; > - mcfg->memseg[0].socket_id = 0; > + mcfg->memseg[0].socket_id = socket_id; I saw quite few issues: - Assume I have a system with two hugepage sizes: 1G (x4) and 2M (x512), mounted at /dev/hugepages and /mnt, respectively. Here we then got an 5G internal_config.memory, and your code will try to mmap 5G on the first mount point (/dev/hugepages) due to the hardcode logic in your code: hpi = &internal_config.hugepage_info[0]; eal_get_hugefile_path(filepath, sizeof(filepath), hpi->hugedir, 0); But it has 4G in total, therefore, it will fails. - As you stated, socket_id is hardcoded, which could be wrong. - As stated in above, the option limitation doesn't seem right to me. I mean, --single-file should be able to work with --socket-mem option in semantic. And I have been thinking how to deal with those issues properly, and a __very immature__ solution come to my mind (which could be simply not working), but anyway, here is FYI: we go through the same process to handle normal huge page initilization to --single-file option as well. But we take different actions or no actions at all at some stages when that option is given, which is a bit similiar with the way of handling RTE_EAL_SINGLE_FILE_SEGMENTS. And we create one hugepage file for each node, each page size. For a system like mine above (2 nodes), it may populate files like following: - 1G x 2 on node0 - 1G x 2 on node1 - 2M x 256 on node0 - 2M x 256 on node1 That could normally fit your case. Though 4 nodes looks like the maximum node number, --socket-mem option may relieve the limit a bit. And if we "could" not care the socket_id being set correctly, we could just simply allocate one file for each hugepage size. That would work well for your container enabling. BTW, since we already have SINGLE_FILE_SEGMENTS (config) option, adding another option --single-file looks really confusing to me. To me, maybe you could base the SINGLE_FILE_SEGMENTS option, and add another option, say --no-sort (I confess this name sucks, but you get my point). With that, we could make sure to create as least huge page files as possible, to fit your case. --yliu