From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 55A422BDB for ; Wed, 9 Mar 2016 15:44:07 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP; 09 Mar 2016 06:44:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,311,1455004800"; d="scan'208";a="920206040" Received: from shwdeisgchi083.ccr.corp.intel.com (HELO [10.239.67.193]) ([10.239.67.193]) by fmsmga001.fm.intel.com with ESMTP; 09 Mar 2016 06:44:02 -0800 To: Yuanhan Liu References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1454671228-33284-1-git-send-email-jianfeng.tan@intel.com> <1454671228-33284-2-git-send-email-jianfeng.tan@intel.com> <20160307131322.GH14300@yliu-dev.sh.intel.com> <56DE30FE.7020809@intel.com> <20160308024437.GJ14300@yliu-dev.sh.intel.com> From: "Tan, Jianfeng" Message-ID: <56E036B1.6020609@intel.com> Date: Wed, 9 Mar 2016 22:44:01 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160308024437.GJ14300@yliu-dev.sh.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: nakajima.yoshihiro@lab.ntt.co.jp, mst@redhat.com, dev@dpdk.org, p.fedin@samsung.com, ann.zhuangyanying@huawei.com Subject: Re: [dpdk-dev] [PATCH v2 1/5] mem: add --single-file to create single mem-backed file X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 14:44:07 -0000 Hi, On 3/8/2016 10:44 AM, Yuanhan Liu wrote: > On Tue, Mar 08, 2016 at 09:55:10AM +0800, Tan, Jianfeng wrote: >> Hi Yuanhan, >> >> On 3/7/2016 9:13 PM, Yuanhan Liu wrote: >>> CC'ed EAL hugepage maintainer, which is something you should do when >>> send a patch. >> Thanks for doing this. >> >>> On Fri, Feb 05, 2016 at 07:20:24PM +0800, Jianfeng Tan wrote: >>>> Originally, there're two cons in using hugepage: a. needs root >>>> privilege to touch /proc/self/pagemap, which is a premise to >>>> alllocate physically contiguous memseg; b. possibly too many >>>> hugepage file are created, especially used with 2M hugepage. >>>> >>>> For virtual devices, they don't care about physical-contiguity >>>> of allocated hugepages at all. Option --single-file is to >>>> provide a way to allocate all hugepages into single mem-backed >>>> file. >>>> >>>> Known issue: >>>> a. single-file option relys on kernel to allocate numa-affinitive >>>> memory. >>>> b. possible ABI break, originally, --no-huge uses anonymous memory >>>> instead of file-backed way to create memory. >>>> >>>> Signed-off-by: Huawei Xie >>>> Signed-off-by: Jianfeng Tan >>> ... >>>> @@ -956,6 +961,16 @@ eal_check_common_options(struct internal_config *internal_cfg) >>>> "be specified together with --"OPT_NO_HUGE"\n"); >>>> return -1; >>>> } >>>> + if (internal_cfg->single_file && internal_cfg->force_sockets == 1) { >>>> + RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE" cannot " >>>> + "be specified together with --"OPT_SOCKET_MEM"\n"); >>>> + return -1; >>>> + } >>>> + if (internal_cfg->single_file && internal_cfg->hugepage_unlink) { >>>> + RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot " >>>> + "be specified together with --"OPT_SINGLE_FILE"\n"); >>>> + return -1; >>>> + } >>> The two limitation doesn't make sense to me. >> For the force_sockets option, my original thought on --single-file option >> is, we don't sort those pages (require root/cap_sys_admin) and even don't >> look up numa information because it may contain both sockets' memory. >> >> For the hugepage_unlink option, those hugepage files get closed in the end >> of memory initialization, if we even unlink those hugepage files, so we >> cannot share those with other processes (say backend). > Yeah, I know how the two limitations come, from your implementation. I > was just wondering if they both are __truly__ the limitations. I mean, > can we get rid of them somehow? > > For --socket-mem option, if we can't handle it well, or if we could > ignore the socket_id for allocated huge page, yes, the limitation is > a true one. To make it work with --socket-mem option, we need to call mbind()/set_mempolicy(), which leads to including "LDFLAGS += -lnuma" a mandatory line in mk file. Don't know if it's acceptable to bring in dependency on libnuma.so? > > But for the second option, no, we should be able to co-work it with > well. One extra action is you should not invoke "close(fd)" for those > huge page files. And then you can get all the informations as I stated > in a reply to your 2nd patch. As discussed yesterday, I think there's a open files limitation for each process, if we keep those FDs open, it will bring failure to those existing programs. If others treat it as a problem? ... >>> BTW, since we already have SINGLE_FILE_SEGMENTS (config) option, adding >>> another option --single-file looks really confusing to me. >>> >>> To me, maybe you could base the SINGLE_FILE_SEGMENTS option, and add >>> another option, say --no-sort (I confess this name sucks, but you get >>> my point). With that, we could make sure to create as least huge page >>> files as possible, to fit your case. >> This is a great advice. So how do you think of --converged, or >> --no-scattered-mem, or any better idea? > TBH, none of them looks great to me, either. But I have no better > options. Well, --no-phys-continuity looks like the best option to > me so far :) I'd like to make it a little more concise, how about --no-phys-contig? In addition, Yuanhan thinks there's still no literal meaning that just create one file for each hugetlbfs (or socket). But from my side, there's an indirect meaning, because if no need to promise physically-contig, then no need to create hugepages one by one. Anyone can give your option here? Thanks. Thanks, Jianfeng