From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 502408E7F for ; Fri, 22 Jan 2016 02:43:55 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP; 21 Jan 2016 17:43:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,328,1449561600"; d="scan'208";a="886655445" Received: from shwdeisgchi083.ccr.corp.intel.com (HELO [10.239.67.119]) ([10.239.67.119]) by fmsmga001.fm.intel.com with ESMTP; 21 Jan 2016 17:43:53 -0800 To: Tetsuya Mukawa , dev@dpdk.org, yuanhan.liu@linux.intel.com References: <1453108389-21006-2-git-send-email-mukawa@igel.co.jp> <1453374478-30996-5-git-send-email-mukawa@igel.co.jp> From: "Tan, Jianfeng" Message-ID: <56A18958.7070202@intel.com> Date: Fri, 22 Jan 2016 09:43:52 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <1453374478-30996-5-git-send-email-mukawa@igel.co.jp> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC PATCH 4/5] EAL: Add new EAL "--shm" option. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jan 2016 01:43:55 -0000 Hi Tetsuya, On 1/21/2016 7:07 PM, Tetsuya Mukawa wrote: > This is a temporary patch to get EAL memory under 16T(1 << 44). > > The patch adds new EAL "--shm" option. If the option is specified, > EAL will allocate one file from hugetlbfs. This memory is for sharing > memory between DPDK applicaiton and QEMU ivhsmem device. > > Signed-off-by: Tetsuya Mukawa > --- > lib/librte_eal/common/eal_common_options.c | 5 ++ > lib/librte_eal/common/eal_internal_cfg.h | 1 + > lib/librte_eal/common/eal_options.h | 2 + > lib/librte_eal/common/include/rte_memory.h | 5 ++ > lib/librte_eal/linuxapp/eal/eal_memory.c | 76 ++++++++++++++++++++++++++++++ > 5 files changed, 89 insertions(+) > ... > } > > +int > +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void **paddr) > +{ > + struct rte_mem_config *mcfg; > + mcfg = rte_eal_get_configuration()->mem_config; > + > + if (pfd != NULL) > + *pfd = mcfg->memseg[index].fd; > + if (psize != NULL) > + *psize = (uint64_t)mcfg->memseg[index].len; > + if (paddr != NULL) > + *paddr = (void *)(uint64_t)mcfg->memseg[index].addr; > + return 0; > +} In my patch, I introduce another API to get memseg info. In my mind, no reason to keep those FDs open. How do you think? > + > /* > * Get physical address of any mapped virtual address in the current process. > */ > @@ -1075,6 +1090,46 @@ calc_num_pages_per_socket(uint64_t * memory, > return total_num_pages; > } > > +static void * > +rte_eal_shm_create(int *pfd, const char *hugedir) > +{ > + int ret, fd; > + char filepath[256]; > + void *vaddr; > + uint64_t size = internal_config.memory; > + > + sprintf(filepath, "%s/%s_cvio", hugedir, > + internal_config.hugefile_prefix); > + > + fd = open(filepath, O_CREAT | O_RDWR, 0600); > + if (fd < 0) > + rte_panic("open %s failed: %s\n", filepath, strerror(errno)); > + > + ret = flock(fd, LOCK_EX); > + if (ret < 0) { > + close(fd); > + rte_panic("flock %s failed: %s\n", filepath, strerror(errno)); > + } > + > + ret = ftruncate(fd, size); > + if (ret < 0) > + rte_panic("ftruncate failed: %s\n", strerror(errno)); > + > + /* > + * Here, we need to map under (1 << 44). > + * This is temporary implementation. > + */ > + vaddr = mmap((void *)(1ULL << 43), size, PROT_READ | PROT_WRITE, > + MAP_SHARED | MAP_FIXED, fd, 0); > + if (vaddr != MAP_FAILED) { > + memset(vaddr, 0, size); > + *pfd = fd; > + } I'm not sure if hard-coded way is good enough. It's known that kernel manages VMAs using red-black tree, but I don't know if kernel allocates VMA from low address to high address (if yes, can we leverage this feature?). > + memset(vaddr, 0, size); > + > + return vaddr; > +} > + > /* > * Prepare physical memory mapping: fill configuration structure with > * these infos, return 0 on success. > @@ -1127,6 +1182,27 @@ rte_eal_hugepage_init(void) > return 0; > } > > + /* create shared memory consist of only one file */ > + if (internal_config.shm) { > + int fd; > + struct hugepage_info *hpi; > + > + hpi = &internal_config.hugepage_info[0]; > + addr = rte_eal_shm_create(&fd, hpi->hugedir); > + if (addr == MAP_FAILED) { > + RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__, > + strerror(errno)); > + return -1; > + } > + mcfg->memseg[0].phys_addr = rte_mem_virt2phy(addr); > + mcfg->memseg[0].addr = addr; > + mcfg->memseg[0].hugepage_sz = hpi->hugepage_sz; > + mcfg->memseg[0].len = internal_config.memory; > + mcfg->memseg[0].socket_id = 0; As pointed out in my patchset, hard-coded socket_id into 0 may lead to failure. Do you have any better idea? Thanks, Jianfeng