From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 2FE785A71 for ; Fri, 6 Nov 2015 17:21:59 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP; 06 Nov 2015 08:21:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,252,1444719600"; d="scan'208";a="829284033" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by fmsmga001.fm.intel.com with ESMTP; 06 Nov 2015 08:21:56 -0800 Received: from irsmsx153.ger.corp.intel.com ([169.254.9.208]) by IRSMSX107.ger.corp.intel.com ([169.254.10.132]) with mapi id 14.03.0248.002; Fri, 6 Nov 2015 16:21:55 +0000 From: "Ananyev, Konstantin" To: "Tan, Jianfeng" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory initialization process Thread-Index: AQHRGDMLVqs8vSWpUE2bg/1qlQH5ep6PKhBA Date: Fri, 6 Nov 2015 16:21:55 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836ABD716@IRSMSX153.ger.corp.intel.com> References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1446748276-132087-5-git-send-email-jianfeng.tan@intel.com> In-Reply-To: <1446748276-132087-5-git-send-email-jianfeng.tan@intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "nakajima.yoshihiro@lab.ntt.co.jp" , "zhbzg@huawei.com" , "mst@redhat.com" , "gaoxiaoqiu@huawei.com" , "oscar.zhangbo@huawei.com" , "ann.zhuangyanying@huawei.com" , "zhoujingbin@huawei.com" , "guohongzhen@huawei.com" Subject: Re: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory initialization process X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Nov 2015 16:21:59 -0000 Hi, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianfeng Tan > Sent: Thursday, November 05, 2015 6:31 PM > To: dev@dpdk.org > Cc: nakajima.yoshihiro@lab.ntt.co.jp; zhbzg@huawei.com; mst@redhat.com; g= aoxiaoqiu@huawei.com; oscar.zhangbo@huawei.com; > ann.zhuangyanying@huawei.com; zhoujingbin@huawei.com; guohongzhen@huawei.= com > Subject: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory initializat= ion process >=20 > When using virtio for container, we should specify --no-huge so > that in memory initialization, shm_open() is used to alloc memory > from tmpfs filesystem /dev/shm/. >=20 > Signed-off-by: Huawei Xie > Signed-off-by: Jianfeng Tan > --- > lib/librte_eal/common/include/rte_memory.h | 5 +++ > lib/librte_eal/linuxapp/eal/eal_memory.c | 58 ++++++++++++++++++++++++= ++++-- > lib/librte_mempool/rte_mempool.c | 16 ++++----- > 3 files changed, 69 insertions(+), 10 deletions(-) >=20 > diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/= common/include/rte_memory.h > index 1bed415..9c1effc 100644 > --- a/lib/librte_eal/common/include/rte_memory.h > +++ b/lib/librte_eal/common/include/rte_memory.h > @@ -100,6 +100,7 @@ struct rte_memseg { > int32_t socket_id; /**< NUMA socket ID. */ > uint32_t nchannel; /**< Number of channels. */ > uint32_t nrank; /**< Number of ranks. */ > + int fd; /**< fd used for share this memory */ > #ifdef RTE_LIBRTE_XEN_DOM0 > /**< store segment MFNs */ > uint64_t mfn[DOM0_NUM_MEMBLOCK]; > @@ -128,6 +129,10 @@ int rte_mem_lock_page(const void *virt); > */ > phys_addr_t rte_mem_virt2phy(const void *virt); >=20 > + > +int > +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void **paddr); > + > /** > * Get the layout of the available physical memory. > * > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/li= nuxapp/eal/eal_memory.c > index ac2745e..9abbfc6 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -80,6 +80,9 @@ > #include > #include > #include > +#include > +#include > +#include >=20 > #include > #include > @@ -143,6 +146,18 @@ rte_mem_lock_page(const void *virt) > return mlock((void*)aligned, page_size); > } >=20 > +int > +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void **paddr) > +{ > + struct rte_mem_config *mcfg; > + mcfg =3D rte_eal_get_configuration()->mem_config; > + > + *pfd =3D mcfg->memseg[index].fd; > + *psize =3D (uint64_t)mcfg->memseg[index].len; > + *paddr =3D (void *)(uint64_t)mcfg->memseg[index].addr; > + return 0; > +} Wonder who will use that function? Can't see any references to that function in that patch or next.=20 > + > /* > * Get physical address of any mapped virtual address in the current pro= cess. > */ > @@ -1044,6 +1059,42 @@ calc_num_pages_per_socket(uint64_t * memory, > return total_num_pages; > } >=20 > +static void * > +rte_eal_shm_create(int *pfd) > +{ > + int ret, fd; > + char filepath[256]; > + void *vaddr; > + uint64_t size =3D internal_config.memory; > + > + sprintf(filepath, "/%s_cvio", internal_config.hugefile_prefix); > + > + fd =3D shm_open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > + if (fd < 0) { > + rte_panic("shm_open %s failed: %s\n", filepath, strerror(errno)); > + } > + ret =3D flock(fd, LOCK_EX); > + if (ret < 0) { > + close(fd); > + rte_panic("flock %s failed: %s\n", filepath, strerror(errno)); > + } > + > + ret =3D ftruncate(fd, size); > + if (ret < 0) { > + rte_panic("ftruncate failed: %s\n", strerror(errno)); > + } > + /* flag: MAP_HUGETLB */ Any explanation what that comment means here? Do you plan to use MAP_HUGETLb in the call below or ...? > + vaddr =3D mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > + if (vaddr =3D=3D MAP_FAILED) { > + rte_panic("mmap failed: %s\n", strerror(errno)); > + } > + memset(vaddr, 0, size); > + *pfd =3D fd; > + > + return vaddr; > +} > + > + > /* > * Prepare physical memory mapping: fill configuration structure with > * these infos, return 0 on success. > @@ -1072,7 +1123,9 @@ rte_eal_hugepage_init(void) > int new_pages_count[MAX_HUGEPAGE_SIZES]; > #endif >=20 > +#ifndef RTE_VIRTIO_VDEV > test_proc_pagemap_readable(); > +#endif >=20 > memset(used_hp, 0, sizeof(used_hp)); >=20 > @@ -1081,8 +1134,8 @@ rte_eal_hugepage_init(void) >=20 > /* hugetlbfs can be disabled */ > if (internal_config.no_hugetlbfs) { > - addr =3D mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE, > - MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > + int fd; > + addr =3D rte_eal_shm_create(&fd); Why do you remove ability to map(dev/zero) here? Probably not everyone plan to use --no-hugepages only inside containers.=20 > if (addr =3D=3D MAP_FAILED) { > RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__, > strerror(errno)); > @@ -1093,6 +1146,7 @@ rte_eal_hugepage_init(void) > mcfg->memseg[0].hugepage_sz =3D RTE_PGSIZE_4K; > mcfg->memseg[0].len =3D internal_config.memory; > mcfg->memseg[0].socket_id =3D 0; > + mcfg->memseg[0].fd =3D fd; > return 0; > } >=20 > diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_me= mpool.c > index e57cbbd..8f8852b 100644 > --- a/lib/librte_mempool/rte_mempool.c > +++ b/lib/librte_mempool/rte_mempool.c > @@ -453,13 +453,6 @@ rte_mempool_xmem_create(const char *name, unsigned n= , unsigned elt_size, > rte_errno =3D EINVAL; > return NULL; > } > - > - /* check that we have both VA and PA */ > - if (vaddr !=3D NULL && paddr =3D=3D NULL) { > - rte_errno =3D EINVAL; > - return NULL; > - } > - > /* Check that pg_num and pg_shift parameters are valid. */ > if (pg_num < RTE_DIM(mp->elt_pa) || pg_shift > MEMPOOL_PG_SHIFT_MAX) { > rte_errno =3D EINVAL; > @@ -596,8 +589,15 @@ rte_mempool_xmem_create(const char *name, unsigned n= , unsigned elt_size, >=20 > /* mempool elements in a separate chunk of memory. */ > } else { > + /* when VA is specified, PA should be specified? */ > + if (rte_eal_has_hugepages()) { > + if (paddr =3D=3D NULL) { > + rte_errno =3D EINVAL; > + return NULL; > + } > + memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * pg_num); > + } > mp->elt_va_start =3D (uintptr_t)vaddr; > - memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * pg_num); Could you explain the reason for that change? Specially why mempool over external memory now only allowed for hugepages c= onfig? Konstantin > } >=20 > mp->elt_va_end =3D mp->elt_va_start; > -- > 2.1.4