From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 2158F5424 for ; Mon, 9 Nov 2015 14:32:46 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP; 09 Nov 2015 05:32:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,266,1444719600"; d="scan'208";a="831694119" Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59]) by fmsmga001.fm.intel.com with ESMTP; 09 Nov 2015 05:32:31 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.203]) by IRSMSX151.ger.corp.intel.com ([169.254.4.95]) with mapi id 14.03.0248.002; Mon, 9 Nov 2015 13:32:29 +0000 From: "Ananyev, Konstantin" To: "Tan, Jianfeng" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory initialization process Thread-Index: AQHRGDMLVqs8vSWpUE2bg/1qlQH5ep6PKhBAgALT2ACAAbI/sA== Date: Mon, 9 Nov 2015 13:32:29 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836AC618B@irsmsx105.ger.corp.intel.com> References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1446748276-132087-5-git-send-email-jianfeng.tan@intel.com> <2601191342CEEE43887BDE71AB97725836ABD716@IRSMSX153.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "nakajima.yoshihiro@lab.ntt.co.jp" , "zhbzg@huawei.com" , "mst@redhat.com" , "gaoxiaoqiu@huawei.com" , "oscar.zhangbo@huawei.com" , "ann.zhuangyanying@huawei.com" , "zhoujingbin@huawei.com" , "guohongzhen@huawei.com" Subject: Re: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory initialization process X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Nov 2015 13:32:47 -0000 >=20 > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Saturday, November 7, 2015 12:22 AM > > To: Tan, Jianfeng; dev@dpdk.org > > Cc: nakajima.yoshihiro@lab.ntt.co.jp; zhbzg@huawei.com; mst@redhat.com; > > gaoxiaoqiu@huawei.com; oscar.zhangbo@huawei.com; > > ann.zhuangyanying@huawei.com; zhoujingbin@huawei.com; > > guohongzhen@huawei.com > > Subject: RE: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory > > initialization process > > > > Hi, > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianfeng Tan > > > Sent: Thursday, November 05, 2015 6:31 PM > > > To: dev@dpdk.org > > > Cc: nakajima.yoshihiro@lab.ntt.co.jp; zhbzg@huawei.com; > > > mst@redhat.com; gaoxiaoqiu@huawei.com; oscar.zhangbo@huawei.com; > > > ann.zhuangyanying@huawei.com; zhoujingbin@huawei.com; > > > guohongzhen@huawei.com > > > Subject: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory > > > initialization process > > > > > > When using virtio for container, we should specify --no-huge so that > > > in memory initialization, shm_open() is used to alloc memory from > > > tmpfs filesystem /dev/shm/. > > > > > > Signed-off-by: Huawei Xie > > > Signed-off-by: Jianfeng Tan > > > --- > ...... > > > +int > > > +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void > > > +**paddr) { > > > + struct rte_mem_config *mcfg; > > > + mcfg =3D rte_eal_get_configuration()->mem_config; > > > + > > > + *pfd =3D mcfg->memseg[index].fd; > > > + *psize =3D (uint64_t)mcfg->memseg[index].len; > > > + *paddr =3D (void *)(uint64_t)mcfg->memseg[index].addr; > > > + return 0; > > > +} > > > > Wonder who will use that function? > > Can't see any references to that function in that patch or next. >=20 > This function is used in 1/5, when virtio front end needs to send VHOST_U= SER_SET_MEM_TABLE to back end. Ok, but hen this function should be defined in the patch *before* it is use= d, not after. Another thing: probably better to create a struct for all memseg parameters= you want to retrieve, and pass it to the function, instead of several pointers.=20 >=20 > > > + > > > /* > > > * Get physical address of any mapped virtual address in the current > > process. > > > */ > > > @@ -1044,6 +1059,42 @@ calc_num_pages_per_socket(uint64_t * > > memory, > > > return total_num_pages; > > > } > > > > > > +static void * > > > +rte_eal_shm_create(int *pfd) > > > +{ > > > + int ret, fd; > > > + char filepath[256]; > > > + void *vaddr; > > > + uint64_t size =3D internal_config.memory; > > > + > > > + sprintf(filepath, "/%s_cvio", internal_config.hugefile_prefix); > > > + > > > + fd =3D shm_open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > > > + if (fd < 0) { > > > + rte_panic("shm_open %s failed: %s\n", filepath, > > strerror(errno)); > > > + } > > > + ret =3D flock(fd, LOCK_EX); > > > + if (ret < 0) { > > > + close(fd); > > > + rte_panic("flock %s failed: %s\n", filepath, strerror(errno)); > > > + } > > > + > > > + ret =3D ftruncate(fd, size); > > > + if (ret < 0) { > > > + rte_panic("ftruncate failed: %s\n", strerror(errno)); > > > + } > > > + /* flag: MAP_HUGETLB */ > > > > Any explanation what that comment means here? > > Do you plan to use MAP_HUGETLb in the call below or ...? >=20 > Yes, it's a todo item. Shm_open() just uses a tmpfs mounted at /dev/shm. = So I wonder maybe we can use this flag to make > sure os allocates hugepages here if user would like to use hugepages. >=20 > >> > ...... > > > @@ -1081,8 +1134,8 @@ rte_eal_hugepage_init(void) > > > > > > /* hugetlbfs can be disabled */ > > > if (internal_config.no_hugetlbfs) { > > > - addr =3D mmap(NULL, internal_config.memory, PROT_READ | > > PROT_WRITE, > > > - MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > > > + int fd; > > > + addr =3D rte_eal_shm_create(&fd); > > > > Why do you remove ability to map(dev/zero) here? > > Probably not everyone plan to use --no-hugepages only inside containers= . >=20 > From my understanding, mmap here is just to allocate some memory, which i= s initialized to be all zero. I cannot understand what's > the relationship with /dev/zero. I used it here as a synonym for mmap(, ..., MAP_ANONYMOUS,...). rte_eal_shm_create() can do the original function, plus it generates a fd = to point to this chunk of > memory. This fd is indispensable in vhost protocol when VHOST_USER_SET_ME= M_TABLE using sendmsg(). My question was: Right now for --no-hugepages it allocates a chunk of memory that is not bac= ked-up by any file and is private to the process: addr =3D mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE, MAP_PRI= VATE | MAP_ANONYMOUS, 0, 0);=20 You changed it to shared memory region allocation: fd =3D shm_open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); addr =3D mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); I understand that you need it for your containers stuff - but I suppose you= have to add new functionality without breaking existing one> There could be other users of --no-hugepages and they probably want existin= g behaviour. Konstantin >=20 > > > > > > > if (addr =3D=3D MAP_FAILED) { > > > RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", > > __func__, > > > strerror(errno)); > > > @@ -1093,6 +1146,7 @@ rte_eal_hugepage_init(void) > > > mcfg->memseg[0].hugepage_sz =3D RTE_PGSIZE_4K; > > > mcfg->memseg[0].len =3D internal_config.memory; > > > mcfg->memseg[0].socket_id =3D 0; > > > + mcfg->memseg[0].fd =3D fd; > > > return 0; > > > } > > > > > > diff --git a/lib/librte_mempool/rte_mempool.c > > > b/lib/librte_mempool/rte_mempool.c > > > index e57cbbd..8f8852b 100644 > > > --- a/lib/librte_mempool/rte_mempool.c > > > +++ b/lib/librte_mempool/rte_mempool.c > > > @@ -453,13 +453,6 @@ rte_mempool_xmem_create(const char *name, > > unsigned n, unsigned elt_size, > > > rte_errno =3D EINVAL; > > > return NULL; > > > } > > > - > > > - /* check that we have both VA and PA */ > > > - if (vaddr !=3D NULL && paddr =3D=3D NULL) { > > > - rte_errno =3D EINVAL; > > > - return NULL; > > > - } > > > - > > > /* Check that pg_num and pg_shift parameters are valid. */ > > > if (pg_num < RTE_DIM(mp->elt_pa) || pg_shift > > > MEMPOOL_PG_SHIFT_MAX) { > > > rte_errno =3D EINVAL; > > > @@ -596,8 +589,15 @@ rte_mempool_xmem_create(const char *name, > > > unsigned n, unsigned elt_size, > > > > > > /* mempool elements in a separate chunk of memory. */ > > > } else { > > > + /* when VA is specified, PA should be specified? */ > > > + if (rte_eal_has_hugepages()) { > > > + if (paddr =3D=3D NULL) { > > > + rte_errno =3D EINVAL; > > > + return NULL; > > > + } > > > + memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) > > * pg_num); > > > + } > > > mp->elt_va_start =3D (uintptr_t)vaddr; > > > - memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * > > pg_num); > > > > Could you explain the reason for that change? > > Specially why mempool over external memory now only allowed for > > hugepages config? > > Konstantin >=20 > Oops, you're right! This change was previously for creating a mbuf mempoo= l at a given vaddr and without > giving any paddr[]. And now we don't need to care about neither vaddr no= r paddr[] so I should have reverted > change in this file. >=20 > > > > > } > > > > > > mp->elt_va_end =3D mp->elt_va_start; > > > -- > > > 2.1.4