From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 1EC4CFE5 for ; Sun, 8 Nov 2015 12:18:22 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP; 08 Nov 2015 03:18:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,261,1444719600"; d="scan'208";a="830784838" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by fmsmga001.fm.intel.com with ESMTP; 08 Nov 2015 03:18:21 -0800 Received: from fmsmsx156.amr.corp.intel.com (10.18.116.74) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.248.2; Sun, 8 Nov 2015 03:18:21 -0800 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by fmsmsx156.amr.corp.intel.com (10.18.116.74) with Microsoft SMTP Server (TLS) id 14.3.248.2; Sun, 8 Nov 2015 03:18:21 -0800 Received: from shsmsx152.ccr.corp.intel.com ([169.254.6.193]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.223]) with mapi id 14.03.0248.002; Sun, 8 Nov 2015 19:18:13 +0800 From: "Tan, Jianfeng" To: "Ananyev, Konstantin" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory initialization process Thread-Index: AQHRGK9CZ9YhP3BPnkOPJZjw/l2eSJ6R9Rig Date: Sun, 8 Nov 2015 11:18:12 +0000 Message-ID: References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com> <1446748276-132087-5-git-send-email-jianfeng.tan@intel.com> <2601191342CEEE43887BDE71AB97725836ABD716@IRSMSX153.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB97725836ABD716@IRSMSX153.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "nakajima.yoshihiro@lab.ntt.co.jp" , "zhbzg@huawei.com" , "mst@redhat.com" , "gaoxiaoqiu@huawei.com" , "oscar.zhangbo@huawei.com" , "ann.zhuangyanying@huawei.com" , "zhoujingbin@huawei.com" , "guohongzhen@huawei.com" Subject: Re: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory initialization process X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Nov 2015 11:18:23 -0000 > -----Original Message----- > From: Ananyev, Konstantin > Sent: Saturday, November 7, 2015 12:22 AM > To: Tan, Jianfeng; dev@dpdk.org > Cc: nakajima.yoshihiro@lab.ntt.co.jp; zhbzg@huawei.com; mst@redhat.com; > gaoxiaoqiu@huawei.com; oscar.zhangbo@huawei.com; > ann.zhuangyanying@huawei.com; zhoujingbin@huawei.com; > guohongzhen@huawei.com > Subject: RE: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory > initialization process >=20 > Hi, >=20 > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianfeng Tan > > Sent: Thursday, November 05, 2015 6:31 PM > > To: dev@dpdk.org > > Cc: nakajima.yoshihiro@lab.ntt.co.jp; zhbzg@huawei.com; > > mst@redhat.com; gaoxiaoqiu@huawei.com; oscar.zhangbo@huawei.com; > > ann.zhuangyanying@huawei.com; zhoujingbin@huawei.com; > > guohongzhen@huawei.com > > Subject: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory > > initialization process > > > > When using virtio for container, we should specify --no-huge so that > > in memory initialization, shm_open() is used to alloc memory from > > tmpfs filesystem /dev/shm/. > > > > Signed-off-by: Huawei Xie > > Signed-off-by: Jianfeng Tan > > --- ...... > > +int > > +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void > > +**paddr) { > > + struct rte_mem_config *mcfg; > > + mcfg =3D rte_eal_get_configuration()->mem_config; > > + > > + *pfd =3D mcfg->memseg[index].fd; > > + *psize =3D (uint64_t)mcfg->memseg[index].len; > > + *paddr =3D (void *)(uint64_t)mcfg->memseg[index].addr; > > + return 0; > > +} >=20 > Wonder who will use that function? > Can't see any references to that function in that patch or next. This function is used in 1/5, when virtio front end needs to send VHOST_USE= R_SET_MEM_TABLE to back end. > > + > > /* > > * Get physical address of any mapped virtual address in the current > process. > > */ > > @@ -1044,6 +1059,42 @@ calc_num_pages_per_socket(uint64_t * > memory, > > return total_num_pages; > > } > > > > +static void * > > +rte_eal_shm_create(int *pfd) > > +{ > > + int ret, fd; > > + char filepath[256]; > > + void *vaddr; > > + uint64_t size =3D internal_config.memory; > > + > > + sprintf(filepath, "/%s_cvio", internal_config.hugefile_prefix); > > + > > + fd =3D shm_open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); > > + if (fd < 0) { > > + rte_panic("shm_open %s failed: %s\n", filepath, > strerror(errno)); > > + } > > + ret =3D flock(fd, LOCK_EX); > > + if (ret < 0) { > > + close(fd); > > + rte_panic("flock %s failed: %s\n", filepath, strerror(errno)); > > + } > > + > > + ret =3D ftruncate(fd, size); > > + if (ret < 0) { > > + rte_panic("ftruncate failed: %s\n", strerror(errno)); > > + } > > + /* flag: MAP_HUGETLB */ >=20 > Any explanation what that comment means here? > Do you plan to use MAP_HUGETLb in the call below or ...? Yes, it's a todo item. Shm_open() just uses a tmpfs mounted at /dev/shm. So= I wonder maybe we can use this flag to make sure os allocates hugepages here if user would like to use hugepages. >> ...... > > @@ -1081,8 +1134,8 @@ rte_eal_hugepage_init(void) > > > > /* hugetlbfs can be disabled */ > > if (internal_config.no_hugetlbfs) { > > - addr =3D mmap(NULL, internal_config.memory, PROT_READ | > PROT_WRITE, > > - MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > > + int fd; > > + addr =3D rte_eal_shm_create(&fd); >=20 > Why do you remove ability to map(dev/zero) here? > Probably not everyone plan to use --no-hugepages only inside containers. >>From my understanding, mmap here is just to allocate some memory, which is = initialized to be all zero. I cannot understand what's the relationship with /dev/zero. rte_eal_shm_create() can do the original f= unction, plus it generates a fd to point to this chunk of memory. This fd is indispensable in vhost protocol when VHOST_USER_SET_MEM_= TABLE using sendmsg(). >=20 >=20 > > if (addr =3D=3D MAP_FAILED) { > > RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", > __func__, > > strerror(errno)); > > @@ -1093,6 +1146,7 @@ rte_eal_hugepage_init(void) > > mcfg->memseg[0].hugepage_sz =3D RTE_PGSIZE_4K; > > mcfg->memseg[0].len =3D internal_config.memory; > > mcfg->memseg[0].socket_id =3D 0; > > + mcfg->memseg[0].fd =3D fd; > > return 0; > > } > > > > diff --git a/lib/librte_mempool/rte_mempool.c > > b/lib/librte_mempool/rte_mempool.c > > index e57cbbd..8f8852b 100644 > > --- a/lib/librte_mempool/rte_mempool.c > > +++ b/lib/librte_mempool/rte_mempool.c > > @@ -453,13 +453,6 @@ rte_mempool_xmem_create(const char *name, > unsigned n, unsigned elt_size, > > rte_errno =3D EINVAL; > > return NULL; > > } > > - > > - /* check that we have both VA and PA */ > > - if (vaddr !=3D NULL && paddr =3D=3D NULL) { > > - rte_errno =3D EINVAL; > > - return NULL; > > - } > > - > > /* Check that pg_num and pg_shift parameters are valid. */ > > if (pg_num < RTE_DIM(mp->elt_pa) || pg_shift > > MEMPOOL_PG_SHIFT_MAX) { > > rte_errno =3D EINVAL; > > @@ -596,8 +589,15 @@ rte_mempool_xmem_create(const char *name, > > unsigned n, unsigned elt_size, > > > > /* mempool elements in a separate chunk of memory. */ > > } else { > > + /* when VA is specified, PA should be specified? */ > > + if (rte_eal_has_hugepages()) { > > + if (paddr =3D=3D NULL) { > > + rte_errno =3D EINVAL; > > + return NULL; > > + } > > + memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) > * pg_num); > > + } > > mp->elt_va_start =3D (uintptr_t)vaddr; > > - memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * > pg_num); >=20 > Could you explain the reason for that change? > Specially why mempool over external memory now only allowed for > hugepages config? > Konstantin Oops, you're right! This change was previously for creating a mbuf mempool = at a given vaddr and without giving any paddr[]. And now we don't need to care about neither vaddr nor = paddr[] so I should have reverted change in this file. >=20 > > } > > > > mp->elt_va_end =3D mp->elt_va_start; > > -- > > 2.1.4