From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 2FE785A71
 for <dev@dpdk.org>; Fri,  6 Nov 2015 17:21:59 +0100 (CET)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga103.fm.intel.com with ESMTP; 06 Nov 2015 08:21:58 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.20,252,1444719600"; d="scan'208";a="829284033"
Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99])
 by fmsmga001.fm.intel.com with ESMTP; 06 Nov 2015 08:21:56 -0800
Received: from irsmsx153.ger.corp.intel.com ([169.254.9.208]) by
 IRSMSX107.ger.corp.intel.com ([169.254.10.132]) with mapi id 14.03.0248.002;
 Fri, 6 Nov 2015 16:21:55 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory
 initialization	process
Thread-Index: AQHRGDMLVqs8vSWpUE2bg/1qlQH5ep6PKhBA
Date: Fri, 6 Nov 2015 16:21:55 +0000
Message-ID: <2601191342CEEE43887BDE71AB97725836ABD716@IRSMSX153.ger.corp.intel.com>
References: <1446748276-132087-1-git-send-email-jianfeng.tan@intel.com>
 <1446748276-132087-5-git-send-email-jianfeng.tan@intel.com>
In-Reply-To: <1446748276-132087-5-git-send-email-jianfeng.tan@intel.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "nakajima.yoshihiro@lab.ntt.co.jp" <nakajima.yoshihiro@lab.ntt.co.jp>,
 "zhbzg@huawei.com" <zhbzg@huawei.com>, "mst@redhat.com" <mst@redhat.com>,
 "gaoxiaoqiu@huawei.com" <gaoxiaoqiu@huawei.com>,
 "oscar.zhangbo@huawei.com" <oscar.zhangbo@huawei.com>,
 "ann.zhuangyanying@huawei.com" <ann.zhuangyanying@huawei.com>,
 "zhoujingbin@huawei.com" <zhoujingbin@huawei.com>,
 "guohongzhen@huawei.com" <guohongzhen@huawei.com>
Subject: Re: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory
 initialization	process
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Nov 2015 16:21:59 -0000

Hi,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianfeng Tan
> Sent: Thursday, November 05, 2015 6:31 PM
> To: dev@dpdk.org
> Cc: nakajima.yoshihiro@lab.ntt.co.jp; zhbzg@huawei.com; mst@redhat.com; g=
aoxiaoqiu@huawei.com; oscar.zhangbo@huawei.com;
> ann.zhuangyanying@huawei.com; zhoujingbin@huawei.com; guohongzhen@huawei.=
com
> Subject: [dpdk-dev] [RFC 4/5] virtio/container: adjust memory initializat=
ion process
>=20
> When using virtio for container, we should specify --no-huge so
> that in memory initialization, shm_open() is used to alloc memory
> from tmpfs filesystem /dev/shm/.
>=20
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>  lib/librte_eal/common/include/rte_memory.h |  5 +++
>  lib/librte_eal/linuxapp/eal/eal_memory.c   | 58 ++++++++++++++++++++++++=
++++--
>  lib/librte_mempool/rte_mempool.c           | 16 ++++-----
>  3 files changed, 69 insertions(+), 10 deletions(-)
>=20
> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/=
common/include/rte_memory.h
> index 1bed415..9c1effc 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -100,6 +100,7 @@ struct rte_memseg {
>  	int32_t socket_id;          /**< NUMA socket ID. */
>  	uint32_t nchannel;          /**< Number of channels. */
>  	uint32_t nrank;             /**< Number of ranks. */
> +	int fd;                     /**< fd used for share this memory */
>  #ifdef RTE_LIBRTE_XEN_DOM0
>  	 /**< store segment MFNs */
>  	uint64_t mfn[DOM0_NUM_MEMBLOCK];
> @@ -128,6 +129,10 @@ int rte_mem_lock_page(const void *virt);
>   */
>  phys_addr_t rte_mem_virt2phy(const void *virt);
>=20
> +
> +int
> +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void **paddr);
> +
>  /**
>   * Get the layout of the available physical memory.
>   *
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/li=
nuxapp/eal/eal_memory.c
> index ac2745e..9abbfc6 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -80,6 +80,9 @@
>  #include <errno.h>
>  #include <sys/ioctl.h>
>  #include <sys/time.h>
> +#include <mntent.h>
> +#include <sys/mman.h>
> +#include <sys/file.h>
>=20
>  #include <rte_log.h>
>  #include <rte_memory.h>
> @@ -143,6 +146,18 @@ rte_mem_lock_page(const void *virt)
>  	return mlock((void*)aligned, page_size);
>  }
>=20
> +int
> +rte_memseg_info_get(int index, int *pfd, uint64_t *psize, void **paddr)
> +{
> +	struct rte_mem_config *mcfg;
> +	mcfg =3D rte_eal_get_configuration()->mem_config;
> +
> +	*pfd =3D mcfg->memseg[index].fd;
> +	*psize =3D (uint64_t)mcfg->memseg[index].len;
> +	*paddr =3D (void *)(uint64_t)mcfg->memseg[index].addr;
> +	return 0;
> +}

Wonder who will use that function?
Can't see any references to that function in that patch or next.=20

> +
>  /*
>   * Get physical address of any mapped virtual address in the current pro=
cess.
>   */
> @@ -1044,6 +1059,42 @@ calc_num_pages_per_socket(uint64_t * memory,
>  	return total_num_pages;
>  }
>=20
> +static void *
> +rte_eal_shm_create(int *pfd)
> +{
> +	int ret, fd;
> +	char filepath[256];
> +	void *vaddr;
> +	uint64_t size =3D internal_config.memory;
> +
> +	sprintf(filepath, "/%s_cvio", internal_config.hugefile_prefix);
> +
> +	fd =3D shm_open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
> +	if (fd < 0) {
> +		rte_panic("shm_open %s failed: %s\n", filepath, strerror(errno));
> +	}
> +	ret =3D flock(fd, LOCK_EX);
> +	if (ret < 0) {
> +		close(fd);
> +		rte_panic("flock %s failed: %s\n", filepath, strerror(errno));
> +	}
> +
> +	ret =3D ftruncate(fd, size);
> +	if (ret < 0) {
> +		rte_panic("ftruncate failed: %s\n", strerror(errno));
> +	}
> +	/* flag: MAP_HUGETLB */

Any explanation what that comment means here?
Do you plan to use MAP_HUGETLb in the call below or ...?

> +	vaddr =3D mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +	if (vaddr =3D=3D MAP_FAILED) {
> +		rte_panic("mmap failed: %s\n", strerror(errno));
> +	}
> +	memset(vaddr, 0, size);
> +	*pfd =3D fd;
> +
> +	return vaddr;
> +}
> +
> +
>  /*
>   * Prepare physical memory mapping: fill configuration structure with
>   * these infos, return 0 on success.
> @@ -1072,7 +1123,9 @@ rte_eal_hugepage_init(void)
>  	int new_pages_count[MAX_HUGEPAGE_SIZES];
>  #endif
>=20
> +#ifndef RTE_VIRTIO_VDEV
>  	test_proc_pagemap_readable();
> +#endif
>=20
>  	memset(used_hp, 0, sizeof(used_hp));
>=20
> @@ -1081,8 +1134,8 @@ rte_eal_hugepage_init(void)
>=20
>  	/* hugetlbfs can be disabled */
>  	if (internal_config.no_hugetlbfs) {
> -		addr =3D mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
> -				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> +		int fd;
> +		addr =3D rte_eal_shm_create(&fd);

Why do you remove ability to map(dev/zero) here?
Probably not everyone plan to use --no-hugepages only inside containers.=20


>  		if (addr =3D=3D MAP_FAILED) {
>  			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
>  					strerror(errno));
> @@ -1093,6 +1146,7 @@ rte_eal_hugepage_init(void)
>  		mcfg->memseg[0].hugepage_sz =3D RTE_PGSIZE_4K;
>  		mcfg->memseg[0].len =3D internal_config.memory;
>  		mcfg->memseg[0].socket_id =3D 0;
> +		mcfg->memseg[0].fd =3D fd;
>  		return 0;
>  	}
>=20
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_me=
mpool.c
> index e57cbbd..8f8852b 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -453,13 +453,6 @@ rte_mempool_xmem_create(const char *name, unsigned n=
, unsigned elt_size,
>  		rte_errno =3D EINVAL;
>  		return NULL;
>  	}
> -
> -	/* check that we have both VA and PA */
> -	if (vaddr !=3D NULL && paddr =3D=3D NULL) {
> -		rte_errno =3D EINVAL;
> -		return NULL;
> -	}
> -
>  	/* Check that pg_num and pg_shift parameters are valid. */
>  	if (pg_num < RTE_DIM(mp->elt_pa) || pg_shift > MEMPOOL_PG_SHIFT_MAX) {
>  		rte_errno =3D EINVAL;
> @@ -596,8 +589,15 @@ rte_mempool_xmem_create(const char *name, unsigned n=
, unsigned elt_size,
>=20
>  	/* mempool elements in a separate chunk of memory. */
>  	} else {
> +		/* when VA is specified, PA should be specified? */
> +		if (rte_eal_has_hugepages()) {
> +			if (paddr =3D=3D NULL) {
> +				rte_errno =3D EINVAL;
> +				return NULL;
> +			}
> +			memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * pg_num);
> +		}
>  		mp->elt_va_start =3D (uintptr_t)vaddr;
> -		memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * pg_num);

Could you explain the reason for that change?
Specially why mempool over external memory now only allowed for hugepages c=
onfig?
Konstantin

>  	}
>=20
>  	mp->elt_va_end =3D mp->elt_va_start;
> --
> 2.1.4