From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jianfeng.tan@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 92EB37D4E
 for <dev@dpdk.org>; Fri, 22 Sep 2017 07:28:52 +0200 (CEST)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 21 Sep 2017 22:28:51 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.42,427,1500966000"; d="scan'208";a="902806055"
Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206])
 by FMSMGA003.fm.intel.com with ESMTP; 21 Sep 2017 22:28:51 -0700
Received: from FMSMSX109.amr.corp.intel.com (10.18.116.9) by
 FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Thu, 21 Sep 2017 22:28:51 -0700
Received: from shsmsx101.ccr.corp.intel.com (10.239.4.153) by
 fmsmsx109.amr.corp.intel.com (10.18.116.9) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Thu, 21 Sep 2017 22:28:50 -0700
Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.213]) by
 SHSMSX101.ccr.corp.intel.com ([169.254.1.159]) with mapi id 14.03.0319.002;
 Fri, 22 Sep 2017 13:28:48 +0800
From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
To: "Yang, Yi Y" <yi.y.yang@intel.com>, "yliu@fridaylinux.org"
 <yliu@fridaylinux.org>
CC: "dev@dpdk.org" <dev@dpdk.org>, "Yang, Yi Y" <yi.y.yang@intel.com>
Thread-Topic: [dpdk-dev] [PATCH] vhost: fix vhost_user_set_mem_table error
Thread-Index: AQHTMeuk/BFonpyJqEiChipL9cJPb6LAWVdw
Date: Fri, 22 Sep 2017 05:28:48 +0000
Message-ID: <ED26CBA2FAD1BF48A8719AEF02201E36512EF802@SHSMSX103.ccr.corp.intel.com>
References: <1505896323-125943-1-git-send-email-yi.y.yang@intel.com>
In-Reply-To: <1505896323-125943-1-git-send-email-yi.y.yang@intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH] vhost: fix vhost_user_set_mem_table error
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Sep 2017 05:28:53 -0000



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yi Yang
> Sent: Wednesday, September 20, 2017 4:32 PM
> To: yliu@fridaylinux.org
> Cc: dev@dpdk.org; Yang, Yi Y
> Subject: [dpdk-dev] [PATCH] vhost: fix vhost_user_set_mem_table error
>=20
> Usually vhost_user message VHOST_USER_SET_MEM_TABLE
> is only sent out during initialization and only sent
> once, but it isn't so for memory hotplug and hotunplug
> , for that case, vhost_user message VHOST_USER_SET_MEM_TABLE
> will be resent with the old memory regions (not hotunplugged)
> and the new memory regions (hotplugged), so current
> vhost_user_set_mem_table implementation is wrong for
> memory hotplug and hotunplug case, it will free current
> memory regions and unmap hugepages no matter if they
> are be using or not and if they are memory regions which
> have been initialized and mapped in the previous vhost_user
> message VHOST_USER_SET_MEM_TABLE or not.
>=20
> This commit fixed this issue very well, it will keep them
> intact for those old memory region it will unmap those
> memroy regions if they have been hotunplugged, it will
> initialize the new hotplugged memory regions and map
> hugepages if they are hotplugged.
>=20
> vhost_user message VHOST_USER_SET_MEM_TABLE will include
> all the current effective memory regions, the hotunplugged
> memory regions won't be included in VHOST_USER_SET_MEM_TABLE,
> the hotplugged memroy regions will be included in
> VHOST_USER_SET_MEM_TABLE.
>=20
> This has been verified in OVS DPDK by memory hotplug and
> hotunplug in qemu.
>=20
> Signed-off-by: Yi Yang <yi.y.yang@intel.com>
> ---
>  lib/librte_vhost/vhost_user.c | 72
> ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 65 insertions(+), 7 deletions(-)
>=20
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.=
c
> index ad2e8d3..1c475d1 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -509,17 +509,43 @@ vhost_user_set_mem_table(struct virtio_net *dev,
> struct VhostUserMsg *pmsg)
>  	uint64_t mmap_size;
>  	uint64_t mmap_offset;
>  	uint64_t alignment;
> -	uint32_t i;
> +	uint32_t i, j;
>  	int fd;
>=20
> -	if (dev->mem) {
> -		free_mem_region(dev);
> -		rte_free(dev->mem);
> -		dev->mem =3D NULL;
> +	/* Handle hot unplug case */
> +	if (dev->mem && dev->mem->nregions) {
> +		i =3D 0;
> +		j =3D 0;
> +		while ((i < memory.nregions) && (j < dev->mem->nregions))
> {
> +			reg =3D &dev->mem->regions[j];
> +			/* munmap this region if it is hot unplugged */
> +			if (reg->guest_user_addr
> +				!=3D memory.regions[i].userspace_addr) {
> +				if (reg->host_user_addr) {
> +					munmap(reg->mmap_addr, reg-
> >mmap_size);
> +					close(reg->fd);
> +				}
> +				reg->guest_user_addr =3D 0;
> +				j++;
> +			} else {
> +				i++;
> +				j++;
> +			}
> +		}

The algorithm here could be problematic for hot plug. For example,

T0: we have two regions with address 1, 3.
T1: we add another region, 2, which makes the regions as 1, 2, 3.

1st iteration, 1 matches 1, i++, j++;
2nd iteration, 2 does not match 3, unmap 3, j++.

> +
> +		/* munmap these regions because they have been hot
> unplugged */
> +		for (; j < dev->mem->nregions; j++) {
> +			reg =3D &dev->mem->regions[j];
> +			if (reg->host_user_addr) {
> +				munmap(reg->mmap_addr, reg-
> >mmap_size);
> +				close(reg->fd);
> +			}
> +			reg->guest_user_addr =3D 0;
> +		}
>  	}
>=20
> -	dev->nr_guest_pages =3D 0;
>  	if (!dev->guest_pages) {
> +		dev->nr_guest_pages =3D 0;

We cannot just move this line here, which results in a wrong nr_guest_pages=
 if we do hot plug/unplug instead of initialization.

>  		dev->max_guest_pages =3D 8;
>  		dev->guest_pages =3D malloc(dev->max_guest_pages *
>  						sizeof(struct guest_page));
> @@ -532,7 +558,7 @@ vhost_user_set_mem_table(struct virtio_net *dev,
> struct VhostUserMsg *pmsg)
>  		}
>  	}
>=20
> -	dev->mem =3D rte_zmalloc("vhost-mem-table", sizeof(struct
> rte_vhost_memory) +
> +	dev->mem =3D rte_realloc(dev->mem, sizeof(struct
> rte_vhost_memory) +
>  		sizeof(struct rte_vhost_mem_region) * memory.nregions, 0);
>  	if (dev->mem =3D=3D NULL) {
>  		RTE_LOG(ERR, VHOST_CONFIG,
> @@ -546,6 +572,38 @@ vhost_user_set_mem_table(struct virtio_net *dev,
> struct VhostUserMsg *pmsg)
>  		fd  =3D pmsg->fds[i];
>  		reg =3D &dev->mem->regions[i];
>=20
> +		/* This region should be skipped if it is initialized before */
> +		if (reg->guest_user_addr =3D=3D
> memory.regions[i].userspace_addr) {

Also problematic for hot unplug. For example,
T0: we have three regions, 1, 2, 3.
T1: remove region 2, with only 1, 3 left.

1st iteration, 1 matches 1, skip old region 1.
2nd iteration, 3 does not match 2, do the mmap.


> +			alignment =3D get_blk_size(fd);
> +			if (alignment =3D=3D (uint64_t)-1) {
> +				RTE_LOG(ERR, VHOST_CONFIG,
> +					"couldn't get hugepage size "
> +					"through fstat\n");
> +				goto err_mmap;
> +			}
> +			RTE_LOG(INFO, VHOST_CONFIG,
> +				"guest memory region(old)"
> +				" %u, size: 0x%" PRIx64 "\n"
> +				"\t guest physical addr: 0x%" PRIx64 "\n"
> +				"\t guest virtual  addr: 0x%" PRIx64 "\n"
> +				"\t host  virtual  addr: 0x%" PRIx64 "\n"
> +				"\t mmap addr : 0x%" PRIx64 "\n"
> +				"\t mmap size : 0x%" PRIx64 "\n"
> +				"\t mmap align: 0x%" PRIx64 "\n"
> +				"\t mmap off  : 0x%" PRIx64 "\n",
> +				i, reg->size,
> +				reg->guest_phys_addr,
> +				reg->guest_user_addr,
> +				reg->host_user_addr,
> +				(uint64_t)(uintptr_t)reg->mmap_addr,
> +				reg->mmap_size,
> +				alignment,
> +				(uint64_t)(uintptr_t)reg->host_user_addr -
> +					(uint64_t)(uintptr_t)reg-
> >mmap_addr
> +				);
> +				continue;
> +		}
> +
>  		reg->guest_phys_addr =3D
> memory.regions[i].guest_phys_addr;
>  		reg->guest_user_addr =3D memory.regions[i].userspace_addr;
>  		reg->size            =3D memory.regions[i].memory_size;
> --
> 2.1.0