From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 92EB37D4E for ; Fri, 22 Sep 2017 07:28:52 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Sep 2017 22:28:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,427,1500966000"; d="scan'208";a="902806055" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by FMSMGA003.fm.intel.com with ESMTP; 21 Sep 2017 22:28:51 -0700 Received: from FMSMSX109.amr.corp.intel.com (10.18.116.9) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 21 Sep 2017 22:28:51 -0700 Received: from shsmsx101.ccr.corp.intel.com (10.239.4.153) by fmsmsx109.amr.corp.intel.com (10.18.116.9) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 21 Sep 2017 22:28:50 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.213]) by SHSMSX101.ccr.corp.intel.com ([169.254.1.159]) with mapi id 14.03.0319.002; Fri, 22 Sep 2017 13:28:48 +0800 From: "Tan, Jianfeng" To: "Yang, Yi Y" , "yliu@fridaylinux.org" CC: "dev@dpdk.org" , "Yang, Yi Y" Thread-Topic: [dpdk-dev] [PATCH] vhost: fix vhost_user_set_mem_table error Thread-Index: AQHTMeuk/BFonpyJqEiChipL9cJPb6LAWVdw Date: Fri, 22 Sep 2017 05:28:48 +0000 Message-ID: References: <1505896323-125943-1-git-send-email-yi.y.yang@intel.com> In-Reply-To: <1505896323-125943-1-git-send-email-yi.y.yang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] vhost: fix vhost_user_set_mem_table error X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Sep 2017 05:28:53 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yi Yang > Sent: Wednesday, September 20, 2017 4:32 PM > To: yliu@fridaylinux.org > Cc: dev@dpdk.org; Yang, Yi Y > Subject: [dpdk-dev] [PATCH] vhost: fix vhost_user_set_mem_table error >=20 > Usually vhost_user message VHOST_USER_SET_MEM_TABLE > is only sent out during initialization and only sent > once, but it isn't so for memory hotplug and hotunplug > , for that case, vhost_user message VHOST_USER_SET_MEM_TABLE > will be resent with the old memory regions (not hotunplugged) > and the new memory regions (hotplugged), so current > vhost_user_set_mem_table implementation is wrong for > memory hotplug and hotunplug case, it will free current > memory regions and unmap hugepages no matter if they > are be using or not and if they are memory regions which > have been initialized and mapped in the previous vhost_user > message VHOST_USER_SET_MEM_TABLE or not. >=20 > This commit fixed this issue very well, it will keep them > intact for those old memory region it will unmap those > memroy regions if they have been hotunplugged, it will > initialize the new hotplugged memory regions and map > hugepages if they are hotplugged. >=20 > vhost_user message VHOST_USER_SET_MEM_TABLE will include > all the current effective memory regions, the hotunplugged > memory regions won't be included in VHOST_USER_SET_MEM_TABLE, > the hotplugged memroy regions will be included in > VHOST_USER_SET_MEM_TABLE. >=20 > This has been verified in OVS DPDK by memory hotplug and > hotunplug in qemu. >=20 > Signed-off-by: Yi Yang > --- > lib/librte_vhost/vhost_user.c | 72 > ++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 65 insertions(+), 7 deletions(-) >=20 > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.= c > index ad2e8d3..1c475d1 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -509,17 +509,43 @@ vhost_user_set_mem_table(struct virtio_net *dev, > struct VhostUserMsg *pmsg) > uint64_t mmap_size; > uint64_t mmap_offset; > uint64_t alignment; > - uint32_t i; > + uint32_t i, j; > int fd; >=20 > - if (dev->mem) { > - free_mem_region(dev); > - rte_free(dev->mem); > - dev->mem =3D NULL; > + /* Handle hot unplug case */ > + if (dev->mem && dev->mem->nregions) { > + i =3D 0; > + j =3D 0; > + while ((i < memory.nregions) && (j < dev->mem->nregions)) > { > + reg =3D &dev->mem->regions[j]; > + /* munmap this region if it is hot unplugged */ > + if (reg->guest_user_addr > + !=3D memory.regions[i].userspace_addr) { > + if (reg->host_user_addr) { > + munmap(reg->mmap_addr, reg- > >mmap_size); > + close(reg->fd); > + } > + reg->guest_user_addr =3D 0; > + j++; > + } else { > + i++; > + j++; > + } > + } The algorithm here could be problematic for hot plug. For example, T0: we have two regions with address 1, 3. T1: we add another region, 2, which makes the regions as 1, 2, 3. 1st iteration, 1 matches 1, i++, j++; 2nd iteration, 2 does not match 3, unmap 3, j++. > + > + /* munmap these regions because they have been hot > unplugged */ > + for (; j < dev->mem->nregions; j++) { > + reg =3D &dev->mem->regions[j]; > + if (reg->host_user_addr) { > + munmap(reg->mmap_addr, reg- > >mmap_size); > + close(reg->fd); > + } > + reg->guest_user_addr =3D 0; > + } > } >=20 > - dev->nr_guest_pages =3D 0; > if (!dev->guest_pages) { > + dev->nr_guest_pages =3D 0; We cannot just move this line here, which results in a wrong nr_guest_pages= if we do hot plug/unplug instead of initialization. > dev->max_guest_pages =3D 8; > dev->guest_pages =3D malloc(dev->max_guest_pages * > sizeof(struct guest_page)); > @@ -532,7 +558,7 @@ vhost_user_set_mem_table(struct virtio_net *dev, > struct VhostUserMsg *pmsg) > } > } >=20 > - dev->mem =3D rte_zmalloc("vhost-mem-table", sizeof(struct > rte_vhost_memory) + > + dev->mem =3D rte_realloc(dev->mem, sizeof(struct > rte_vhost_memory) + > sizeof(struct rte_vhost_mem_region) * memory.nregions, 0); > if (dev->mem =3D=3D NULL) { > RTE_LOG(ERR, VHOST_CONFIG, > @@ -546,6 +572,38 @@ vhost_user_set_mem_table(struct virtio_net *dev, > struct VhostUserMsg *pmsg) > fd =3D pmsg->fds[i]; > reg =3D &dev->mem->regions[i]; >=20 > + /* This region should be skipped if it is initialized before */ > + if (reg->guest_user_addr =3D=3D > memory.regions[i].userspace_addr) { Also problematic for hot unplug. For example, T0: we have three regions, 1, 2, 3. T1: remove region 2, with only 1, 3 left. 1st iteration, 1 matches 1, skip old region 1. 2nd iteration, 3 does not match 2, do the mmap. > + alignment =3D get_blk_size(fd); > + if (alignment =3D=3D (uint64_t)-1) { > + RTE_LOG(ERR, VHOST_CONFIG, > + "couldn't get hugepage size " > + "through fstat\n"); > + goto err_mmap; > + } > + RTE_LOG(INFO, VHOST_CONFIG, > + "guest memory region(old)" > + " %u, size: 0x%" PRIx64 "\n" > + "\t guest physical addr: 0x%" PRIx64 "\n" > + "\t guest virtual addr: 0x%" PRIx64 "\n" > + "\t host virtual addr: 0x%" PRIx64 "\n" > + "\t mmap addr : 0x%" PRIx64 "\n" > + "\t mmap size : 0x%" PRIx64 "\n" > + "\t mmap align: 0x%" PRIx64 "\n" > + "\t mmap off : 0x%" PRIx64 "\n", > + i, reg->size, > + reg->guest_phys_addr, > + reg->guest_user_addr, > + reg->host_user_addr, > + (uint64_t)(uintptr_t)reg->mmap_addr, > + reg->mmap_size, > + alignment, > + (uint64_t)(uintptr_t)reg->host_user_addr - > + (uint64_t)(uintptr_t)reg- > >mmap_addr > + ); > + continue; > + } > + > reg->guest_phys_addr =3D > memory.regions[i].guest_phys_addr; > reg->guest_user_addr =3D memory.regions[i].userspace_addr; > reg->size =3D memory.regions[i].memory_size; > -- > 2.1.0