From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 1FD7E4C97; Thu, 23 Aug 2018 16:01:36 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Aug 2018 07:01:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,278,1531810800"; d="scan'208";a="82796380" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.148]) ([10.237.220.148]) by fmsmga004.fm.intel.com with ESMTP; 23 Aug 2018 07:01:34 -0700 To: Sean Harte Cc: tiwei.bie@intel.com, maxime.coquelin@redhat.com, zhihong.wang@intel.com, dev@dpdk.org, stable@dpdk.org References: <20180823025721.18300-1-tiwei.bie@intel.com> <992313c9-cd03-03db-888b-f5dd8c072cf0@intel.com> From: "Burakov, Anatoly" Message-ID: <34c163f6-b50f-06c0-d099-7d0ae5fbc5b0@intel.com> Date: Thu, 23 Aug 2018 15:01:30 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH] net/virtio-user: fix memory hotplug support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Aug 2018 14:01:37 -0000 On 23-Aug-18 12:19 PM, Sean Harte wrote: > On Thu, 23 Aug 2018 at 10:05, Burakov, Anatoly > wrote: >> >> On 23-Aug-18 3:57 AM, Tiwei Bie wrote: >>> Deadlock can occur when allocating memory if a vhost-kernel >>> based virtio-user device is in use. Besides, it's possible >>> to have much more than 64 non-contiguous hugepage backed >>> memory regions due to the memory hotplug, which may cause >>> problems when handling VHOST_SET_MEM_TABLE request. A better >>> solution is to have the virtio-user pass all the VA ranges >>> reserved by DPDK to vhost-kernel. >>> >>> Bugzilla ID: 81 >>> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug") >>> Cc: stable@dpdk.org >>> >>> Reported-by: Seán Harte >>> Signed-off-by: Tiwei Bie >>> --- >>> drivers/net/virtio/virtio_user/vhost_kernel.c | 64 ++++++++----------- >>> 1 file changed, 27 insertions(+), 37 deletions(-) >>> >>> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c >>> index b2444096c..49bd1b821 100644 >>> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c >>> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c >>> @@ -70,41 +70,12 @@ static uint64_t vhost_req_user_to_kernel[] = { >>> [VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE, >>> }; >>> >>> -struct walk_arg { >>> - struct vhost_memory_kernel *vm; >>> - uint32_t region_nr; >>> -}; >>> -static int >>> -add_memory_region(const struct rte_memseg_list *msl __rte_unused, >>> - const struct rte_memseg *ms, size_t len, void *arg) >>> -{ >>> - struct walk_arg *wa = arg; >>> - struct vhost_memory_region *mr; >>> - void *start_addr; >>> - >>> - if (wa->region_nr >= max_regions) >>> - return -1; >>> - >>> - mr = &wa->vm->regions[wa->region_nr++]; >>> - start_addr = ms->addr; >>> - >>> - mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr; >>> - mr->userspace_addr = (uint64_t)(uintptr_t)start_addr; >>> - mr->memory_size = len; >>> - mr->mmap_offset = 0; >>> - >>> - return 0; >>> -} >>> - >>> -/* By default, vhost kernel module allows 64 regions, but DPDK allows >>> - * 256 segments. As a relief, below function merges those virtually >>> - * adjacent memsegs into one region. >>> - */ >>> static struct vhost_memory_kernel * >>> prepare_vhost_memory_kernel(void) >>> { >>> + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; >>> struct vhost_memory_kernel *vm; >>> - struct walk_arg wa; >>> + uint32_t region_nr = 0, i; >>> >>> vm = malloc(sizeof(struct vhost_memory_kernel) + >>> max_regions * >>> @@ -112,15 +83,34 @@ prepare_vhost_memory_kernel(void) >>> if (!vm) >>> return NULL; >>> >>> - wa.region_nr = 0; >>> - wa.vm = vm; >>> + for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { >>> + struct rte_memseg_list *msl = &mcfg->memsegs[i]; >>> + struct vhost_memory_region *mr; >>> + void *start_addr; >>> + uint64_t len; >> >> There is a rte_memseg_list_walk() - please do not iterate over memseg >> lists manually. >> > > rte_memseg_list_walk() can't be used here because > prepare_vhost_memory_kernel() is sometimes called from a memory > callback. It will then hang trying to get a read lock on > memory_hotplug_lock. OK, so use rte_memseg_list_walk_thread_unsafe(). > I don't think the rte_memseg_list_walk_thread_unsafe() function is > appropriate because prepare_vhost_memory_kernel() may not always be > called from a memory callback. And how is this different? What you're doing here is identical to calling rte_memseg_list_walk_thread_unsafe() (that's precisely what it does internally - check the code!), except that you're doing it manually and not using DPDK API, which makes your code dependent on internals of DPDK's memory implementation. So, this function may or may not be called from a callback, but you're using thread-unsafe walk anyway. I think you should call either thread-safe or thread-unsafe version depending on whether you're being called from a callback or not. > >>> >>> - if (rte_memseg_contig_walk(add_memory_region, &wa) < 0) { >>> - free(vm); >>> - return NULL; >>> + start_addr = msl->base_va; >>> + len = msl->page_sz * msl->memseg_arr.len; >>> + >>> + if (start_addr == NULL || len == 0) >>> + continue; >>> + >>> + if (region_nr >= max_regions) { >>> + free(vm); >>> + return NULL; >>> + } >>> + >>> + mr = &vm->regions[region_nr++]; >>> + mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr; >>> + mr->userspace_addr = (uint64_t)(uintptr_t)start_addr; >>> + mr->memory_size = len; >>> + mr->mmap_offset = 0; /* flags_padding */ >>> + >>> + PMD_DRV_LOG(DEBUG, "index=%u, addr=%p len=%" PRIu64, >>> + i, start_addr, len); >>> } >>> >>> - vm->nregions = wa.region_nr; >>> + vm->nregions = region_nr; >>> vm->padding = 0; >>> return vm; >>> } >>> >> >> >> -- >> Thanks, >> Anatoly > -- Thanks, Anatoly