From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f67.google.com (mail-oi0-f67.google.com [209.85.218.67]) by dpdk.org (Postfix) with ESMTP id 25A172BF7; Thu, 23 Aug 2018 17:34:08 +0200 (CEST) Received: by mail-oi0-f67.google.com with SMTP id l202-v6so3091364oig.7; Thu, 23 Aug 2018 08:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=1eVqpaosW2yFP9d8/AFxhjrEzhrKR64/PZRuxJopqLY=; b=kkzs1Xw0HHch4C0tYGBO97gATghS/n0U3g4eAyx3QWzH38JcvS0wDjcLj/5nNirhPA s8t02nwqFvVzoYMYBtHYdOz61juYBG93j2Ogfm5jQN4C9z9KGlgguS1zjyeIKQh+r6EX eGGgMScZFXKc6m1dc29MDycd9Ldf+KcH8lmXiexrB3nJ1/gUfIklfd0V3KBP4P/BHHGU Rx1t9U91RGI0vyAORO0BGyzgCCoh6VYmTEZGvAth54TI2EK1PbpkBAZKpoB4vFfm36Hw ycNpbEUyyPlNYAESexmYP85CLQm/dBuG6DLEaM39CQQGsFbGUl+Mvs1QV1YaC+bA5w88 1JBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=1eVqpaosW2yFP9d8/AFxhjrEzhrKR64/PZRuxJopqLY=; b=SWqkSgtvsbjorrVKOGvFSdb496ahfcpZvcFya9ZuHbnBQSkb734dNmwn+51VbmkAik 4d1zg4/WLkodW4ieoxy5UHjBgOM1Vw3UDPNjxE7K/yRNpGnQASqlRZOuxwvgt0l1UWQ1 9eDymhxmbbQNONwXP40AvTcpFcZIg0el02zyFF+6RO2oiULd20e+kOpVaBgO/ruT33yG MsVdplrT8txGVZhHAaNEcXX3UDxBXb+UDB4IRfg8DGnV7VU9xY+JN8wR0yYvwAJVrw3m 7yZIC7PuR5tN1ffwUXlAjvqKGOTvJ5qjCppWyCqBGZgXTiUpRakPOX5Ju6QrkxE2Svfd f0WA== X-Gm-Message-State: APzg51Azle4apncSgKHNPNgHXBGXL7n4jFP1yesy8SStAQBfNmflulyb JlbdbJcIxNWA18yJOFSAFPwylowa68NO5XgBVsA= X-Google-Smtp-Source: ANB0VdbK4sRVvhmGC+7yejY8Se9Nkybodk6MDjQagBoZAsupYd4cL3pQZxNW5r4p7e4G2ndAK6UfBLaVsEepDflywlo= X-Received: by 2002:aca:fdd5:: with SMTP id b204-v6mr8314928oii.30.1535038447422; Thu, 23 Aug 2018 08:34:07 -0700 (PDT) MIME-Version: 1.0 References: <20180823025721.18300-1-tiwei.bie@intel.com> <992313c9-cd03-03db-888b-f5dd8c072cf0@intel.com> <34c163f6-b50f-06c0-d099-7d0ae5fbc5b0@intel.com> In-Reply-To: <34c163f6-b50f-06c0-d099-7d0ae5fbc5b0@intel.com> From: Sean Harte Date: Thu, 23 Aug 2018 16:33:30 +0100 Message-ID: To: anatoly.burakov@intel.com Cc: tiwei.bie@intel.com, maxime.coquelin@redhat.com, zhihong.wang@intel.com, dev@dpdk.org, stable@dpdk.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-stable] [PATCH] net/virtio-user: fix memory hotplug support X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Aug 2018 15:34:09 -0000 On Thu, 23 Aug 2018 at 15:01, Burakov, Anatoly wrote: > > On 23-Aug-18 12:19 PM, Sean Harte wrote: > > On Thu, 23 Aug 2018 at 10:05, Burakov, Anatoly > > wrote: > >> > >> On 23-Aug-18 3:57 AM, Tiwei Bie wrote: > >>> Deadlock can occur when allocating memory if a vhost-kernel > >>> based virtio-user device is in use. Besides, it's possible > >>> to have much more than 64 non-contiguous hugepage backed > >>> memory regions due to the memory hotplug, which may cause > >>> problems when handling VHOST_SET_MEM_TABLE request. A better > >>> solution is to have the virtio-user pass all the VA ranges > >>> reserved by DPDK to vhost-kernel. > >>> > >>> Bugzilla ID: 81 > >>> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug") > >>> Cc: stable@dpdk.org > >>> > >>> Reported-by: Se=C3=A1n Harte > >>> Signed-off-by: Tiwei Bie > >>> --- > >>> drivers/net/virtio/virtio_user/vhost_kernel.c | 64 ++++++++-------= ---- > >>> 1 file changed, 27 insertions(+), 37 deletions(-) > >>> > >>> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/= net/virtio/virtio_user/vhost_kernel.c > >>> index b2444096c..49bd1b821 100644 > >>> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c > >>> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c > >>> @@ -70,41 +70,12 @@ static uint64_t vhost_req_user_to_kernel[] =3D { > >>> [VHOST_USER_SET_MEM_TABLE] =3D VHOST_SET_MEM_TABLE, > >>> }; > >>> > >>> -struct walk_arg { > >>> - struct vhost_memory_kernel *vm; > >>> - uint32_t region_nr; > >>> -}; > >>> -static int > >>> -add_memory_region(const struct rte_memseg_list *msl __rte_unused, > >>> - const struct rte_memseg *ms, size_t len, void *arg) > >>> -{ > >>> - struct walk_arg *wa =3D arg; > >>> - struct vhost_memory_region *mr; > >>> - void *start_addr; > >>> - > >>> - if (wa->region_nr >=3D max_regions) > >>> - return -1; > >>> - > >>> - mr =3D &wa->vm->regions[wa->region_nr++]; > >>> - start_addr =3D ms->addr; > >>> - > >>> - mr->guest_phys_addr =3D (uint64_t)(uintptr_t)start_addr; > >>> - mr->userspace_addr =3D (uint64_t)(uintptr_t)start_addr; > >>> - mr->memory_size =3D len; > >>> - mr->mmap_offset =3D 0; > >>> - > >>> - return 0; > >>> -} > >>> - > >>> -/* By default, vhost kernel module allows 64 regions, but DPDK allow= s > >>> - * 256 segments. As a relief, below function merges those virtually > >>> - * adjacent memsegs into one region. > >>> - */ > >>> static struct vhost_memory_kernel * > >>> prepare_vhost_memory_kernel(void) > >>> { > >>> + struct rte_mem_config *mcfg =3D rte_eal_get_configuration()->me= m_config; > >>> struct vhost_memory_kernel *vm; > >>> - struct walk_arg wa; > >>> + uint32_t region_nr =3D 0, i; > >>> > >>> vm =3D malloc(sizeof(struct vhost_memory_kernel) + > >>> max_regions * > >>> @@ -112,15 +83,34 @@ prepare_vhost_memory_kernel(void) > >>> if (!vm) > >>> return NULL; > >>> > >>> - wa.region_nr =3D 0; > >>> - wa.vm =3D vm; > >>> + for (i =3D 0; i < RTE_MAX_MEMSEG_LISTS; i++) { > >>> + struct rte_memseg_list *msl =3D &mcfg->memsegs[i]; > >>> + struct vhost_memory_region *mr; > >>> + void *start_addr; > >>> + uint64_t len; > >> > >> There is a rte_memseg_list_walk() - please do not iterate over memseg > >> lists manually. > >> > > > > rte_memseg_list_walk() can't be used here because > > prepare_vhost_memory_kernel() is sometimes called from a memory > > callback. It will then hang trying to get a read lock on > > memory_hotplug_lock. > > OK, so use rte_memseg_list_walk_thread_unsafe(). > > > I don't think the rte_memseg_list_walk_thread_unsafe() function is > > appropriate because prepare_vhost_memory_kernel() may not always be > > called from a memory callback. > > And how is this different? What you're doing here is identical to > calling rte_memseg_list_walk_thread_unsafe() (that's precisely what it > does internally - check the code!), except that you're doing it manually > and not using DPDK API, which makes your code dependent on internals of > DPDK's memory implementation. > > So, this function may or may not be called from a callback, but you're > using thread-unsafe walk anyway. I think you should call either > thread-safe or thread-unsafe version depending on whether you're being > called from a callback or not. > I'm not the patch author :-) Switching to the _thread_unsafe() function might convert the easy-to-debug deadlock issue into a tricky-to-debug race condition. Picking between the two implementations depending on the calling context would work. Another approach might be a different type of locking system that allows reading when the same thread already holds the write lock. > > > >>> > >>> - if (rte_memseg_contig_walk(add_memory_region, &wa) < 0) { > >>> - free(vm); > >>> - return NULL; > >>> + start_addr =3D msl->base_va; > >>> + len =3D msl->page_sz * msl->memseg_arr.len; > >>> + > >>> + if (start_addr =3D=3D NULL || len =3D=3D 0) > >>> + continue; > >>> + > >>> + if (region_nr >=3D max_regions) { > >>> + free(vm); > >>> + return NULL; > >>> + } > >>> + > >>> + mr =3D &vm->regions[region_nr++]; > >>> + mr->guest_phys_addr =3D (uint64_t)(uintptr_t)start_addr= ; > >>> + mr->userspace_addr =3D (uint64_t)(uintptr_t)start_addr; > >>> + mr->memory_size =3D len; > >>> + mr->mmap_offset =3D 0; /* flags_padding */ > >>> + > >>> + PMD_DRV_LOG(DEBUG, "index=3D%u, addr=3D%p len=3D%" PRIu= 64, > >>> + i, start_addr, len); > >>> } > >>> > >>> - vm->nregions =3D wa.region_nr; > >>> + vm->nregions =3D region_nr; > >>> vm->padding =3D 0; > >>> return vm; > >>> } > >>> > >> > >> > >> -- > >> Thanks, > >> Anatoly > > > > > -- > Thanks, > Anatoly