From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by dpdk.org (Postfix) with ESMTP id 5760B20F; Thu, 29 Mar 2018 09:35:59 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BD6794067EF0; Thu, 29 Mar 2018 07:35:58 +0000 (UTC) Received: from [10.36.112.54] (ovpn-112-54.ams2.redhat.com [10.36.112.54]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C77DC2026DFD; Thu, 29 Mar 2018 07:35:54 +0000 (UTC) To: "Tan, Jianfeng" , Victor Kaplansky Cc: "dev@dpdk.org" , "stable@dpdk.org" , "Yang, Yi Y" , "Harris, James R" , "Yang, Ziye" , "Liu, Changpeng" , "Wodkowski, PawelX" , "Stojaczyk, DariuszX" , Yuanhan Liu References: <1510746068-143223-1-git-send-email-jianfeng.tan@intel.com> <20171205141954.GF9111@yliu-dev> From: Maxime Coquelin Message-ID: <1c5d3010-c7a9-a2b1-909e-dd43cde24af1@redhat.com> Date: Thu, 29 Mar 2018 09:35:53 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 29 Mar 2018 07:35:58 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 29 Mar 2018 07:35:58 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:'' Subject: Re: [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Mar 2018 07:36:00 -0000 On 03/29/2018 09:01 AM, Tan, Jianfeng wrote: > Hi Maxime and Victor, > >> -----Original Message----- >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >> Sent: Tuesday, December 5, 2017 10:28 PM >> To: Yuanhan Liu; Tan, Jianfeng; Victor Kaplansky >> Cc: dev@dpdk.org; stable@dpdk.org; Yang, Yi Y >> Subject: Re: [PATCH] vhost: fix segfault as handle set_mem_table message >> >> >> >> On 12/05/2017 03:19 PM, Yuanhan Liu wrote: >>> On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote: >>>> >>>> >>>> On 11/15/2017 12:41 PM, Jianfeng Tan wrote: >>>>> In a running VM, operations (like device attach/detach) will >>>>> trigger the QEMU to resend set_mem_table to vhost-user backend. >>>>> >>>>> DPDK vhost-user handles this message rudely by unmap all existing >>>>> regions and map new ones. This might lead to segfault if there >>>>> is pmd thread just trying to touch those unmapped memory regions. >>>>> >>>>> But for most cases, except VM memory hotplug, >> >> FYI, Victor is working on implementing a lock-less protection mechanism >> to prevent crashes in such cases. It is intended first to protect >> log_base in case of multiqueue + live-migration, but would solve thi >> issue too. > > Bring this issue back for discussion. > > Reported by SPDK guys, even with per-queue lock, they could still run into crash as of memory hot plug or unplug. > In detail, you add the lock in an internal struct, vhost_queue which is, unfortunately, not visible to the external datapath, like vhost-scsi in SPDK. Yes, I agree current solution is not enough > The memory hot plug and unplug would be the main issue from SPDK side so far. For this specific issue, I think we can continue this patch to filter out the changed regions, and keep unchanged regions not remapped. > > But I know that the per-vq lock is not only to resolve the memory table issue, some other vhost-user messages could also lead to problems? If yes, shall we take a step back, to think about how to solve this kind of issues for backends, like vhost-scsi. Right, any message that can change the device or virtqueue states can be problematic. > Thoughts? In another thread, SPDK people proposed to destroy and re-create the device for every message. I think this is not acceptable. I proposed an alternative that I think would work: - external backends & applications implements the .vring_state_changed() callback. On disable it stops processing the rings and only return once all descriptor buffers are processed. On enable, they resume the rings processing. - In vhost lib, we implement vhost_vring_pause and vhost_vring_resume functions. In pause function, we save device state (enable or disable) in a variable, and if the ring is enabled at device level it calls .vring_state_changed() with disabled state. In resume, it checks the vring state in the device and call .vring_state_changed() with enable state if it was enabled in device state. So, I think that would work but I hadn't had a clear reply from SPDK people proving it wouldn't. They asked we revert Victor's patch, but I don't see the need as it does not hurt SPDK (but doesn't protect anything for them I agree), while it really fixes real issues with internal Net backend. What do you think of my proposal? Do you see other alternative? Thanks, Maxime > Thanks, > Jianfeng > >> >>>>>> QEMU still sends the >>>>> set_mem_table message even the memory regions are not changed as >>>>> QEMU vhost-user filters out those not backed by file (fd > 0). >>>>> >>>>> To fix this case, we add a check in the handler to see if the >>>>> memory regions are really changed; if not, we just keep old memory >>>>> regions. >>>>> >>>>> Fixes: 8f972312b8f4 ("vhost: support vhost-user") >>>>> >>>>> CC: stable@dpdk.org >>>>> >>>>> CC: Yuanhan Liu >>>>> CC: Maxime Coquelin >>>>> >>>>> Reported-by: Yang Zhang >>>>> Reported-by: Xin Long >>>>> Signed-off-by: Yi Yang >>>>> Signed-off-by: Jianfeng Tan >>>>> --- >>>>> lib/librte_vhost/vhost_user.c | 33 >> +++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 33 insertions(+) >>>> >>>> Reviewed-by: Maxime Coquelin >>> >>> Applied to dpdk-next-virtio. >>> >>> Thanks. >>> >>> --yliu >>> >> >> Maxime