From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id C3C3A2B96; Thu, 29 Mar 2018 14:59:49 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Mar 2018 05:59:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,376,1517904000"; d="scan'208";a="42320479" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.29.250]) ([10.255.29.250]) by fmsmga001.fm.intel.com with ESMTP; 29 Mar 2018 05:59:46 -0700 To: Maxime Coquelin , Victor Kaplansky References: <1510746068-143223-1-git-send-email-jianfeng.tan@intel.com> <20171205141954.GF9111@yliu-dev> <1c5d3010-c7a9-a2b1-909e-dd43cde24af1@redhat.com> Cc: "dev@dpdk.org" , "stable@dpdk.org" , "Yang, Yi Y" , "Harris, James R" , "Yang, Ziye" , "Liu, Changpeng" , "Wodkowski, PawelX" , "Stojaczyk, DariuszX" , Yuanhan Liu From: "Tan, Jianfeng" Message-ID: <1aac21ef-e566-857f-3ccd-a9e5be7b3b37@intel.com> Date: Thu, 29 Mar 2018 20:59:45 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1c5d3010-c7a9-a2b1-909e-dd43cde24af1@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Mar 2018 12:59:50 -0000 On 3/29/2018 3:35 PM, Maxime Coquelin wrote: > > > On 03/29/2018 09:01 AM, Tan, Jianfeng wrote: >> Hi Maxime and Victor, >> >>> -----Original Message----- >>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >>> Sent: Tuesday, December 5, 2017 10:28 PM >>> To: Yuanhan Liu; Tan, Jianfeng; Victor Kaplansky >>> Cc: dev@dpdk.org; stable@dpdk.org; Yang, Yi Y >>> Subject: Re: [PATCH] vhost: fix segfault as handle set_mem_table >>> message >>> >>> >>> >>> On 12/05/2017 03:19 PM, Yuanhan Liu wrote: >>>> On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote: >>>>> >>>>> >>>>> On 11/15/2017 12:41 PM, Jianfeng Tan wrote: >>>>>> In a running VM, operations (like device attach/detach) will >>>>>> trigger the QEMU to resend set_mem_table to vhost-user backend. >>>>>> >>>>>> DPDK vhost-user handles this message rudely by unmap all existing >>>>>> regions and map new ones. This might lead to segfault if there >>>>>> is pmd thread just trying to touch those unmapped memory regions. >>>>>> >>>>>> But for most cases, except VM memory hotplug, >>> >>> FYI, Victor is working on implementing a lock-less protection mechanism >>> to prevent crashes in such cases. It is intended first to protect >>> log_base in case of multiqueue + live-migration, but would solve thi >>> issue too. >> >> Bring this issue back for discussion. >> >> Reported by SPDK guys, even with per-queue lock, they could still run >> into crash as of memory hot plug or unplug. >> In detail, you add the lock in an internal struct, vhost_queue which >> is, unfortunately, not visible to the external datapath, like >> vhost-scsi in SPDK. > > Yes, I agree current solution is not enough > >> The memory hot plug and unplug would be the main issue from SPDK side >> so far. For this specific issue, I think we can continue this patch >> to filter out the changed regions, and keep unchanged regions not >> remapped. How do you think that we move forward on this specific memory issue? I think it can be parallel with the general mechanism below. >> >> But I know that the per-vq lock is not only to resolve the memory >> table issue, some other vhost-user messages could also lead to >> problems? If yes, shall we take a step back, to think about how to >> solve this kind of issues for backends, like vhost-scsi. > > Right, any message that can change the device or virtqueue states can be > problematic. > >> Thoughts? > > In another thread, Apologize for starting another thread. > SPDK people proposed to destroy and re-create the > device for every message. I think this is not acceptable. It sounds a little overkill and error-prone in my first impression. > > I proposed an alternative that I think would work: > - external backends & applications implements the .vring_state_changed() > callback. On disable it stops processing the rings and only return > once all descriptor buffers are processed. On enable, they resume the > rings processing. OK, this gives a chance for external backends & applications to sync (lock or event) with polling threads, just like how destroy_device() works. > - In vhost lib, we implement vhost_vring_pause and vhost_vring_resume > functions. In pause function, we save device state (enable or > disable) in a variable, and if the ring is enabled at device level it > calls .vring_state_changed() with disabled state. In resume, it checks > the vring state in the device and call .vring_state_changed() with > enable state if it was enabled in device state. > > So, I think that would work but I hadn't had a clear reply from SPDK > people proving it wouldn't. > > They asked we revert Victor's patch, but I don't see the need as it does > not hurt SPDK (but doesn't protect anything for them I agree), while it > really fixes real issues with internal Net backend. I agree with you. As SPDK has no chance to call the lock, it can not be affected. I think what people (including myself) are really against is adding lock in DPDK PMD. But we did not find a better way so far. > > What do you think of my proposal? Do you see other alternative? Sounds a feasible way. Let's wait and see how SPDK guys think. Thanks, Jianfeng > > Thanks, > Maxime > >> Thanks, >> Jianfeng