From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6A740A0547; Mon, 19 Apr 2021 09:13:33 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2616A40683; Mon, 19 Apr 2021 09:13:33 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 8F68F40040 for ; Mon, 19 Apr 2021 09:13:31 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618816411; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zNBI87SJp9HNBEaTWHeyRpaFo0irJvZRyv75+my40LE=; b=KpbayPFkTah0XkI/+aphIaovpfpAojjvdVFW/FPzgEWd5pJrGDoCrljvB1dCAdIHt6zo1r m60GI139HqhIH3SMqBviJVi0fShpq8DafLID0hDOjSAqUUngeO5OBD16SYRNk0CjYSqumH PoEIRMPLp36MRKWd9qkzxNGTwvjFjQE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-418-O2Kd6-PAMb-76fyta6_Dmg-1; Mon, 19 Apr 2021 03:13:27 -0400 X-MC-Unique: O2Kd6-PAMb-76fyta6_Dmg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0E91F83DD28; Mon, 19 Apr 2021 07:13:26 +0000 (UTC) Received: from [10.36.110.28] (unknown [10.36.110.28]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E8BB71037E98; Mon, 19 Apr 2021 07:13:23 +0000 (UTC) To: "Hu, Jiayu" , "dev@dpdk.org" Cc: "Xia, Chenbo" , "Wang, Yinan" , "Pai G, Sunil" , "Jiang, Cheng1" References: <1615985773-406787-1-git-send-email-jiayu.hu@intel.com> <1617368642-131298-1-git-send-email-jiayu.hu@intel.com> <1617368642-131298-4-git-send-email-jiayu.hu@intel.com> <0b974e3cd6534947ba6568d93d9431ab@intel.com> <9eb47884f1254cc98b9f1c60dfe9714c@intel.com> <4fc5b419864e4b829623099cd31de9aa@intel.com> <07eb8178-9523-6db3-1056-e96b48e65139@redhat.com> From: Maxime Coquelin Message-ID: Date: Mon, 19 Apr 2021 09:13:22 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=maxime.coquelin@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v2 3/4] vhost: avoid deadlock on async register X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 4/19/21 6:10 AM, Hu, Jiayu wrote: > Hi Maxime, > >> -----Original Message----- >> From: Maxime Coquelin >> Sent: Friday, April 16, 2021 4:34 PM >> To: Hu, Jiayu ; dev@dpdk.org >> Cc: Xia, Chenbo ; Wang, Yinan >> ; Pai G, Sunil ; Jiang, Cheng1 >> >> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register >> >> >> >> On 4/16/21 10:18 AM, Hu, Jiayu wrote: >>> Hi Maxime, >>> >>>> -----Original Message----- >>>> From: Maxime Coquelin >>>> Sent: Friday, April 16, 2021 2:35 PM >>>> To: Hu, Jiayu ; dev@dpdk.org >>>> Cc: Xia, Chenbo ; Wang, Yinan >>>> ; Pai G, Sunil ; Jiang, >> Cheng1 >>>> >>>> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register >>>> >>>> Hi Jiayu, >>>> >>>> On 4/16/21 4:19 AM, Hu, Jiayu wrote: >>>>> Hi Maxime, >>>>>>>> On 4/14/21 3:40 AM, Hu, Jiayu wrote: >>>>>>>>> Hi Maxime, >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Maxime Coquelin >>>>>>>>>> Sent: Tuesday, April 13, 2021 7:33 PM >>>>>>>>>> To: Hu, Jiayu ; dev@dpdk.org >>>>>>>>>> Cc: Xia, Chenbo ; Wang, Yinan >>>>>>>>>> ; Pai G, Sunil ; >> Jiang, >>>>>>>> Cheng1 >>>>>>>>>> >>>>>>>>>> Subject: Re: [PATCH v2 3/4] vhost: avoid deadlock on async register >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 4/2/21 3:04 PM, Jiayu Hu wrote: >>>>>>>>>>> Users can register async copy device in vring_state_changed(), >>>>>>>>>>> when vhost queue is enabled. However, a deadlock occurs inside >>>>>>>>>>> rte_vhost_async_channel_register(), if >>>>>>>>>> VHOST_USER_F_PROTOCOL_FEATURES >>>>>>>>>>> is not supported, as vhost_user_msg_handler() takes vq- >>>>> access_lock >>>>>>>>>>> before calling vhost_user_set_vring_kick(). >>>>>>>>>>> >>>>>>>>>>> This patch avoids async register deadlock by removing calling >>>>>>>>>>> vring_state_changed() in vhost_user_set_vring_kick(). It's safe >>>>>>>>>>> as vhost_user_msg_handler() will call vring_state_changed() >>>> anyway. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Jiayu Hu >>>>>>>>>>> --- >>>>>>>>>>> lib/librte_vhost/vhost_user.c | 3 --- >>>>>>>>>>> 1 file changed, 3 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/lib/librte_vhost/vhost_user.c >>>>>> b/lib/librte_vhost/vhost_user.c >>>>>>>>>>> index 44c0452..8f0eba6 100644 >>>>>>>>>>> --- a/lib/librte_vhost/vhost_user.c >>>>>>>>>>> +++ b/lib/librte_vhost/vhost_user.c >>>>>>>>>>> @@ -1918,9 +1918,6 @@ vhost_user_set_vring_kick(struct >>>> virtio_net >>>>>>>>>> **pdev, struct VhostUserMsg *msg, >>>>>>>>>>> */ >>>>>>>>>>> if (!(dev->features & (1ULL << >>>>>>>>>> VHOST_USER_F_PROTOCOL_FEATURES))) { >>>>>>>>>>> vq->enabled = true; >>>>>>>>>>> - if (dev->notify_ops->vring_state_changed) >>>>>>>>>>> - dev->notify_ops->vring_state_changed( >>>>>>>>>>> - dev->vid, file.index, 1); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> if (vq->ready) { >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> As replied earlier on v1, I agree the call to vring_state_changed >> here >>>>>>>>>> is not needed. But it might not be enough, there are other cases >>>> where >>>>>>>>>> you could have issues. >>>>>>>>> >>>>>>>>> vhost_user_notify_queue_state() can be called in three cases: >>>>>>>>> 1. when vq ready status changes, vhost_user_msg_handler() calls it >> to >>>>>>>> notify >>>>>>>>> backend. But vhost_user_msg_handler() doesn't take lock before >>>> calling >>>>>> it. >>>>>>>>> So in this case, no deadlock occurs in async register. >>>>>>>>> >>>>>>>>> 2. if vq->ready is true, vhost_user_set_vring_call() calls it to notify >>>>>> backend >>>>>>>>> vq is not enabled. Although vhost_user_set_vring_call() is protected >> by >>>>>> lock, >>>>>>>>> async register is called only if vq is enabled, so async register will not >>>> be >>>>>>>> called >>>>>>>>> in this case. >>>>>>>>> >>>>>>>>> 3. If vq->ready is true, vhost_user_set_vring_kick() calls it to notify >>>>>> backend >>>>>>>>> vq is not enabled. Same as #2, async register is called only when vq >> is >>>>>>>> enabled. >>>>>>>>> Even if vhost_user_set_vring_kick() is protected by lock, there is no >>>>>>>> deadlock in >>>>>>>>> async register, as it will not be called in this case. >>>>>>>>> >>>>>>>>> In summary, I think there is no deadlock issue in async register if we >>>>>>>>> can remove calling vring_state_change() in >>>> vhost_user_set_vring_kick(). >>>>>>>> >>>>>>>> >>>>>>>> But unregister one could be called in theory no? Otherwise it would >>>> look >>>>>>>> unbalanced. At least on disabled notification, the app should make >> sure >>>>>>>> the DMA transfers to and from the vring are stopped before it >> returns >>>>>>>> from the callabck. Otherwise it could lead to undefined behavior. >>>>>>> >>>>>>> Right, users need to call unregister, but we cannot remove calling >>>>>>> vhost_user_notify_queue_state() in case #2 and #3, IMHO. So to >>>>>>> avoid deadlock, we recommended users to call async unregister in >>>>>>> destroy_device(), instead of on vring disabled notification. Does it >>>>>>> make sense to you? >>>>>> >>>>>> Calling async unregister in destroy device is fine by me. But I'm more >>>>>> concerned about DMA transations not being stopped when the ring >>>> becomes >>>>>> disabled. >>>>> If ring becomes disabled, no more new pkts go into async data path, as >>>>> virtio_dev_rx_async_submit() returns if vq->enabled is false. >>>>> >>>>>> >>>>>> I cannot say if you are doing it right, because the vhost example does >>>>>> not implement the vring_state_changed callback. >>>>>> It is not a problem with the sync datapath as we have the lock >>>>>> protection + enabled variable that prevents to process the rings when it >>>>>> gets stopped. >>>>>> But for the async path, if you have programmed DMA transfers, you >> need >>>>>> to rely on the vring_state_change to block the control path while the >>>>>> transfers are cancelled or done. >>>>> >>>>> I am not sure if I understand your concern correctly, but for async data >>>> path, >>>>> enable variable can prevent it from enqueue new pkts when ring is >>>> disabled, as >>>>> virtio_dev_rx_async_submit() check enable variable before processing >> ring; >>>>> in addition, lock protection can stop async data path as >>>> virtio_dev_rx_async_submit() >>>>> acquires lock too. For example, in the case that front-end updates kickfd, >>>>> set_vring_kick() will notify backend that vring is stopped by calling >>>>> vhost_user_notify_queue_state(). In this case, sync data path will stop as >> a >>>> result of >>>>> lock protection, and I think it's the same for async data path, as it >> acquires >>>> lock before >>>>> processing ring. >>>> >>>> >>>> What about memory hotplug happening while the DMA transfers are on- >>>> going >>>> for example? >>>> >>>> In this case, the lock is enough for sync datapath, but is not for async >>>> one. >>>> >>>> If a VHOST_USER_SET_MEM_TABLE request is received while >>>> virtio_dev_rx_async_submit(), it will be blocked until >>>> virtio_dev_rx_async_submit() returns but the DMA transfers may not be >>>> finished yet. >>>> When unblocked the control thread will call vhost_user_set_mem_table(), >>>> which will mnumap the current memory regions before mmaping the new >>>> ones while DMA transfers are on-going. >>>> >>>> To fix this, we need to call the vring_state_changed callback for all >>>> the virtqueues with disable state. in your app, you need to stop DMA >>>> transfers when disabled state notification happens, or block the >>>> callback while the transfer is done. >>> >>> Let me summarize the problem: >>> when frontend memory is hot-plug, host application needs to stop >>> DMA transfer for all virtqueues, which is done by emptying in-flight >>> DMA copies and unregister async inside vring_state_changed(). >> >> I don't think unregistering async channels is mandatory. >> On disable notification, the callback has to block until all on-going >> DMA transfers are done. New transfers won't be schedules since >> virtio_dev_rx_async_submit() will be blocked by the lock while the >> memory regions update is being done. > > Yes, unregister is not a must, but the problem is that > rte_vhost_poll_enqueue_completed() takes lock too. > In VM memory hot-plug case, emptying in-flight pkts in > vring_state_changed() may be OK, as it is called by > vhost_user_msg_handler() (please correct me if I am wrong), This is wrong. As I said in previous email, we need a fix in vhost_user_set_mem_table() to call the vring_state_changed callback with disabled state before the shared memory get unmapped. This was not necessary for sync datapath as the lock already protects that, but this is mandatory for async path (and maybe SPDK too, but I didn't checked). > which doesn't take lock. But if it is called inside set_vring_kick()/_call(), > a deadlock occurs in rte_vhost_poll_enqueue_completed(), even if user > app doesn't call async unregister API. It will be taken with a lock too in set_mem_table(). >> >>> But >>> vhost library takes lock before entering vring_state_changed(), and >>> async unregister and rte_vhost_poll_enqueue_completed() both >>> acquire lock, so deadlock occurs inside rte_vhost_poll_enqueue_ >>> completed() or async unregister. (please correct me if am wrong) >>> >>> There may be two possible solutions: >>> 1. remove lock before calling vring_state_change() in vhost library; >> >> I don't think so. If you release the lock in vring_state_change(), it >> will enable the app to enqueue/dequeue descriptors in the middle of >> handling the request. > > Sorry, I didn't express my opinion clearly. I mean remove the following the > code in set_vring_kick() and set_vring_call(): > - if (vq->ready) { > - vq->ready = false; > - vhost_user_notify_queue_state(dev, file.index, 0); > - } > > IMHO, in the both cases, the purpose of calling vhost_user_notify_queue_state() > is to tell backend vring is not ready to use as a result of callfd/kickfd update; > after that, vhost_user_msg_handler() calls it again to notify backend of vring > ready to process. But there is lock protection before entering > set_vring_kick()/_call(), so data path threads will not process vring during updating > callfd/kickfd anyway. After updating callfd/kickfd, I think data path threads are already > safe to start to process vring, so there is no need for a ready notification in > vhost_user_msg_hander(). > > BTW, lock protection for vhost_user_notify_queue_state() is only in > set_vring_kick()/_call(), but not in vhost_msg_handler(). My point in previous reply was that the lock is mandatory in set_vring_state and other places, so in order to have consistent behaviour, the lock should be taken every time we call the callback so even in vhost_msg_handler(). > Thanks, > Jiayu >> >>> 2. provide a new callback without acquiring lock for DMA accelerated >>> case. The callback is used to notify backend that you need to stop >>> DMA transfer. >> >> No, in your app just wait for DMA transfers to be finished before >> returning from the callback. >> >> Please implement vring_state_changed callback in Vhost example, it is >> now mandatory with async datapath, and it will help to have an example >> on what real applications should do. >> >> Thanks, >> Maxime >> >>> Thanks, >>> Jiayu >>> >