From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 57731A0350; Wed, 6 May 2020 09:53:36 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 346F71D704; Wed, 6 May 2020 09:53:36 +0200 (CEST) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by dpdk.org (Postfix) with ESMTP id 9AF6D1D6F9 for ; Wed, 6 May 2020 09:53:35 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1588751615; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=v2O6sNlqH+CzlmGdssWB4Ui8gnt5jp6RYeadMlst8DY=; b=XobxHllqYvZqAyH0jU0vsj3DlImehRG7NP5lb/ozwtMD+GYSTn9U67VVVC7WDt+DgcXRd2 Ng5I7V1yOS6K02HkyDJjkCrVR+rE6x7ESMp7c2magaeMZD75EzMFbxV/AQJtis05TZln12 O0wasKijy1zNSLnNkMeRMS9ieyyWwkE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-446-BsoLq058MZyVYZCsaJjo5A-1; Wed, 06 May 2020 03:53:31 -0400 X-MC-Unique: BsoLq058MZyVYZCsaJjo5A-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 35315835B42; Wed, 6 May 2020 07:53:30 +0000 (UTC) Received: from [10.36.110.14] (unknown [10.36.110.14]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 905DC70549; Wed, 6 May 2020 07:53:27 +0000 (UTC) To: =?UTF-8?B?546L5b+X5YWL?= Cc: Zhike Wang , dev@openvswitch.org, dev@dpdk.org, stable@dpdk.org, xiaolong.ye@intel.com, Ilya Maximets References: <1584007039-12437-1-git-send-email-wangzhike@jd.com> <178ab8fa.363e.170ebb22d94.Coremail.wangzk320@163.com> <22c7f18d.24fb.171e7daa7e6.Coremail.wangzk320@163.com> From: Maxime Coquelin Autocrypt: addr=maxime.coquelin@redhat.com; keydata= mQINBFOEQQIBEADjNLYZZqghYuWv1nlLisptPJp+TSxE/KuP7x47e1Gr5/oMDJ1OKNG8rlNg kLgBQUki3voWhUbMb69ybqdMUHOl21DGCj0BTU3lXwapYXOAnsh8q6RRM+deUpasyT+Jvf3a gU35dgZcomRh5HPmKMU4KfeA38cVUebsFec1HuJAWzOb/UdtQkYyZR4rbzw8SbsOemtMtwOx YdXodneQD7KuRU9IhJKiEfipwqk2pufm2VSGl570l5ANyWMA/XADNhcEXhpkZ1Iwj3TWO7XR uH4xfvPl8nBsLo/EbEI7fbuUULcAnHfowQslPUm6/yaGv6cT5160SPXT1t8U9QDO6aTSo59N jH519JS8oeKZB1n1eLDslCfBpIpWkW8ZElGkOGWAN0vmpLfdyiqBNNyS3eGAfMkJ6b1A24un /TKc6j2QxM0QK4yZGfAxDxtvDv9LFXec8ENJYsbiR6WHRHq7wXl/n8guyh5AuBNQ3LIK44x0 KjGXP1FJkUhUuruGyZsMrDLBRHYi+hhDAgRjqHgoXi5XGETA1PAiNBNnQwMf5aubt+mE2Q5r qLNTgwSo2dpTU3+mJ3y3KlsIfoaxYI7XNsPRXGnZi4hbxmeb2NSXgdCXhX3nELUNYm4ArKBP LugOIT/zRwk0H0+RVwL2zHdMO1Tht1UOFGfOZpvuBF60jhMzbQARAQABtCxNYXhpbWUgQ29x dWVsaW4gPG1heGltZS5jb3F1ZWxpbkByZWRoYXQuY29tPokCOAQTAQIAIgUCV3u/5QIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQyjiNKEaHD4ma2g/+P+Hg9WkONPaY1J4AR7Uf kBneosS4NO3CRy0x4WYmUSLYMLx1I3VH6SVjqZ6uBoYy6Fs6TbF6SHNc7QbB6Qjo3neqnQR1 71Ua1MFvIob8vUEl3jAR/+oaE1UJKrxjWztpppQTukIk4oJOmXbL0nj3d8dA2QgHdTyttZ1H xzZJWWz6vqxCrUqHU7RSH9iWg9R2iuTzii4/vk1oi4Qz7y/q8ONOq6ffOy/t5xSZOMtZCspu Mll2Szzpc/trFO0pLH4LZZfz/nXh2uuUbk8qRIJBIjZH3ZQfACffgfNefLe2PxMqJZ8mFJXc RQO0ONZvwoOoHL6CcnFZp2i0P5ddduzwPdGsPq1bnIXnZqJSl3dUfh3xG5ArkliZ/++zGF1O wvpGvpIuOgLqjyCNNRoR7cP7y8F24gWE/HqJBXs1qzdj/5Hr68NVPV1Tu/l2D1KMOcL5sOrz 2jLXauqDWn1Okk9hkXAP7+0Cmi6QwAPuBT3i6t2e8UdtMtCE4sLesWS/XohnSFFscZR6Vaf3 gKdWiJ/fW64L6b9gjkWtHd4jAJBAIAx1JM6xcA1xMbAFsD8gA2oDBWogHGYcScY/4riDNKXi lw92d6IEHnSf6y7KJCKq8F+Jrj2BwRJiFKTJ6ChbOpyyR6nGTckzsLgday2KxBIyuh4w+hMq TGDSp2rmWGJjASq5Ag0EVPSbkwEQAMkaNc084Qvql+XW+wcUIY+Dn9A2D1gMr2BVwdSfVDN7 0ZYxo9PvSkzh6eQmnZNQtl8WSHl3VG3IEDQzsMQ2ftZn2sxjcCadexrQQv3Lu60Tgj7YVYRM H+fLYt9W5YuWduJ+FPLbjIKynBf6JCRMWr75QAOhhhaI0tsie3eDsKQBA0w7WCuPiZiheJaL 4MDe9hcH4rM3ybnRW7K2dLszWNhHVoYSFlZGYh+MGpuODeQKDS035+4H2rEWgg+iaOwqD7bg CQXwTZ1kSrm8NxIRVD3MBtzp9SZdUHLfmBl/tLVwDSZvHZhhvJHC6Lj6VL4jPXF5K2+Nn/Su CQmEBisOmwnXZhhu8ulAZ7S2tcl94DCo60ReheDoPBU8PR2TLg8rS5f9w6mLYarvQWL7cDtT d2eX3Z6TggfNINr/RTFrrAd7NHl5h3OnlXj7PQ1f0kfufduOeCQddJN4gsQfxo/qvWVB7PaE 1WTIggPmWS+Xxijk7xG6x9McTdmGhYaPZBpAxewK8ypl5+yubVsE9yOOhKMVo9DoVCjh5To5 aph7CQWfQsV7cd9PfSJjI2lXI0dhEXhQ7lRCFpf3V3mD6CyrhpcJpV6XVGjxJvGUale7+IOp sQIbPKUHpB2F+ZUPWds9yyVxGwDxD8WLqKKy0WLIjkkSsOb9UBNzgRyzrEC9lgQ/ABEBAAGJ Ah8EGAECAAkFAlT0m5MCGwwACgkQyjiNKEaHD4nU8hAAtt0xFJAy0sOWqSmyxTc7FUcX+pbD KVyPlpl6urKKMk1XtVMUPuae/+UwvIt0urk1mXi6DnrAN50TmQqvdjcPTQ6uoZ8zjgGeASZg jj0/bJGhgUr9U7oG7Hh2F8vzpOqZrdd65MRkxmc7bWj1k81tOU2woR/Gy8xLzi0k0KUa8ueB iYOcZcIGTcs9CssVwQjYaXRoeT65LJnTxYZif2pfNxfINFzCGw42s3EtZFteczClKcVSJ1+L +QUY/J24x0/ocQX/M1PwtZbB4c/2Pg/t5FS+s6UB1Ce08xsJDcwyOPIH6O3tccZuriHgvqKP yKz/Ble76+NFlTK1mpUlfM7PVhD5XzrDUEHWRTeTJSvJ8TIPL4uyfzhjHhlkCU0mw7Pscyxn DE8G0UYMEaNgaZap8dcGMYH/96EfE5s/nTX0M6MXV0yots7U2BDb4soLCxLOJz4tAFDtNFtA wLBhXRSvWhdBJZiig/9CG3dXmKfi2H+wdUCSvEFHRpgo7GK8/Kh3vGhgKmnnxhl8ACBaGy9n fxjSxjSO6rj4/MeenmlJw1yebzkX8ZmaSi8BHe+n6jTGEFNrbiOdWpJgc5yHIZZnwXaW54QT UhhSjDL1rV2B4F28w30jYmlRmm2RdN7iCZfbyP3dvFQTzQ4ySquuPkIGcOOHrvZzxbRjzMx1 Mwqu3GQ= Message-ID: <6996f01c-84db-c721-3792-a49a61f31a8f@redhat.com> Date: Wed, 6 May 2020 09:53:25 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <22c7f18d.24fb.171e7daa7e6.Coremail.wangzk320@163.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [ovs-dev] [PATCH] vhost: return -EAGAIN during unregistering vhost if it is busy. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Zhike, On 5/6/20 4:39 AM, =E7=8E=8B=E5=BF=97=E5=85=8B wrote: > NO, it is different issue. >=20 > The current=C2=A0 deadlock=C2=A0 mentioned in this patch is caused by som= e > blocking function (like=C2=A0ovsrcu_synchronize) in application (like OVS= ). > In this patch, the application is needed=C2=A0 to break=C2=A0 the logical= deadlock.=C2=A0 Ok, so we need either a non-blocking rte_vhost_driver_unregister() in DPDK, or a non-blocking .destroy_device() in OVS. Thanks for the clarification, I misread your original commit message. The problem with your patch is that you break the DPDK ABI, as current unregistering API is supposed to be blocking. Meaning, that could not be backported to older stable branches. What we can do on DPDK side is to provide a new non-blocking API for unregistering, or do Any chance there is a way for OVS to prevent that? Thanks, Maxime >=20 >=20 >=20 >=20 >=20 >=20 > At 2020-04-27 16:09:31, "Maxime Coquelin" wr= ote: >> >> >>On 3/18/20 4:31 AM, =E7=8E=8B=E5=BF=97=E5=85=8B wrote: >>> Involve openvswitch group since this fix is highly coupled with OVS. >>> welcome comment. >>> At 2020-03-12 17:57:19, "Zhike Wang" wrote: >>>> The vhost_user_read_cb() and rte_vhost_driver_unregister() >>>> can be called at the same time by 2 threads, and may lead to deadlock. >>>> Eg thread1 calls vhost_user_read_cb()->vhost_user_get_vring_base()->de= stroy_device(), >>>> then thread2 calls rte_vhost_driver_unregister(), and will retry the f= dset_try_del() in loop. >>>> >>>> Some application implements destroy_device() as a blocking function, e= g >>>> OVS calls ovsrcu_synchronize() insides destroy_device(). As a result, >>>> thread1(eg vhost_events) is blocked to wait quiesce of thread2(eg ovs-= vswitchd), >>>> and thread2 is in a loop to wait thread1 to give up the use of the vho= st fd, >>>> then leads to deadlock. >>>> >>>> It is better to return -EAGAIN to application, who will decide how to = handle >>>> (eg OVS can call ovsrcu_quiesce() and then retry). >>>> >>>> Signed-off-by: Zhike Wang >>>> --- >>>> lib/librte_vhost/rte_vhost.h | 4 +++- >>>> lib/librte_vhost/socket.c | 8 ++++---- >>>> 2 files changed, 7 insertions(+), 5 deletions(-) >> >> >>Isn't it fixed with below commit that landed into DPDK v20.02? >> >>commit 5efb18e85f7fdb436d3e56591656051c16802066 >>Author: Maxime Coquelin >>Date: Tue Jan 14 19:53:57 2020 +0100 >> >> vhost: fix deadlock on port deletion >> >> If the vhost-user application (e.g. OVS) deletes the vhost-user >> port while Qemu sends a vhost-user request, a deadlock can >> happen if the request handler tries to acquire vhost-user's >> global mutex, which is also locked by the vhost-user port >> deletion API (rte_vhost_driver_unregister). >> >> This patch prevents the deadlock by making >> rte_vhost_driver_unregister() to release the mutex and try >> again if a request is being handled to give a chance to >> the request handler to complete. >> >> Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode"= ) >> Fixes: 5fbb3941da9f ("vhost: introduce driver features related APIs") >> Cc: stable@dpdk.org >> >> Signed-off-by: Maxime Coquelin >> Reviewed-by: Tiwei Bie >> Acked-by: Eelco Chaudron >> >>>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost= .h >>>> index c7b619a..276db11 100644 >>>> --- a/lib/librte_vhost/rte_vhost.h >>>> +++ b/lib/librte_vhost/rte_vhost.h >>>> @@ -389,7 +389,9 @@ void rte_vhost_log_used_vring(int vid, uint16_t vr= ing_idx, >>>> */ >>>> int rte_vhost_driver_register(const char *path, uint64_t flags); >>>> >>>> -/* Unregister vhost driver. This is only meaningful to vhost user. */ >>>> +/* Unregister vhost driver. This is only meaningful to vhost user. >>>> + * Return -EAGAIN if device is busy, and leave it to be handled by ap= plication. >>>> + */ >>>> int rte_vhost_driver_unregister(const char *path); >>>> >>>> /** >>>> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c >>>> index 7c80121..a75a3f6 100644 >>>> --- a/lib/librte_vhost/socket.c >>>> +++ b/lib/librte_vhost/socket.c >>>> @@ -1027,7 +1027,8 @@ struct vhost_user_reconnect_list { >>>> } >>>> >>>> /** >>>> - * Unregister the specified vhost socket >>>> + * Unregister the specified vhost socket. >>>> + * Return -EAGAIN if device is busy, and leave it to be handled by ap= plication. >>>> */ >>>> int >>>> rte_vhost_driver_unregister(const char *path) >>>> @@ -1039,7 +1040,6 @@ struct vhost_user_reconnect_list { >>>> =09if (path =3D=3D NULL) >>>> =09=09return -1; >>>> >>>> -again: >>>> =09pthread_mutex_lock(&vhost_user.mutex); >>>> >>>> =09for (i =3D 0; i < vhost_user.vsocket_cnt; i++) { >>>> @@ -1063,7 +1063,7 @@ struct vhost_user_reconnect_list { >>>> =09=09=09=09=09pthread_mutex_unlock( >>>> =09=09=09=09=09=09=09&vsocket->conn_mutex); >>>> =09=09=09=09=09pthread_mutex_unlock(&vhost_user.mutex); >>>> -=09=09=09=09=09goto again; >>>> +=09=09=09=09=09return -EAGAIN; >>>> =09=09=09=09} >>>> >>>> =09=09=09=09VHOST_LOG_CONFIG(INFO, >>>> @@ -1085,7 +1085,7 @@ struct vhost_user_reconnect_list { >>>> =09=09=09=09if (fdset_try_del(&vhost_user.fdset, >>>> =09=09=09=09=09=09vsocket->socket_fd) =3D=3D -1) { >>>> =09=09=09=09=09pthread_mutex_unlock(&vhost_user.mutex); >>>> -=09=09=09=09=09goto again; >>>> +=09=09=09=09=09return -EAGAIN; >>>> =09=09=09=09} >>>> >>>> =09=09=09=09close(vsocket->socket_fd); >>>> --=20 >>>> 1.8.3.1 >>>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>=20 >=20 >=20 >=20 > =C2=A0 >=20