From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by dpdk.org (Postfix) with ESMTP id 685641B30C for ; Thu, 18 Jan 2018 15:03:36 +0100 (CET) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id A18BE20AC6; Thu, 18 Jan 2018 09:03:35 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Thu, 18 Jan 2018 09:03:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org; h=cc:content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=f4RK+FX/zGG9tH/AgrohH3WG1e8rzHUfQ5x77OpDbiM=; b=maTfnFJP qCbBha7jcmPZfE0kb4IxkZtdVIB6suZ0cGi1pt7EoZ/cylyiLPyRFnwES1Jq5a5j KZ219/9toqBFgEb397IAn/4MtU0oe9qfHFwa5a4WuaDREzI2HiXmtWeQWKNWUqNw PGSWKdC/swQ6oxfvwynT+9HH7JngAb8eheRqX06aNQ6ATTRQwMmU9+MowcW1zy89 6GWS+ziTJhXfmUC8O8kf6UWMJD3VnBtvq4ejlobUr7pI+QvE0glYfRryEVlqGwY5 eCLyMWYY1c2gYekEMKKWU6z/6MZQlCl+QhqTQfonFGQxs9TcosW2f9gpDPj37jRY 1Da0pdr0CGz6Bw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=f4RK+FX/zGG9tH/AgrohH3WG1e8rz HUfQ5x77OpDbiM=; b=a9uL2neuAa74ag2nGCyuRgvMYE6zxLEgfHLgkhyiyb19h i+JM0Lh11TbF2ZZFG82iOARGtofNnT0PxDhlol9pibPYhF+ZRQCckXAl+JNnnbu5 KpzPzHC5g3CLk+bg8Z+r/mkeWVT8AnxbFYROaj2ygpDrOty254QIPCHfiKSUjGKi Xa1qLUH2I4EA7gfbF+7DZgTQlRFi4frQs/TPFIcb40f7HkS7eCzzpa0Y8smympCR WFzIRJ45Npl+4p+FPC1daE86MOCrsE4hMH9NZLvJUpStGFXFVz+9uCrIAMTaOGm5 o0qLBw299mqom+tBN2eJ6xOhEE2X20Gfgh5k2FdvQ== X-ME-Sender: Received: from yliu-mob (unknown [115.148.90.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 464437E3D7; Thu, 18 Jan 2018 09:03:33 -0500 (EST) Date: Thu, 18 Jan 2018 22:03:30 +0800 From: Yuanhan Liu To: zhike wang Cc: dev@dpdk.org Message-ID: <20180118140330.GI29540@yliu-mob> References: <1514887716-26343-1-git-send-email-wangzhike@jd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1514887716-26343-1-git-send-email-wangzhike@jd.com> User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jan 2018 14:03:36 -0000 Hi, Apologize for late review. On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote: > From: wang zhike > > v3: > * Fix duplicate variable name, which leads to unexpected memory write. > v2: > * Move fdset_del before conn destroy. > * Fix coding style. Note that we prefer to put the change logs after "---" below Signed-off-by, so that those change logs won't be tracked in the git log history. > This patch fixes below race condition: > 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex > ->fdset_del->loop to check fd.busy. > 2. another thread calls fdset_event_dispatch, and the busy flag is > changed AFTER handling on the fd, i.e, rcb(). However, the rcb, > such as vhost_user_read_cb() would try to retrieve the conn_mutex. > > So issue is that the 1st thread will loop check the flag while holding > the mutex, while the 2nd thread would be blocked by mutex and can not > change the flag. Then dead lock is observed. I then would change the title to "vhost: fix deadlock". I'm also keen to know how do you reproduce this issue with real-life APP (say ovs) and how easy it is for reproduce. > Signed-off-by: zhike wang Again, you need fix your git config file about your name. > --- > lib/librte_vhost/socket.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index 422da00..ea01327 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list { > struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; > > if (!strcmp(vsocket->path, path)) { > + int del_fds[MAX_FDS]; > + int num_of_fds = 0, fd_index; > + I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds), fd_idx". > if (vsocket->is_server) { > fdset_del(&vhost_user.fdset, vsocket->socket_fd); > close(vsocket->socket_fd); > @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list { > vhost_user_remove_reconnect(vsocket); > } > > + /* fdset_del() must be called without conn_mutex. */ > + pthread_mutex_lock(&vsocket->conn_mutex); > + for (conn = TAILQ_FIRST(&vsocket->conn_list); > + conn != NULL; > + conn = next) { > + next = TAILQ_NEXT(conn, next); > + > + del_fds[num_of_fds++] = conn->connfd; > + } > + pthread_mutex_unlock(&vsocket->conn_mutex); > + > + for (fd_index = 0; fd_index < num_of_fds; fd_index++) > + fdset_del(&vhost_user.fdset, del_fds[fd_index]); > + > pthread_mutex_lock(&vsocket->conn_mutex); > for (conn = TAILQ_FIRST(&vsocket->conn_list); > conn != NULL; > conn = next) { > next = TAILQ_NEXT(conn, next); > > - fdset_del(&vhost_user.fdset, conn->connfd); If you log the fd here and invoke fdset_del() and close() after the loop, you then could avoid one extra loop as you did above. --yliu > RTE_LOG(INFO, VHOST_CONFIG, > "free connfd = %d for device '%s'\n", > conn->connfd, path); > -- > 1.8.3.1