From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <yliu@fridaylinux.org>
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
 [66.111.4.25]) by dpdk.org (Postfix) with ESMTP id 685641B30C
 for <dev@dpdk.org>; Thu, 18 Jan 2018 15:03:36 +0100 (CET)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.nyi.internal (Postfix) with ESMTP id A18BE20AC6;
 Thu, 18 Jan 2018 09:03:35 -0500 (EST)
Received: from frontend1 ([10.202.2.160])
 by compute1.internal (MEProxy); Thu, 18 Jan 2018 09:03:35 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fridaylinux.org;
 h=cc:content-type:date:from:in-reply-to:message-id:mime-version
 :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=
 fm1; bh=f4RK+FX/zGG9tH/AgrohH3WG1e8rzHUfQ5x77OpDbiM=; b=maTfnFJP
 qCbBha7jcmPZfE0kb4IxkZtdVIB6suZ0cGi1pt7EoZ/cylyiLPyRFnwES1Jq5a5j
 KZ219/9toqBFgEb397IAn/4MtU0oe9qfHFwa5a4WuaDREzI2HiXmtWeQWKNWUqNw
 PGSWKdC/swQ6oxfvwynT+9HH7JngAb8eheRqX06aNQ6ATTRQwMmU9+MowcW1zy89
 6GWS+ziTJhXfmUC8O8kf6UWMJD3VnBtvq4ejlobUr7pI+QvE0glYfRryEVlqGwY5
 eCLyMWYY1c2gYekEMKKWU6z/6MZQlCl+QhqTQfonFGQxs9TcosW2f9gpDPj37jRY
 1Da0pdr0CGz6Bw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-type:date:from:in-reply-to
 :message-id:mime-version:references:subject:to:x-me-sender
 :x-me-sender:x-sasl-enc; s=fm1; bh=f4RK+FX/zGG9tH/AgrohH3WG1e8rz
 HUfQ5x77OpDbiM=; b=a9uL2neuAa74ag2nGCyuRgvMYE6zxLEgfHLgkhyiyb19h
 i+JM0Lh11TbF2ZZFG82iOARGtofNnT0PxDhlol9pibPYhF+ZRQCckXAl+JNnnbu5
 KpzPzHC5g3CLk+bg8Z+r/mkeWVT8AnxbFYROaj2ygpDrOty254QIPCHfiKSUjGKi
 Xa1qLUH2I4EA7gfbF+7DZgTQlRFi4frQs/TPFIcb40f7HkS7eCzzpa0Y8smympCR
 WFzIRJ45Npl+4p+FPC1daE86MOCrsE4hMH9NZLvJUpStGFXFVz+9uCrIAMTaOGm5
 o0qLBw299mqom+tBN2eJ6xOhEE2X20Gfgh5k2FdvQ==
X-ME-Sender: <xms:N6lgWqsmk43gKWNugt1LX9PiT3XoTHiC6XMg2ZSN5UjXTIZxg5EPvQ>
Received: from yliu-mob (unknown [115.148.90.22])
 by mail.messagingengine.com (Postfix) with ESMTPA id 464437E3D7;
 Thu, 18 Jan 2018 09:03:33 -0500 (EST)
Date: Thu, 18 Jan 2018 22:03:30 +0800
From: Yuanhan Liu <yliu@fridaylinux.org>
To: zhike wang <wangzhike@jd.com>
Cc: dev@dpdk.org
Message-ID: <20180118140330.GI29540@yliu-mob>
References: <1514887716-26343-1-git-send-email-wangzhike@jd.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1514887716-26343-1-git-send-email-wangzhike@jd.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Subject: Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of
 conn_mutex
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jan 2018 14:03:36 -0000

Hi,

Apologize for late review.

On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote:
> From: wang zhike <wangzhike@jd.com>
> 
> v3:
> * Fix duplicate variable name, which leads to unexpected memory write.
> v2:
> * Move fdset_del before conn destroy.
> * Fix coding style.

Note that we prefer to put the change logs after "---" below Signed-off-by,
so that those change logs won't be tracked in the git log history.

> This patch fixes below race condition:
> 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex
>    ->fdset_del->loop to check fd.busy.
> 2. another thread calls fdset_event_dispatch, and the busy flag is
>    changed AFTER handling on the fd, i.e, rcb(). However, the rcb,
>    such as vhost_user_read_cb() would try to retrieve the conn_mutex.
> 
> So issue is that the 1st thread will loop check the flag while holding
> the mutex, while the 2nd thread would be blocked by mutex and can not
> change the flag. Then dead lock is observed.

I then would change the title to "vhost: fix deadlock".

I'm also keen to know how do you reproduce this issue with real-life
APP (say ovs) and how easy it is for reproduce.

> Signed-off-by: zhike wang <wangzhike@jd.com>

Again, you need fix your git config file about your name.

> ---
>  lib/librte_vhost/socket.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 422da00..ea01327 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list {
>  		struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
>  
>  		if (!strcmp(vsocket->path, path)) {
> +			int del_fds[MAX_FDS];
> +			int num_of_fds = 0, fd_index;
> +

I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds),
fd_idx".

>  			if (vsocket->is_server) {
>  				fdset_del(&vhost_user.fdset, vsocket->socket_fd);
>  				close(vsocket->socket_fd);
> @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list {
>  				vhost_user_remove_reconnect(vsocket);
>  			}
>  
> +			/* fdset_del() must be called without conn_mutex. */
> +			pthread_mutex_lock(&vsocket->conn_mutex);
> +			for (conn = TAILQ_FIRST(&vsocket->conn_list);
> +			     conn != NULL;
> +			     conn = next) {
> +				next = TAILQ_NEXT(conn, next);
> +
> +				del_fds[num_of_fds++] = conn->connfd;
> +			}
> +			pthread_mutex_unlock(&vsocket->conn_mutex);
> +
> +			for (fd_index = 0; fd_index < num_of_fds; fd_index++)
> +				fdset_del(&vhost_user.fdset, del_fds[fd_index]);
> +
>  			pthread_mutex_lock(&vsocket->conn_mutex);
>  			for (conn = TAILQ_FIRST(&vsocket->conn_list);
>  			     conn != NULL;
>  			     conn = next) {
>  				next = TAILQ_NEXT(conn, next);
>  
> -				fdset_del(&vhost_user.fdset, conn->connfd);

If you log the fd here and invoke fdset_del() and close() after the loop,
you then could avoid one extra loop as you did above.

	--yliu
>  				RTE_LOG(INFO, VHOST_CONFIG,
>  					"free connfd = %d for device '%s'\n",
>  					conn->connfd, path);
> -- 
> 1.8.3.1