From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id B2C7437AC for ; Thu, 21 Jul 2016 13:36:09 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP; 21 Jul 2016 04:36:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,399,1464678000"; d="scan'208";a="143149795" Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.67.162]) by fmsmga004.fm.intel.com with ESMTP; 21 Jul 2016 04:36:07 -0700 Date: Thu, 21 Jul 2016 19:40:16 +0800 From: Yuanhan Liu To: Ilya Maximets Cc: dev@dpdk.org, Huawei Xie , Dyasly Sergey , Heetae Ahn , Thomas Monjalon Message-ID: <20160721114016.GF28708@yliu-dev.sh.intel.com> References: <1469089275-15209-1-git-send-email-i.maximets@samsung.com> <20160721093714.GD28708@yliu-dev.sh.intel.com> <579099BC.9050603@samsung.com> <20160721101311.GE28708@yliu-dev.sh.intel.com> <5790A5D4.1090703@samsung.com> <5790AEB3.2010708@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5790AEB3.2010708@samsung.com> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH] vhost: fix connect hang in client mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2016 11:36:10 -0000 On Thu, Jul 21, 2016 at 02:14:59PM +0300, Ilya Maximets wrote: > > Hmm, how about this fixup: > > ------------------------------------------------------------------------------ > > diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c > > index 8626d13..b0f45e6 100644 > > --- a/lib/librte_vhost/vhost_user/vhost-net-user.c > > +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c > > @@ -537,18 +537,7 @@ vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) > > errno = EINVAL; > > > > ret = connect(fd, un, sz); > > - if (ret == -1 && errno != EINPROGRESS) > > - return -1; > > - if (ret == 0) > > - goto connected; > > - > > - FD_ZERO(&fdset); > > - FD_SET(fd, &fdset); > > - > > - ret = select(fd + 1, NULL, &fdset, NULL, &tv); > > - if (!ret) > > - errno = ETIMEDOUT; > > - if (ret != 1) > > + if (ret < 0 && errno != EISCONN) > > return -1; > > > > ret = getsockopt(fd, SOL_SOCKET, SO_ERROR, &so_error, &len); > > @@ -558,7 +547,6 @@ vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) > > return -1; > > } > > > > -connected: > > flags = fcntl(fd, F_GETFL, 0); > > if (flags < 0) { > > RTE_LOG(ERR, VHOST_CONFIG, > > ------------------------------------------------------------------------------ > > ? > > > > We will not check the EINPROGRESS, but subsequent 'connect()' will return > > EISCONN if connection already established. getsockopt() is kept just in > > case. Subsequent 'connect()' will happen on the next iteration of > > reconnection cycle (1 second sleep). > > I've sent v2 with this changes. Thanks. But still, it doesn't look clean to me. I was thinking following might be cleaner? diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user. index f0f92f8..c0ef290 100644 --- a/lib/librte_vhost/vhost_user/vhost-net-user.c +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c @@ -532,6 +532,10 @@ vhost_user_client_reconnect(void *arg __rte_unused) reconn != NULL; reconn = next) { next = TAILQ_NEXT(reconn, next); + if (reconn->conn_inprogress) { + /* do connect check here */ + } + if (connect(reconn->fd, (struct sockaddr *)&reconn->un, sizeof(reconn->un)) < 0) continue; @@ -605,6 +609,7 @@ vhost_user_create_client(struct vhost_user_socket *vsocket) reconn->un = un; reconn->fd = fd; reconn->vsocket = vsocket; + reconn->conn_inprogress = errno == EINPROGRESS; pthread_mutex_lock(&reconn_list.mutex); TAILQ_INSERT_TAIL(&reconn_list.head, reconn, next); pthread_mutex_unlock(&reconn_list.mutex); It's just a rough diff, hopefully it shows my idea clearly. And of course, we should not call connect() anymore when conn_inprogress is set. What do you think of it? --yliu