From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id C75B53777 for ; Thu, 21 Jul 2016 12:09:04 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP; 21 Jul 2016 03:09:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,398,1464678000"; d="scan'208";a="737686252" Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.67.162]) by FMSMGA003.fm.intel.com with ESMTP; 21 Jul 2016 03:09:04 -0700 Date: Thu, 21 Jul 2016 18:13:11 +0800 From: Yuanhan Liu To: Ilya Maximets Cc: dev@dpdk.org, Huawei Xie , Dyasly Sergey , Heetae Ahn , Thomas Monjalon Message-ID: <20160721101311.GE28708@yliu-dev.sh.intel.com> References: <1469089275-15209-1-git-send-email-i.maximets@samsung.com> <20160721093714.GD28708@yliu-dev.sh.intel.com> <579099BC.9050603@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <579099BC.9050603@samsung.com> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH] vhost: fix connect hang in client mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2016 10:09:05 -0000 On Thu, Jul 21, 2016 at 12:45:32PM +0300, Ilya Maximets wrote: > On 21.07.2016 12:37, Yuanhan Liu wrote: > > On Thu, Jul 21, 2016 at 11:21:15AM +0300, Ilya Maximets wrote: > >> If something abnormal happened to QEMU, 'connect()' can block calling > >> thread (e.g. main thread of OVS) forever or for a really long time. > >> This can break whole application or block the reconnection thread. > >> > >> Example with OVS: > >> > >> ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce > >> (gdb) bt > >> #0 connect () from /lib64/libpthread.so.0 > >> #1 vhost_user_create_client (vsocket=0xa816e0) > >> #2 rte_vhost_driver_register > >> #3 netdev_dpdk_vhost_user_construct > >> #4 netdev_open (name=0xa664b0 "vhost1") > >> [...] > >> #11 main > >> > >> Fix that by setting non-blocking mode for client sockets for connection. > >> > >> Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode") > > > > Thanks for spotting and fixing yet another bug! > > > >> > >> +static int > >> +vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz) > > > > I don't quite understand why this is needed: connect() with O_NONBLOCK > > flag set is not enough? > > There is a little issue with non-blocking connect() call. Connection > establishing may be started but '-1' returned with 'errno = EINPROGRESS'. > In this case we must wait on fd until it will be available for writing. > After that we need to check current status of connection using getsockopt(). > > I don't sure that we're able to get such situation, but it's documented, > and, I think, we should handle it. > > See 'man connect' for details. I see. Thanks. But basically, I don't like the way of introduing yet another fdset here. I'm wondering we could leverage current fdset code to achieve that. This might need some work though. So how about making it simple and stupid at this stage: sleep a while (maybe 1ms, or maybe 1s) when that happens, and give up when the connection is still not established? --yliu