From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sessmg22.ericsson.net (sessmg22.ericsson.net [193.180.251.58]) by dpdk.org (Postfix) with ESMTP id A860B28F2 for ; Tue, 5 Apr 2016 10:40:26 +0200 (CEST) X-AuditID: c1b4fb3a-f79d86d000005b69-02-570379f9c376 Received: from ESESSHC021.ericsson.se (Unknown_Domain [153.88.183.81]) by sessmg22.ericsson.net (Symantec Mail Security) with SMTP id 71.74.23401.9F973075; Tue, 5 Apr 2016 10:40:26 +0200 (CEST) Received: from [147.214.49.111] (153.88.183.153) by smtp.internal.ericsson.com (153.88.183.83) with Microsoft SMTP Server id 14.3.248.2; Tue, 5 Apr 2016 10:40:25 +0200 To: "Xie, Huawei" , "dev@dpdk.org" , Thomas Monjalon , "Ananyev, Konstantin" , Yuanhan Liu References: <1458292380-9258-1-git-send-email-patrik.r.andersson@ericsson.com> From: Patrik Andersson R Organization: Ericsson AB Message-ID: <570379F9.6020306@ericsson.com> Date: Tue, 5 Apr 2016 10:40:25 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsUyM2J7oO6vSuZwg0nv2S3efdrOZNE+8yyT xfs/i1gsvmyazmZxfcIFVgdWj4v9dxg9fi1YyuqxeM9LJo95JwMDWKK4bFJSczLLUov07RK4 MrY8+cNWsNOy4tqBc6wNjE91uxg5OSQETCSmHGlnhbDFJC7cW8/WxcjFISRwhFFi+p41UM5q Rol/DyYydzFycAgLeEm8umgM0iAicJNR4u+qGhBbSGAGo8Sfq94gNpuAlcS8bcuYQGx+AUmJ DQ27mUFsXgFtiRk3t7OCjGERUJFYdkYNJCwqECHxZO5JRogSQYmTM5+wgNicAiESy6a3gG1l FrCXeLC1DCTMLCAvsf3tHGaIrToSr868ZZvAKDgLSfcshI5ZSDoWMDKvYhQtTi0uzk03MtJL LcpMLi7Oz9PLSy3ZxAgM7INbflvtYDz43PEQowAHoxIPr4IMc7gQa2JZcWXuIUYJDmYlEd5V xUAh3pTEyqrUovz4otKc1OJDjNIcLErivDmR/8KEBNITS1KzU1MLUotgskwcnFINjD05v07q FzCe72lyuZiQrcwX/Yg5Kur/nnUim1OftS/YLzp3t/qFKSuCdTbos724uEq1UcvrYwDbSamp t+6t2bT3eQUnp9eR/ZXajd+zw++21oRKdHyPEuDRe+Tx13L/kWnbNWfzNt2ffV19Ie/Sq1nG p5nCO+N4lkxanCS8ZkeQvsvftaEbmJVYijMSDbWYi4oTAa8gWcVoAgAA Subject: Re: [dpdk-dev] [RFC] vhost user: add error handling for fd > 1023 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Apr 2016 08:40:26 -0000 Hi, thank you for the response on this. The described fault situation arises due to the fact that there is a bug in an OpenStack component, Neutron or Nova, that fails to release ports on VM deletion. This typically leads to an accumulation of 1-2 file descriptors per unreleased port. It could also arise when allocating a large number (~500?) of vhost user ports and connecting them all to VMs. Unfortunately I don't have a DPDK design test environment, thus I have not tried to reproduce the issue with DPDK only. But, I assume that by creating enough vhost user devices it can be triggered. File descriptors would be "consumed" by calls to rte_vhost_driver_register() and user_set_mem_table() both, presumably. The key point, I think, is that more than one file descriptor is used per vhost user device. This means that there is no real relation between the number of devices and the number of file descriptors in use. As for using select() or epoll(), I don't know the strength of the portability argument. It is possible that epoll() would exhibit some better properties, like simpler code in the polling loop, but I have yet seen too little of the total picture to venture an opinion. On 03/30/2016 11:05 AM, Xie, Huawei wrote: > On 3/18/2016 5:15 PM, Patrik Andersson wrote: >> Protect against DPDK crash when allocation of listen fd >= 1023. >> For events on fd:s >1023, the current implementation will trigger >> an abort due to access outside of allocated bit mask. >> >> Corrections would include: >> >> * Match fdset_add() signature in fd_man.c to fd_man.h >> * Handling of return codes from fdset_add() >> * Addition of check of fd number in fdset_add_fd() >> >> --- >> >> The rationale behind the suggested code change is that, >> fdset_event_dispatch() could attempt access outside of the FD_SET >> bitmask if there is an event on a file descriptor that in turn >> looks up a virtio file descriptor with a value > 1023. >> Such an attempt will lead to an abort() and a restart of any >> vswitch using DPDK. >> >> A discussion topic exist in the ovs-discuss mailing list that can >> provide a little more background: >> >> http://openvswitch.org/pipermail/discuss/2016-February/020243.html > Thanks for catching this. Could you give more details on how to > accumulating fds? > The buggy code is based on the fact that FD_SETSIZE limits the number of > file descriptors, which might be true on Windows. However from the > manual, it says clearly it limits the value of file descriptors. > The use of select is for portability. I have been wondering if it is > truly that important. Use poll could simplify the code a bit, for > example we need add timeout to select so that another thread could > insert/remove a fd into/from the monitored list. > > Any comments on using poll/epoll? > >> Signed-off-by: Patrik Andersson >> --- >> lib/librte_vhost/vhost_user/fd_man.c | 11 ++++++----- >> lib/librte_vhost/vhost_user/vhost-net-user.c | 22 ++++++++++++++++++++-- >> 2 files changed, 26 insertions(+), 7 deletions(-) >> >> diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c >> index 087aaed..c691339 100644 >> --- a/lib/librte_vhost/vhost_user/fd_man.c >> +++ b/lib/librte_vhost/vhost_user/fd_man.c >> @@ -71,20 +71,22 @@ fdset_find_free_slot(struct fdset *pfdset) >> return fdset_find_fd(pfdset, -1); >> } >> >> -static void >> +static int >> fdset_add_fd(struct fdset *pfdset, int idx, int fd, >> fd_cb rcb, fd_cb wcb, void *dat) >> { >> struct fdentry *pfdentry; >> >> - if (pfdset == NULL || idx >= MAX_FDS) > seems we had better change the definition of MAX_FDS and > MAX_VHOST_SERVER to FD_SETSIZE or add some build time check. I'm not sure how build time checks would help, in my build the MAX_FDS == FD_SETSIZE == 1024. Here it is not actually possible to support 1024 vhost user devices, because of the file descriptor limitation. In my opinion the problem is that the assumption: number of vhost user device == number of file descriptors does not hold. What the actual relation might be hard to determine with any certainty. Use of epoll() instead of select() might relax checking to the number of vhost user devices only. In addition the return value of the fdset_add_fd() should be checked (note: in the corresponding include file the function signature is "static int", not "static void" as in the code). > >> - return; >> + if (pfdset == NULL || idx >= MAX_FDS || fd >= FD_SETSIZE) >> + return -1; >> >> pfdentry = &pfdset->fd[idx]; >> pfdentry->fd = fd; >> pfdentry->rcb = rcb; >> pfdentry->wcb = wcb; >> pfdentry->dat = dat; >> + >> + return 0; >> } >> >> /** >> @@ -150,12 +152,11 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) >> >> /* Find a free slot in the list. */ >> i = fdset_find_free_slot(pfdset); >> - if (i == -1) { >> + if (i == -1 || fdset_add_fd(pfdset, i, fd, rcb, wcb, dat) < 0) { >> pthread_mutex_unlock(&pfdset->fd_mutex); >> return -2; >> } >> >> - fdset_add_fd(pfdset, i, fd, rcb, wcb, dat); >> pfdset->num++; >> >> pthread_mutex_unlock(&pfdset->fd_mutex); >> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c >> index df2bd64..778851d 100644 >> --- a/lib/librte_vhost/vhost_user/vhost-net-user.c >> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c >> @@ -288,6 +288,7 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused int *remove) >> int fh; >> struct vhost_device_ctx vdev_ctx = { (pid_t)0, 0 }; >> unsigned int size; >> + int ret; >> >> conn_fd = accept(fd, NULL, NULL); >> RTE_LOG(INFO, VHOST_CONFIG, >> @@ -317,8 +318,15 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused int *remove) >> >> ctx->vserver = vserver; >> ctx->fh = fh; >> - fdset_add(&g_vhost_server.fdset, >> + ret = fdset_add(&g_vhost_server.fdset, >> conn_fd, vserver_message_handler, NULL, ctx); >> + if (ret < 0) { >> + free(ctx); >> + close(conn_fd); >> + RTE_LOG(ERR, VHOST_CONFIG, >> + "failed to add fd %d into vhost server fdset\n", >> + conn_fd); >> + } >> } >> >> /* callback when there is message on the connfd */ >> @@ -453,6 +461,7 @@ int >> rte_vhost_driver_register(const char *path) >> { >> struct vhost_server *vserver; >> + int ret; >> >> pthread_mutex_lock(&g_vhost_server.server_mutex); >> >> @@ -478,8 +487,17 @@ rte_vhost_driver_register(const char *path) >> >> vserver->path = strdup(path); >> >> - fdset_add(&g_vhost_server.fdset, vserver->listenfd, >> + ret = fdset_add(&g_vhost_server.fdset, vserver->listenfd, >> vserver_new_vq_conn, NULL, vserver); >> + if (ret < 0) { >> + pthread_mutex_unlock(&g_vhost_server.server_mutex); >> + RTE_LOG(ERR, VHOST_CONFIG, >> + "failed to add listen fd %d to vhost server fdset\n", >> + vserver->listenfd); >> + free(vserver->path); >> + free(vserver); >> + return -1; >> + } >> >> g_vhost_server.server[g_vhost_server.vserver_cnt++] = vserver; >> pthread_mutex_unlock(&g_vhost_server.server_mutex);