From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5C4B8424CA; Tue, 11 Jun 2024 13:29:52 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E1F7C4067C; Tue, 11 Jun 2024 13:29:51 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 9891E40263 for ; Tue, 11 Jun 2024 13:29:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718105389; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=AIYccIiWJq+6oUebbT4mh0u1cNixqgMcE17aA41SuGE=; b=Qs+GM1wTgraXZhtqXkH0ZxOUDuz6IpHyY2IgwkZwBRWsdefNu//cI7fnIomPeMPRHqpyQ9 CNbUye9En3MgIDLjFDNnTEqaqsc9lRdzt/eqcQl8tcsHReHNZtDgdx1LisZr/YVEMeiU8B +PVrd2oWn912dNzu8NrjhWbnqwOis7I= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-681-rOwquzXnP562vXDSmmmrWA-1; Tue, 11 Jun 2024 07:29:47 -0400 X-MC-Unique: rOwquzXnP562vXDSmmmrWA-1 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D7A8719560B2; Tue, 11 Jun 2024 11:29:46 +0000 (UTC) Received: from [10.39.208.15] (unknown [10.39.208.15]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E1E5619560B0; Tue, 11 Jun 2024 11:29:44 +0000 (UTC) Message-ID: Date: Tue, 11 Jun 2024 13:29:41 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 5/5] vhost: manage FD with epoll To: Chenbo Xia Cc: "dev@dpdk.org" , David Marchand References: <20240409114845.1336403-1-maxime.coquelin@redhat.com> <20240409114845.1336403-6-maxime.coquelin@redhat.com> From: Maxime Coquelin Autocrypt: addr=maxime.coquelin@redhat.com; keydata= xsFNBFOEQQIBEADjNLYZZqghYuWv1nlLisptPJp+TSxE/KuP7x47e1Gr5/oMDJ1OKNG8rlNg kLgBQUki3voWhUbMb69ybqdMUHOl21DGCj0BTU3lXwapYXOAnsh8q6RRM+deUpasyT+Jvf3a gU35dgZcomRh5HPmKMU4KfeA38cVUebsFec1HuJAWzOb/UdtQkYyZR4rbzw8SbsOemtMtwOx YdXodneQD7KuRU9IhJKiEfipwqk2pufm2VSGl570l5ANyWMA/XADNhcEXhpkZ1Iwj3TWO7XR uH4xfvPl8nBsLo/EbEI7fbuUULcAnHfowQslPUm6/yaGv6cT5160SPXT1t8U9QDO6aTSo59N jH519JS8oeKZB1n1eLDslCfBpIpWkW8ZElGkOGWAN0vmpLfdyiqBNNyS3eGAfMkJ6b1A24un /TKc6j2QxM0QK4yZGfAxDxtvDv9LFXec8ENJYsbiR6WHRHq7wXl/n8guyh5AuBNQ3LIK44x0 KjGXP1FJkUhUuruGyZsMrDLBRHYi+hhDAgRjqHgoXi5XGETA1PAiNBNnQwMf5aubt+mE2Q5r qLNTgwSo2dpTU3+mJ3y3KlsIfoaxYI7XNsPRXGnZi4hbxmeb2NSXgdCXhX3nELUNYm4ArKBP LugOIT/zRwk0H0+RVwL2zHdMO1Tht1UOFGfOZpvuBF60jhMzbQARAQABzSxNYXhpbWUgQ29x dWVsaW4gPG1heGltZS5jb3F1ZWxpbkByZWRoYXQuY29tPsLBeAQTAQIAIgUCV3u/5QIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQyjiNKEaHD4ma2g/+P+Hg9WkONPaY1J4AR7Uf kBneosS4NO3CRy0x4WYmUSLYMLx1I3VH6SVjqZ6uBoYy6Fs6TbF6SHNc7QbB6Qjo3neqnQR1 71Ua1MFvIob8vUEl3jAR/+oaE1UJKrxjWztpppQTukIk4oJOmXbL0nj3d8dA2QgHdTyttZ1H xzZJWWz6vqxCrUqHU7RSH9iWg9R2iuTzii4/vk1oi4Qz7y/q8ONOq6ffOy/t5xSZOMtZCspu Mll2Szzpc/trFO0pLH4LZZfz/nXh2uuUbk8qRIJBIjZH3ZQfACffgfNefLe2PxMqJZ8mFJXc RQO0ONZvwoOoHL6CcnFZp2i0P5ddduzwPdGsPq1bnIXnZqJSl3dUfh3xG5ArkliZ/++zGF1O wvpGvpIuOgLqjyCNNRoR7cP7y8F24gWE/HqJBXs1qzdj/5Hr68NVPV1Tu/l2D1KMOcL5sOrz 2jLXauqDWn1Okk9hkXAP7+0Cmi6QwAPuBT3i6t2e8UdtMtCE4sLesWS/XohnSFFscZR6Vaf3 gKdWiJ/fW64L6b9gjkWtHd4jAJBAIAx1JM6xcA1xMbAFsD8gA2oDBWogHGYcScY/4riDNKXi lw92d6IEHnSf6y7KJCKq8F+Jrj2BwRJiFKTJ6ChbOpyyR6nGTckzsLgday2KxBIyuh4w+hMq TGDSp2rmWGJjASrOwU0EVPSbkwEQAMkaNc084Qvql+XW+wcUIY+Dn9A2D1gMr2BVwdSfVDN7 0ZYxo9PvSkzh6eQmnZNQtl8WSHl3VG3IEDQzsMQ2ftZn2sxjcCadexrQQv3Lu60Tgj7YVYRM H+fLYt9W5YuWduJ+FPLbjIKynBf6JCRMWr75QAOhhhaI0tsie3eDsKQBA0w7WCuPiZiheJaL 4MDe9hcH4rM3ybnRW7K2dLszWNhHVoYSFlZGYh+MGpuODeQKDS035+4H2rEWgg+iaOwqD7bg CQXwTZ1kSrm8NxIRVD3MBtzp9SZdUHLfmBl/tLVwDSZvHZhhvJHC6Lj6VL4jPXF5K2+Nn/Su CQmEBisOmwnXZhhu8ulAZ7S2tcl94DCo60ReheDoPBU8PR2TLg8rS5f9w6mLYarvQWL7cDtT d2eX3Z6TggfNINr/RTFrrAd7NHl5h3OnlXj7PQ1f0kfufduOeCQddJN4gsQfxo/qvWVB7PaE 1WTIggPmWS+Xxijk7xG6x9McTdmGhYaPZBpAxewK8ypl5+yubVsE9yOOhKMVo9DoVCjh5To5 aph7CQWfQsV7cd9PfSJjI2lXI0dhEXhQ7lRCFpf3V3mD6CyrhpcJpV6XVGjxJvGUale7+IOp sQIbPKUHpB2F+ZUPWds9yyVxGwDxD8WLqKKy0WLIjkkSsOb9UBNzgRyzrEC9lgQ/ABEBAAHC wV8EGAECAAkFAlT0m5MCGwwACgkQyjiNKEaHD4nU8hAAtt0xFJAy0sOWqSmyxTc7FUcX+pbD KVyPlpl6urKKMk1XtVMUPuae/+UwvIt0urk1mXi6DnrAN50TmQqvdjcPTQ6uoZ8zjgGeASZg jj0/bJGhgUr9U7oG7Hh2F8vzpOqZrdd65MRkxmc7bWj1k81tOU2woR/Gy8xLzi0k0KUa8ueB iYOcZcIGTcs9CssVwQjYaXRoeT65LJnTxYZif2pfNxfINFzCGw42s3EtZFteczClKcVSJ1+L +QUY/J24x0/ocQX/M1PwtZbB4c/2Pg/t5FS+s6UB1Ce08xsJDcwyOPIH6O3tccZuriHgvqKP yKz/Ble76+NFlTK1mpUlfM7PVhD5XzrDUEHWRTeTJSvJ8TIPL4uyfzhjHhlkCU0mw7Pscyxn DE8G0UYMEaNgaZap8dcGMYH/96EfE5s/nTX0M6MXV0yots7U2BDb4soLCxLOJz4tAFDtNFtA wLBhXRSvWhdBJZiig/9CG3dXmKfi2H+wdUCSvEFHRpgo7GK8/Kh3vGhgKmnnxhl8ACBaGy9n fxjSxjSO6rj4/MeenmlJw1yebzkX8ZmaSi8BHe+n6jTGEFNrbiOdWpJgc5yHIZZnwXaW54QT UhhSjDL1rV2B4F28w30jYmlRmm2RdN7iCZfbyP3dvFQTzQ4ySquuPkIGcOOHrvZzxbRjzMx1 Mwqu3GQ= In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Chenbo, On 4/28/24 05:22, Chenbo Xia wrote: > Hi Maxime, > >> On Apr 9, 2024, at 19:48, Maxime Coquelin wrote: >> >> External email: Use caution opening links or attachments >> >> >> From: David Marchand >> >> Switch to epoll so that the concern over the poll() fd array >> is removed. >> Add a simple list of used entries and track the next free entry. >> >> epoll() is thread safe, we no more need a synchronization >> mechanism and so can remove the notification pipe. >> >> Signed-off-by: David Marchand >> Signed-off-by: Maxime Coquelin >> --- >> lib/vhost/fd_man.c | 399 ++++++++++++--------------------------------- >> lib/vhost/fd_man.h | 5 +- >> 2 files changed, 106 insertions(+), 298 deletions(-) >> >> diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c >> index 8b47c97d45..a4a2965da1 100644 >> --- a/lib/vhost/fd_man.c >> +++ b/lib/vhost/fd_man.c >> @@ -3,9 +3,9 @@ >> */ >> >> #include >> -#include >> #include >> #include >> +#include >> #include >> >> #include >> @@ -21,49 +21,34 @@ RTE_LOG_REGISTER_SUFFIX(vhost_fdset_logtype, fdset, INFO); >> #define VHOST_FDMAN_LOG(level, ...) \ >> RTE_LOG_LINE(level, VHOST_FDMAN, "" __VA_ARGS__) >> >> -#define FDPOLLERR (POLLERR | POLLHUP | POLLNVAL) >> - >> struct fdentry { >> int fd; /* -1 indicates this entry is empty */ >> fd_cb rcb; /* callback when this fd is readable. */ >> fd_cb wcb; /* callback when this fd is writeable.*/ >> void *dat; /* fd context */ >> int busy; /* whether this entry is being used in cb. */ >> + LIST_ENTRY(fdentry) next; >> }; >> >> struct fdset { >> char name[RTE_THREAD_NAME_SIZE]; >> - struct pollfd rwfds[MAX_FDS]; >> + int epfd; >> struct fdentry fd[MAX_FDS]; >> + LIST_HEAD(, fdentry) fdlist; >> + int next_free_idx; >> rte_thread_t tid; >> pthread_mutex_t fd_mutex; >> - pthread_mutex_t fd_polling_mutex; >> - int num; /* current fd number of this fdset */ >> - >> - union pipefds { >> - struct { >> - int pipefd[2]; >> - }; >> - struct { >> - int readfd; >> - int writefd; >> - }; >> - } u; >> - >> - pthread_mutex_t sync_mutex; >> - pthread_cond_t sync_cond; >> - bool sync; >> + > > Not sure this blank line is intended or not :) It isn't, removing it in next revision. > >> bool destroy; >> }; >> >> -static int fdset_add_no_sync(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat); >> -static uint32_t fdset_event_dispatch(void *arg); >> - >> #define MAX_FDSETS 8 >> >> static struct fdset *fdsets[MAX_FDSETS]; >> pthread_mutex_t fdsets_mutex = PTHREAD_MUTEX_INITIALIZER; >> >> +static uint32_t fdset_event_dispatch(void *arg); >> + >> static struct fdset * >> fdset_lookup(const char *name) >> { >> @@ -96,166 +81,6 @@ fdset_insert(struct fdset *fdset) >> return -1; >> } >> >> -static void >> -fdset_pipe_read_cb(int readfd, void *dat, >> - int *remove __rte_unused) >> -{ >> - char charbuf[16]; >> - struct fdset *fdset = dat; >> - int r = read(readfd, charbuf, sizeof(charbuf)); >> - /* >> - * Just an optimization, we don't care if read() failed >> - * so ignore explicitly its return value to make the >> - * compiler happy >> - */ >> - RTE_SET_USED(r); >> - >> - pthread_mutex_lock(&fdset->sync_mutex); >> - fdset->sync = true; >> - pthread_cond_broadcast(&fdset->sync_cond); >> - pthread_mutex_unlock(&fdset->sync_mutex); >> -} >> - >> -static void >> -fdset_pipe_uninit(struct fdset *fdset) >> -{ >> - fdset_del(fdset, fdset->u.readfd); >> - close(fdset->u.readfd); >> - fdset->u.readfd = -1; >> - close(fdset->u.writefd); >> - fdset->u.writefd = -1; >> -} >> - >> -static int >> -fdset_pipe_init(struct fdset *fdset) >> -{ >> - int ret; >> - >> - pthread_mutex_init(&fdset->sync_mutex, NULL); >> - pthread_cond_init(&fdset->sync_cond, NULL); >> - >> - if (pipe(fdset->u.pipefd) < 0) { >> - VHOST_FDMAN_LOG(ERR, >> - "failed to create pipe for vhost fdset"); >> - return -1; >> - } >> - >> - ret = fdset_add_no_sync(fdset, fdset->u.readfd, >> - fdset_pipe_read_cb, NULL, fdset); >> - if (ret < 0) { >> - VHOST_FDMAN_LOG(ERR, >> - "failed to add pipe readfd %d into vhost server fdset", >> - fdset->u.readfd); >> - >> - fdset_pipe_uninit(fdset); >> - return -1; >> - } >> - >> - return 0; >> -} >> - >> -static void >> -fdset_sync(struct fdset *fdset) >> -{ >> - int ret; >> - >> - pthread_mutex_lock(&fdset->sync_mutex); >> - >> - fdset->sync = false; >> - ret = write(fdset->u.writefd, "1", 1); >> - if (ret < 0) { >> - VHOST_FDMAN_LOG(ERR, >> - "Failed to write to notification pipe: %s", >> - strerror(errno)); >> - goto out_unlock; >> - } >> - >> - while (!fdset->sync) >> - pthread_cond_wait(&fdset->sync_cond, &fdset->sync_mutex); >> - >> -out_unlock: >> - pthread_mutex_unlock(&fdset->sync_mutex); >> -} >> - >> -static int >> -get_last_valid_idx(struct fdset *pfdset, int last_valid_idx) >> -{ >> - int i; >> - >> - for (i = last_valid_idx; i >= 0 && pfdset->fd[i].fd == -1; i--) >> - ; >> - >> - return i; >> -} >> - >> -static void >> -fdset_move(struct fdset *pfdset, int dst, int src) >> -{ >> - pfdset->fd[dst] = pfdset->fd[src]; >> - pfdset->rwfds[dst] = pfdset->rwfds[src]; >> -} >> - >> -static void >> -fdset_shrink_nolock(struct fdset *pfdset) >> -{ >> - int i; >> - int last_valid_idx = get_last_valid_idx(pfdset, pfdset->num - 1); >> - >> - for (i = 0; i < last_valid_idx; i++) { >> - if (pfdset->fd[i].fd != -1) >> - continue; >> - >> - fdset_move(pfdset, i, last_valid_idx); >> - last_valid_idx = get_last_valid_idx(pfdset, last_valid_idx - 1); >> - } >> - pfdset->num = last_valid_idx + 1; >> -} >> - >> -/* >> - * Find deleted fd entries and remove them >> - */ >> -static void >> -fdset_shrink(struct fdset *pfdset) >> -{ >> - pthread_mutex_lock(&pfdset->fd_mutex); >> - fdset_shrink_nolock(pfdset); >> - pthread_mutex_unlock(&pfdset->fd_mutex); >> -} >> - >> -/** >> - * Returns the index in the fdset for a given fd. >> - * @return >> - * index for the fd, or -1 if fd isn't in the fdset. >> - */ >> -static int >> -fdset_find_fd(struct fdset *pfdset, int fd) >> -{ >> - int i; >> - >> - for (i = 0; i < pfdset->num && pfdset->fd[i].fd != fd; i++) >> - ; >> - >> - return i == pfdset->num ? -1 : i; >> -} >> - >> -static void >> -fdset_add_fd(struct fdset *pfdset, int idx, int fd, >> - fd_cb rcb, fd_cb wcb, void *dat) >> -{ >> - struct fdentry *pfdentry = &pfdset->fd[idx]; >> - struct pollfd *pfd = &pfdset->rwfds[idx]; >> - >> - pfdentry->fd = fd; >> - pfdentry->rcb = rcb; >> - pfdentry->wcb = wcb; >> - pfdentry->dat = dat; >> - >> - pfd->fd = fd; >> - pfd->events = rcb ? POLLIN : 0; >> - pfd->events |= wcb ? POLLOUT : 0; >> - pfd->revents = 0; >> -} >> - >> struct fdset * >> fdset_init(const char *name) >> { >> @@ -284,16 +109,20 @@ fdset_init(const char *name) >> rte_strscpy(fdset->name, name, RTE_THREAD_NAME_SIZE); >> >> pthread_mutex_init(&fdset->fd_mutex, NULL); >> - pthread_mutex_init(&fdset->fd_polling_mutex, NULL); >> >> - for (i = 0; i < MAX_FDS; i++) { >> + for (i = 0; i < (int)RTE_DIM(fdset->fd); i++) { >> fdset->fd[i].fd = -1; >> fdset->fd[i].dat = NULL; >> } >> - fdset->num = 0; >> + LIST_INIT(&fdset->fdlist); >> >> - if (fdset_pipe_init(fdset)) { >> - VHOST_FDMAN_LOG(ERR, "Failed to init pipe for %s", name); >> + /* >> + * Any non-zero value would work (see man epoll_create), >> + * but pass MAX_FDS for consistency. >> + */ >> + fdset->epfd = epoll_create(MAX_FDS); >> + if (fdset->epfd < 0) { >> + VHOST_FDMAN_LOG(ERR, "failed to create epoll for %s fdset", name); > > failed -> Failed like other logs Ack. > >> goto err_free; >> } >> >> @@ -301,7 +130,7 @@ fdset_init(const char *name) >> fdset_event_dispatch, fdset)) { >> VHOST_FDMAN_LOG(ERR, "Failed to create %s event dispatch thread", >> fdset->name); >> - goto err_pipe; >> + goto err_epoll; >> } >> >> if (fdset_insert(fdset)) { >> @@ -315,10 +144,9 @@ fdset_init(const char *name) >> >> err_thread: >> fdset->destroy = true; >> - fdset_sync(fdset); >> rte_thread_join(fdset->tid, &val); >> -err_pipe: >> - fdset_pipe_uninit(fdset); >> +err_epoll: >> + close(fdset->epfd); >> err_free: >> rte_free(fdset); >> err_unlock: >> @@ -330,78 +158,99 @@ fdset_init(const char *name) >> /** >> * Register the fd in the fdset with read/write handler and context. >> */ >> -static int >> -fdset_add_no_sync(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) >> +int >> +fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) >> { >> - int i; >> + struct fdentry *pfdentry; >> + struct epoll_event ev; >> >> if (pfdset == NULL || fd == -1) >> return -1; >> >> pthread_mutex_lock(&pfdset->fd_mutex); >> - i = pfdset->num < MAX_FDS ? pfdset->num++ : -1; >> - if (i == -1) { >> - pthread_mutex_lock(&pfdset->fd_polling_mutex); >> - fdset_shrink_nolock(pfdset); >> - pthread_mutex_unlock(&pfdset->fd_polling_mutex); >> - i = pfdset->num < MAX_FDS ? pfdset->num++ : -1; >> - if (i == -1) { >> - pthread_mutex_unlock(&pfdset->fd_mutex); >> - return -2; >> - } >> + if (pfdset->next_free_idx >= (int)RTE_DIM(pfdset->fd)) { >> + pthread_mutex_unlock(&pfdset->fd_mutex); >> + return -2; >> } >> >> - fdset_add_fd(pfdset, i, fd, rcb, wcb, dat); >> + pfdentry = &pfdset->fd[pfdset->next_free_idx]; >> + pfdentry->fd = fd; >> + pfdentry->rcb = rcb; >> + pfdentry->wcb = wcb; >> + pfdentry->dat = dat; >> + >> + LIST_INSERT_HEAD(&pfdset->fdlist, pfdentry, next); >> + >> + /* Find next free slot */ >> + pfdset->next_free_idx++; >> + for (; pfdset->next_free_idx < (int)RTE_DIM(pfdset->fd); pfdset->next_free_idx++) { >> + if (pfdset->fd[pfdset->next_free_idx].fd != -1) >> + continue; >> + break; >> + } >> pthread_mutex_unlock(&pfdset->fd_mutex); >> >> + ev.events = EPOLLERR; >> + ev.events |= rcb ? EPOLLIN : 0; >> + ev.events |= wcb ? EPOLLOUT : 0; >> + ev.data.fd = fd; >> + >> + if (epoll_ctl(pfdset->epfd, EPOLL_CTL_ADD, fd, &ev) == -1) >> + VHOST_FDMAN_LOG(ERR, "could not add %d fd to %d epfd: %s", >> + fd, pfdset->epfd, strerror(errno)); > > Should not return 0 if this fails ? Right, in upcoming revision I'm improving the error handling with doing some refactoring for the pdset management. > >> + >> return 0; >> } >> >> -int >> -fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) >> +static struct fdentry * >> +fdset_find_entry_locked(struct fdset *pfdset, int fd) >> { >> - int ret; >> + struct fdentry *pfdentry; >> >> - ret = fdset_add_no_sync(pfdset, fd, rcb, wcb, dat); >> - if (ret < 0) >> - return ret; >> + LIST_FOREACH(pfdentry, &pfdset->fdlist, next) { >> + if (pfdentry->fd != fd) >> + continue; >> + return pfdentry; >> + } >> >> - fdset_sync(pfdset); >> + return NULL; >> +} >> >> - return 0; >> +static void >> +fdset_del_locked(struct fdset *pfdset, struct fdentry *pfdentry) >> +{ >> + int entry_idx; >> + >> + if (epoll_ctl(pfdset->epfd, EPOLL_CTL_DEL, pfdentry->fd, NULL) == -1) >> + VHOST_FDMAN_LOG(ERR, "could not remove %d fd from %d epfd: %s", >> + pfdentry->fd, pfdset->epfd, strerror(errno)); >> + >> + pfdentry->fd = -1; >> + pfdentry->rcb = pfdentry->wcb = NULL; >> + pfdentry->dat = NULL; >> + entry_idx = pfdentry - pfdset->fd; >> + if (entry_idx < pfdset->next_free_idx) >> + pfdset->next_free_idx = entry_idx; >> + LIST_REMOVE(pfdentry, next); >> } >> >> -/** >> - * Unregister the fd from the fdset. >> - * Returns context of a given fd or NULL. >> - */ >> -void * >> +void >> fdset_del(struct fdset *pfdset, int fd) >> { >> - int i; >> - void *dat = NULL; >> + struct fdentry *pfdentry; >> >> if (pfdset == NULL || fd == -1) >> - return NULL; >> + return; >> >> do { >> pthread_mutex_lock(&pfdset->fd_mutex); >> - >> - i = fdset_find_fd(pfdset, fd); >> - if (i != -1 && pfdset->fd[i].busy == 0) { >> - /* busy indicates r/wcb is executing! */ >> - dat = pfdset->fd[i].dat; >> - pfdset->fd[i].fd = -1; >> - pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL; >> - pfdset->fd[i].dat = NULL; >> - i = -1; >> + pfdentry = fdset_find_entry_locked(pfdset, fd); >> + if (pfdentry != NULL && pfdentry->busy == 0) { >> + fdset_del_locked(pfdset, pfdentry); >> + pfdentry = NULL; >> } >> pthread_mutex_unlock(&pfdset->fd_mutex); >> - } while (i != -1); >> - >> - fdset_sync(pfdset); >> - >> - return dat; >> + } while (pfdentry != NULL); >> } >> >> /** >> @@ -415,28 +264,22 @@ fdset_del(struct fdset *pfdset, int fd) >> int >> fdset_try_del(struct fdset *pfdset, int fd) >> { >> - int i; >> + struct fdentry *pfdentry; >> >> if (pfdset == NULL || fd == -1) >> return -2; >> >> pthread_mutex_lock(&pfdset->fd_mutex); >> - i = fdset_find_fd(pfdset, fd); >> - if (i != -1 && pfdset->fd[i].busy) { >> + pfdentry = fdset_find_entry_locked(pfdset, fd); >> + if (pfdentry != NULL && pfdentry->busy != 0) { >> pthread_mutex_unlock(&pfdset->fd_mutex); >> return -1; >> } >> >> - if (i != -1) { >> - pfdset->fd[i].fd = -1; >> - pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL; >> - pfdset->fd[i].dat = NULL; >> - } >> + if (pfdentry != NULL) >> + fdset_del_locked(pfdset, pfdentry); >> >> pthread_mutex_unlock(&pfdset->fd_mutex); >> - >> - fdset_sync(pfdset); >> - >> return 0; >> } >> >> @@ -453,53 +296,29 @@ static uint32_t >> fdset_event_dispatch(void *arg) >> { >> int i; >> - struct pollfd *pfd; >> - struct fdentry *pfdentry; >> fd_cb rcb, wcb; >> void *dat; >> int fd, numfds; >> int remove1, remove2; >> - int need_shrink; >> struct fdset *pfdset = arg; >> - int val; >> >> if (pfdset == NULL) >> return 0; >> >> while (1) { >> + struct epoll_event events[MAX_FDS]; >> + struct fdentry *pfdentry; >> >> - /* >> - * When poll is blocked, other threads might unregister >> - * listenfds from and register new listenfds into fdset. >> - * When poll returns, the entries for listenfds in the fdset >> - * might have been updated. It is ok if there is unwanted call >> - * for new listenfds. >> - */ >> - pthread_mutex_lock(&pfdset->fd_mutex); >> - numfds = pfdset->num; >> - pthread_mutex_unlock(&pfdset->fd_mutex); >> - >> - pthread_mutex_lock(&pfdset->fd_polling_mutex); >> - val = poll(pfdset->rwfds, numfds, 1000 /* millisecs */); >> - pthread_mutex_unlock(&pfdset->fd_polling_mutex); >> - if (val < 0) >> + numfds = epoll_wait(pfdset->epfd, events, RTE_DIM(events), 1000); >> + if (numfds < 0) >> continue; >> >> - need_shrink = 0; >> for (i = 0; i < numfds; i++) { >> pthread_mutex_lock(&pfdset->fd_mutex); >> >> - pfdentry = &pfdset->fd[i]; >> - fd = pfdentry->fd; >> - pfd = &pfdset->rwfds[i]; >> - >> - if (fd < 0) { >> - need_shrink = 1; >> - pthread_mutex_unlock(&pfdset->fd_mutex); >> - continue; >> - } >> - >> - if (!pfd->revents) { >> + fd = events[i].data.fd; >> + pfdentry = fdset_find_entry_locked(pfdset, fd); >> + if (pfdentry == NULL) { >> pthread_mutex_unlock(&pfdset->fd_mutex); >> continue; >> } >> @@ -513,9 +332,9 @@ fdset_event_dispatch(void *arg) >> >> pthread_mutex_unlock(&pfdset->fd_mutex); >> >> - if (rcb && pfd->revents & (POLLIN | FDPOLLERR)) >> + if (rcb && events[i].events & (EPOLLIN | EPOLLERR | EPOLLHUP)) >> rcb(fd, dat, &remove1); >> - if (wcb && pfd->revents & (POLLOUT | FDPOLLERR)) >> + if (wcb && events[i].events & (EPOLLOUT | EPOLLERR | EPOLLHUP)) >> wcb(fd, dat, &remove2); >> pfdentry->busy = 0; >> /* >> @@ -524,23 +343,13 @@ fdset_event_dispatch(void *arg) >> * directly. >> */ >> /* >> - * When we are to clean up the fd from fdset, >> - * because the fd is closed in the cb, >> - * the old fd val could be reused by when creates new >> - * listen fd in another thread, we couldn't call >> - * fdset_del. >> + * A concurrent fdset_del may have been waiting for the >> + * fdentry not to be busy, so we can't call >> + * fdset_del_locked(). >> */ >> - if (remove1 || remove2) { >> - pfdentry->fd = -1; >> - need_shrink = 1; >> - } >> + if (remove1 || remove2) >> + fdset_del(pfdset, fd); >> } >> - >> - if (need_shrink) >> - fdset_shrink(pfdset); >> - >> - if (pfdset->destroy) >> - break; > > I guess we want to keep the destroy logic Right, good catch! Adding it back. Thanks for the review, Maxime > Thanks, > Chenbo > >> } >> >> return 0; >> diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h >> index 079fa0155f..6398343a6a 100644 >> --- a/lib/vhost/fd_man.h >> +++ b/lib/vhost/fd_man.h >> @@ -6,7 +6,7 @@ >> #define _FD_MAN_H_ >> #include >> #include >> -#include >> +#include >> >> struct fdset; >> >> @@ -19,8 +19,7 @@ struct fdset *fdset_init(const char *name); >> int fdset_add(struct fdset *pfdset, int fd, >> fd_cb rcb, fd_cb wcb, void *dat); >> >> -void *fdset_del(struct fdset *pfdset, int fd); >> - >> +void fdset_del(struct fdset *pfdset, int fd); >> int fdset_try_del(struct fdset *pfdset, int fd); >> >> #endif >> -- >> 2.44.0 >> >