From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id AC8D643F31; Sun, 28 Apr 2024 05:22:17 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3417840265; Sun, 28 Apr 2024 05:22:17 +0200 (CEST) Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2084.outbound.protection.outlook.com [40.107.236.84]) by mails.dpdk.org (Postfix) with ESMTP id 60D7940263 for ; Sun, 28 Apr 2024 05:22:15 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hFI+VPhqnN/47bBYT+dUYHF4mQ1SPRKJKVJUDa2DW2emADdQhy2KjzBm5NeQ+UpURdryQE04xGBJpSFzQIxFBwj/7sXJ0mvR8FbQROodIanfMS+8N+JjZDodN2Sye0fvQ8J6sT2wOayppmihc72++GKSkkOHiJkkMibAfioBkyIh3vVOyJ1yW0qvr4COW7kokLjLszgzW2y3OfC/w6PpbtMRvpkDHzY9aDZlqGumlAzO+VJhxKWwJp9UE22tD3tLwPAwevqeTraokackJUW1Jva/qoIRjExnVABQL9Qo8lVJQBdX/hcdyAswFYm3kXWKQaszv1BB9n3RkOks/YrEhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=VrhzUXDy7Q6TqaAUXKSEtg0DBrWeMTNDt4bQQUQGKmE=; b=hI0JLmUupRFeaWhyqOQqJiAfgTjiWbC9EtybjrulYFBCOa5FXR/bQ7FTN8epg5KGbxSZi23t0WOQBharw0mRePn6LvOuHnWLS6y6eKIa/aboox0EzY17j8ULWqg8cp1yGyVIWAhc0TXXavx2MkRvj1qjJ0+v6iTEtjWvtlS7qE4iXRvlC13cQkbYkET5fXu8rtgAwItM22x2gimvFpzWoaVxGuLHW6aci3qWxS5dBQHbjEuKecwBf2dc9n+euAYZQaW7oBQGBod710E8hBLo0tKGVn9TQ4x1G9QvoDo4B0FK4ANkAiurY+k4oCPHRPUrQPHPO+mJIuL+qCQYNi6d3A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VrhzUXDy7Q6TqaAUXKSEtg0DBrWeMTNDt4bQQUQGKmE=; b=mWTKATgF8jwwDMvCVoWvidXP6ukxk00i26txjoFWcK6247Vl+bGsSWUGm4LZsLdV2T8GOwTAP4QlVAg5fqxYMIP+4sZVXUD9dr9HRsNr8Y8y0FwMRvC//n3/0QShOKRw7Y+GNamDtDL0lkKZ8yOFYDhOvh5w/TjyvoUpvU1Kr7Qg1WhSciqMGekdyZfkPLEGOcTqDaj4lDAQqB1MMVuivlOkbQrlduKmH1wDveP53TvPBHE4kaTHaoAdVWmCC6PWJh4grhtHHhX/YeoEYqibdGdZgv4pFDBhzG1yo35swmKNcyTOYxwPXlVKIiYm+ui2vx1XijJAyBzm9AcZaidVGw== Received: from MN0PR12MB6341.namprd12.prod.outlook.com (2603:10b6:208:3c2::13) by PH7PR12MB7355.namprd12.prod.outlook.com (2603:10b6:510:20e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7519.32; Sun, 28 Apr 2024 03:22:10 +0000 Received: from MN0PR12MB6341.namprd12.prod.outlook.com ([fe80::4cbb:90c1:e4f:38d5]) by MN0PR12MB6341.namprd12.prod.outlook.com ([fe80::4cbb:90c1:e4f:38d5%4]) with mapi id 15.20.7519.023; Sun, 28 Apr 2024 03:22:09 +0000 From: Chenbo Xia To: Maxime Coquelin CC: "dev@dpdk.org" , David Marchand Subject: Re: [PATCH v3 5/5] vhost: manage FD with epoll Thread-Topic: [PATCH v3 5/5] vhost: manage FD with epoll Thread-Index: AQHainPuLFmTTr3JTEaONiKPCrQY/bF9IYuA Date: Sun, 28 Apr 2024 03:22:09 +0000 Message-ID: References: <20240409114845.1336403-1-maxime.coquelin@redhat.com> <20240409114845.1336403-6-maxime.coquelin@redhat.com> In-Reply-To: <20240409114845.1336403-6-maxime.coquelin@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MN0PR12MB6341:EE_|PH7PR12MB7355:EE_ x-ms-office365-filtering-correlation-id: c2df9f0e-0e3b-48db-9df6-08dc673263b0 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230031|1800799015|376005|366007|38070700009; x-microsoft-antispam-message-info: =?us-ascii?Q?eDsKkYIKmDbCB/2j43SAH8SgRGaMjzcEbDc8eV7R2E0LKYYfVcPbimRCIH5I?= =?us-ascii?Q?aH+/WNuIxPca7oAR5sfLKhSAeLep+olfSVufcaf2wPiDYRlt1GXtW23SnFlT?= =?us-ascii?Q?3ZG/2xu/EEzdo0V5RMW8y8Ao91q01oVfn8DdK9yRKLzKKC0pvx6QBbhQkmtg?= =?us-ascii?Q?9v8LBjVqtPh6r989dxM2X9rLGD+reBvgMmE13rKbNYBBBuYuEOVn0rsKw8sl?= =?us-ascii?Q?wzCinLdT0QaNFTz+JD6N/uAFwcUXVHHk6Q16b0jbUOYcmzgxqTUMNvOAOeG/?= =?us-ascii?Q?bSa197MYWPl/k9IBT8PM098deCn49UV3q+f4i4AkjQnldgE1FeRrmWrfNk8K?= =?us-ascii?Q?fq62N/SPVMIqn5kmNWztQHdXod9nag07QC79fy3KnZ3cgPmCBNXBeCp4HCpA?= =?us-ascii?Q?Fqg9K77skv1oIJIrSQxVBdGgnxOwx6gwcOwE5XXEnYaLvgcF0WMdx5GkbsFg?= =?us-ascii?Q?RwaaU5Q8PH0b1BoHC4x052D0jtvOXfDDMuFelqbAz/c5UATfxzuqZZPyU1Jv?= =?us-ascii?Q?bZJEy8d8g2APRN/T6lq7/dFBn2awS4DbK4Vr3DOlfB9d2kkWplJx/THq6WJ+?= =?us-ascii?Q?M1yctunJXVUIuinEZIQ7WiVd86uOz1HsVMKWGDGHMhDtvBWPxgAPDA5K0pQs?= =?us-ascii?Q?y7TEChb6m/WZkcpKJ4OeoaqhojZahtxr+9/IeieDzMI/3WaGkRnOWORj5nhZ?= =?us-ascii?Q?77pwOW5yUvp2vMbhtem5BLQioFYF1cn6gHN20U/MlW6rcUFGy5IbCrkGt+i3?= =?us-ascii?Q?0yo08Gk6irNPhud6UUuCqf5tlq+VDdIDSFAOhgDzGhV0gXrVCRoHOhTSnVIW?= =?us-ascii?Q?UcQc5MCqD+NTIzRCzwkpHvyQlc4qGZYAa6YG/0OONpFRJhBFL6eP94nNczIP?= =?us-ascii?Q?uT6n5OV2ekdtZOY2wfrDc5vCAzmcgUG2EohRC1jIwGq3xOI1rf99LFHJRSN/?= =?us-ascii?Q?r4u5ax8fvsVQtIqt52WZXJkZC3MFdf5c9tGX+6xfJqt5Rh5lQxY95P1LXF15?= =?us-ascii?Q?LwKF45tluyDaUyZ9pv4FNfFLBf60nQ0Rthqo39fgIOFamrkYqSK5j4ABkvxt?= =?us-ascii?Q?4uQZETtodF3Q/tecN42jYP5ARg86VY9btPOgfwJyopWmKvDsWhkQdDMNhj1G?= =?us-ascii?Q?qYy53wCMYx35cQZ0SLZ6AkAHxfRsQTCYFXci7GzCaRkf6MTX1N9MVsayP5Ar?= =?us-ascii?Q?1IfcejypMDEzIDdU+gjr1uvE6aKzDRuM9I0IDPXRNaOLQDLcJ5KeKOZ33dyy?= =?us-ascii?Q?JYYNvvxedk4zJx/Cp/OP5XtZvhnIg3fM1f1qnh5oq9mKklGV7wVhtFCa4igE?= =?us-ascii?Q?kQLsbOTV1yPZ8rIvZx3zBhGuewbBpu1ijLWELpxN6EVjdQ=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MN0PR12MB6341.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(1800799015)(376005)(366007)(38070700009); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?Ach6Q/+NCclDLys4ERnS8r9HRKkSZmxalYagrkntXzZIha3hYXHABnq5Jgyv?= =?us-ascii?Q?JfOdeHy8yhdxNZR9BaLW4ltd2WjK9zaMWAgpr/O4uXMHImFUPzkE2tOAeg71?= =?us-ascii?Q?VvNSGaDaOlb6kN6zKF+zdKJxCyqLrT80V4lGxqpXBfahHBKYJ1AGTF/7hvEU?= =?us-ascii?Q?H99fCMgv1N18UA7gvxQcK7SDjUYKOt4hAHAnTYhIYC/bmMOSL1vkNchGrCzy?= =?us-ascii?Q?XdVlhdtfXxzr8xzeORjZqxWLgZXDc0PS+Y82oYOif47PbeOJq48yUWtxE2zp?= =?us-ascii?Q?KrYSYjiKk52y4aYT/Ee+kOrvDPWRkvCbSy+SACy4T1u6kfoXwAGK5neO/rSp?= =?us-ascii?Q?tknGeKU7WJCjwZfHfkuKigCQizY8xL2sPZFAaqbbOhpXP3HrOo2x/03Hnxkc?= =?us-ascii?Q?ppsuB81RQxbz3HqHI3K59jEaWBXGdsuhBkK5Z+3CyPJ9T1UT2HWL/okSWUsP?= =?us-ascii?Q?xhJwHZktqk5HjJCBe6Ra4xJ2dXN86Z9v/DHH6VwB8yAYGKATxmLv2D0nnfXv?= =?us-ascii?Q?tgSBAiT7MnkZAigorv7/blZcFEW+slnrNEmXwtqzGSDHV3SUZ1rTtZe5TVDo?= =?us-ascii?Q?4IVd9qPQzZzs9+vPMwT7C807SCX2sjtP6czu2XA6zF+QLHEtxMmdut+CfXZz?= =?us-ascii?Q?9FH/frpDfG8MRW3cUg+H2698xPot4kTyhUfWGsNXqKFZ+A9DC3JFVNn3Omi2?= =?us-ascii?Q?fP12ehj1T1KH+zCwPKmUTsJxuo87lhSpmtWCWIldDu5Zt/qNH37e+AL44okg?= =?us-ascii?Q?u9wZI4RlY3NmsVPmiDBNG0uC73lJjCqiCAzqunHlANdJYurmhsP0qxESwHa9?= =?us-ascii?Q?YL7PDvLuMo4eT2X685AOUDYAMGMFsseTEMWNQWtv+Ym5c5KVm6ZjEOmFtqds?= =?us-ascii?Q?uaky8/SUhIU3iHdWkUgw3Ik7msyEtqzB7Il2+o9BQDY1svoQ336r12EIvIC7?= =?us-ascii?Q?OLYmb3CRq2yRzaoUrMF3LFlbJsvO3gDM0H2BnLtEb1n6RApJSK3U5Y3r4sPE?= =?us-ascii?Q?SOHPn4Tnwbro+GFxfQyV4q/fr+WTPKRaa/pysjkIowCBVwz1O23o8tJSw4vB?= =?us-ascii?Q?CCF3tj97hny4XVI+uhjXrU2TfcWST8Vpxv4g3M0q9Hk9D3mvne4K7MfGuSX6?= =?us-ascii?Q?D06z8ur+qFw0G44UgenGZK+NY9hZbcn3sKBrk8Fs0ahaW0y4RJC2x109S0jy?= =?us-ascii?Q?4SSc90Da0bq32oLW3M2efUaAvYEKrNsz3FHkFkr8x++w6kXRJA92iWLYspJ3?= =?us-ascii?Q?pSij+APEoGt4U13QC8YzOTnSVoBgLHIZJQ4RRxJe4NI1A0WLngFP4vowdw6N?= =?us-ascii?Q?GRgfhR/PjSMVt5ufqCwjY8muvSxNgqiBLr5ubrWgu4QGwoLBdKbhcMrAbuBQ?= =?us-ascii?Q?cwBfpbKdjt5J42dprfJyAgsQC4BWgGl6h8BME2AnP8ILq/3khCrds9Cuuj8d?= =?us-ascii?Q?N9uVqsR5BxhgnP/3w2XGbsdibp8ffvTT7p6UKgSJ6+uOhP4S+alFsufyKzZO?= =?us-ascii?Q?1txUqM12EX+6vip7xGz6IDOZUNOESYkdMVncUbz6pVAHyJWzpjhCpI+7edYG?= =?us-ascii?Q?S6prmpX5g8tp2FHe5ClUDfmiWuBFAhE5viTLrkq6nyTkMtvpis8+Fz76VMOH?= =?us-ascii?Q?Hg=3D=3D?= Content-Type: text/plain; charset="us-ascii" Content-ID: <537ACE670CC111439BE781F419FEB26A@namprd12.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MN0PR12MB6341.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: c2df9f0e-0e3b-48db-9df6-08dc673263b0 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Apr 2024 03:22:09.8281 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: PDuwEZ+c6yd4bdxnNjO/l/ejBiHBbtmpjl5DhnZjtxYqpJQ7UWf2/XrXt8plaKNrvB3oDGdVpZgC5LKdm0aNBw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB7355 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Maxime, > On Apr 9, 2024, at 19:48, Maxime Coquelin wr= ote: >=20 > External email: Use caution opening links or attachments >=20 >=20 > From: David Marchand >=20 > Switch to epoll so that the concern over the poll() fd array > is removed. > Add a simple list of used entries and track the next free entry. >=20 > epoll() is thread safe, we no more need a synchronization > mechanism and so can remove the notification pipe. >=20 > Signed-off-by: David Marchand > Signed-off-by: Maxime Coquelin > --- > lib/vhost/fd_man.c | 399 ++++++++++++--------------------------------- > lib/vhost/fd_man.h | 5 +- > 2 files changed, 106 insertions(+), 298 deletions(-) >=20 > diff --git a/lib/vhost/fd_man.c b/lib/vhost/fd_man.c > index 8b47c97d45..a4a2965da1 100644 > --- a/lib/vhost/fd_man.c > +++ b/lib/vhost/fd_man.c > @@ -3,9 +3,9 @@ > */ >=20 > #include > -#include > #include > #include > +#include > #include >=20 > #include > @@ -21,49 +21,34 @@ RTE_LOG_REGISTER_SUFFIX(vhost_fdset_logtype, fdset, I= NFO); > #define VHOST_FDMAN_LOG(level, ...) \ > RTE_LOG_LINE(level, VHOST_FDMAN, "" __VA_ARGS__) >=20 > -#define FDPOLLERR (POLLERR | POLLHUP | POLLNVAL) > - > struct fdentry { > int fd; /* -1 indicates this entry is empty */ > fd_cb rcb; /* callback when this fd is readable. */ > fd_cb wcb; /* callback when this fd is writeable.*/ > void *dat; /* fd context */ > int busy; /* whether this entry is being used in cb. */ > + LIST_ENTRY(fdentry) next; > }; >=20 > struct fdset { > char name[RTE_THREAD_NAME_SIZE]; > - struct pollfd rwfds[MAX_FDS]; > + int epfd; > struct fdentry fd[MAX_FDS]; > + LIST_HEAD(, fdentry) fdlist; > + int next_free_idx; > rte_thread_t tid; > pthread_mutex_t fd_mutex; > - pthread_mutex_t fd_polling_mutex; > - int num; /* current fd number of this fdset */ > - > - union pipefds { > - struct { > - int pipefd[2]; > - }; > - struct { > - int readfd; > - int writefd; > - }; > - } u; > - > - pthread_mutex_t sync_mutex; > - pthread_cond_t sync_cond; > - bool sync; > + Not sure this blank line is intended or not :) > bool destroy; > }; >=20 > -static int fdset_add_no_sync(struct fdset *pfdset, int fd, fd_cb rcb, fd= _cb wcb, void *dat); > -static uint32_t fdset_event_dispatch(void *arg); > - > #define MAX_FDSETS 8 >=20 > static struct fdset *fdsets[MAX_FDSETS]; > pthread_mutex_t fdsets_mutex =3D PTHREAD_MUTEX_INITIALIZER; >=20 > +static uint32_t fdset_event_dispatch(void *arg); > + > static struct fdset * > fdset_lookup(const char *name) > { > @@ -96,166 +81,6 @@ fdset_insert(struct fdset *fdset) > return -1; > } >=20 > -static void > -fdset_pipe_read_cb(int readfd, void *dat, > - int *remove __rte_unused) > -{ > - char charbuf[16]; > - struct fdset *fdset =3D dat; > - int r =3D read(readfd, charbuf, sizeof(charbuf)); > - /* > - * Just an optimization, we don't care if read() failed > - * so ignore explicitly its return value to make the > - * compiler happy > - */ > - RTE_SET_USED(r); > - > - pthread_mutex_lock(&fdset->sync_mutex); > - fdset->sync =3D true; > - pthread_cond_broadcast(&fdset->sync_cond); > - pthread_mutex_unlock(&fdset->sync_mutex); > -} > - > -static void > -fdset_pipe_uninit(struct fdset *fdset) > -{ > - fdset_del(fdset, fdset->u.readfd); > - close(fdset->u.readfd); > - fdset->u.readfd =3D -1; > - close(fdset->u.writefd); > - fdset->u.writefd =3D -1; > -} > - > -static int > -fdset_pipe_init(struct fdset *fdset) > -{ > - int ret; > - > - pthread_mutex_init(&fdset->sync_mutex, NULL); > - pthread_cond_init(&fdset->sync_cond, NULL); > - > - if (pipe(fdset->u.pipefd) < 0) { > - VHOST_FDMAN_LOG(ERR, > - "failed to create pipe for vhost fdset"); > - return -1; > - } > - > - ret =3D fdset_add_no_sync(fdset, fdset->u.readfd, > - fdset_pipe_read_cb, NULL, fdset); > - if (ret < 0) { > - VHOST_FDMAN_LOG(ERR, > - "failed to add pipe readfd %d into vhost server f= dset", > - fdset->u.readfd); > - > - fdset_pipe_uninit(fdset); > - return -1; > - } > - > - return 0; > -} > - > -static void > -fdset_sync(struct fdset *fdset) > -{ > - int ret; > - > - pthread_mutex_lock(&fdset->sync_mutex); > - > - fdset->sync =3D false; > - ret =3D write(fdset->u.writefd, "1", 1); > - if (ret < 0) { > - VHOST_FDMAN_LOG(ERR, > - "Failed to write to notification pipe: %s", > - strerror(errno)); > - goto out_unlock; > - } > - > - while (!fdset->sync) > - pthread_cond_wait(&fdset->sync_cond, &fdset->sync_mutex); > - > -out_unlock: > - pthread_mutex_unlock(&fdset->sync_mutex); > -} > - > -static int > -get_last_valid_idx(struct fdset *pfdset, int last_valid_idx) > -{ > - int i; > - > - for (i =3D last_valid_idx; i >=3D 0 && pfdset->fd[i].fd =3D=3D -1= ; i--) > - ; > - > - return i; > -} > - > -static void > -fdset_move(struct fdset *pfdset, int dst, int src) > -{ > - pfdset->fd[dst] =3D pfdset->fd[src]; > - pfdset->rwfds[dst] =3D pfdset->rwfds[src]; > -} > - > -static void > -fdset_shrink_nolock(struct fdset *pfdset) > -{ > - int i; > - int last_valid_idx =3D get_last_valid_idx(pfdset, pfdset->num - 1= ); > - > - for (i =3D 0; i < last_valid_idx; i++) { > - if (pfdset->fd[i].fd !=3D -1) > - continue; > - > - fdset_move(pfdset, i, last_valid_idx); > - last_valid_idx =3D get_last_valid_idx(pfdset, last_valid_= idx - 1); > - } > - pfdset->num =3D last_valid_idx + 1; > -} > - > -/* > - * Find deleted fd entries and remove them > - */ > -static void > -fdset_shrink(struct fdset *pfdset) > -{ > - pthread_mutex_lock(&pfdset->fd_mutex); > - fdset_shrink_nolock(pfdset); > - pthread_mutex_unlock(&pfdset->fd_mutex); > -} > - > -/** > - * Returns the index in the fdset for a given fd. > - * @return > - * index for the fd, or -1 if fd isn't in the fdset. > - */ > -static int > -fdset_find_fd(struct fdset *pfdset, int fd) > -{ > - int i; > - > - for (i =3D 0; i < pfdset->num && pfdset->fd[i].fd !=3D fd; i++) > - ; > - > - return i =3D=3D pfdset->num ? -1 : i; > -} > - > -static void > -fdset_add_fd(struct fdset *pfdset, int idx, int fd, > - fd_cb rcb, fd_cb wcb, void *dat) > -{ > - struct fdentry *pfdentry =3D &pfdset->fd[idx]; > - struct pollfd *pfd =3D &pfdset->rwfds[idx]; > - > - pfdentry->fd =3D fd; > - pfdentry->rcb =3D rcb; > - pfdentry->wcb =3D wcb; > - pfdentry->dat =3D dat; > - > - pfd->fd =3D fd; > - pfd->events =3D rcb ? POLLIN : 0; > - pfd->events |=3D wcb ? POLLOUT : 0; > - pfd->revents =3D 0; > -} > - > struct fdset * > fdset_init(const char *name) > { > @@ -284,16 +109,20 @@ fdset_init(const char *name) > rte_strscpy(fdset->name, name, RTE_THREAD_NAME_SIZE); >=20 > pthread_mutex_init(&fdset->fd_mutex, NULL); > - pthread_mutex_init(&fdset->fd_polling_mutex, NULL); >=20 > - for (i =3D 0; i < MAX_FDS; i++) { > + for (i =3D 0; i < (int)RTE_DIM(fdset->fd); i++) { > fdset->fd[i].fd =3D -1; > fdset->fd[i].dat =3D NULL; > } > - fdset->num =3D 0; > + LIST_INIT(&fdset->fdlist); >=20 > - if (fdset_pipe_init(fdset)) { > - VHOST_FDMAN_LOG(ERR, "Failed to init pipe for %s", name); > + /* > + * Any non-zero value would work (see man epoll_create), > + * but pass MAX_FDS for consistency. > + */ > + fdset->epfd =3D epoll_create(MAX_FDS); > + if (fdset->epfd < 0) { > + VHOST_FDMAN_LOG(ERR, "failed to create epoll for %s fdset= ", name); failed -> Failed like other logs > goto err_free; > } >=20 > @@ -301,7 +130,7 @@ fdset_init(const char *name) > fdset_event_dispatch, fdset)) { > VHOST_FDMAN_LOG(ERR, "Failed to create %s event dispatch t= hread", > fdset->name); > - goto err_pipe; > + goto err_epoll; > } >=20 > if (fdset_insert(fdset)) { > @@ -315,10 +144,9 @@ fdset_init(const char *name) >=20 > err_thread: > fdset->destroy =3D true; > - fdset_sync(fdset); > rte_thread_join(fdset->tid, &val); > -err_pipe: > - fdset_pipe_uninit(fdset); > +err_epoll: > + close(fdset->epfd); > err_free: > rte_free(fdset); > err_unlock: > @@ -330,78 +158,99 @@ fdset_init(const char *name) > /** > * Register the fd in the fdset with read/write handler and context. > */ > -static int > -fdset_add_no_sync(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, vo= id *dat) > +int > +fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) > { > - int i; > + struct fdentry *pfdentry; > + struct epoll_event ev; >=20 > if (pfdset =3D=3D NULL || fd =3D=3D -1) > return -1; >=20 > pthread_mutex_lock(&pfdset->fd_mutex); > - i =3D pfdset->num < MAX_FDS ? pfdset->num++ : -1; > - if (i =3D=3D -1) { > - pthread_mutex_lock(&pfdset->fd_polling_mutex); > - fdset_shrink_nolock(pfdset); > - pthread_mutex_unlock(&pfdset->fd_polling_mutex); > - i =3D pfdset->num < MAX_FDS ? pfdset->num++ : -1; > - if (i =3D=3D -1) { > - pthread_mutex_unlock(&pfdset->fd_mutex); > - return -2; > - } > + if (pfdset->next_free_idx >=3D (int)RTE_DIM(pfdset->fd)) { > + pthread_mutex_unlock(&pfdset->fd_mutex); > + return -2; > } >=20 > - fdset_add_fd(pfdset, i, fd, rcb, wcb, dat); > + pfdentry =3D &pfdset->fd[pfdset->next_free_idx]; > + pfdentry->fd =3D fd; > + pfdentry->rcb =3D rcb; > + pfdentry->wcb =3D wcb; > + pfdentry->dat =3D dat; > + > + LIST_INSERT_HEAD(&pfdset->fdlist, pfdentry, next); > + > + /* Find next free slot */ > + pfdset->next_free_idx++; > + for (; pfdset->next_free_idx < (int)RTE_DIM(pfdset->fd); pfdset->= next_free_idx++) { > + if (pfdset->fd[pfdset->next_free_idx].fd !=3D -1) > + continue; > + break; > + } > pthread_mutex_unlock(&pfdset->fd_mutex); >=20 > + ev.events =3D EPOLLERR; > + ev.events |=3D rcb ? EPOLLIN : 0; > + ev.events |=3D wcb ? EPOLLOUT : 0; > + ev.data.fd =3D fd; > + > + if (epoll_ctl(pfdset->epfd, EPOLL_CTL_ADD, fd, &ev) =3D=3D -1) > + VHOST_FDMAN_LOG(ERR, "could not add %d fd to %d epfd: %s"= , > + fd, pfdset->epfd, strerror(errno)); Should not return 0 if this fails ? > + > return 0; > } >=20 > -int > -fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat) > +static struct fdentry * > +fdset_find_entry_locked(struct fdset *pfdset, int fd) > { > - int ret; > + struct fdentry *pfdentry; >=20 > - ret =3D fdset_add_no_sync(pfdset, fd, rcb, wcb, dat); > - if (ret < 0) > - return ret; > + LIST_FOREACH(pfdentry, &pfdset->fdlist, next) { > + if (pfdentry->fd !=3D fd) > + continue; > + return pfdentry; > + } >=20 > - fdset_sync(pfdset); > + return NULL; > +} >=20 > - return 0; > +static void > +fdset_del_locked(struct fdset *pfdset, struct fdentry *pfdentry) > +{ > + int entry_idx; > + > + if (epoll_ctl(pfdset->epfd, EPOLL_CTL_DEL, pfdentry->fd, NULL) = =3D=3D -1) > + VHOST_FDMAN_LOG(ERR, "could not remove %d fd from %d epfd= : %s", > + pfdentry->fd, pfdset->epfd, strerror(errno)); > + > + pfdentry->fd =3D -1; > + pfdentry->rcb =3D pfdentry->wcb =3D NULL; > + pfdentry->dat =3D NULL; > + entry_idx =3D pfdentry - pfdset->fd; > + if (entry_idx < pfdset->next_free_idx) > + pfdset->next_free_idx =3D entry_idx; > + LIST_REMOVE(pfdentry, next); > } >=20 > -/** > - * Unregister the fd from the fdset. > - * Returns context of a given fd or NULL. > - */ > -void * > +void > fdset_del(struct fdset *pfdset, int fd) > { > - int i; > - void *dat =3D NULL; > + struct fdentry *pfdentry; >=20 > if (pfdset =3D=3D NULL || fd =3D=3D -1) > - return NULL; > + return; >=20 > do { > pthread_mutex_lock(&pfdset->fd_mutex); > - > - i =3D fdset_find_fd(pfdset, fd); > - if (i !=3D -1 && pfdset->fd[i].busy =3D=3D 0) { > - /* busy indicates r/wcb is executing! */ > - dat =3D pfdset->fd[i].dat; > - pfdset->fd[i].fd =3D -1; > - pfdset->fd[i].rcb =3D pfdset->fd[i].wcb =3D NULL; > - pfdset->fd[i].dat =3D NULL; > - i =3D -1; > + pfdentry =3D fdset_find_entry_locked(pfdset, fd); > + if (pfdentry !=3D NULL && pfdentry->busy =3D=3D 0) { > + fdset_del_locked(pfdset, pfdentry); > + pfdentry =3D NULL; > } > pthread_mutex_unlock(&pfdset->fd_mutex); > - } while (i !=3D -1); > - > - fdset_sync(pfdset); > - > - return dat; > + } while (pfdentry !=3D NULL); > } >=20 > /** > @@ -415,28 +264,22 @@ fdset_del(struct fdset *pfdset, int fd) > int > fdset_try_del(struct fdset *pfdset, int fd) > { > - int i; > + struct fdentry *pfdentry; >=20 > if (pfdset =3D=3D NULL || fd =3D=3D -1) > return -2; >=20 > pthread_mutex_lock(&pfdset->fd_mutex); > - i =3D fdset_find_fd(pfdset, fd); > - if (i !=3D -1 && pfdset->fd[i].busy) { > + pfdentry =3D fdset_find_entry_locked(pfdset, fd); > + if (pfdentry !=3D NULL && pfdentry->busy !=3D 0) { > pthread_mutex_unlock(&pfdset->fd_mutex); > return -1; > } >=20 > - if (i !=3D -1) { > - pfdset->fd[i].fd =3D -1; > - pfdset->fd[i].rcb =3D pfdset->fd[i].wcb =3D NULL; > - pfdset->fd[i].dat =3D NULL; > - } > + if (pfdentry !=3D NULL) > + fdset_del_locked(pfdset, pfdentry); >=20 > pthread_mutex_unlock(&pfdset->fd_mutex); > - > - fdset_sync(pfdset); > - > return 0; > } >=20 > @@ -453,53 +296,29 @@ static uint32_t > fdset_event_dispatch(void *arg) > { > int i; > - struct pollfd *pfd; > - struct fdentry *pfdentry; > fd_cb rcb, wcb; > void *dat; > int fd, numfds; > int remove1, remove2; > - int need_shrink; > struct fdset *pfdset =3D arg; > - int val; >=20 > if (pfdset =3D=3D NULL) > return 0; >=20 > while (1) { > + struct epoll_event events[MAX_FDS]; > + struct fdentry *pfdentry; >=20 > - /* > - * When poll is blocked, other threads might unregister > - * listenfds from and register new listenfds into fdset. > - * When poll returns, the entries for listenfds in the fd= set > - * might have been updated. It is ok if there is unwanted= call > - * for new listenfds. > - */ > - pthread_mutex_lock(&pfdset->fd_mutex); > - numfds =3D pfdset->num; > - pthread_mutex_unlock(&pfdset->fd_mutex); > - > - pthread_mutex_lock(&pfdset->fd_polling_mutex); > - val =3D poll(pfdset->rwfds, numfds, 1000 /* millisecs */)= ; > - pthread_mutex_unlock(&pfdset->fd_polling_mutex); > - if (val < 0) > + numfds =3D epoll_wait(pfdset->epfd, events, RTE_DIM(event= s), 1000); > + if (numfds < 0) > continue; >=20 > - need_shrink =3D 0; > for (i =3D 0; i < numfds; i++) { > pthread_mutex_lock(&pfdset->fd_mutex); >=20 > - pfdentry =3D &pfdset->fd[i]; > - fd =3D pfdentry->fd; > - pfd =3D &pfdset->rwfds[i]; > - > - if (fd < 0) { > - need_shrink =3D 1; > - pthread_mutex_unlock(&pfdset->fd_mutex); > - continue; > - } > - > - if (!pfd->revents) { > + fd =3D events[i].data.fd; > + pfdentry =3D fdset_find_entry_locked(pfdset, fd); > + if (pfdentry =3D=3D NULL) { > pthread_mutex_unlock(&pfdset->fd_mutex); > continue; > } > @@ -513,9 +332,9 @@ fdset_event_dispatch(void *arg) >=20 > pthread_mutex_unlock(&pfdset->fd_mutex); >=20 > - if (rcb && pfd->revents & (POLLIN | FDPOLLERR)) > + if (rcb && events[i].events & (EPOLLIN | EPOLLERR= | EPOLLHUP)) > rcb(fd, dat, &remove1); > - if (wcb && pfd->revents & (POLLOUT | FDPOLLERR)) > + if (wcb && events[i].events & (EPOLLOUT | EPOLLER= R | EPOLLHUP)) > wcb(fd, dat, &remove2); > pfdentry->busy =3D 0; > /* > @@ -524,23 +343,13 @@ fdset_event_dispatch(void *arg) > * directly. > */ > /* > - * When we are to clean up the fd from fdset, > - * because the fd is closed in the cb, > - * the old fd val could be reused by when creates= new > - * listen fd in another thread, we couldn't call > - * fdset_del. > + * A concurrent fdset_del may have been waiting f= or the > + * fdentry not to be busy, so we can't call > + * fdset_del_locked(). > */ > - if (remove1 || remove2) { > - pfdentry->fd =3D -1; > - need_shrink =3D 1; > - } > + if (remove1 || remove2) > + fdset_del(pfdset, fd); > } > - > - if (need_shrink) > - fdset_shrink(pfdset); > - > - if (pfdset->destroy) > - break; I guess we want to keep the destroy logic Thanks, Chenbo > } >=20 > return 0; > diff --git a/lib/vhost/fd_man.h b/lib/vhost/fd_man.h > index 079fa0155f..6398343a6a 100644 > --- a/lib/vhost/fd_man.h > +++ b/lib/vhost/fd_man.h > @@ -6,7 +6,7 @@ > #define _FD_MAN_H_ > #include > #include > -#include > +#include >=20 > struct fdset; >=20 > @@ -19,8 +19,7 @@ struct fdset *fdset_init(const char *name); > int fdset_add(struct fdset *pfdset, int fd, > fd_cb rcb, fd_cb wcb, void *dat); >=20 > -void *fdset_del(struct fdset *pfdset, int fd); > - > +void fdset_del(struct fdset *pfdset, int fd); > int fdset_try_del(struct fdset *pfdset, int fd); >=20 > #endif > -- > 2.44.0 >=20