From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A621942B2A; Wed, 17 May 2023 11:09:22 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2F2A142D42; Wed, 17 May 2023 11:09:19 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 1947E42D3A for ; Wed, 17 May 2023 11:09:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684314557; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZDF3QMO8cn3elD20Y0P+iBVGYCEI/QDhL/8POYxdPnM=; b=ATEfUD2AvpMCb3RUOLusdqU+S61qYsBU2wG1UBy55JKuK/Dip1UJtfuWqwJi09Hjdp/Z2K RKh6WS6vHRbYSACcEWLWf7bCUfEQgpQEiCHLFRmdFqHD1qiAuQVQCO8F7sk0sYLRkukrji 3iqLNjM6bUee/5J2N3J+hkycLl3pg/o= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-414-XsoAaRK0P5Ozaqly7QqM8Q-1; Wed, 17 May 2023 05:09:16 -0400 X-MC-Unique: XsoAaRK0P5Ozaqly7QqM8Q-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0177184AF31; Wed, 17 May 2023 09:09:16 +0000 (UTC) Received: from ebuild.redhat.com (unknown [10.39.195.120]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3778C14171C0; Wed, 17 May 2023 09:09:15 +0000 (UTC) From: Eelco Chaudron To: maxime.coquelin@redhat.com, chenbo.xia@intel.com, david.marchand@redhat.com Cc: dev@dpdk.org Subject: [PATCH v3 4/4] vhost: add device op to offload the interrupt kick Date: Wed, 17 May 2023 11:09:13 +0200 Message-Id: <168431455219.558450.14986601389394385835.stgit@ebuild.local> In-Reply-To: <168431450017.558450.16680518469610688737.stgit@ebuild.local> References: <168431450017.558450.16680518469610688737.stgit@ebuild.local> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org This patch adds an operation callback which gets called every time the library wants to call eventfd_write(). This eventfd_write() call could result in a system call, which could potentially block the PMD thread. The callback function can decide whether it's ok to handle the eventfd_write() now or have the newly introduced function, rte_vhost_notify_guest(), called at a later time. This can be used by 3rd party applications, like OVS, to avoid system calls being called as part of the PMD threads. Signed-off-by: Eelco Chaudron --- lib/vhost/meson.build | 2 ++ lib/vhost/rte_vhost.h | 23 +++++++++++++++++- lib/vhost/socket.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++--- lib/vhost/version.map | 9 +++++++ lib/vhost/vhost.c | 38 ++++++++++++++++++++++++++++++ lib/vhost/vhost.h | 58 ++++++++++++++++++++++++++++++++------------- 6 files changed, 171 insertions(+), 22 deletions(-) diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build index 0d1abf6283..05679447db 100644 --- a/lib/vhost/meson.build +++ b/lib/vhost/meson.build @@ -38,3 +38,5 @@ driver_sdk_headers = files( 'vdpa_driver.h', ) deps += ['ethdev', 'cryptodev', 'hash', 'pci', 'dmadev'] + +use_function_versioning = true diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h index 58a5d4be92..7a10bc36cf 100644 --- a/lib/vhost/rte_vhost.h +++ b/lib/vhost/rte_vhost.h @@ -298,7 +298,13 @@ struct rte_vhost_device_ops { */ void (*guest_notified)(int vid); - void *reserved[1]; /**< Reserved for future extension */ + /** + * If this callback is registered, notification to the guest can + * be handled by the front-end calling rte_vhost_notify_guest(). + * If it's not handled, 'false' should be returned. This can be used + * to remove the "slow" eventfd_write() syscall from the datapath. + */ + bool (*guest_notify)(int vid, uint16_t queue_id); }; /** @@ -433,6 +439,21 @@ void rte_vhost_log_used_vring(int vid, uint16_t vring_idx, int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable); +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice. + * + * Inject the offloaded interrupt into the vhost device's queue. For more + * details see the 'guest_notify' vhost device operation. + * + * @param vid + * vhost device ID + * @param queue_id + * virtio queue index + */ +__rte_experimental +void rte_vhost_notify_guest(int vid, uint16_t queue_id); + /** * Register vhost driver. path could be different for multiple * instance support. diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c index 669c322e12..f2c02075fe 100644 --- a/lib/vhost/socket.c +++ b/lib/vhost/socket.c @@ -15,6 +15,7 @@ #include #include +#include #include #include "fd_man.h" @@ -59,6 +60,7 @@ struct vhost_user_socket { struct rte_vdpa_device *vdpa_dev; struct rte_vhost_device_ops const *notify_ops; + struct rte_vhost_device_ops *malloc_notify_ops; }; struct vhost_user_connection { @@ -846,6 +848,11 @@ vhost_user_socket_mem_free(struct vhost_user_socket *vsocket) vsocket->path = NULL; } + if (vsocket && vsocket->malloc_notify_ops) { + free(vsocket->malloc_notify_ops); + vsocket->malloc_notify_ops = NULL; + } + if (vsocket) { free(vsocket); vsocket = NULL; @@ -1099,21 +1106,69 @@ rte_vhost_driver_unregister(const char *path) /* * Register ops so that we can add/remove device to data core. */ -int -rte_vhost_driver_callback_register(const char *path, - struct rte_vhost_device_ops const * const ops) +static int +vhost_driver_callback_register(const char *path, + struct rte_vhost_device_ops const * const ops, + struct rte_vhost_device_ops *malloc_ops) { struct vhost_user_socket *vsocket; pthread_mutex_lock(&vhost_user.mutex); vsocket = find_vhost_user_socket(path); - if (vsocket) + if (vsocket) { vsocket->notify_ops = ops; + free(vsocket->malloc_notify_ops); + vsocket->malloc_notify_ops = malloc_ops; + } pthread_mutex_unlock(&vhost_user.mutex); return vsocket ? 0 : -1; } +int __vsym +rte_vhost_driver_callback_register_v24(const char *path, + struct rte_vhost_device_ops const * const ops) +{ + return vhost_driver_callback_register(path, ops, NULL); +} + +int __vsym +rte_vhost_driver_callback_register_v23(const char *path, + struct rte_vhost_device_ops const * const ops) +{ + int ret; + + /* + * Although the ops structure is a const structure, we do need to + * override the guest_notify operation. This is because with the + * previous APIs it was "reserved" and if any garbage value was passed, + * it could crash the application. + */ + if (ops && !ops->guest_notify) { + struct rte_vhost_device_ops *new_ops; + + new_ops = malloc(sizeof(*new_ops)); + if (new_ops == NULL) + return -1; + + memcpy(new_ops, ops, sizeof(*new_ops)); + new_ops->guest_notify = NULL; + + ret = vhost_driver_callback_register(path, new_ops, new_ops); + } else { + ret = vhost_driver_callback_register(path, ops, NULL); + } + + return ret; +} + +/* Mark the v23 function as the old version, and v24 as the default version. */ +VERSION_SYMBOL(rte_vhost_driver_callback_register, _v23, 23); +BIND_DEFAULT_SYMBOL(rte_vhost_driver_callback_register, _v24, 24); +MAP_STATIC_SYMBOL(int rte_vhost_driver_callback_register(const char *path, + struct rte_vhost_device_ops const * const ops), + rte_vhost_driver_callback_register_v24); + struct rte_vhost_device_ops const * vhost_driver_callback_get(const char *path) { diff --git a/lib/vhost/version.map b/lib/vhost/version.map index d322a4a888..7bcbfd12cf 100644 --- a/lib/vhost/version.map +++ b/lib/vhost/version.map @@ -64,6 +64,12 @@ DPDK_23 { local: *; }; +DPDK_24 { + global: + + rte_vhost_driver_callback_register; +} DPDK_23; + EXPERIMENTAL { global: @@ -98,6 +104,9 @@ EXPERIMENTAL { # added in 22.11 rte_vhost_async_dma_unconfigure; rte_vhost_vring_call_nonblock; + + # added in 23.07 + rte_vhost_notify_guest; }; INTERNAL { diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index 8ff6434c93..79e88f986e 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -44,6 +44,10 @@ static const struct vhost_vq_stats_name_off vhost_vq_stat_strings[] = { {"size_1024_1518_packets", offsetof(struct vhost_virtqueue, stats.size_bins[6])}, {"size_1519_max_packets", offsetof(struct vhost_virtqueue, stats.size_bins[7])}, {"guest_notifications", offsetof(struct vhost_virtqueue, stats.guest_notifications)}, + {"guest_notifications_offloaded", offsetof(struct vhost_virtqueue, + stats.guest_notifications_offloaded)}, + {"guest_notifications_error", offsetof(struct vhost_virtqueue, + stats.guest_notifications_error)}, {"iotlb_hits", offsetof(struct vhost_virtqueue, stats.iotlb_hits)}, {"iotlb_misses", offsetof(struct vhost_virtqueue, stats.iotlb_misses)}, {"inflight_submitted", offsetof(struct vhost_virtqueue, stats.inflight_submitted)}, @@ -1467,6 +1471,40 @@ rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable) return ret; } +void +rte_vhost_notify_guest(int vid, uint16_t queue_id) +{ + struct virtio_net *dev = get_device(vid); + struct vhost_virtqueue *vq; + + if (!dev || queue_id >= VHOST_MAX_VRING) + return; + + vq = dev->virtqueue[queue_id]; + if (!vq) + return; + + rte_rwlock_read_lock(&vq->access_lock); + + if (vq->callfd >= 0) { + int ret = eventfd_write(vq->callfd, (eventfd_t)1); + + if (ret) { + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) + __atomic_fetch_add(&vq->stats.guest_notifications_error, + 1, __ATOMIC_RELAXED); + } else { + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) + __atomic_fetch_add(&vq->stats.guest_notifications, + 1, __ATOMIC_RELAXED); + if (dev->notify_ops->guest_notified) + dev->notify_ops->guest_notified(dev->vid); + } + } + + rte_rwlock_read_unlock(&vq->access_lock); +} + void rte_vhost_log_write(int vid, uint64_t addr, uint64_t len) { diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h index 23a4e2b1a7..8ad53e9bb5 100644 --- a/lib/vhost/vhost.h +++ b/lib/vhost/vhost.h @@ -141,6 +141,8 @@ struct virtqueue_stats { uint64_t inflight_completed; /* Counters below are atomic, and should be incremented as such. */ uint64_t guest_notifications; + uint64_t guest_notifications_offloaded; + uint64_t guest_notifications_error; }; /** @@ -884,6 +886,34 @@ vhost_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old) return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old); } +static __rte_always_inline void +vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *vq) +{ + int ret; + + if (dev->notify_ops->guest_notify && + dev->notify_ops->guest_notify(dev->vid, vq->index)) { + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) + __atomic_fetch_add(&vq->stats.guest_notifications_offloaded, + 1, __ATOMIC_RELAXED); + return; + } + + ret = eventfd_write(vq->callfd, (eventfd_t) 1); + if (ret) { + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) + __atomic_fetch_add(&vq->stats.guest_notifications_error, + 1, __ATOMIC_RELAXED); + return; + } + + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) + __atomic_fetch_add(&vq->stats.guest_notifications, + 1, __ATOMIC_RELAXED); + if (dev->notify_ops->guest_notified) + dev->notify_ops->guest_notified(dev->vid); +} + static __rte_always_inline void vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq) { @@ -906,23 +936,13 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq) if ((vhost_need_event(vhost_used_event(vq), new, old) || unlikely(!signalled_used_valid)) && vq->callfd >= 0) { - eventfd_write(vq->callfd, (eventfd_t) 1); - if (dev->flags & VIRTIO_DEV_STATS_ENABLED) - __atomic_fetch_add(&vq->stats.guest_notifications, - 1, __ATOMIC_RELAXED); - if (dev->notify_ops->guest_notified) - dev->notify_ops->guest_notified(dev->vid); + vhost_vring_inject_irq(dev, vq); } } else { /* Kick the guest if necessary. */ if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) && (vq->callfd >= 0)) { - eventfd_write(vq->callfd, (eventfd_t)1); - if (dev->flags & VIRTIO_DEV_STATS_ENABLED) - __atomic_fetch_add(&vq->stats.guest_notifications, - 1, __ATOMIC_RELAXED); - if (dev->notify_ops->guest_notified) - dev->notify_ops->guest_notified(dev->vid); + vhost_vring_inject_irq(dev, vq); } } } @@ -974,11 +994,8 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) if (vhost_need_event(off, new, old)) kick = true; kick: - if (kick && vq->callfd >= 0) { - eventfd_write(vq->callfd, (eventfd_t)1); - if (dev->notify_ops->guest_notified) - dev->notify_ops->guest_notified(dev->vid); - } + if (kick && vq->callfd >= 0) + vhost_vring_inject_irq(dev, vq); } static __rte_always_inline void @@ -1017,4 +1034,11 @@ mbuf_is_consumed(struct rte_mbuf *m) uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr); void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment); + +/* Versioned functions */ +int rte_vhost_driver_callback_register_v23(const char *path, + struct rte_vhost_device_ops const * const ops); +int rte_vhost_driver_callback_register_v24(const char *path, + struct rte_vhost_device_ops const * const ops); + #endif /* _VHOST_NET_CDEV_H_ */