From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B3C2742AC8; Wed, 10 May 2023 13:45:14 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 42FD7406B6; Wed, 10 May 2023 13:45:14 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 4ED5740697 for ; Wed, 10 May 2023 13:45:12 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683719111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wuttcaprsTuRNPV1YyyhWJ/pojB66LWTil7txo8qXb4=; b=VSzKVz4E/2/CWf6TFLeMhRvF00fq+d5FHGVoxbuI7kq7E3vZTi0jUxtRQJzXxJjkmWTQhb tQR9p1EQeylY1CfBfptYasCAG/xAFNk9hQHh+VGLt4+HMe3DdDV9K3xkmuAdD6DneUnZHF uTLVw6kFpicnGVbZoFa/GXd71FjboHg= Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-581-tXJV-ISPM1y7zC9_P5KH3g-1; Wed, 10 May 2023 07:45:10 -0400 X-MC-Unique: tXJV-ISPM1y7zC9_P5KH3g-1 Received: by mail-pl1-f200.google.com with SMTP id d9443c01a7336-1aae2223aaeso40516535ad.2 for ; Wed, 10 May 2023 04:45:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683719110; x=1686311110; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wuttcaprsTuRNPV1YyyhWJ/pojB66LWTil7txo8qXb4=; b=fHj0iyed+3WzcDbzcYsMHATpS3Ay33g+gC4GVoDYXDZBzp2THIfmYOOhbKKs4Ay8rB vHKXoR2RVpXlSjEJ60ZpAqsftbb2IVgRS4gMDzd5bSFGwXzUZoeUzJ31QwXo2/LZIXJz LQt/IIAIDj6PAw9Dd0gcnqLFjNlsszgQch2g9WxYIeZxB29vmduQM+Dm/ieZUTlZqHDv 8ZkpZEhToHs2+KGcp4UUrXxkKZ9fzN7o8Xd5rriEW6JpoTAPeXthR3Y+8gxvG3q+6g5c am4BbZio0nCPVHVtLvwNct+Xt4Aqyay2+VCDmqS3IS+vCMZ9wqHvUIKRKLWrNjRHASbW Bs3g== X-Gm-Message-State: AC+VfDwdv7GbwEPNBF2ffAlvVqcQS7rKZqBIDBbbDoRysPQYaaxrrLaP HmnjI4aywZHbPkYCqj6tniF13aXD5DycNrTnnowHVZpsAR1bVxyPA4SqAvGSjOiaL43ZF5uXse9 AwxbDD3aldSfeYZ+I5Iw= X-Received: by 2002:a17:903:22cd:b0:1aa:ee36:4095 with SMTP id y13-20020a17090322cd00b001aaee364095mr22564462plg.43.1683719109742; Wed, 10 May 2023 04:45:09 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6OLEuIoq2Jo0g+C1vyIU7KLwGw/RcbZuVYY3N5hNIZN1yMw0DdXyJ6yBKVkI5Jdy5ySoFzX12MTm6uaqkVH8M= X-Received: by 2002:a17:903:22cd:b0:1aa:ee36:4095 with SMTP id y13-20020a17090322cd00b001aaee364095mr22564442plg.43.1683719109316; Wed, 10 May 2023 04:45:09 -0700 (PDT) MIME-Version: 1.0 References: <168069838578.833254.4856666346839028593.stgit@ebuild.local> <168069850278.833254.17836553051894685658.stgit@ebuild.local> In-Reply-To: <168069850278.833254.17836553051894685658.stgit@ebuild.local> From: David Marchand Date: Wed, 10 May 2023 13:44:58 +0200 Message-ID: Subject: Re: [PATCH v2 3/3] vhost: add device op to offload the interrupt kick To: Eelco Chaudron Cc: maxime.coquelin@redhat.com, chenbo.xia@intel.com, dev@dpdk.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, Apr 5, 2023 at 2:42=E2=80=AFPM Eelco Chaudron = wrote: > > This patch adds an operation callback which gets called every time the > library wants to call eventfd_write(). This eventfd_write() call could > result in a system call, which could potentially block the PMD thread. > > The callback function can decide whether it's ok to handle the > eventfd_write() now or have the newly introduced function, > rte_vhost_notify_guest(), called at a later time. > > This can be used by 3rd party applications, like OVS, to avoid system > calls being called as part of the PMD threads. > > Signed-off-by: Eelco Chaudron > --- > lib/vhost/meson.build | 2 + > lib/vhost/rte_vhost.h | 23 +++++++++++++++- > lib/vhost/socket.c | 72 +++++++++++++++++++++++++++++++++++++++++++= +++--- > lib/vhost/version.map | 9 ++++++ > lib/vhost/vhost.c | 38 ++++++++++++++++++++++++++ > lib/vhost/vhost.h | 65 +++++++++++++++++++++++++++++++------------= - > 6 files changed, 184 insertions(+), 25 deletions(-) > > diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build > index 197a51d936..e93ba6b078 100644 > --- a/lib/vhost/meson.build > +++ b/lib/vhost/meson.build > @@ -39,3 +39,5 @@ driver_sdk_headers =3D files( > 'vdpa_driver.h', > ) > deps +=3D ['ethdev', 'cryptodev', 'hash', 'pci', 'dmadev'] > + > +use_function_versioning =3D true > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h > index 58a5d4be92..7a10bc36cf 100644 > --- a/lib/vhost/rte_vhost.h > +++ b/lib/vhost/rte_vhost.h > @@ -298,7 +298,13 @@ struct rte_vhost_device_ops { > */ > void (*guest_notified)(int vid); > > - void *reserved[1]; /**< Reserved for future extension */ > + /** > + * If this callback is registered, notification to the guest can > + * be handled by the front-end calling rte_vhost_notify_guest(). > + * If it's not handled, 'false' should be returned. This can be u= sed > + * to remove the "slow" eventfd_write() syscall from the datapath= . > + */ > + bool (*guest_notify)(int vid, uint16_t queue_id); > }; > > /** > @@ -433,6 +439,21 @@ void rte_vhost_log_used_vring(int vid, uint16_t vrin= g_idx, > > int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int = enable); > > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior no= tice. > + * > + * Inject the offloaded interrupt into the vhost device's queue. For mor= e > + * details see the 'guest_notify' vhost device operation. > + * > + * @param vid > + * vhost device ID > + * @param queue_id > + * virtio queue index > + */ > +__rte_experimental > +void rte_vhost_notify_guest(int vid, uint16_t queue_id); > + > /** > * Register vhost driver. path could be different for multiple > * instance support. > diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c > index 669c322e12..787d6bacf8 100644 > --- a/lib/vhost/socket.c > +++ b/lib/vhost/socket.c > @@ -15,6 +15,7 @@ > #include > #include > > +#include > #include > > #include "fd_man.h" > @@ -43,6 +44,7 @@ struct vhost_user_socket { > bool async_copy; > bool net_compliant_ol_flags; > bool stats_enabled; > + bool alloc_notify_ops; > > /* > * The "supported_features" indicates the feature bits the > @@ -846,6 +848,14 @@ vhost_user_socket_mem_free(struct vhost_user_socket = *vsocket) > vsocket->path =3D NULL; > } > > + if (vsocket && vsocket->alloc_notify_ops) { > +#pragma GCC diagnostic push > +#pragma GCC diagnostic ignored "-Wcast-qual" > + free((struct rte_vhost_device_ops *)vsocket->notify_ops); > +#pragma GCC diagnostic pop > + vsocket->notify_ops =3D NULL; > + } Rather than select the behavior based on a boolean (and here force the compiler to close its eyes), I would instead add a non const pointer to ops (let's say alloc_notify_ops) in vhost_user_socket. The code can then unconditionnally call free(vsocket->alloc_notify_ops); > + > if (vsocket) { > free(vsocket); > vsocket =3D NULL; > @@ -1099,21 +1109,75 @@ rte_vhost_driver_unregister(const char *path) > /* > * Register ops so that we can add/remove device to data core. > */ > -int > -rte_vhost_driver_callback_register(const char *path, > - struct rte_vhost_device_ops const * const ops) > +static int > +rte_vhost_driver_callback_register__(const char *path, No need for a rte_ prefix on static symbol. And we can simply call this internal helper vhost_driver_callback_register(= ). > + struct rte_vhost_device_ops const * const ops, bool ops_allocated= ) > { > struct vhost_user_socket *vsocket; > > pthread_mutex_lock(&vhost_user.mutex); > vsocket =3D find_vhost_user_socket(path); > - if (vsocket) > + if (vsocket) { > + if (vsocket->alloc_notify_ops) { > + vsocket->alloc_notify_ops =3D false; > +#pragma GCC diagnostic push > +#pragma GCC diagnostic ignored "-Wcast-qual" > + free((struct rte_vhost_device_ops *)vsocket->noti= fy_ops); > +#pragma GCC diagnostic pop > + } > vsocket->notify_ops =3D ops; > + if (ops_allocated) > + vsocket->alloc_notify_ops =3D true; > + } > pthread_mutex_unlock(&vhost_user.mutex); > > return vsocket ? 0 : -1; > } > > +int __vsym > +rte_vhost_driver_callback_register_v24(const char *path, > + struct rte_vhost_device_ops const * const ops) > +{ > + return rte_vhost_driver_callback_register__(path, ops, false); > +} > + > +int __vsym > +rte_vhost_driver_callback_register_v23(const char *path, > + struct rte_vhost_device_ops const * const ops) > +{ > + int ret; > + > + /* > + * Although the ops structure is a const structure, we do need to > + * override the guest_notify operation. This is because with the > + * previous APIs it was "reserved" and if any garbage value was p= assed, > + * it could crash the application. > + */ > + if (ops && !ops->guest_notify) { Hum, as described in the comment above, I don't think we should look at ops->guest_notify value at all. Checking ops !=3D NULL should be enough. > + struct rte_vhost_device_ops *new_ops; > + > + new_ops =3D malloc(sizeof(*new_ops)); Strictly speaking, we lose the numa affinity of "ops" by calling malloc. I am unclear of the impact though. > + if (new_ops =3D=3D NULL) > + return -1; > + > + memcpy(new_ops, ops, sizeof(*new_ops)); > + new_ops->guest_notify =3D NULL; > + > + ret =3D rte_vhost_driver_callback_register__(path, ops, t= rue); > + } else { > + ret =3D rte_vhost_driver_callback_register__(path, ops, f= alse); > + } > + > + return ret; > +} > + > +/* Mark the v23 function as the old version, and v24 as the default vers= ion. */ > +VERSION_SYMBOL(rte_vhost_driver_callback_register, _v23, 23); > +BIND_DEFAULT_SYMBOL(rte_vhost_driver_callback_register, _v24, 24); > +MAP_STATIC_SYMBOL(int rte_vhost_driver_callback_register(const char *pat= h, > + struct rte_vhost_device_ops const * const ops), > + rte_vhost_driver_callback_register_v24); > + > struct rte_vhost_device_ops const * > vhost_driver_callback_get(const char *path) > { > diff --git a/lib/vhost/version.map b/lib/vhost/version.map > index d322a4a888..7bcbfd12cf 100644 > --- a/lib/vhost/version.map > +++ b/lib/vhost/version.map > @@ -64,6 +64,12 @@ DPDK_23 { > local: *; > }; > > +DPDK_24 { > + global: > + > + rte_vhost_driver_callback_register; > +} DPDK_23; > + > EXPERIMENTAL { > global: > > @@ -98,6 +104,9 @@ EXPERIMENTAL { > # added in 22.11 > rte_vhost_async_dma_unconfigure; > rte_vhost_vring_call_nonblock; > + > + # added in 23.07 > + rte_vhost_notify_guest; > }; > > INTERNAL { > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > index 8ff6434c93..79e88f986e 100644 > --- a/lib/vhost/vhost.c > +++ b/lib/vhost/vhost.c > @@ -44,6 +44,10 @@ static const struct vhost_vq_stats_name_off vhost_vq_s= tat_strings[] =3D { > {"size_1024_1518_packets", offsetof(struct vhost_virtqueue, stats= .size_bins[6])}, > {"size_1519_max_packets", offsetof(struct vhost_virtqueue, stats= .size_bins[7])}, > {"guest_notifications", offsetof(struct vhost_virtqueue, stats= .guest_notifications)}, > + {"guest_notifications_offloaded", offsetof(struct vhost_virtqueue= , > + stats.guest_notifications_offloaded)}, > + {"guest_notifications_error", offsetof(struct vhost_virtqueue, > + stats.guest_notifications_error)}, > {"iotlb_hits", offsetof(struct vhost_virtqueue, stats= .iotlb_hits)}, > {"iotlb_misses", offsetof(struct vhost_virtqueue, stats= .iotlb_misses)}, > {"inflight_submitted", offsetof(struct vhost_virtqueue, stats= .inflight_submitted)}, > @@ -1467,6 +1471,40 @@ rte_vhost_enable_guest_notification(int vid, uint1= 6_t queue_id, int enable) > return ret; > } > > +void > +rte_vhost_notify_guest(int vid, uint16_t queue_id) > +{ > + struct virtio_net *dev =3D get_device(vid); > + struct vhost_virtqueue *vq; > + > + if (!dev || queue_id >=3D VHOST_MAX_VRING) > + return; > + > + vq =3D dev->virtqueue[queue_id]; > + if (!vq) > + return; > + > + rte_rwlock_read_lock(&vq->access_lock); > + > + if (vq->callfd >=3D 0) { > + int ret =3D eventfd_write(vq->callfd, (eventfd_t)1); > + > + if (ret) { > + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) > + __atomic_fetch_add(&vq->stats.guest_notif= ications_error, > + 1, __ATOMIC_RELAXED); > + } else { > + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) > + __atomic_fetch_add(&vq->stats.guest_notif= ications, > + 1, __ATOMIC_RELAXED); > + if (dev->notify_ops->guest_notified) > + dev->notify_ops->guest_notified(dev->vid)= ; > + } > + } > + > + rte_rwlock_read_unlock(&vq->access_lock); > +} > + > void > rte_vhost_log_write(int vid, uint64_t addr, uint64_t len) > { > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h > index 37609c7c8d..0ab75b78b5 100644 > --- a/lib/vhost/vhost.h > +++ b/lib/vhost/vhost.h > @@ -141,6 +141,8 @@ struct virtqueue_stats { > uint64_t inflight_completed; > /* Counters below are atomic, and should be incremented as such. = */ > uint64_t guest_notifications; > + uint64_t guest_notifications_offloaded; > + uint64_t guest_notifications_error; > }; > > /** > @@ -884,6 +886,35 @@ vhost_need_event(uint16_t event_idx, uint16_t new_id= x, uint16_t old) > return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx -= old); > } > > +static __rte_always_inline void > +vhost_vring_inject_irq(struct virtio_net *dev, struct vhost_virtqueue *v= q) > +{ > + int ret; > + > + if (dev->notify_ops->guest_notify && > + dev->notify_ops->guest_notify(dev->vid, vq->index)) { > + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) > + __atomic_fetch_add(&vq->stats.guest_notifications= _offloaded, > + 1, __ATOMIC_RELAXED); > + return; > + } > + > + ret =3D eventfd_write(vq->callfd, (eventfd_t) 1); > + if (ret) { > + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) > + __atomic_fetch_add(&vq->stats.guest_notifications= _error, > + 1, __ATOMIC_RELAXED); > + return; > + } > + > + if (dev->flags & VIRTIO_DEV_STATS_ENABLED) > + __atomic_fetch_add(&vq->stats.guest_notifications, > + 1, __ATOMIC_RELAXED); > + if (dev->notify_ops->guest_notified) > + dev->notify_ops->guest_notified(dev->vid); > +} > + > + One empty line is enough. > static __rte_always_inline void > vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *v= q) > { > @@ -903,26 +934,16 @@ vhost_vring_call_split(struct virtio_net *dev, stru= ct vhost_virtqueue *vq) > "%s: used_event_idx=3D%d, old=3D%d, new=3D%d\n", > __func__, vhost_used_event(vq), old, new); > > - if ((vhost_need_event(vhost_used_event(vq), new, old) && > - (vq->callfd >=3D 0)) || > - unlikely(!signalled_used_valid)) { > - eventfd_write(vq->callfd, (eventfd_t) 1); > - if (dev->flags & VIRTIO_DEV_STATS_ENABLED) > - __atomic_fetch_add(&vq->stats.guest_notif= ications, > - 1, __ATOMIC_RELAXED); > - if (dev->notify_ops->guest_notified) > - dev->notify_ops->guest_notified(dev->vid)= ; > + if ((vhost_need_event(vhost_used_event(vq), new, old) || > + unlikely(!signalled_used_valid)) && A bit hard to read (even before your change). After the change, indentation does not comply with dpdk coding style. http://doc.dpdk.org/guides/contributing/coding_style.html#c-indentation But putting indentation aside, is this change equivalent? - if ((vhost_need_event(vhost_used_event(vq), new, old) && - (vq->callfd >=3D 0)) || - unlikely(!signalled_used_valid)) { + if ((vhost_need_event(vhost_used_event(vq), new, old) || + unlikely(!signalled_used_valid)) && + vq->callfd >=3D 0) { > + vhost_vring_inject_irq(dev, vq); > } > } else { > /* Kick the guest if necessary. */ > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) > && (vq->callfd >=3D 0)) { > - eventfd_write(vq->callfd, (eventfd_t)1); > - if (dev->flags & VIRTIO_DEV_STATS_ENABLED) > - __atomic_fetch_add(&vq->stats.guest_notif= ications, > - 1, __ATOMIC_RELAXED); > - if (dev->notify_ops->guest_notified) > - dev->notify_ops->guest_notified(dev->vid)= ; > + vhost_vring_inject_irq(dev, vq); > } > } > } > @@ -974,11 +995,8 @@ vhost_vring_call_packed(struct virtio_net *dev, stru= ct vhost_virtqueue *vq) > if (vhost_need_event(off, new, old)) > kick =3D true; > kick: > - if (kick) { > - eventfd_write(vq->callfd, (eventfd_t)1); > - if (dev->notify_ops->guest_notified) > - dev->notify_ops->guest_notified(dev->vid); > - } > + if (kick && vq->callfd >=3D 0) > + vhost_vring_inject_irq(dev, vq); > } > > static __rte_always_inline void > @@ -1017,4 +1035,11 @@ mbuf_is_consumed(struct rte_mbuf *m) > > uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr); > void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignmen= t); > + > +/* Versioned functions */ > +int rte_vhost_driver_callback_register_v23(const char *path, > + struct rte_vhost_device_ops const * const ops); > +int rte_vhost_driver_callback_register_v24(const char *path, > + struct rte_vhost_device_ops const * const ops); > + > #endif /* _VHOST_NET_CDEV_H_ */ > --=20 David Marchand