From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 0B5C64284B;
	Mon, 27 Mar 2023 16:10:37 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 93A2C40EE1;
	Mon, 27 Mar 2023 16:10:36 +0200 (CEST)
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by mails.dpdk.org (Postfix) with ESMTP id DE42340ED8
 for <dev@dpdk.org>; Mon, 27 Mar 2023 16:10:34 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1679926234;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=Pk0zuN4XHgNPFXk0X6FeoDT+Nit9qL+MhEZTFUSfZmY=;
 b=fU/1udTJqOS04RTxxhVMS5DGeKFuB1kBLs5tByDn2ZOqWtNsr++HCTtGvrNmPzyNxWD6ul
 EitD4lk49PG0WrS4tjuyMguwRwiiCqnqQNyWiVgC/MPMsi/ZA7WbH1HzqkOzhQRdeSQKOG
 z+a0jGtuaXR8oU/ClSxIdR4zXmWUbKg=
Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com
 [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-57-wCn5o03PO--c5WMfX18DkQ-1; Mon, 27 Mar 2023 10:10:32 -0400
X-MC-Unique: wCn5o03PO--c5WMfX18DkQ-1
Received: by mail-ed1-f72.google.com with SMTP id
 m18-20020a50d7d2000000b00501dfd867a4so12936774edj.20
 for <dev@dpdk.org>; Mon, 27 Mar 2023 07:10:32 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112; t=1679926231;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=l/ckQ9bJQHGzLHn+gqdPqCttkoKMzlE38GXKUrSQzdA=;
 b=cA5C7xFwYn5IWI1YJ80OmRTIAP/uTHTlqo32XIWUJ7KK8FwecDbYaA4k0Jf/8m/eV0
 1Yotsqk4sMbc7KCyFohZLo+0DKn2KQ9ZH9ezBLp36MULoBMFyVlPSTmZjQKrfaFWpkme
 XQpVo3A5ymnBG5sXgiEC8haz+LnrD6rgaoTU2vh441XwwyibU1BERe7Br/YtzQyfyutl
 NLH12As/4d8ROKi2kLE9hcsTi8NBrCz71lxQYivMiJmZKzWx+UvK0Sc5i23d4RkB/9r+
 5yAhM2q5Eqg5NKWAcdx0bSR339b/Dj1jgb31cIOt9r1Sjxf4oyFxrIpH+BDHPN+oRA2B
 MGUA==
X-Gm-Message-State: AAQBX9f9PSpWGp7LOxOl3aGknRtsJadaJ5fAh8K8ks3jD5GZSfvWjiX2
 e2b0HBmtOCASzhasO/I3zr8Qq2ZEPRFogdnyfEDew7PFwmvziEwnE91nd9RKIbx6U+B8QHMU6XJ
 pLes=
X-Received: by 2002:a17:906:5850:b0:935:3028:ff58 with SMTP id
 h16-20020a170906585000b009353028ff58mr12080372ejs.55.1679926231749; 
 Mon, 27 Mar 2023 07:10:31 -0700 (PDT)
X-Google-Smtp-Source: AKy350atwA40UyeEUoWZSV1aacX6oQdlhj3jletiW4DXcZMQLymxxv5+eNXoxloTE8qays/4NFJ38A==
X-Received: by 2002:a17:906:5850:b0:935:3028:ff58 with SMTP id
 h16-20020a170906585000b009353028ff58mr12080334ejs.55.1679926231277; 
 Mon, 27 Mar 2023 07:10:31 -0700 (PDT)
Received: from [10.39.192.159] (5920ab7b.static.cust.trined.nl.
 [89.32.171.123]) by smtp.gmail.com with ESMTPSA id
 bb14-20020a1709070a0e00b0093f322187f0sm3399833ejc.189.2023.03.27.07.10.30
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 27 Mar 2023 07:10:30 -0700 (PDT)
From: Eelco Chaudron <echaudro@redhat.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: chenbo.xia@intel.com, dev@dpdk.org,
 David Marchand <david.marchand@redhat.com>
Subject: Re: [PATCH] vhost: add device op to offload the interrupt kick
Date: Mon, 27 Mar 2023 16:10:29 +0200
X-Mailer: MailMate (1.14r5950)
Message-ID: <223DB816-D925-4014-AF9F-2D8E6F0B5F79@redhat.com>
In-Reply-To: <09508737-d567-444f-9efa-87b06726e336@redhat.com>
References: <167992139724.45323.17979512439014217881.stgit@ebuild.local>
 <09508737-d567-444f-9efa-87b06726e336@redhat.com>
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org



On 27 Mar 2023, at 15:21, Maxime Coquelin wrote:

> Hi Eelco,
>
> On 3/27/23 14:51, Eelco Chaudron wrote:
>> This patch adds an operation callback which gets called every time the
>> library wants to call eventfd_write(). This eventfd_write() call could
>> result in a system call, which could potentially block the PMD thread.
>>
>> The callback function can decide whether it's ok to handle the
>> eventfd_write() now or have the newly introduced function,
>> rte_vhost_notify_guest(), called at a later time.
>>
>> This can be used by 3rd party applications, like OVS, to avoid system
>> calls being called as part of the PMD threads.
>
> That's a good idea, please find some comments inline:

Thanks for the review, see inline.
>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>>   lib/vhost/rte_vhost.h |   10 +++++++++-
>>   lib/vhost/vhost.c     |   21 +++++++++++++++++++++
>>   lib/vhost/vhost.h     |   43 ++++++++++++++++++++++++++++-------------=
--
>>   3 files changed, 58 insertions(+), 16 deletions(-)
>>
>> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
>> index 58a5d4be92..af7a394d0f 100644
>> --- a/lib/vhost/rte_vhost.h
>> +++ b/lib/vhost/rte_vhost.h
>> @@ -298,7 +298,13 @@ struct rte_vhost_device_ops {
>>   =09 */
>>   =09void (*guest_notified)(int vid);
>>  -=09void *reserved[1]; /**< Reserved for future extension */
>> +=09/**
>> +=09 * If this callback is registered, notification to the guest can
>> +=09 * be handled by the front-end calling rte_vhost_notify_guest().
>> +=09 * If it's not handled, 'false' should be returned. This can be used
>> +=09 * to remove the "slow" eventfd_write() syscall from the datapath.
>> +=09 */
>> +=09bool (*guest_notify)(int vid, uint16_t queue_id);
>>   };
>>    /**
>> @@ -433,6 +439,8 @@ void rte_vhost_log_used_vring(int vid, uint16_t vrin=
g_idx,
>>    int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, i=
nt enable);
>>  +void rte_vhost_notify_guest(int vid, uint16_t queue_id);
>
> The new API needs to be tagged as experimental, and also documented. (I
> see rte_vhost_enable_guest_notification is not properly documented, so
> not a good example!)

I used exactly that as an example, so thought it should be good ;)
Will add this in the next revision...

>> +
>>   /**
>>    * Register vhost driver. path could be different for multiple
>>    * instance support.
>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
>> index ef37943817..ee090d78ef 100644
>> --- a/lib/vhost/vhost.c
>> +++ b/lib/vhost/vhost.c
>> @@ -1467,6 +1467,27 @@ rte_vhost_enable_guest_notification(int vid, uint=
16_t queue_id, int enable)
>>   =09return ret;
>>   }
>>  +void
>> +rte_vhost_notify_guest(int vid, uint16_t queue_id)
>> +{
>> +=09struct virtio_net *dev =3D get_device(vid);
>> +=09struct vhost_virtqueue *vq;
>> +
>> +=09if (!dev ||  queue_id >=3D VHOST_MAX_VRING)
>> +=09=09return;
>> +
>> +=09vq =3D dev->virtqueue[queue_id];
>> +=09if (!vq)
>> +=09=09return;
>> +
>> +=09rte_spinlock_lock(&vq->access_lock);
>> +
>> +=09if (vq->callfd >=3D 0)
>> +=09=09eventfd_write(vq->callfd, (eventfd_t)1);
>
> Maybe we should return an error of callfd is invalid or eventfd_write()
> failed.

See below

>> +
>> +=09rte_spinlock_unlock(&vq->access_lock);
>> +}
>> +
>>   void
>>   rte_vhost_log_write(int vid, uint64_t addr, uint64_t len)
>>   {
>> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
>> index 8fdab13c70..39ad8260a1 100644
>> --- a/lib/vhost/vhost.h
>> +++ b/lib/vhost/vhost.h
>> @@ -883,6 +883,30 @@ vhost_need_event(uint16_t event_idx, uint16_t new_i=
dx, uint16_t old)
>>   =09return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - o=
ld);
>>   }
>>  +static __rte_always_inline void
>> +vhost_vring_kick_guest(struct virtio_net *dev, struct vhost_virtqueue *=
vq)
>> +{
>> +=09if (dev->notify_ops->guest_notify) {
>> +=09=09uint16_t qid;
>> +=09=09for (qid =3D 0;  qid < dev->nr_vring; qid++) {
>> +=09=09=09if (dev->virtqueue[qid] =3D=3D vq) {
>> +=09=09=09=09if (dev->notify_ops->guest_notify(dev->vid,
>> +=09=09=09=09=09=09=09=09  qid))
>> +=09=09=09=09=09goto done;
>> +=09=09=09=09break;
>> +=09=09=09}
>> +=09=09}
>
> Since v22.11, you no more need to iterate through the port's virtqueues,
> David introduced index field in vhost_virtuqueue in v22.11 (57e414e3ec29
> ("vhost: keep a reference to virtqueue index")).

Thanks will change this, I did most of this a while back and did not re-che=
ck.

>> +=09}
>> +=09eventfd_write(vq->callfd, (eventfd_t) 1);
>> +
>> +done:
>> +=09if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
>> +=09=09vq->stats.guest_notifications++;
>> +=09if (dev->notify_ops->guest_notified)
>> +=09=09dev->notify_ops->guest_notified(dev->vid);
>> +}
>
> FYI, I have done almost the same refactoring in the VDUSE series I will
> soon send:
>
> https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/7089a8a6d1e89b9db9=
0547eb5b4fef886f24fd5b
>
> My version also introduces a new counter in case the notification
> failed. Maybe we could also have a counter in case the notification has
> been "offloaded" to the application?

What would be the best approach!? I can wait to send a new rev until your p=
atch is out.

>> +
>> +
>>   static __rte_always_inline void
>>   vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue =
*vq)
>>   {
>> @@ -905,21 +929,13 @@ vhost_vring_call_split(struct virtio_net *dev, str=
uct vhost_virtqueue *vq)
>>   =09=09if ((vhost_need_event(vhost_used_event(vq), new, old) &&
>>   =09=09=09=09=09(vq->callfd >=3D 0)) ||
>>   =09=09=09=09unlikely(!signalled_used_valid)) {
>> -=09=09=09eventfd_write(vq->callfd, (eventfd_t) 1);
>> -=09=09=09if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
>> -=09=09=09=09vq->stats.guest_notifications++;
>> -=09=09=09if (dev->notify_ops->guest_notified)
>> -=09=09=09=09dev->notify_ops->guest_notified(dev->vid);
>> +=09=09=09vhost_vring_kick_guest(dev, vq);
>>   =09=09}
>>   =09} else {
>>   =09=09/* Kick the guest if necessary. */
>>   =09=09if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
>>   =09=09=09=09&& (vq->callfd >=3D 0)) {
>> -=09=09=09eventfd_write(vq->callfd, (eventfd_t)1);
>> -=09=09=09if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
>> -=09=09=09=09vq->stats.guest_notifications++;
>> -=09=09=09if (dev->notify_ops->guest_notified)
>> -=09=09=09=09dev->notify_ops->guest_notified(dev->vid);
>> +=09=09=09vhost_vring_kick_guest(dev, vq);
>>   =09=09}
>>   =09}
>>   }
>> @@ -971,11 +987,8 @@ vhost_vring_call_packed(struct virtio_net *dev, str=
uct vhost_virtqueue *vq)
>>   =09if (vhost_need_event(off, new, old))
>>   =09=09kick =3D true;
>>   kick:
>> -=09if (kick) {
>> -=09=09eventfd_write(vq->callfd, (eventfd_t)1);
>> -=09=09if (dev->notify_ops->guest_notified)
>> -=09=09=09dev->notify_ops->guest_notified(dev->vid);
>> -=09}
>> +=09if (kick)
>> +=09=09vhost_vring_kick_guest(dev, vq);
>>   }
>>    static __rte_always_inline void
>>