From: "Li,Lin(ACG Cloud)" <lilin24@baidu.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
Li Lin <chuckylinchuckylin@gmail.com>,
"tiwei.bie@intel.com" <tiwei.bie@intel.com>,
"zhihong.wang@intel.com" <zhihong.wang@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"dariusz.stojaczyk@intel.com" <dariusz.stojaczyk@intel.com>,
"changpeng.liu@intel.com" <changpeng.liu@intel.com>,
"james.r.harris@intel.com" <james.r.harris@intel.com>,
"Xie,Yongji" <xieyongji@baidu.com>, "Ni,Xun" <nixun@baidu.com>,
"Zhang,Yu(ACG Cloud)" <zhangyu31@baidu.com>
Subject: [dpdk-dev] 答复: [PATCH v4] vhost: support inflight share memory protocol feature
Date: Wed, 22 May 2019 11:04:43 +0000 [thread overview]
Message-ID: <cb8f2b80285e4ce88a615f461bc69c82@baidu.com> (raw)
In-Reply-To: <8ba5c1ff-08ea-6cb2-c201-b972e79da4fc@redhat.com>
> -----邮件原件-----
> 发件人: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> 发送时间: 2019年5月17日 23:47
> 收件人: Li Lin <chuckylinchuckylin@gmail.com>; tiwei.bie@intel.com;
> zhihong.wang@intel.com
> 抄送: dev@dpdk.org; dariusz.stojaczyk@intel.com; changpeng.liu@intel.com;
> james.r.harris@intel.com; Xie,Yongji <xieyongji@baidu.com>; Li,Lin(ACG Cloud)
> <lilin24@baidu.com>; Ni,Xun <nixun@baidu.com>; Zhang,Yu(ACG Cloud)
> <zhangyu31@baidu.com>
> 主题: Re: [PATCH v4] vhost: support inflight share memory protocol feature
>
>
>
> On 5/5/19 11:02 AM, Li Lin wrote:
> > From: Li Lin <lilin24@baidu.com>
> >
> > This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
> and
> > VHOST_USER_SET_INFLIGHT_FD to support transferring a shared buffer
> > between qemu and backend.
> >
> > Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the shared buffer
> > from backend. Then qemu should send it back through
> > VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user.
> >
> > This shared buffer is used to process inflight I/O when backend
> > reconnect.
> >
> > Signed-off-by: Li Lin <lilin24@baidu.com>
> > Signed-off-by: Ni Xun <nixun@baidu.com>
> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > ---
> > v2 - add set&clr inflight desc functions
> > v3 - simplified some functions implementation
> > v4 - rework some data structures according to qemu doc
> >
> > lib/librte_vhost/rte_vhost.h | 83 ++++++++++
> > lib/librte_vhost/rte_vhost_version.map | 3 +
> > lib/librte_vhost/vhost.c | 105 ++++++++++++
> > lib/librte_vhost/vhost.h | 12 ++
> > lib/librte_vhost/vhost_user.c | 292
> +++++++++++++++++++++++++++++++++
> > lib/librte_vhost/vhost_user.h | 13 +-
> > 6 files changed, 507 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_vhost/rte_vhost.h
> > b/lib/librte_vhost/rte_vhost.h index d2c0c93f4..d4d709717 100644
> > --- a/lib/librte_vhost/rte_vhost.h
> > +++ b/lib/librte_vhost/rte_vhost.h
> > @@ -71,6 +71,10 @@ extern "C" {
> > #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
> > #endif
> >
> > +#ifndef VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
> > +#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 #endif
> > +
> > /** Indicate whether protocol features negotiation is supported. */
> > #ifndef VHOST_USER_F_PROTOCOL_FEATURES
> > #define VHOST_USER_F_PROTOCOL_FEATURES 30
> > @@ -98,12 +102,41 @@ struct rte_vhost_memory {
> > struct rte_vhost_mem_region regions[];
> > };
> >
> > +struct inflight_desc {
> > + uint8_t inflight;
> > + uint8_t padding[5];
> > + uint16_t next;
> > + uint64_t counter;
> > +};
> > +
> > +struct inflight_info {
> > + uint64_t features;
> > + uint16_t version;
> > + uint16_t desc_num;
> > + uint16_t last_inflight_io;
> > + uint16_t used_idx;
> > + struct inflight_desc desc[0];
> > +};
> > +
> > +struct resubmit_desc {
> > + uint16_t index;
> > + uint64_t counter;
> > +};
> > +
> > +struct resubmit_info {
> > + struct resubmit_desc *resubmit_list;
> > + uint16_t resubmit_num;
> > +};
> > +
> > struct rte_vhost_vring {
> > struct vring_desc *desc;
> > struct vring_avail *avail;
> > struct vring_used *used;
> > uint64_t log_guest_addr;
> >
> > + struct inflight_info *inflight;
> > + struct resubmit_info *resubmit;
> > +
> > /** Deprecated, use rte_vhost_vring_call() instead. */
> > int callfd;
> >
> > @@ -632,6 +665,56 @@ int rte_vhost_get_vhost_vring(int vid, uint16_t
> vring_idx,
> > int rte_vhost_vring_call(int vid, uint16_t vring_idx);
> >
> > /**
> > + * set inflight flag for a desc.
> > + *
> > + * @param vid
> > + * vhost device ID
> > + * @param vring_idx
> > + * vring index
> > + * @param idx
> > + * inflight entry index
> > + * @return
> > + * 0 on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vhost_set_inflight_desc(int vid, uint16_t vring_idx,
> > + uint16_t idx);
> > +
> > +/**
> > + * clear inflight flag for a desc.
> > + *
> > + * @param vid
> > + * vhost device ID
> > + * @param vring_idx
> > + * vring index
> > + * @param last_used_idx
> > + * next free used_idx
> > + * @param idx
> > + * inflight entry index
> > + * @return
> > + * 0 on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vhost_clr_inflight_desc(int vid, uint16_t vring_idx,
> > + uint16_t last_used_idx, uint16_t idx);
> > +
> > +/**
> > + * set last inflight io index.
> > + *
> > + * @param vid
> > + * vhost device ID
> > + * @param vring_idx
> > + * vring index
> > + * @param idx
> > + * inflight entry index
> > + * @return
> > + * 0 on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vhost_set_last_inflight_io(int vid, uint16_t vring_idx,
> > + uint16_t idx);
> > +
> > +/**
> > * Get vhost RX queue avail count.
> > *
> > * @param vid
> > diff --git a/lib/librte_vhost/rte_vhost_version.map
> > b/lib/librte_vhost/rte_vhost_version.map
> > index 5f1d4a75c..a992b3935 100644
> > --- a/lib/librte_vhost/rte_vhost_version.map
> > +++ b/lib/librte_vhost/rte_vhost_version.map
> > @@ -87,4 +87,7 @@ EXPERIMENTAL {
> > rte_vdpa_relay_vring_used;
> > rte_vhost_extern_callback_register;
> > rte_vhost_driver_set_protocol_features;
> > + rte_vhost_set_inflight_desc;
> > + rte_vhost_clr_inflight_desc;
> > + rte_vhost_set_last_inflight_io;
> > };
> > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index
> > 163f4595e..03eab3551 100644
> > --- a/lib/librte_vhost/vhost.c
> > +++ b/lib/librte_vhost/vhost.c
> > @@ -76,6 +76,16 @@ cleanup_vq(struct vhost_virtqueue *vq, int destroy)
> > close(vq->callfd);
> > if (vq->kickfd >= 0)
> > close(vq->kickfd);
> > + if (vq->inflight)
> > + vq->inflight = NULL;
> > + if (vq->resubmit->resubmit_list) {
> > + free(vq->resubmit->resubmit_list);
> > + vq->resubmit->resubmit_list = NULL;
> > + }
> > + if (vq->resubmit) {
> > + free(vq->resubmit);
> > + vq->resubmit = NULL;
> > + }
> > }
> >
> > /*
> > @@ -589,6 +599,9 @@ rte_vhost_get_vhost_vring(int vid, uint16_t
> vring_idx,
> > vring->kickfd = vq->kickfd;
> > vring->size = vq->size;
> >
> > + vring->inflight = vq->inflight;
> > + vring->resubmit = vq->resubmit;
> > +
> > return 0;
> > }
> >
> > @@ -617,6 +630,98 @@ rte_vhost_vring_call(int vid, uint16_t vring_idx)
> > return 0;
> > }
> >
> > +int
> > +rte_vhost_set_inflight_desc(int vid, uint16_t vring_idx, uint16_t
> > +idx) {
> > + struct virtio_net *dev;
> > + struct vhost_virtqueue *vq;
> > +
> > + dev = get_device(vid);
> > + if (!dev)
> > + return -1;
> > +
> > + if (!(dev->features &
> > + (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
> > + return 0;
>
> Shouldn't -1 be returned here also?
> I think this function should be called only if the feature negotiation succeeded.
>
> Returning an error here would help the calling application to know something
> went wrong.
>
> Same comment applies for the clear function.
If the feature is not set,set & clr function returns directly.
They are similar to empty functions.
The return value is - 1, which represents an error in function execution
and unexpected behavior.
> > +
> > + if (vring_idx >= VHOST_MAX_VRING)
> > + return -1;
> > +
> > + vq = dev->virtqueue[vring_idx];
> > + if (!vq)
> > + return -1;
> > +
> > + if (unlikely(!vq->inflight))
> > + return -1;
> > +
> > + vq->inflight->desc[idx].counter = vq->counter++;
> > + vq->inflight->desc[idx].inflight = 1;
> > + return 0;
> > +}
> > +
> > +int
> > +rte_vhost_clr_inflight_desc(int vid, uint16_t vring_idx,
> > + uint16_t last_used_idx, uint16_t idx) {
> > + struct virtio_net *dev;
> > + struct vhost_virtqueue *vq;
> > +
> > + dev = get_device(vid);
> > + if (!dev)
> > + return -1;
> > +
> > + if (!(dev->features &
> > + (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
> > + return 0;
> > +
> > + if (vring_idx >= VHOST_MAX_VRING)
> > + return -1;
> > +
> > + vq = dev->virtqueue[vring_idx];
> > + if (!vq)
> > + return -1;
> > +
> > + if (unlikely(!vq->inflight))
> > + return -1;
> > +
> > + rte_compiler_barrier();
> > +
> > + vq->inflight->desc[idx].inflight = 0;
> > +
> > + rte_compiler_barrier();
> > +
> > + vq->inflight->used_idx = last_used_idx;
> > + return 0;
> > +}
> > +
> > +int
> > +rte_vhost_set_last_inflight_io(int vid, uint16_t vring_idx, uint16_t
> > +idx) {
> > + struct virtio_net *dev;
> > + struct vhost_virtqueue *vq;
> > +
> > + dev = get_device(vid);
> > + if (!dev)
> > + return -1;
> > +
> > + if (!(dev->features &
> > + (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
> > + return 0;
> > +
> > + if (vring_idx >= VHOST_MAX_VRING)
> > + return -1;
> > +
> > + vq = dev->virtqueue[vring_idx];
> > + if (!vq)
> > + return -1;
> > +
> > + if (unlikely(!vq->inflight))
> > + return -1;
> > +
> > + vq->inflight->last_inflight_io = idx;
> > + return 0;
> > +}
>
> Are above functions supposed to be called in the hot path?
> If so, you might want to use unlikely for the error checks everywhere.
You're right. I'll modify it in the next version.
>
> > +
> > uint16_t
> > rte_vhost_avail_entries(int vid, uint16_t queue_id)
> > {
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index
> > e9138dfab..3dace2ec3 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -128,6 +128,11 @@ struct vhost_virtqueue {
> > /* Physical address of used ring, for logging */
> > uint64_t log_guest_addr;
> >
> > + /* Inflight share memory info */
> > + struct inflight_info *inflight;
> > + struct resubmit_info *resubmit;
> > + uint64_t counter;
>
> counter and resubmit are too generic names.
I'll modify the names in the next version.
>
> > +
> > uint16_t nr_zmbuf;
> > uint16_t zmbuf_size;
> > uint16_t last_zmbuf_idx;
> > @@ -286,6 +291,12 @@ struct guest_page {
> > uint64_t size;
> > };
> >
> > +struct inflight_mem_info {
> > + int fd;
> > + void *addr;
> > + uint64_t size;
> > +};
> > +
> > /**
> > * Device structure contains all configuration information relating
> > * to the device.
> > @@ -303,6 +314,7 @@ struct virtio_net {
> > uint32_t nr_vring;
> > int dequeue_zero_copy;
> > struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
> > + struct inflight_mem_info inflight_info;
> > #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> > char ifname[IF_NAME_SZ];
> > uint64_t log_size;
>
> Do you have some code example using these new APIs?
> It would help for reviewing the patch.
This patch is only valid for vhost blk.
The use of new two messages and the implementation of packed queue are
described in docs/vhost-user.txt.
I have submitted SPDK-related patches to use new API.
>
> Thanks,
> Maxime
>
next prev parent reply other threads:[~2019-05-22 11:04 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-25 7:56 [dpdk-dev] [PATCH] [resend] " Li Lin
2019-04-25 7:56 ` Li Lin
2019-04-25 10:56 ` lin li
2019-04-25 10:56 ` lin li
2019-04-25 11:10 ` [dpdk-dev] [PATCH v2] " Li Lin
2019-04-25 11:10 ` Li Lin
2019-04-26 9:40 ` [dpdk-dev] [PATCH v3] " Li Lin
2019-04-26 9:40 ` Li Lin
2019-04-28 11:17 ` Tiwei Bie
2019-04-28 11:17 ` Tiwei Bie
2019-04-29 4:07 ` lin li
2019-04-29 4:07 ` lin li
2019-04-29 10:54 ` Tiwei Bie
2019-04-29 10:54 ` Tiwei Bie
2019-04-30 8:37 ` lin li
2019-04-30 8:37 ` lin li
2019-05-05 9:02 ` [dpdk-dev] [PATCH v4] " Li Lin
2019-05-05 9:02 ` Li Lin
2019-05-17 15:47 ` Maxime Coquelin
2019-05-20 2:13 ` Tiwei Bie
2019-05-21 8:29 ` lin li
2019-05-21 8:46 ` Tiwei Bie
2019-05-22 10:15 ` [dpdk-dev] 答复: " Li,Lin(ACG Cloud)
2019-05-22 11:04 ` Li,Lin(ACG Cloud) [this message]
2019-06-12 8:07 ` Maxime Coquelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb8f2b80285e4ce88a615f461bc69c82@baidu.com \
--to=lilin24@baidu.com \
--cc=changpeng.liu@intel.com \
--cc=chuckylinchuckylin@gmail.com \
--cc=dariusz.stojaczyk@intel.com \
--cc=dev@dpdk.org \
--cc=james.r.harris@intel.com \
--cc=maxime.coquelin@redhat.com \
--cc=nixun@baidu.com \
--cc=tiwei.bie@intel.com \
--cc=xieyongji@baidu.com \
--cc=zhangyu31@baidu.com \
--cc=zhihong.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).