DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Li,Lin(ACG Cloud)" <lilin24@baidu.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
	Li Lin <chuckylinchuckylin@gmail.com>,
	"tiwei.bie@intel.com" <tiwei.bie@intel.com>,
	"zhihong.wang@intel.com" <zhihong.wang@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"dariusz.stojaczyk@intel.com" <dariusz.stojaczyk@intel.com>,
	"changpeng.liu@intel.com" <changpeng.liu@intel.com>,
	"james.r.harris@intel.com" <james.r.harris@intel.com>,
	"Xie,Yongji" <xieyongji@baidu.com>, "Ni,Xun" <nixun@baidu.com>,
	"Zhang,Yu(ACG Cloud)" <zhangyu31@baidu.com>
Subject: [dpdk-dev] 答复: [PATCH v4] vhost: support inflight share memory protocol feature
Date: Wed, 22 May 2019 11:04:43 +0000	[thread overview]
Message-ID: <cb8f2b80285e4ce88a615f461bc69c82@baidu.com> (raw)
In-Reply-To: <8ba5c1ff-08ea-6cb2-c201-b972e79da4fc@redhat.com>



> -----邮件原件-----
> 发件人: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> 发送时间: 2019年5月17日 23:47
> 收件人: Li Lin <chuckylinchuckylin@gmail.com>; tiwei.bie@intel.com;
> zhihong.wang@intel.com
> 抄送: dev@dpdk.org; dariusz.stojaczyk@intel.com; changpeng.liu@intel.com;
> james.r.harris@intel.com; Xie,Yongji <xieyongji@baidu.com>; Li,Lin(ACG Cloud)
> <lilin24@baidu.com>; Ni,Xun <nixun@baidu.com>; Zhang,Yu(ACG Cloud)
> <zhangyu31@baidu.com>
> 主题: Re: [PATCH v4] vhost: support inflight share memory protocol feature
> 
> 
> 
> On 5/5/19 11:02 AM, Li Lin wrote:
> > From: Li Lin <lilin24@baidu.com>
> >
> > This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
> and
> > VHOST_USER_SET_INFLIGHT_FD to support transferring a shared buffer
> > between qemu and backend.
> >
> > Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the shared buffer
> > from backend. Then qemu should send it back through
> > VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user.
> >
> > This shared buffer is used to process inflight I/O when backend
> > reconnect.
> >
> > Signed-off-by: Li Lin <lilin24@baidu.com>
> > Signed-off-by: Ni Xun <nixun@baidu.com>
> > Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> > ---
> > v2 - add set&clr inflight desc functions
> > v3 - simplified some functions implementation
> > v4 - rework some data structures according to qemu doc
> >
> >   lib/librte_vhost/rte_vhost.h           |  83 ++++++++++
> >   lib/librte_vhost/rte_vhost_version.map |   3 +
> >   lib/librte_vhost/vhost.c               | 105 ++++++++++++
> >   lib/librte_vhost/vhost.h               |  12 ++
> >   lib/librte_vhost/vhost_user.c          | 292
> +++++++++++++++++++++++++++++++++
> >   lib/librte_vhost/vhost_user.h          |  13 +-
> >   6 files changed, 507 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_vhost/rte_vhost.h
> > b/lib/librte_vhost/rte_vhost.h index d2c0c93f4..d4d709717 100644
> > --- a/lib/librte_vhost/rte_vhost.h
> > +++ b/lib/librte_vhost/rte_vhost.h
> > @@ -71,6 +71,10 @@ extern "C" {
> >   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
> >   #endif
> >
> > +#ifndef VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
> > +#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 #endif
> > +
> >   /** Indicate whether protocol features negotiation is supported. */
> >   #ifndef VHOST_USER_F_PROTOCOL_FEATURES
> >   #define VHOST_USER_F_PROTOCOL_FEATURES	30
> > @@ -98,12 +102,41 @@ struct rte_vhost_memory {
> >   	struct rte_vhost_mem_region regions[];
> >   };
> >
> > +struct inflight_desc {
> > +	uint8_t inflight;
> > +	uint8_t padding[5];
> > +	uint16_t next;
> > +	uint64_t counter;
> > +};
> > +
> > +struct inflight_info {
> > +	uint64_t features;
> > +	uint16_t version;
> > +	uint16_t desc_num;
> > +	uint16_t last_inflight_io;
> > +	uint16_t used_idx;
> > +	struct inflight_desc desc[0];
> > +};
> > +
> > +struct resubmit_desc {
> > +	uint16_t index;
> > +	uint64_t counter;
> > +};
> > +
> > +struct resubmit_info {
> > +	struct resubmit_desc *resubmit_list;
> > +	uint16_t resubmit_num;
> > +};
> > +
> >   struct rte_vhost_vring {
> >   	struct vring_desc	*desc;
> >   	struct vring_avail	*avail;
> >   	struct vring_used	*used;
> >   	uint64_t		log_guest_addr;
> >
> > +	struct inflight_info	*inflight;
> > +	struct resubmit_info	*resubmit;
> > +
> >   	/** Deprecated, use rte_vhost_vring_call() instead. */
> >   	int			callfd;
> >
> > @@ -632,6 +665,56 @@ int rte_vhost_get_vhost_vring(int vid, uint16_t
> vring_idx,
> >   int rte_vhost_vring_call(int vid, uint16_t vring_idx);
> >
> >   /**
> > + * set inflight flag for a desc.
> > + *
> > + * @param vid
> > + *  vhost device ID
> > + * @param vring_idx
> > + *  vring index
> > + * @param idx
> > + *  inflight entry index
> > + * @return
> > + *  0 on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vhost_set_inflight_desc(int vid, uint16_t vring_idx,
> > +		uint16_t idx);
> > +
> > +/**
> > + * clear inflight flag for a desc.
> > + *
> > + * @param vid
> > + *  vhost device ID
> > + * @param vring_idx
> > + *  vring index
> > + * @param last_used_idx
> > + *  next free used_idx
> > + * @param idx
> > + *  inflight entry index
> > + * @return
> > + *  0 on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vhost_clr_inflight_desc(int vid, uint16_t vring_idx,
> > +		uint16_t last_used_idx, uint16_t idx);
> > +
> > +/**
> > + * set last inflight io index.
> > + *
> > + * @param vid
> > + *  vhost device ID
> > + * @param vring_idx
> > + *  vring index
> > + * @param idx
> > + *  inflight entry index
> > + * @return
> > + *  0 on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vhost_set_last_inflight_io(int vid, uint16_t vring_idx,
> > +		uint16_t idx);
> > +
> > +/**
> >    * Get vhost RX queue avail count.
> >    *
> >    * @param vid
> > diff --git a/lib/librte_vhost/rte_vhost_version.map
> > b/lib/librte_vhost/rte_vhost_version.map
> > index 5f1d4a75c..a992b3935 100644
> > --- a/lib/librte_vhost/rte_vhost_version.map
> > +++ b/lib/librte_vhost/rte_vhost_version.map
> > @@ -87,4 +87,7 @@ EXPERIMENTAL {
> >   	rte_vdpa_relay_vring_used;
> >   	rte_vhost_extern_callback_register;
> >   	rte_vhost_driver_set_protocol_features;
> > +	rte_vhost_set_inflight_desc;
> > +	rte_vhost_clr_inflight_desc;
> > +	rte_vhost_set_last_inflight_io;
> >   };
> > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index
> > 163f4595e..03eab3551 100644
> > --- a/lib/librte_vhost/vhost.c
> > +++ b/lib/librte_vhost/vhost.c
> > @@ -76,6 +76,16 @@ cleanup_vq(struct vhost_virtqueue *vq, int destroy)
> >   		close(vq->callfd);
> >   	if (vq->kickfd >= 0)
> >   		close(vq->kickfd);
> > +	if (vq->inflight)
> > +		vq->inflight = NULL;
> > +	if (vq->resubmit->resubmit_list) {
> > +		free(vq->resubmit->resubmit_list);
> > +		vq->resubmit->resubmit_list = NULL;
> > +	}
> > +	if (vq->resubmit) {
> > +		free(vq->resubmit);
> > +		vq->resubmit = NULL;
> > +	}
> >   }
> >
> >   /*
> > @@ -589,6 +599,9 @@ rte_vhost_get_vhost_vring(int vid, uint16_t
> vring_idx,
> >   	vring->kickfd  = vq->kickfd;
> >   	vring->size    = vq->size;
> >
> > +	vring->inflight = vq->inflight;
> > +	vring->resubmit = vq->resubmit;
> > +
> >   	return 0;
> >   }
> >
> > @@ -617,6 +630,98 @@ rte_vhost_vring_call(int vid, uint16_t vring_idx)
> >   	return 0;
> >   }
> >
> > +int
> > +rte_vhost_set_inflight_desc(int vid, uint16_t vring_idx, uint16_t
> > +idx) {
> > +	struct virtio_net *dev;
> > +	struct vhost_virtqueue *vq;
> > +
> > +	dev = get_device(vid);
> > +	if (!dev)
> > +		return -1;
> > +
> > +	if (!(dev->features &
> > +		(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
> > +		return 0;
> 
> Shouldn't -1 be returned here also?
> I think this function should be called only if the feature negotiation succeeded.
> 
> Returning an error here would help the calling application to know something
> went wrong.
> 
> Same comment applies for the clear function.

If the feature is not set,set & clr function returns directly.
They are similar to empty functions.
The return value is - 1, which represents an error in function execution
and unexpected behavior.

> > +
> > +	if (vring_idx >= VHOST_MAX_VRING)
> > +		return -1;
> > +
> > +	vq = dev->virtqueue[vring_idx];
> > +	if (!vq)
> > +		return -1;
> > +
> > +	if (unlikely(!vq->inflight))
> > +		return -1;
> > +
> > +	vq->inflight->desc[idx].counter = vq->counter++;
> > +	vq->inflight->desc[idx].inflight = 1;
> > +	return 0;
> > +}
> > +
> > +int
> > +rte_vhost_clr_inflight_desc(int vid, uint16_t vring_idx,
> > +	uint16_t last_used_idx, uint16_t idx) {
> > +	struct virtio_net *dev;
> > +	struct vhost_virtqueue *vq;
> > +
> > +	dev = get_device(vid);
> > +	if (!dev)
> > +		return -1;
> > +
> > +	if (!(dev->features &
> > +		(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
> > +		return 0;
> > +
> > +	if (vring_idx >= VHOST_MAX_VRING)
> > +		return -1;
> > +
> > +	vq = dev->virtqueue[vring_idx];
> > +	if (!vq)
> > +		return -1;
> > +
> > +	if (unlikely(!vq->inflight))
> > +		return -1;
> > +
> > +	rte_compiler_barrier();
> > +
> > +	vq->inflight->desc[idx].inflight = 0;
> > +
> > +	rte_compiler_barrier();
> > +
> > +	vq->inflight->used_idx = last_used_idx;
> > +	return 0;
> > +}
> > +
> > +int
> > +rte_vhost_set_last_inflight_io(int vid, uint16_t vring_idx, uint16_t
> > +idx) {
> > +	struct virtio_net *dev;
> > +	struct vhost_virtqueue *vq;
> > +
> > +	dev = get_device(vid);
> > +	if (!dev)
> > +		return -1;
> > +
> > +	if (!(dev->features &
> > +		(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD)))
> > +		return 0;
> > +
> > +	if (vring_idx >= VHOST_MAX_VRING)
> > +		return -1;
> > +
> > +	vq = dev->virtqueue[vring_idx];
> > +	if (!vq)
> > +		return -1;
> > +
> > +	if (unlikely(!vq->inflight))
> > +		return -1;
> > +
> > +	vq->inflight->last_inflight_io = idx;
> > +	return 0;
> > +}
> 
> Are above functions supposed to be called in the hot path?
> If so, you might want to use unlikely for the error checks everywhere.

You're right. I'll modify it in the next version.

> 
> > +
> >   uint16_t
> >   rte_vhost_avail_entries(int vid, uint16_t queue_id)
> >   {
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index
> > e9138dfab..3dace2ec3 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -128,6 +128,11 @@ struct vhost_virtqueue {
> >   	/* Physical address of used ring, for logging */
> >   	uint64_t		log_guest_addr;
> >
> > +	/* Inflight share memory info */
> > +	struct inflight_info	*inflight;
> > +	struct resubmit_info	*resubmit;
> > +	uint64_t counter;
> 
> counter and resubmit are too generic names.

I'll modify the names in the next version.

> 
> > +
> >   	uint16_t		nr_zmbuf;
> >   	uint16_t		zmbuf_size;
> >   	uint16_t		last_zmbuf_idx;
> > @@ -286,6 +291,12 @@ struct guest_page {
> >   	uint64_t size;
> >   };
> >
> > +struct inflight_mem_info {
> > +	int fd;
> > +	void *addr;
> > +	uint64_t size;
> > +};
> > +
> >   /**
> >    * Device structure contains all configuration information relating
> >    * to the device.
> > @@ -303,6 +314,7 @@ struct virtio_net {
> >   	uint32_t		nr_vring;
> >   	int			dequeue_zero_copy;
> >   	struct vhost_virtqueue	*virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
> > +	struct inflight_mem_info inflight_info;
> >   #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> >   	char			ifname[IF_NAME_SZ];
> >   	uint64_t		log_size;
> 
> Do you have some code example using these new APIs?
> It would help for reviewing the patch.

This patch is only valid for vhost blk. 
The use of new two messages and the implementation of packed queue are
described in docs/vhost-user.txt.

I have submitted SPDK-related patches to use new API.

> 
> Thanks,
> Maxime
> 


  parent reply	other threads:[~2019-05-22 11:04 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-25  7:56 [dpdk-dev] [PATCH] [resend] " Li Lin
2019-04-25  7:56 ` Li Lin
2019-04-25 10:56 ` lin li
2019-04-25 10:56   ` lin li
2019-04-25 11:10 ` [dpdk-dev] [PATCH v2] " Li Lin
2019-04-25 11:10   ` Li Lin
2019-04-26  9:40   ` [dpdk-dev] [PATCH v3] " Li Lin
2019-04-26  9:40     ` Li Lin
2019-04-28 11:17     ` Tiwei Bie
2019-04-28 11:17       ` Tiwei Bie
2019-04-29  4:07       ` lin li
2019-04-29  4:07         ` lin li
2019-04-29 10:54         ` Tiwei Bie
2019-04-29 10:54           ` Tiwei Bie
2019-04-30  8:37           ` lin li
2019-04-30  8:37             ` lin li
2019-05-05  9:02     ` [dpdk-dev] [PATCH v4] " Li Lin
2019-05-05  9:02       ` Li Lin
2019-05-17 15:47       ` Maxime Coquelin
2019-05-20  2:13         ` Tiwei Bie
2019-05-21  8:29           ` lin li
2019-05-21  8:46             ` Tiwei Bie
2019-05-22 10:15               ` [dpdk-dev] 答复: " Li,Lin(ACG Cloud)
2019-05-22 11:04         ` Li,Lin(ACG Cloud) [this message]
2019-06-12  8:07           ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb8f2b80285e4ce88a615f461bc69c82@baidu.com \
    --to=lilin24@baidu.com \
    --cc=changpeng.liu@intel.com \
    --cc=chuckylinchuckylin@gmail.com \
    --cc=dariusz.stojaczyk@intel.com \
    --cc=dev@dpdk.org \
    --cc=james.r.harris@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=nixun@baidu.com \
    --cc=tiwei.bie@intel.com \
    --cc=xieyongji@baidu.com \
    --cc=zhangyu31@baidu.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).