DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Stojaczyk, DariuszX" <dariuszx.stojaczyk@intel.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
	"dev@dpdk.org" <dev@dpdk.org>, "Bie, Tiwei" <tiwei.bie@intel.com>,
	Tetsuya Mukawa <mtetsuyah@gmail.com>,
	Thomas Monjalon <thomas@monjalon.net>
Cc: "yliu@fridaylinux.org" <yliu@fridaylinux.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"Harris, James R" <james.r.harris@intel.com>
Subject: Re: [dpdk-dev] [RFC v2] vhost: new rte_vhost API proposal
Date: Tue, 22 May 2018 10:19:53 +0000	[thread overview]
Message-ID: <FBE7E039FA50BF47A673AD0BD3CD56A8449A61D1@HASMSX105.ger.corp.intel.com> (raw)
In-Reply-To: <27ce772e-9f01-dff9-1f82-b99924efa950@redhat.com>

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Friday, May 18, 2018 9:51 PM
> On 05/18/2018 03:01 PM, Dariusz Stojaczyk wrote:
> > rte_vhost is not vhost-user spec compliant. Some Vhost drivers have
> > been already confirmed not to work with rte_vhost. virtio-user-scsi-pci
> > in QEMU 2.12 doesn't fully initialize its management queues at SeaBIOS
> > stage. This is perfectly fine from the Vhost-user spec perspective, but
> > doesn't meet rte_vhost expectations. rte_vhost waits for all queues
> > to be fully initialized before it allows the entire device to be
> > processed. qFixing rte_vhost directly would require quite a big amount
> > of changes, which would completely break backwards compatibility.
> >
> > This rte_vhost2 library is intended to smooth out the transition.
> > It exposes a low-level API for implementing new Vhost-user slaves.
> > The existing rte_vhost is about to be refactored to use rte_vhost2
> > library underneath, and demanding backends could now use rte_vhost2
> > directly.
> 
> I like the idea, and the proposed way to smooth the transition.
> 
> I will certainly have other comments later, but please find below
> the ones I have for the moment.
> 
> > <snip>
> > +
> > +/**
> > + * Registers a new vhost target accepting remote connections. Multiple
> > + * available transports are available. It is possible to create a Vhost-
> user
> > + * Unix domain socket polling local connections or connect to a
> physical
> > + * Virtio device and install an interrupt handler .
> > + *
> > + * This function is thread-safe.
> > + *
> > + * \param trtype type of the transport used, e.g. "vhost-user",
> > + * "PCI-vhost-user", "PCI-vDPA".
> > + * \param trid identifier of the device. For PCI this would be the BDF
> address,
> > + * for vhost-user the socket name.
> > + * \param trflags additional options for the specified transport
> > + * \param trctx additional data for the specified transport. Can be
> NULL.
> > + * \param tgt_ops callbacks to be called upon reaching specific
> initialization
> > + * states.
> > + * \param features supported Virtio features. To be negotiated with
> the
> > + * driver ones. rte_vhost2 will append a couple of generic feature bits
> > + * which are required by the Virtio spec. TODO list these features here
> > + * \return 0 on success, negative errno otherwise
> > + */
> > +int rte_vhost2_tgt_register(const char *trtype, const char *trid,
> > +			    uint64_t trflags, void *trctx,
> > +			    struct rte_vhost2_tgt_ops *tgt_ops,
> > +			    uint64_t features);
> 
> Couldn't the register API also pass the vdev?
> Doing this, the backend could have rte_vhost2_dev in its device
> struct.

Please notice the register API is for registering targets, not devices. A single Vhost-user server target can spawn multiple devices - one for each connection. I know the nomenclature is different from rte_vhost, but since each connection uses its own (virt)queues it makes sense to call things this way.

Initially I thought about adding some rte_vhost2_tgt struct declaration that register function would return, but later on came to a conclusion that it would only complicate things for the library user. A parent struct that would keep rte_vhost2_tgt* needs to contain `const char *trtype` and `const char *trid` anyway, so it's just easier to use these two strings for target identification. 

> > <snip>
> > +/**
> > + * Bypass VIRTIO_F_IOMMU_PLATFORM and translate gpa directly.
> > + *
> > + * This function is thread-safe.
> > + *
> > + * \param mem vhost device memory
> > + * \param gpa guest physical address
> > + * \param len length of the memory to translate (in bytes). If
> requested
> > + * memory chunk crosses memory region boundary, the *len will be
> set to
> > + * the remaining, maximum length of virtually contiguous memory. In
> such
> > + * case the user will be required to call another gpa_to_vva(gpa +
> *len).
> > + * \return vhost virtual address or NULL if requested `gpa` is not
> mapped.
> > + */
> > +static inline void *
> > +rte_vhost2_gpa_to_vva(struct rte_vhost2_memory *mem, uint64_t
> gpa, uint64_t *len)
> > +{
> > +	struct rte_vhost2_mem_region *r;
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < mem->nregions; i++) {
> > +		r = &mem->regions[i];
> > +		if (gpa >= r->guest_phys_addr &&
> > +		    gpa <  r->guest_phys_addr + r->size) {
> > +
> > +			if (unlikely(*len > r->guest_phys_addr + r->size -
> gpa)) {
> > +				*len = r->guest_phys_addr + r->size - gpa;
> > +			}
> > +
> > +			return gpa - r->guest_phys_addr + r-
> >host_user_addr;
> > +		}
> > +	}
> > +	*len = 0;
> > +
> > +	return 0;
> > +}
> 
> Maybe we could take the opportunity to only have
> rte_vhost2_iova_to_vva.

Good idea; will remove it in v3.

Thanks,
D.

  parent reply	other threads:[~2018-05-22 10:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-10 13:22 [dpdk-dev] [RFC] " Dariusz Stojaczyk
     [not found] ` <20180510163643.GD9308@stefanha-x1.localdomain>
2018-05-11  5:55   ` Stojaczyk, DariuszX
     [not found]     ` <20180511100531.GA19894@stefanha-x1.localdomain>
2018-05-18  7:51       ` Stojaczyk, DariuszX
2018-05-18 13:01 ` [dpdk-dev] [RFC v2] " Dariusz Stojaczyk
2018-05-18 13:50   ` Maxime Coquelin
2018-05-20  7:07     ` Yuanhan Liu
2018-05-22 10:19     ` Stojaczyk, DariuszX [this message]
     [not found]   ` <20180525100550.GD14757@stefanha-x1.localdomain>
2018-05-29 13:38     ` Stojaczyk, DariuszX
     [not found]       ` <20180530085700.GC14623@stefanha-x1.localdomain>
2018-05-30 12:24         ` Stojaczyk, DariuszX
     [not found]   ` <20180607151227.23660-1-darek.stojaczyk@gmail.com>
     [not found]     ` <20180608100852.GA31164@stefanha-x1.localdomain>
2018-06-13  9:41       ` [dpdk-dev] [RFC v3 0/7] vhost2: new librte_vhost2 proposal Dariusz Stojaczyk
2018-06-25 11:01     ` Tiwei Bie
2018-06-25 12:17       ` Stojaczyk, DariuszX
2018-06-26  8:22         ` Tiwei Bie
2018-06-26  8:30           ` Thomas Monjalon
2018-06-26  8:47           ` Stojaczyk, DariuszX
2018-06-26  9:14             ` Tiwei Bie
2018-06-26  9:38               ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FBE7E039FA50BF47A673AD0BD3CD56A8449A61D1@HASMSX105.ger.corp.intel.com \
    --to=dariuszx.stojaczyk@intel.com \
    --cc=dev@dpdk.org \
    --cc=james.r.harris@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mtetsuyah@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=thomas@monjalon.net \
    --cc=tiwei.bie@intel.com \
    --cc=yliu@fridaylinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).