From: "Michael S. Tsirkin" <mst@redhat.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: yuanhan.liu@linux.intel.com, huawei.xie@intel.com, dev@dpdk.org,
vkaplans@redhat.com, stephen@networkplumber.org
Subject: Re: [dpdk-dev] [PATCH v3] vhost: Add indirect descriptors support to the TX path
Date: Fri, 23 Sep 2016 18:49:18 +0300 [thread overview]
Message-ID: <20160923184310-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1474619303-16709-1-git-send-email-maxime.coquelin@redhat.com>
On Fri, Sep 23, 2016 at 10:28:23AM +0200, Maxime Coquelin wrote:
> Indirect descriptors are usually supported by virtio-net devices,
> allowing to dispatch a larger number of requests.
>
> When the virtio device sends a packet using indirect descriptors,
> only one slot is used in the ring, even for large packets.
>
> The main effect is to improve the 0% packet loss benchmark.
> A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
> (fwd io for host, macswap for VM) on DUT shows a +50% gain for
> zero loss.
>
> On the downside, micro-benchmark using testpmd txonly in VM and
> rxonly on host shows a loss between 1 and 4%.i But depending on
> the needs, feature can be disabled at VM boot time by passing
> indirect_desc=off argument to vhost-user device in Qemu.
Even better, change guest pmd to only use indirect
descriptors when this makes sense (e.g. sufficiently
large packets).
I would be very interested to know when does it make
sense.
The feature is there, it's up to guest whether to
use it.
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Changes since v2:
> =================
> - Revert back to not checking feature flag to be aligned with
> kernel implementation
> - Ensure we don't have nested indirect descriptors
> - Ensure the indirect desc address is valid, to protect against
> malicious guests
>
> Changes since RFC:
> =================
> - Enrich commit message with figures
> - Rebased on top of dpdk-next-virtio's master
> - Add feature check to ensure we don't receive an indirect desc
> if not supported by the virtio driver
>
> lib/librte_vhost/vhost.c | 3 ++-
> lib/librte_vhost/virtio_net.c | 41 +++++++++++++++++++++++++++++++----------
> 2 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 46095c3..30bb0ce 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -65,7 +65,8 @@
> (1ULL << VIRTIO_NET_F_CSUM) | \
> (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
> (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> - (1ULL << VIRTIO_NET_F_GUEST_TSO6))
> + (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
> + (1ULL << VIRTIO_RING_F_INDIRECT_DESC))
>
> uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
>
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 8a151af..2e0a587 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -679,8 +679,8 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac)
> }
>
> static inline int __attribute__((always_inline))
> -copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> - struct rte_mbuf *m, uint16_t desc_idx,
> +copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> + uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> struct rte_mempool *mbuf_pool)
> {
> struct vring_desc *desc;
> @@ -693,8 +693,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> /* A counter to avoid desc dead loop chain */
> uint32_t nr_desc = 1;
>
> - desc = &vq->desc[desc_idx];
> - if (unlikely(desc->len < dev->vhost_hlen))
> + desc = &descs[desc_idx];
> + if (unlikely((desc->len < dev->vhost_hlen)) ||
> + (desc->flags & VRING_DESC_F_INDIRECT))
> return -1;
>
> desc_addr = gpa_to_vva(dev, desc->addr);
> @@ -711,7 +712,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> */
> if (likely((desc->len == dev->vhost_hlen) &&
> (desc->flags & VRING_DESC_F_NEXT) != 0)) {
> - desc = &vq->desc[desc->next];
> + desc = &descs[desc->next];
> + if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> + return -1;
>
> desc_addr = gpa_to_vva(dev, desc->addr);
> if (unlikely(!desc_addr))
Just to make sure, does this still allow a chain of
direct descriptors ending with an indirect one?
This is legal as per spec.
> @@ -747,10 +750,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> if ((desc->flags & VRING_DESC_F_NEXT) == 0)
> break;
>
> - if (unlikely(desc->next >= vq->size ||
> - ++nr_desc > vq->size))
> + if (unlikely(desc->next >= max_desc ||
> + ++nr_desc > max_desc))
> + return -1;
> + desc = &descs[desc->next];
> + if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> return -1;
> - desc = &vq->desc[desc->next];
>
> desc_addr = gpa_to_vva(dev, desc->addr);
> if (unlikely(!desc_addr))
> @@ -878,19 +883,35 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> /* Prefetch descriptor index. */
> rte_prefetch0(&vq->desc[desc_indexes[0]]);
> for (i = 0; i < count; i++) {
> + struct vring_desc *desc;
> + uint16_t sz, idx;
> int err;
>
> if (likely(i + 1 < count))
> rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
>
> + if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) {
> + desc = (struct vring_desc *)gpa_to_vva(dev,
> + vq->desc[desc_indexes[i]].addr);
> + if (unlikely(!desc))
> + break;
> +
> + rte_prefetch0(desc);
> + sz = vq->desc[desc_indexes[i]].len / sizeof(*desc);
> + idx = 0;
> + } else {
> + desc = vq->desc;
> + sz = vq->size;
> + idx = desc_indexes[i];
> + }
> +
> pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
> if (unlikely(pkts[i] == NULL)) {
> RTE_LOG(ERR, VHOST_DATA,
> "Failed to allocate memory for mbuf.\n");
> break;
> }
> - err = copy_desc_to_mbuf(dev, vq, pkts[i], desc_indexes[i],
> - mbuf_pool);
> + err = copy_desc_to_mbuf(dev, desc, sz, pkts[i], idx, mbuf_pool);
> if (unlikely(err)) {
> rte_pktmbuf_free(pkts[i]);
> break;
> --
> 2.7.4
Something that I'm missing here: it's legal for guest
to add indirect descriptors for RX.
I don't see the handling of RX here though.
I think it's required for spec compliance.
--
MST
next prev parent reply other threads:[~2016-09-23 15:49 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-23 8:28 Maxime Coquelin
2016-09-23 15:49 ` Michael S. Tsirkin [this message]
2016-09-23 18:02 ` Maxime Coquelin
2016-09-23 18:06 ` Michael S. Tsirkin
2016-09-23 18:16 ` Maxime Coquelin
2016-09-23 18:22 ` Michael S. Tsirkin
2016-09-23 20:24 ` Stephen Hemminger
2016-09-26 3:03 ` Yuanhan Liu
2016-09-26 12:25 ` Michael S. Tsirkin
2016-09-26 13:04 ` Yuanhan Liu
2016-09-27 4:15 ` Yuanhan Liu
2016-09-27 7:25 ` Maxime Coquelin
2016-09-27 8:42 ` [dpdk-dev] [PATCH v4] " Maxime Coquelin
2016-09-27 12:18 ` Yuanhan Liu
2016-10-14 7:24 ` Wang, Zhihong
2016-10-14 7:34 ` Wang, Zhihong
2016-10-14 15:50 ` Maxime Coquelin
2016-10-17 11:23 ` Maxime Coquelin
2016-10-17 13:21 ` Yuanhan Liu
2016-10-17 14:14 ` Maxime Coquelin
2016-10-27 9:00 ` Wang, Zhihong
2016-10-27 9:10 ` Maxime Coquelin
2016-10-27 9:55 ` Maxime Coquelin
2016-10-27 10:19 ` Wang, Zhihong
2016-10-28 7:32 ` Pierre Pfister (ppfister)
2016-10-28 7:58 ` Maxime Coquelin
2016-11-01 8:15 ` Yuanhan Liu
2016-11-01 9:39 ` Thomas Monjalon
2016-11-02 2:44 ` Yuanhan Liu
2016-10-27 10:33 ` Yuanhan Liu
2016-10-27 10:35 ` Maxime Coquelin
2016-10-27 10:46 ` Yuanhan Liu
2016-10-28 0:49 ` Wang, Zhihong
2016-10-28 7:42 ` Maxime Coquelin
2016-10-31 10:01 ` Wang, Zhihong
2016-11-02 10:51 ` Maxime Coquelin
2016-11-03 8:11 ` Maxime Coquelin
2016-11-04 6:18 ` Xu, Qian Q
2016-11-04 7:41 ` Maxime Coquelin
2016-11-04 7:20 ` Wang, Zhihong
2016-11-04 7:57 ` Maxime Coquelin
2016-11-04 7:59 ` Maxime Coquelin
2016-11-04 10:43 ` Wang, Zhihong
2016-11-04 11:22 ` Maxime Coquelin
2016-11-04 11:36 ` Yuanhan Liu
2016-11-04 11:39 ` Maxime Coquelin
2016-11-04 12:30 ` Wang, Zhihong
2016-11-04 12:54 ` Maxime Coquelin
2016-11-04 13:09 ` Wang, Zhihong
2016-11-08 10:51 ` Wang, Zhihong
2016-10-27 10:53 ` Maxime Coquelin
2016-10-28 6:05 ` Xu, Qian Q
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160923184310-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=dev@dpdk.org \
--cc=huawei.xie@intel.com \
--cc=maxime.coquelin@redhat.com \
--cc=stephen@networkplumber.org \
--cc=vkaplans@redhat.com \
--cc=yuanhan.liu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).