From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 0D8D847D2 for ; Fri, 23 Sep 2016 10:28:27 +0200 (CEST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4A09BC0567A0; Fri, 23 Sep 2016 08:28:26 +0000 (UTC) Received: from max-t460s.redhat.com (vpn1-7-3.ams2.redhat.com [10.36.7.3]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8N8SNkS024207; Fri, 23 Sep 2016 04:28:24 -0400 From: Maxime Coquelin To: yuanhan.liu@linux.intel.com, huawei.xie@intel.com, dev@dpdk.org Cc: vkaplans@redhat.com, mst@redhat.com, stephen@networkplumber.org Date: Fri, 23 Sep 2016 10:28:23 +0200 Message-Id: <1474619303-16709-1-git-send-email-maxime.coquelin@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Fri, 23 Sep 2016 08:28:26 +0000 (UTC) Subject: [dpdk-dev] [PATCH v3] vhost: Add indirect descriptors support to the TX path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Sep 2016 08:28:27 -0000 Indirect descriptors are usually supported by virtio-net devices, allowing to dispatch a larger number of requests. When the virtio device sends a packet using indirect descriptors, only one slot is used in the ring, even for large packets. The main effect is to improve the 0% packet loss benchmark. A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd (fwd io for host, macswap for VM) on DUT shows a +50% gain for zero loss. On the downside, micro-benchmark using testpmd txonly in VM and rxonly on host shows a loss between 1 and 4%.i But depending on the needs, feature can be disabled at VM boot time by passing indirect_desc=off argument to vhost-user device in Qemu. Signed-off-by: Maxime Coquelin --- Changes since v2: ================= - Revert back to not checking feature flag to be aligned with kernel implementation - Ensure we don't have nested indirect descriptors - Ensure the indirect desc address is valid, to protect against malicious guests Changes since RFC: ================= - Enrich commit message with figures - Rebased on top of dpdk-next-virtio's master - Add feature check to ensure we don't receive an indirect desc if not supported by the virtio driver lib/librte_vhost/vhost.c | 3 ++- lib/librte_vhost/virtio_net.c | 41 +++++++++++++++++++++++++++++++---------- 2 files changed, 33 insertions(+), 11 deletions(-) diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 46095c3..30bb0ce 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -65,7 +65,8 @@ (1ULL << VIRTIO_NET_F_CSUM) | \ (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \ (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \ - (1ULL << VIRTIO_NET_F_GUEST_TSO6)) + (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \ + (1ULL << VIRTIO_RING_F_INDIRECT_DESC)) uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 8a151af..2e0a587 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -679,8 +679,8 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac) } static inline int __attribute__((always_inline)) -copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, - struct rte_mbuf *m, uint16_t desc_idx, +copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs, + uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx, struct rte_mempool *mbuf_pool) { struct vring_desc *desc; @@ -693,8 +693,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, /* A counter to avoid desc dead loop chain */ uint32_t nr_desc = 1; - desc = &vq->desc[desc_idx]; - if (unlikely(desc->len < dev->vhost_hlen)) + desc = &descs[desc_idx]; + if (unlikely((desc->len < dev->vhost_hlen)) || + (desc->flags & VRING_DESC_F_INDIRECT)) return -1; desc_addr = gpa_to_vva(dev, desc->addr); @@ -711,7 +712,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, */ if (likely((desc->len == dev->vhost_hlen) && (desc->flags & VRING_DESC_F_NEXT) != 0)) { - desc = &vq->desc[desc->next]; + desc = &descs[desc->next]; + if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) + return -1; desc_addr = gpa_to_vva(dev, desc->addr); if (unlikely(!desc_addr)) @@ -747,10 +750,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, if ((desc->flags & VRING_DESC_F_NEXT) == 0) break; - if (unlikely(desc->next >= vq->size || - ++nr_desc > vq->size)) + if (unlikely(desc->next >= max_desc || + ++nr_desc > max_desc)) + return -1; + desc = &descs[desc->next]; + if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) return -1; - desc = &vq->desc[desc->next]; desc_addr = gpa_to_vva(dev, desc->addr); if (unlikely(!desc_addr)) @@ -878,19 +883,35 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, /* Prefetch descriptor index. */ rte_prefetch0(&vq->desc[desc_indexes[0]]); for (i = 0; i < count; i++) { + struct vring_desc *desc; + uint16_t sz, idx; int err; if (likely(i + 1 < count)) rte_prefetch0(&vq->desc[desc_indexes[i + 1]]); + if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) { + desc = (struct vring_desc *)gpa_to_vva(dev, + vq->desc[desc_indexes[i]].addr); + if (unlikely(!desc)) + break; + + rte_prefetch0(desc); + sz = vq->desc[desc_indexes[i]].len / sizeof(*desc); + idx = 0; + } else { + desc = vq->desc; + sz = vq->size; + idx = desc_indexes[i]; + } + pkts[i] = rte_pktmbuf_alloc(mbuf_pool); if (unlikely(pkts[i] == NULL)) { RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); break; } - err = copy_desc_to_mbuf(dev, vq, pkts[i], desc_indexes[i], - mbuf_pool); + err = copy_desc_to_mbuf(dev, desc, sz, pkts[i], idx, mbuf_pool); if (unlikely(err)) { rte_pktmbuf_free(pkts[i]); break; -- 2.7.4