From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id D2F0368F8 for ; Fri, 23 Sep 2016 20:02:31 +0200 (CEST) Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 26D51D8966; Fri, 23 Sep 2016 18:02:31 +0000 (UTC) Received: from [10.36.7.3] (vpn1-7-3.ams2.redhat.com [10.36.7.3]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8NI2RPF016406 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 23 Sep 2016 14:02:29 -0400 To: "Michael S. Tsirkin" References: <1474619303-16709-1-git-send-email-maxime.coquelin@redhat.com> <20160923184310-mutt-send-email-mst@kernel.org> Cc: yuanhan.liu@linux.intel.com, huawei.xie@intel.com, dev@dpdk.org, vkaplans@redhat.com, stephen@networkplumber.org From: Maxime Coquelin Message-ID: Date: Fri, 23 Sep 2016 20:02:27 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160923184310-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 23 Sep 2016 18:02:31 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v3] vhost: Add indirect descriptors support to the TX path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Sep 2016 18:02:32 -0000 On 09/23/2016 05:49 PM, Michael S. Tsirkin wrote: > On Fri, Sep 23, 2016 at 10:28:23AM +0200, Maxime Coquelin wrote: >> Indirect descriptors are usually supported by virtio-net devices, >> allowing to dispatch a larger number of requests. >> >> When the virtio device sends a packet using indirect descriptors, >> only one slot is used in the ring, even for large packets. >> >> The main effect is to improve the 0% packet loss benchmark. >> A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd >> (fwd io for host, macswap for VM) on DUT shows a +50% gain for >> zero loss. >> >> On the downside, micro-benchmark using testpmd txonly in VM and >> rxonly on host shows a loss between 1 and 4%.i But depending on >> the needs, feature can be disabled at VM boot time by passing >> indirect_desc=off argument to vhost-user device in Qemu. > > Even better, change guest pmd to only use indirect > descriptors when this makes sense (e.g. sufficiently > large packets). With the micro-benchmark, the degradation is quite constant whatever the packet size. For PVP, I could not test with larger packets than 64 bytes, as I don't have a 40G interface, and line rate with 10G is reached rapidly. > I would be very interested to know when does it make > sense. > > The feature is there, it's up to guest whether to > use it. Do you mean the PMD should detect dynamically whether using indirect, or having an option at device init time to enable or not the feature? > > >> Signed-off-by: Maxime Coquelin >> --- >> Changes since v2: >> ================= >> - Revert back to not checking feature flag to be aligned with >> kernel implementation >> - Ensure we don't have nested indirect descriptors >> - Ensure the indirect desc address is valid, to protect against >> malicious guests >> >> Changes since RFC: >> ================= >> - Enrich commit message with figures >> - Rebased on top of dpdk-next-virtio's master >> - Add feature check to ensure we don't receive an indirect desc >> if not supported by the virtio driver >> >> lib/librte_vhost/vhost.c | 3 ++- >> lib/librte_vhost/virtio_net.c | 41 +++++++++++++++++++++++++++++++---------- >> 2 files changed, 33 insertions(+), 11 deletions(-) >> >> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c >> index 46095c3..30bb0ce 100644 >> --- a/lib/librte_vhost/vhost.c >> +++ b/lib/librte_vhost/vhost.c >> @@ -65,7 +65,8 @@ >> (1ULL << VIRTIO_NET_F_CSUM) | \ >> (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \ >> (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \ >> - (1ULL << VIRTIO_NET_F_GUEST_TSO6)) >> + (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \ >> + (1ULL << VIRTIO_RING_F_INDIRECT_DESC)) >> >> uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; >> >> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c >> index 8a151af..2e0a587 100644 >> --- a/lib/librte_vhost/virtio_net.c >> +++ b/lib/librte_vhost/virtio_net.c >> @@ -679,8 +679,8 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac) >> } >> >> static inline int __attribute__((always_inline)) >> -copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >> - struct rte_mbuf *m, uint16_t desc_idx, >> +copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs, >> + uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx, >> struct rte_mempool *mbuf_pool) >> { >> struct vring_desc *desc; >> @@ -693,8 +693,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >> /* A counter to avoid desc dead loop chain */ >> uint32_t nr_desc = 1; >> >> - desc = &vq->desc[desc_idx]; >> - if (unlikely(desc->len < dev->vhost_hlen)) >> + desc = &descs[desc_idx]; >> + if (unlikely((desc->len < dev->vhost_hlen)) || >> + (desc->flags & VRING_DESC_F_INDIRECT)) >> return -1; >> >> desc_addr = gpa_to_vva(dev, desc->addr); >> @@ -711,7 +712,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >> */ >> if (likely((desc->len == dev->vhost_hlen) && >> (desc->flags & VRING_DESC_F_NEXT) != 0)) { >> - desc = &vq->desc[desc->next]; >> + desc = &descs[desc->next]; >> + if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) >> + return -1; >> >> desc_addr = gpa_to_vva(dev, desc->addr); >> if (unlikely(!desc_addr)) > > > Just to make sure, does this still allow a chain of > direct descriptors ending with an indirect one? > This is legal as per spec. > >> @@ -747,10 +750,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >> if ((desc->flags & VRING_DESC_F_NEXT) == 0) >> break; >> >> - if (unlikely(desc->next >= vq->size || >> - ++nr_desc > vq->size)) >> + if (unlikely(desc->next >= max_desc || >> + ++nr_desc > max_desc)) >> + return -1; >> + desc = &descs[desc->next]; >> + if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) >> return -1; >> - desc = &vq->desc[desc->next]; >> >> desc_addr = gpa_to_vva(dev, desc->addr); >> if (unlikely(!desc_addr)) >> @@ -878,19 +883,35 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, >> /* Prefetch descriptor index. */ >> rte_prefetch0(&vq->desc[desc_indexes[0]]); >> for (i = 0; i < count; i++) { >> + struct vring_desc *desc; >> + uint16_t sz, idx; >> int err; >> >> if (likely(i + 1 < count)) >> rte_prefetch0(&vq->desc[desc_indexes[i + 1]]); >> >> + if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) { >> + desc = (struct vring_desc *)gpa_to_vva(dev, >> + vq->desc[desc_indexes[i]].addr); >> + if (unlikely(!desc)) >> + break; >> + >> + rte_prefetch0(desc); >> + sz = vq->desc[desc_indexes[i]].len / sizeof(*desc); >> + idx = 0; >> + } else { >> + desc = vq->desc; >> + sz = vq->size; >> + idx = desc_indexes[i]; >> + } >> + >> pkts[i] = rte_pktmbuf_alloc(mbuf_pool); >> if (unlikely(pkts[i] == NULL)) { >> RTE_LOG(ERR, VHOST_DATA, >> "Failed to allocate memory for mbuf.\n"); >> break; >> } >> - err = copy_desc_to_mbuf(dev, vq, pkts[i], desc_indexes[i], >> - mbuf_pool); >> + err = copy_desc_to_mbuf(dev, desc, sz, pkts[i], idx, mbuf_pool); >> if (unlikely(err)) { >> rte_pktmbuf_free(pkts[i]); >> break; >> -- >> 2.7.4 > > Something that I'm missing here: it's legal for guest > to add indirect descriptors for RX. > I don't see the handling of RX here though. > I think it's required for spec compliance. >