From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 08167231C for ; Fri, 23 Sep 2016 20:16:14 +0200 (CEST) Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 205F6148DE; Fri, 23 Sep 2016 18:16:13 +0000 (UTC) Received: from [10.36.7.3] (vpn1-7-3.ams2.redhat.com [10.36.7.3]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8NIG9JB017558 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 23 Sep 2016 14:16:11 -0400 To: "Michael S. Tsirkin" References: <1474619303-16709-1-git-send-email-maxime.coquelin@redhat.com> <20160923184310-mutt-send-email-mst@kernel.org> <20160923210259-mutt-send-email-mst@kernel.org> Cc: yuanhan.liu@linux.intel.com, huawei.xie@intel.com, dev@dpdk.org, vkaplans@redhat.com, stephen@networkplumber.org From: Maxime Coquelin Message-ID: <425573ad-216f-54e7-f4ee-998a4f87e189@redhat.com> Date: Fri, 23 Sep 2016 20:16:09 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160923210259-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 23 Sep 2016 18:16:13 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v3] vhost: Add indirect descriptors support to the TX path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Sep 2016 18:16:14 -0000 On 09/23/2016 08:06 PM, Michael S. Tsirkin wrote: > On Fri, Sep 23, 2016 at 08:02:27PM +0200, Maxime Coquelin wrote: >> >> >> On 09/23/2016 05:49 PM, Michael S. Tsirkin wrote: >>> On Fri, Sep 23, 2016 at 10:28:23AM +0200, Maxime Coquelin wrote: >>>> Indirect descriptors are usually supported by virtio-net devices, >>>> allowing to dispatch a larger number of requests. >>>> >>>> When the virtio device sends a packet using indirect descriptors, >>>> only one slot is used in the ring, even for large packets. >>>> >>>> The main effect is to improve the 0% packet loss benchmark. >>>> A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd >>>> (fwd io for host, macswap for VM) on DUT shows a +50% gain for >>>> zero loss. >>>> >>>> On the downside, micro-benchmark using testpmd txonly in VM and >>>> rxonly on host shows a loss between 1 and 4%.i But depending on >>>> the needs, feature can be disabled at VM boot time by passing >>>> indirect_desc=off argument to vhost-user device in Qemu. >>> >>> Even better, change guest pmd to only use indirect >>> descriptors when this makes sense (e.g. sufficiently >>> large packets). >> With the micro-benchmark, the degradation is quite constant whatever >> the packet size. >> >> For PVP, I could not test with larger packets than 64 bytes, as I don't >> have a 40G interface, > > Don't 64 byte packets fit in a single slot anyway? No, indirect is used. I didn't checked in details, but I think this is because there is no headroom reserved in the mbuf. This is the condition to meet to fit in a single slot: /* optimize ring usage */ if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) && rte_mbuf_refcnt_read(txm) == 1 && RTE_MBUF_DIRECT(txm) && txm->nb_segs == 1 && rte_pktmbuf_headroom(txm) >= hdr_size && rte_is_aligned(rte_pktmbuf_mtod(txm, char *), __alignof__(struct virtio_net_hdr_mrg_rxbuf))) can_push = 1; else if (vtpci_with_feature(hw, VIRTIO_RING_F_INDIRECT_DESC) && txm->nb_segs < VIRTIO_MAX_TX_INDIRECT) use_indirect = 1; I will check more in details next week. > Why would there be an effect with that? > >> and line rate with 10G is reached rapidly. > > Right but focus on packet loss. you can have that at any rate. > >> >>> I would be very interested to know when does it make >>> sense. >>> >>> The feature is there, it's up to guest whether to >>> use it. >> Do you mean the PMD should detect dynamically whether using indirect, >> or having an option at device init time to enable or not the feature? > > guest PMD should not use indirect where it slows things down. > >>> >>> >>>> Signed-off-by: Maxime Coquelin >>>> --- >>>> Changes since v2: >>>> ================= >>>> - Revert back to not checking feature flag to be aligned with >>>> kernel implementation >>>> - Ensure we don't have nested indirect descriptors >>>> - Ensure the indirect desc address is valid, to protect against >>>> malicious guests >>>> >>>> Changes since RFC: >>>> ================= >>>> - Enrich commit message with figures >>>> - Rebased on top of dpdk-next-virtio's master >>>> - Add feature check to ensure we don't receive an indirect desc >>>> if not supported by the virtio driver >>>> >>>> lib/librte_vhost/vhost.c | 3 ++- >>>> lib/librte_vhost/virtio_net.c | 41 +++++++++++++++++++++++++++++++---------- >>>> 2 files changed, 33 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c >>>> index 46095c3..30bb0ce 100644 >>>> --- a/lib/librte_vhost/vhost.c >>>> +++ b/lib/librte_vhost/vhost.c >>>> @@ -65,7 +65,8 @@ >>>> (1ULL << VIRTIO_NET_F_CSUM) | \ >>>> (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \ >>>> (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \ >>>> - (1ULL << VIRTIO_NET_F_GUEST_TSO6)) >>>> + (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \ >>>> + (1ULL << VIRTIO_RING_F_INDIRECT_DESC)) >>>> >>>> uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; >>>> >>>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c >>>> index 8a151af..2e0a587 100644 >>>> --- a/lib/librte_vhost/virtio_net.c >>>> +++ b/lib/librte_vhost/virtio_net.c >>>> @@ -679,8 +679,8 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac) >>>> } >>>> >>>> static inline int __attribute__((always_inline)) >>>> -copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >>>> - struct rte_mbuf *m, uint16_t desc_idx, >>>> +copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs, >>>> + uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx, >>>> struct rte_mempool *mbuf_pool) >>>> { >>>> struct vring_desc *desc; >>>> @@ -693,8 +693,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >>>> /* A counter to avoid desc dead loop chain */ >>>> uint32_t nr_desc = 1; >>>> >>>> - desc = &vq->desc[desc_idx]; >>>> - if (unlikely(desc->len < dev->vhost_hlen)) >>>> + desc = &descs[desc_idx]; >>>> + if (unlikely((desc->len < dev->vhost_hlen)) || >>>> + (desc->flags & VRING_DESC_F_INDIRECT)) >>>> return -1; >>>> >>>> desc_addr = gpa_to_vva(dev, desc->addr); >>>> @@ -711,7 +712,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >>>> */ >>>> if (likely((desc->len == dev->vhost_hlen) && >>>> (desc->flags & VRING_DESC_F_NEXT) != 0)) { >>>> - desc = &vq->desc[desc->next]; >>>> + desc = &descs[desc->next]; >>>> + if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) >>>> + return -1; >>>> >>>> desc_addr = gpa_to_vva(dev, desc->addr); >>>> if (unlikely(!desc_addr)) >>> >>> >>> Just to make sure, does this still allow a chain of >>> direct descriptors ending with an indirect one? >>> This is legal as per spec. >>> >>>> @@ -747,10 +750,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >>>> if ((desc->flags & VRING_DESC_F_NEXT) == 0) >>>> break; >>>> >>>> - if (unlikely(desc->next >= vq->size || >>>> - ++nr_desc > vq->size)) >>>> + if (unlikely(desc->next >= max_desc || >>>> + ++nr_desc > max_desc)) >>>> + return -1; >>>> + desc = &descs[desc->next]; >>>> + if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) >>>> return -1; >>>> - desc = &vq->desc[desc->next]; >>>> >>>> desc_addr = gpa_to_vva(dev, desc->addr); >>>> if (unlikely(!desc_addr)) >>>> @@ -878,19 +883,35 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, >>>> /* Prefetch descriptor index. */ >>>> rte_prefetch0(&vq->desc[desc_indexes[0]]); >>>> for (i = 0; i < count; i++) { >>>> + struct vring_desc *desc; >>>> + uint16_t sz, idx; >>>> int err; >>>> >>>> if (likely(i + 1 < count)) >>>> rte_prefetch0(&vq->desc[desc_indexes[i + 1]]); >>>> >>>> + if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) { >>>> + desc = (struct vring_desc *)gpa_to_vva(dev, >>>> + vq->desc[desc_indexes[i]].addr); >>>> + if (unlikely(!desc)) >>>> + break; >>>> + >>>> + rte_prefetch0(desc); >>>> + sz = vq->desc[desc_indexes[i]].len / sizeof(*desc); >>>> + idx = 0; >>>> + } else { >>>> + desc = vq->desc; >>>> + sz = vq->size; >>>> + idx = desc_indexes[i]; >>>> + } >>>> + >>>> pkts[i] = rte_pktmbuf_alloc(mbuf_pool); >>>> if (unlikely(pkts[i] == NULL)) { >>>> RTE_LOG(ERR, VHOST_DATA, >>>> "Failed to allocate memory for mbuf.\n"); >>>> break; >>>> } >>>> - err = copy_desc_to_mbuf(dev, vq, pkts[i], desc_indexes[i], >>>> - mbuf_pool); >>>> + err = copy_desc_to_mbuf(dev, desc, sz, pkts[i], idx, mbuf_pool); >>>> if (unlikely(err)) { >>>> rte_pktmbuf_free(pkts[i]); >>>> break; >>>> -- >>>> 2.7.4 >>> >>> Something that I'm missing here: it's legal for guest >>> to add indirect descriptors for RX. >>> I don't see the handling of RX here though. >>> I think it's required for spec compliance. >>>