From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id A366A5595 for ; Mon, 12 Sep 2016 17:42:44 +0200 (CEST) Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EEF3132B82C; Mon, 12 Sep 2016 15:42:43 +0000 (UTC) Received: from [10.36.4.241] (vpn1-4-241.ams2.redhat.com [10.36.4.241]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8CFgfUp001747 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 12 Sep 2016 11:42:42 -0400 To: Zhihong Wang , dev@dpdk.org References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <1473392368-84903-1-git-send-email-zhihong.wang@intel.com> <1473392368-84903-3-git-send-email-zhihong.wang@intel.com> Cc: yuanhan.liu@linux.intel.com, thomas.monjalon@6wind.com From: Maxime Coquelin Message-ID: <2cc970ca-ea2c-8e3b-069a-9f1ec1182b03@redhat.com> Date: Mon, 12 Sep 2016 17:42:40 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1473392368-84903-3-git-send-email-zhihong.wang@intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 12 Sep 2016 15:42:44 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v5 2/6] vhost: rewrite enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Sep 2016 15:42:45 -0000 Hi, On 09/09/2016 05:39 AM, Zhihong Wang wrote: > This patch implements the vhost logic from scratch into a single function > designed for high performance and better maintainability. > > This is the baseline version of the new code, more optimization will be > added in the following patches in this patch set. > > Signed-off-by: Zhihong Wang > --- > Changes in v5: > > 1. Rebase to the latest branch. > > 2. Rename variables to keep consistent in naming style. > > 3. Small changes like return value adjustment and vertical alignment. > > --- > Changes in v4: > > 1. Refactor the code for clearer logic. > > 2. Add PRINT_PACKET for debugging. > > --- > Changes in v3: > > 1. Rewrite enqueue and delete the obsolete in the same patch. > > lib/librte_vhost/virtio_net.c | 514 ++++++++++++------------------------------ > 1 file changed, 138 insertions(+), 376 deletions(-) > > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > index 0d6e7d9..6f63968 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -91,7 +91,7 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb) > return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM; > } > > -static void > +static inline void __attribute__((always_inline)) > virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) > { > if (m_buf->ol_flags & PKT_TX_L4_MASK) { > @@ -112,6 +112,10 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) > cksum)); > break; > } > + } else { > + net_hdr->flags = 0; > + net_hdr->csum_start = 0; > + net_hdr->csum_offset = 0; > } > > if (m_buf->ol_flags & PKT_TX_TCP_SEG) { > @@ -122,439 +126,197 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) > net_hdr->gso_size = m_buf->tso_segsz; > net_hdr->hdr_len = m_buf->l2_len + m_buf->l3_len > + m_buf->l4_len; > + } else { > + net_hdr->gso_type = 0; > + net_hdr->hdr_len = 0; > + net_hdr->gso_size = 0; > } > } > > -static inline void > -copy_virtio_net_hdr(struct virtio_net *dev, uint64_t desc_addr, > - struct virtio_net_hdr_mrg_rxbuf hdr) > +static inline void __attribute__((always_inline)) > +update_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq, > + uint32_t desc_chain_head, uint32_t desc_chain_len) > { > - if (dev->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) > - *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > - else > - *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > + uint32_t used_idx = vq->last_used_idx & (vq->size - 1); > + > + vq->used->ring[used_idx].id = desc_chain_head; > + vq->used->ring[used_idx].len = desc_chain_len; > + vq->last_used_idx++; > + vhost_log_used_vring(dev, vq, offsetof(struct vring_used, > + ring[used_idx]), > + sizeof(vq->used->ring[used_idx])); > } > > static inline int __attribute__((always_inline)) > -copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, > - struct rte_mbuf *m, uint16_t desc_idx) > +enqueue_packet(struct virtio_net *dev, struct vhost_virtqueue *vq, > + uint16_t avail_idx, struct rte_mbuf *mbuf, > + uint32_t is_mrg_rxbuf) > { > - uint32_t desc_avail, desc_offset; > - uint32_t mbuf_avail, mbuf_offset; > - uint32_t cpy_len; > + struct virtio_net_hdr_mrg_rxbuf *virtio_hdr; > struct vring_desc *desc; > uint64_t desc_addr; > - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; > + uint32_t desc_chain_head; > + uint32_t desc_chain_len; > + uint32_t desc_current; > + uint32_t desc_offset; > + uint32_t mbuf_len; > + uint32_t mbuf_avail; > + uint32_t cpy_len; > + uint32_t num_buffers = 0; > > - desc = &vq->desc[desc_idx]; > + /* start with the first mbuf of the packet */ > + mbuf_len = rte_pktmbuf_data_len(mbuf); > + mbuf_avail = mbuf_len; > + > + /* get the current desc */ > + desc_current = vq->avail->ring[(vq->last_used_idx) & (vq->size - 1)]; > + desc_chain_head = desc_current; > + desc = &vq->desc[desc_current]; > desc_addr = gpa_to_vva(dev, desc->addr); > - /* > - * Checking of 'desc_addr' placed outside of 'unlikely' macro to avoid > - * performance issue with some versions of gcc (4.8.4 and 5.3.0) which > - * otherwise stores offset on the stack instead of in a register. > - */ > - if (unlikely(desc->len < dev->vhost_hlen) || !desc_addr) > - return -1; > + if (unlikely(!desc_addr)) > + goto error; > > - rte_prefetch0((void *)(uintptr_t)desc_addr); > + /* handle virtio header */ > + virtio_hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr; > + virtio_enqueue_offload(mbuf, &(virtio_hdr->hdr)); > + if (is_mrg_rxbuf) > + virtio_hdr->num_buffers = 1; > > - virtio_enqueue_offload(m, &virtio_hdr.hdr); > - copy_virtio_net_hdr(dev, desc_addr, virtio_hdr); > vhost_log_write(dev, desc->addr, dev->vhost_hlen); > PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0); > - > desc_offset = dev->vhost_hlen; > - desc_avail = desc->len - dev->vhost_hlen; > - > - mbuf_avail = rte_pktmbuf_data_len(m); > - mbuf_offset = 0; > - while (mbuf_avail != 0 || m->next != NULL) { > - /* done with current mbuf, fetch next */ > - if (mbuf_avail == 0) { > - m = m->next; > - > - mbuf_offset = 0; > - mbuf_avail = rte_pktmbuf_data_len(m); > + desc_chain_len = desc_offset; > + desc_addr += desc_offset; > + > + /* start copy from mbuf to desc */ > + while (mbuf_avail || mbuf->next) { > + /* get the next mbuf if the current done */ > + if (!mbuf_avail) { > + mbuf = mbuf->next; > + mbuf_len = rte_pktmbuf_data_len(mbuf); > + mbuf_avail = mbuf_len; > } > > - /* done with current desc buf, fetch next */ > - if (desc_avail == 0) { > - if ((desc->flags & VRING_DESC_F_NEXT) == 0) { > - /* Room in vring buffer is not enough */ > - return -1; > - } > - if (unlikely(desc->next >= vq->size)) > - return -1; > + /* get the next desc if the current done */ > + if (desc->len <= desc_offset) { > + if (desc->flags & VRING_DESC_F_NEXT) { > + /* go on with the current desc chain */ > + desc_offset = 0; > + desc_current = desc->next; > + desc = &vq->desc[desc_current]; > + desc_addr = gpa_to_vva(dev, desc->addr); > + if (unlikely(!desc_addr)) > + goto error; > + } else if (is_mrg_rxbuf) { > + /* start with the next desc chain */ > + update_used_ring(dev, vq, desc_chain_head, > + desc_chain_len); > + num_buffers++; > + virtio_hdr->num_buffers++; > + if (avail_idx == vq->last_used_idx) > + goto error; > + > + desc_current = > + vq->avail->ring[(vq->last_used_idx) & > + (vq->size - 1)]; > + desc_chain_head = desc_current; > + desc = &vq->desc[desc_current]; > + desc_addr = gpa_to_vva(dev, desc->addr); > + if (unlikely(!desc_addr)) > + goto error; > > - desc = &vq->desc[desc->next]; > - desc_addr = gpa_to_vva(dev, desc->addr); > - if (unlikely(!desc_addr)) > - return -1; > - > - desc_offset = 0; > - desc_avail = desc->len; > + desc_chain_len = 0; > + desc_offset = 0; As I commented on v3, there is code duplication between next flag, and mrg buf cases: desc_offset = 0; and: desc = &vq->desc[desc_current]; desc_addr = gpa_to_vva(dev, desc->addr); if (unlikely(!desc_addr)) goto error; Regards, Maxime