From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id C22C32BAD for ; Thu, 3 Mar 2016 17:21:28 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP; 03 Mar 2016 08:21:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,532,1449561600"; d="scan'208";a="59221213" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga004.fm.intel.com with ESMTP; 03 Mar 2016 08:21:27 -0800 Received: from FMSMSX109.amr.corp.intel.com (10.18.116.9) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 3 Mar 2016 08:21:27 -0800 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by fmsmsx109.amr.corp.intel.com (10.18.116.9) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 3 Mar 2016 08:21:27 -0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.136]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.24]) with mapi id 14.03.0248.002; Fri, 4 Mar 2016 00:21:19 +0800 From: "Xie, Huawei" To: Yuanhan Liu , "dev@dpdk.org" Thread-Topic: [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Thread-Index: AdF1aLd5clXhimYJTx2/yKoyh8rOiw== Date: Thu, 3 Mar 2016 16:21:19 +0000 Message-ID: References: <1449122773-25510-1-git-send-email-yuanhan.liu@linux.intel.com> <1455803352-5518-1-git-send-email-yuanhan.liu@linux.intel.com> <1455803352-5518-2-git-send-email-yuanhan.liu@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: Victor Kaplansky , "Michael S. Tsirkin" Subject: Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Mar 2016 16:21:29 -0000 On 2/18/2016 9:48 PM, Yuanhan Liu wrote:=0A= > The current rte_vhost_dequeue_burst() implementation is a bit messy=0A= > and logic twisted. And you could see repeat code here and there: it=0A= > invokes rte_pktmbuf_alloc() three times at three different places!=0A= >=0A= > However, rte_vhost_dequeue_burst() acutally does a simple job: copy=0A= > the packet data from vring desc to mbuf. What's tricky here is:=0A= >=0A= > - desc buff could be chained (by desc->next field), so that you need=0A= > fetch next one if current is wholly drained.=0A= >=0A= > - One mbuf could not be big enough to hold all desc buff, hence you=0A= > need to chain the mbuf as well, by the mbuf->next field.=0A= >=0A= > Even though, the logic could be simple. Here is the pseudo code.=0A= >=0A= > while (this_desc_is_not_drained_totally || has_next_desc) {=0A= > if (this_desc_has_drained_totally) {=0A= > this_desc =3D next_desc();=0A= > }=0A= >=0A= > if (mbuf_has_no_room) {=0A= > mbuf =3D allocate_a_new_mbuf();=0A= > }=0A= >=0A= > COPY(mbuf, desc);=0A= > }=0A= >=0A= > And this is how I refactored rte_vhost_dequeue_burst.=0A= >=0A= > Note that the old patch does a special handling for skipping virtio=0A= > header. However, that could be simply done by adjusting desc_avail=0A= > and desc_offset var:=0A= >=0A= > desc_avail =3D desc->len - vq->vhost_hlen;=0A= > desc_offset =3D vq->vhost_hlen;=0A= >=0A= > This refactor makes the code much more readable (IMO), yet it reduces=0A= > binary code size (nearly 2K).=0A= >=0A= > Signed-off-by: Yuanhan Liu =0A= > ---=0A= >=0A= > v2: - fix potential NULL dereference bug of var "prev" and "head"=0A= > ---=0A= > lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++-------------------= ------=0A= > 1 file changed, 116 insertions(+), 181 deletions(-)=0A= >=0A= > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.= c=0A= > index 5e7e5b1..d5cd0fa 100644=0A= > --- a/lib/librte_vhost/vhost_rxtx.c=0A= > +++ b/lib/librte_vhost/vhost_rxtx.c=0A= > @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, = struct rte_mbuf *m)=0A= > }=0A= > }=0A= > =0A= > +static inline struct rte_mbuf *=0A= > +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,=0A= > + uint16_t desc_idx, struct rte_mempool *mbuf_pool)=0A= > +{=0A= > + struct vring_desc *desc;=0A= > + uint64_t desc_addr;=0A= > + uint32_t desc_avail, desc_offset;=0A= > + uint32_t mbuf_avail, mbuf_offset;=0A= > + uint32_t cpy_len;=0A= > + struct rte_mbuf *head =3D NULL;=0A= > + struct rte_mbuf *cur =3D NULL, *prev =3D NULL;=0A= > + struct virtio_net_hdr *hdr;=0A= > +=0A= > + desc =3D &vq->desc[desc_idx];=0A= > + desc_addr =3D gpa_to_vva(dev, desc->addr);=0A= > + rte_prefetch0((void *)(uintptr_t)desc_addr);=0A= > +=0A= > + /* Retrieve virtio net header */=0A= > + hdr =3D (struct virtio_net_hdr *)((uintptr_t)desc_addr);=0A= > + desc_avail =3D desc->len - vq->vhost_hlen;=0A= =0A= There is a serious bug here, desc->len - vq->vhost_len could overflow.=0A= VM could easily create this case. Let us fix it here.=0A= =0A= > + desc_offset =3D vq->vhost_hlen;=0A= > +=0A= > + mbuf_avail =3D 0;=0A= > + mbuf_offset =3D 0;=0A= > + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) !=3D 0) {=0A= > + /* This desc reachs to its end, get the next one */=0A= > + if (desc_avail =3D=3D 0) {=0A= > + desc =3D &vq->desc[desc->next];=0A= > +=0A= > + desc_addr =3D gpa_to_vva(dev, desc->addr);=0A= > + rte_prefetch0((void *)(uintptr_t)desc_addr);=0A= > +=0A= > + desc_offset =3D 0;=0A= > + desc_avail =3D desc->len;=0A= > +=0A= > + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);=0A= > + }=0A= > +=0A= > + /*=0A= > + * This mbuf reachs to its end, get a new one=0A= > + * to hold more data.=0A= > + */=0A= > + if (mbuf_avail =3D=3D 0) {=0A= > + cur =3D rte_pktmbuf_alloc(mbuf_pool);=0A= > + if (unlikely(!cur)) {=0A= > + RTE_LOG(ERR, VHOST_DATA, "Failed to "=0A= > + "allocate memory for mbuf.\n");=0A= > + if (head)=0A= > + rte_pktmbuf_free(head);=0A= > + return NULL;=0A= > + }=0A= > + if (!head) {=0A= > + head =3D cur;=0A= > + } else {=0A= > + prev->next =3D cur;=0A= > + prev->data_len =3D mbuf_offset;=0A= > + head->nb_segs +=3D 1;=0A= > + }=0A= > + head->pkt_len +=3D mbuf_offset;=0A= > + prev =3D cur;=0A= > +=0A= > + mbuf_offset =3D 0;=0A= > + mbuf_avail =3D cur->buf_len - RTE_PKTMBUF_HEADROOM;=0A= > + }=0A= > +=0A= > + cpy_len =3D RTE_MIN(desc_avail, mbuf_avail);=0A= > + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset),=0A= > + (void *)((uintptr_t)(desc_addr + desc_offset)),=0A= > + cpy_len);=0A= > +=0A= > + mbuf_avail -=3D cpy_len;=0A= > + mbuf_offset +=3D cpy_len;=0A= > + desc_avail -=3D cpy_len;=0A= > + desc_offset +=3D cpy_len;=0A= > + }=0A= > +=0A= > + if (prev) {=0A= > + prev->data_len =3D mbuf_offset;=0A= > + head->pkt_len +=3D mbuf_offset;=0A= > +=0A= > + if (hdr->flags !=3D 0 || hdr->gso_type !=3D VIRTIO_NET_HDR_GSO_NONE)= =0A= > + vhost_dequeue_offload(hdr, head);=0A= > + }=0A= > +=0A= > + return head;=0A= > +}=0A= > +=0A= > uint16_t=0A= > rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,=0A= > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)= =0A= > {=0A= > - struct rte_mbuf *m, *prev;=0A= > struct vhost_virtqueue *vq;=0A= > - struct vring_desc *desc;=0A= > - uint64_t vb_addr =3D 0;=0A= > - uint64_t vb_net_hdr_addr =3D 0;=0A= > - uint32_t head[MAX_PKT_BURST];=0A= > + uint32_t desc_indexes[MAX_PKT_BURST];=0A= > uint32_t used_idx;=0A= > uint32_t i;=0A= > - uint16_t free_entries, entry_success =3D 0;=0A= > + uint16_t free_entries;=0A= > uint16_t avail_idx;=0A= > - struct virtio_net_hdr *hdr =3D NULL;=0A= > + struct rte_mbuf *m;=0A= > =0A= > if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {= =0A= > RTE_LOG(ERR, VHOST_DATA,=0A= > @@ -730,197 +813,49 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, ui= nt16_t queue_id,=0A= > return 0;=0A= > =0A= > avail_idx =3D *((volatile uint16_t *)&vq->avail->idx);=0A= > -=0A= > - /* If there are no available buffers then return. */=0A= > - if (vq->last_used_idx =3D=3D avail_idx)=0A= > + free_entries =3D avail_idx - vq->last_used_idx;=0A= > + if (free_entries =3D=3D 0)=0A= > return 0;=0A= > =0A= > - LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__,=0A= > - dev->device_fh);=0A= > + LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, dev->device_fh);=0A= > =0A= > - /* Prefetch available ring to retrieve head indexes. */=0A= > - rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]);=0A= > + used_idx =3D vq->last_used_idx & (vq->size -1);=0A= > =0A= > - /*get the number of free entries in the ring*/=0A= > - free_entries =3D (avail_idx - vq->last_used_idx);=0A= > + /* Prefetch available ring to retrieve head indexes. */=0A= > + rte_prefetch0(&vq->avail->ring[used_idx]);=0A= > =0A= > - free_entries =3D RTE_MIN(free_entries, count);=0A= > - /* Limit to MAX_PKT_BURST. */=0A= > - free_entries =3D RTE_MIN(free_entries, MAX_PKT_BURST);=0A= > + count =3D RTE_MIN(count, MAX_PKT_BURST);=0A= > + count =3D RTE_MIN(count, free_entries);=0A= > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequeue %u buffers\n",=0A= > + dev->device_fh, count);=0A= > =0A= > - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n",=0A= > - dev->device_fh, free_entries);=0A= > /* Retrieve all of the head indexes first to avoid caching issues. */= =0A= > - for (i =3D 0; i < free_entries; i++)=0A= > - head[i] =3D vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)];= =0A= > + for (i =3D 0; i < count; i++) {=0A= > + desc_indexes[i] =3D vq->avail->ring[(vq->last_used_idx + i) &=0A= > + (vq->size - 1)];=0A= > + }=0A= > =0A= > /* Prefetch descriptor index. */=0A= > - rte_prefetch0(&vq->desc[head[entry_success]]);=0A= > + rte_prefetch0(&vq->desc[desc_indexes[0]]);=0A= > rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);=0A= > =0A= > - while (entry_success < free_entries) {=0A= > - uint32_t vb_avail, vb_offset;=0A= > - uint32_t seg_avail, seg_offset;=0A= > - uint32_t cpy_len;=0A= > - uint32_t seg_num =3D 0;=0A= > - struct rte_mbuf *cur;=0A= > - uint8_t alloc_err =3D 0;=0A= > -=0A= > - desc =3D &vq->desc[head[entry_success]];=0A= > -=0A= > - vb_net_hdr_addr =3D gpa_to_vva(dev, desc->addr);=0A= > - hdr =3D (struct virtio_net_hdr *)((uintptr_t)vb_net_hdr_addr);=0A= > -=0A= > - /* Discard first buffer as it is the virtio header */=0A= > - if (desc->flags & VRING_DESC_F_NEXT) {=0A= > - desc =3D &vq->desc[desc->next];=0A= > - vb_offset =3D 0;=0A= > - vb_avail =3D desc->len;=0A= > - } else {=0A= > - vb_offset =3D vq->vhost_hlen;=0A= > - vb_avail =3D desc->len - vb_offset;=0A= > - }=0A= > -=0A= > - /* Buffer address translation. */=0A= > - vb_addr =3D gpa_to_vva(dev, desc->addr);=0A= > - /* Prefetch buffer address. */=0A= > - rte_prefetch0((void *)(uintptr_t)vb_addr);=0A= > -=0A= > - used_idx =3D vq->last_used_idx & (vq->size - 1);=0A= > -=0A= > - if (entry_success < (free_entries - 1)) {=0A= > - /* Prefetch descriptor index. */=0A= > - rte_prefetch0(&vq->desc[head[entry_success+1]]);=0A= > - rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]);=0A= > - }=0A= > -=0A= > - /* Update used index buffer information. */=0A= > - vq->used->ring[used_idx].id =3D head[entry_success];=0A= > - vq->used->ring[used_idx].len =3D 0;=0A= > -=0A= > - /* Allocate an mbuf and populate the structure. */=0A= > - m =3D rte_pktmbuf_alloc(mbuf_pool);=0A= > - if (unlikely(m =3D=3D NULL)) {=0A= > - RTE_LOG(ERR, VHOST_DATA,=0A= > - "Failed to allocate memory for mbuf.\n");=0A= > - break;=0A= > - }=0A= > - seg_offset =3D 0;=0A= > - seg_avail =3D m->buf_len - RTE_PKTMBUF_HEADROOM;=0A= > - cpy_len =3D RTE_MIN(vb_avail, seg_avail);=0A= > -=0A= > - PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0);=0A= > -=0A= > - seg_num++;=0A= > - cur =3D m;=0A= > - prev =3D m;=0A= > - while (cpy_len !=3D 0) {=0A= > - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset),=0A= > - (void *)((uintptr_t)(vb_addr + vb_offset)),=0A= > - cpy_len);=0A= > -=0A= > - seg_offset +=3D cpy_len;=0A= > - vb_offset +=3D cpy_len;=0A= > - vb_avail -=3D cpy_len;=0A= > - seg_avail -=3D cpy_len;=0A= > -=0A= > - if (vb_avail !=3D 0) {=0A= > - /*=0A= > - * The segment reachs to its end,=0A= > - * while the virtio buffer in TX vring has=0A= > - * more data to be copied.=0A= > - */=0A= > - cur->data_len =3D seg_offset;=0A= > - m->pkt_len +=3D seg_offset;=0A= > - /* Allocate mbuf and populate the structure. */=0A= > - cur =3D rte_pktmbuf_alloc(mbuf_pool);=0A= > - if (unlikely(cur =3D=3D NULL)) {=0A= > - RTE_LOG(ERR, VHOST_DATA, "Failed to "=0A= > - "allocate memory for mbuf.\n");=0A= > - rte_pktmbuf_free(m);=0A= > - alloc_err =3D 1;=0A= > - break;=0A= > - }=0A= > -=0A= > - seg_num++;=0A= > - prev->next =3D cur;=0A= > - prev =3D cur;=0A= > - seg_offset =3D 0;=0A= > - seg_avail =3D cur->buf_len - RTE_PKTMBUF_HEADROOM;=0A= > - } else {=0A= > - if (desc->flags & VRING_DESC_F_NEXT) {=0A= > - /*=0A= > - * There are more virtio buffers in=0A= > - * same vring entry need to be copied.=0A= > - */=0A= > - if (seg_avail =3D=3D 0) {=0A= > - /*=0A= > - * The current segment hasn't=0A= > - * room to accomodate more=0A= > - * data.=0A= > - */=0A= > - cur->data_len =3D seg_offset;=0A= > - m->pkt_len +=3D seg_offset;=0A= > - /*=0A= > - * Allocate an mbuf and=0A= > - * populate the structure.=0A= > - */=0A= > - cur =3D rte_pktmbuf_alloc(mbuf_pool);=0A= > - if (unlikely(cur =3D=3D NULL)) {=0A= > - RTE_LOG(ERR,=0A= > - VHOST_DATA,=0A= > - "Failed to "=0A= > - "allocate memory "=0A= > - "for mbuf\n");=0A= > - rte_pktmbuf_free(m);=0A= > - alloc_err =3D 1;=0A= > - break;=0A= > - }=0A= > - seg_num++;=0A= > - prev->next =3D cur;=0A= > - prev =3D cur;=0A= > - seg_offset =3D 0;=0A= > - seg_avail =3D cur->buf_len - RTE_PKTMBUF_HEADROOM;=0A= > - }=0A= > -=0A= > - desc =3D &vq->desc[desc->next];=0A= > -=0A= > - /* Buffer address translation. */=0A= > - vb_addr =3D gpa_to_vva(dev, desc->addr);=0A= > - /* Prefetch buffer address. */=0A= > - rte_prefetch0((void *)(uintptr_t)vb_addr);=0A= > - vb_offset =3D 0;=0A= > - vb_avail =3D desc->len;=0A= > -=0A= > - PRINT_PACKET(dev, (uintptr_t)vb_addr,=0A= > - desc->len, 0);=0A= > - } else {=0A= > - /* The whole packet completes. */=0A= > - cur->data_len =3D seg_offset;=0A= > - m->pkt_len +=3D seg_offset;=0A= > - vb_avail =3D 0;=0A= > - }=0A= > - }=0A= > -=0A= > - cpy_len =3D RTE_MIN(vb_avail, seg_avail);=0A= > - }=0A= > -=0A= > - if (unlikely(alloc_err =3D=3D 1))=0A= > + for (i =3D 0; i < count; i++) {=0A= > + m =3D copy_desc_to_mbuf(dev, vq, desc_indexes[i], mbuf_pool);=0A= > + if (m =3D=3D NULL)=0A= > break;=0A= > + pkts[i] =3D m;=0A= > =0A= > - m->nb_segs =3D seg_num;=0A= > - if ((hdr->flags !=3D 0) || (hdr->gso_type !=3D VIRTIO_NET_HDR_GSO_NONE= ))=0A= > - vhost_dequeue_offload(hdr, m);=0A= > -=0A= > - pkts[entry_success] =3D m;=0A= > - vq->last_used_idx++;=0A= > - entry_success++;=0A= > + used_idx =3D vq->last_used_idx++ & (vq->size - 1);=0A= > + vq->used->ring[used_idx].id =3D desc_indexes[i];=0A= > + vq->used->ring[used_idx].len =3D 0;=0A= > }=0A= > =0A= > rte_compiler_barrier();=0A= > - vq->used->idx +=3D entry_success;=0A= > + vq->used->idx +=3D i;=0A= > +=0A= > /* Kick guest if required. */=0A= > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))=0A= > eventfd_write(vq->callfd, (eventfd_t)1);=0A= > - return entry_success;=0A= > +=0A= > + return i;=0A= > }=0A= =0A=