From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id F324D2B92 for ; Tue, 17 Jul 2018 09:17:49 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Jul 2018 00:17:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,365,1526367600"; d="scan'208";a="54935206" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga007.fm.intel.com with ESMTP; 17 Jul 2018 00:17:30 -0700 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 17 Jul 2018 00:17:29 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by fmsmsx121.amr.corp.intel.com (10.18.125.36) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 17 Jul 2018 00:17:29 -0700 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.81]) by shsmsx102.ccr.corp.intel.com ([169.254.2.124]) with mapi id 14.03.0319.002; Tue, 17 Jul 2018 15:17:27 +0800 From: "Wang, Yinan" To: "'maxime.coquelin@redhat.com'" , "jfreimann@redhat.com" , "dev@dpdk.org" CC: "Yao, Lei A" , "Bie, Tiwei" , "Wang, Zhihong" Thread-Topic: [dpdk-dev] [PATCH v4 2/5] vhost: use buffer vectors in dequeue path Thread-Index: AQHUFPfDPDZsaWC3zUG8dZgNtyJzvaSRrqNQgAAINPCAAAKiEIABQdzg Date: Tue, 17 Jul 2018 07:17:27 +0000 Message-ID: References: <20180706070449.1946-1-maxime.coquelin@redhat.com> <20180706070449.1946-3-maxime.coquelin@redhat.com> <2DBBFF226F7CF64BAFCA79B681719D953A4EB9E3@SHSMSX101.ccr.corp.intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMjU0ZGRkZTctNTAxZi00NWY2LWI3ZWQtMzU4NzZmYWEyYjk4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiVzdvZFBxYmVJOW1RMkVmWnRTelRlbmE1RURwTHpNNms5QmNJK0F5SDlpXC9ERHJZenBNZ0R5b1V0OFpLN3Uydk0ifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.200.100 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4 2/5] vhost: use buffer vectors in dequeue path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jul 2018 07:17:50 -0000 Hi Maxime, vhost user + virtio-net VM2VM TSO performance test can work well on dpdk v1= 8.05.=20 But during our performance test with v18.08-rc1, we found a regression in t= he VM2VM test case. When using iperf or netperf, the server VM will hang/cr= ash. After the bisection, I found it's caused by your below patch. Could you help to take a look?=20 Below is the steps to reproduce: 1.Bind 82599 NIC port to igb_uio 2.Launch vhost-switch ./examples/vhost/build/vhost-switch -c 0x70000000 -n 4 --socket-mem 2048,20= 48 --legacy-mem -- -p 0x1 --mergeable 1 --vm2vm 1 --tso 1 --tx-csum 1 --s= ocket-file ./vhost-net --socket-file ./vhost-net1 3.Launch VM1 and VM2. taskset -c 31 \ qemu-system-x86_64 -name vm0 -enable-kvm \ -chardev socket,path=3D/tmp/vm0_qga0.sock,server,nowait,id=3Dvm0_qga0= \ -device virtio-serial -device virtserialport,chardev=3Dvm0_qga0,name= =3Dorg.qemu.guest_agent.0 -daemonize \ -monitor unix:/tmp/vm0_monitor.sock,server,nowait -net nic,vlan=3D0,m= acaddr=3D00:00:00:50:fb:f3,addr=3D1f -net user,vlan=3D0,hostfwd=3Dtcp:127.0= .0.1:6145-:22 \ -chardev socket,id=3Dchar0,path=3D./vhost-net \ -netdev type=3Dvhost-user,id=3Dnetdev0,chardev=3Dchar0,vhostforce \ -device virtio-net-pci,netdev=3Dnetdev0,mac=3D52:54:00:00:00:01 \ -cpu host -smp 1 -m 4096 -object memory-backend-file,id=3Dmem,size=3D= 4096M,mem-path=3D/mnt/huge,share=3Don -numa node,memdev=3Dmem -mem-prealloc= \ -drive file=3D/home/osimg/ubuntu16.img -vnc :4 taskset -c 32 \ qemu-system-x86_64 -name vm1 -enable-kvm \ -chardev socket,path=3D/tmp/vm1_qga0.sock,server,nowait,id=3Dvm1_qga0 = \ -device virtio-serial -device virtserialport,chardev=3Dvm1_qga0,name= =3Dorg.qemu.guest_agent.0 -daemonize \ -monitor unix:/tmp/vm1_monitor.sock,server,nowait -net nic,vlan=3D0,ma= caddr=3D00:00:00:40:75:e7,addr=3D1f -net user,vlan=3D0,hostfwd=3Dtcp:127.0.= 0.1:6134-:22 \ -chardev socket,id=3Dchar0,path=3D./vhost-net1 \ -netdev type=3Dvhost-user,id=3Dnetdev0,chardev=3Dchar0,vhostforce \ -device virtio-net-pci,netdev=3Dnetdev0,mac=3D52:54:00:00:00:02 -cpu h= ost -smp 1 -m 4096 \ -object memory-backend-file,id=3Dmem,size=3D4096M,mem-path=3D/mnt/huge= ,share=3Don -numa node,memdev=3Dmem -mem-prealloc -drive file=3D/home/osimg= /ubuntu16-2.img -vnc :5 4. On VM1, set the virtio IP and run iperf ifconfig ens4 1.1.1.2 arp -s 1.1.1.8 52:54:00:00:00:02 arp # to check the arp table is complete and correct.=20 5. On VM2, set the virtio IP and run iperf ifconfig ens4 1.1.1.8 arp -s 1.1.1.2 52:54:00:00:00:01 arp # to check the arp table is complete and correct.=20 =20 6. Ensure virtio1 can ping virtio2,then in VM1, run : `iperf -s -i 1` ; In = VM2, run `iperf -c 1.1.1.2 -i 1 -t 60`. =20 7. Check the iperf performance for VM2VM case. Best Wishes, Yinan -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Maxime Coquelin Sent: Friday, July 6, 2018 8:05 AM To: Bie, Tiwei ; Wang, Zhihong ; dev@dpdk.org Cc: Maxime Coquelin Subject: [dpdk-dev] [PATCH v4 2/5] vhost: use buffer vectors in dequeue pat= h To ease packed ring layout integration, this patch makes the dequeue path t= o re-use buffer vectors implemented for enqueue path. Doing this, copy_desc_to_mbuf() is now ring layout type agnostic. Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost.h | 1 + lib/librte_vhost/virtio_net.c | 451 ++++++++++++++++----------------------= ---- 2 files changed, 167 insertions(+), 285 deletions(-) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 3437= b996b..79e3117d2 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -43,6 +43,7 @@ * from vring to do scatter RX. */ struct buf_vector { + uint64_t buf_iova; uint64_t buf_addr; uint32_t buf_len; uint32_t desc_idx; diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c = index 741267345..6339296c7 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -225,12 +225,12 @@ static __rte_always_inline int fill_vec_buf(struct v= irtio_net *dev, struct vhost_virtqueue *vq, uint32_t avail_idx, uint32_t *vec_idx, struct buf_vector *buf_vec, uint16_t *desc_chain_head, - uint16_t *desc_chain_len) + uint16_t *desc_chain_len, uint8_t perm) { uint16_t idx =3D vq->avail->ring[avail_idx & (vq->size - 1)]; uint32_t vec_id =3D *vec_idx; uint32_t len =3D 0; - uint64_t dlen; + uint64_t dlen, desc_avail, desc_iova; struct vring_desc *descs =3D vq->desc; struct vring_desc *idesc =3D NULL; =20 @@ -261,16 +261,43 @@ fill_vec_buf(struct virtio_net *dev, struct vhost_vir= tqueue *vq, } =20 while (1) { - if (unlikely(vec_id >=3D BUF_VECTOR_MAX || idx >=3D vq->size)) { + if (unlikely(idx >=3D vq->size)) { free_ind_table(idesc); return -1; } =20 + len +=3D descs[idx].len; - buf_vec[vec_id].buf_addr =3D descs[idx].addr; - buf_vec[vec_id].buf_len =3D descs[idx].len; - buf_vec[vec_id].desc_idx =3D idx; - vec_id++; + desc_avail =3D descs[idx].len; + desc_iova =3D descs[idx].addr; + + while (desc_avail) { + uint64_t desc_addr; + uint64_t desc_chunck_len =3D desc_avail; + + if (unlikely(vec_id >=3D BUF_VECTOR_MAX)) { + free_ind_table(idesc); + return -1; + } + + desc_addr =3D vhost_iova_to_vva(dev, vq, + desc_iova, + &desc_chunck_len, + perm); + if (unlikely(!desc_addr)) { + free_ind_table(idesc); + return -1; + } + + buf_vec[vec_id].buf_iova =3D desc_iova; + buf_vec[vec_id].buf_addr =3D desc_addr; + buf_vec[vec_id].buf_len =3D desc_chunck_len; + buf_vec[vec_id].desc_idx =3D idx; + + desc_avail -=3D desc_chunck_len; + desc_iova +=3D desc_chunck_len; + vec_id++; + } =20 if ((descs[idx].flags & VRING_DESC_F_NEXT) =3D=3D 0) break; @@ -293,7 +320,8 @@ fill_vec_buf(struct virtio_net *dev, struct vhost_virtq= ueue *vq, static inline int reserve_avail_buf(struct virtio_net *dev, str= uct vhost_virtqueue *vq, uint32_t size, struct buf_vector *buf_vec, - uint16_t *num_buffers, uint16_t avail_head) + uint16_t *num_buffers, uint16_t avail_head, + uint16_t *nr_vec) { uint16_t cur_idx; uint32_t vec_idx =3D 0; @@ -315,7 +343,8 @@ reserve_avail_buf(struct virtio_net *dev, struct vhost_= virtqueue *vq, return -1; =20 if (unlikely(fill_vec_buf(dev, vq, cur_idx, &vec_idx, buf_vec, - &head_idx, &len) < 0)) + &head_idx, &len, + VHOST_ACCESS_RW) < 0)) return -1; len =3D RTE_MIN(len, size); update_shadow_used_ring(vq, head_idx, len); @@ -334,21 +363,22 @@ reserv= e_avail_buf(struct virtio_net *dev, struct vhost_virtqueue *vq, return -1; } =20 + *nr_vec =3D vec_idx; + return 0; } =20 static __rte_always_inline int copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mbuf *m, struct buf_vector *buf_vec, - uint16_t num_buffers) + uint16_t nr_vec, uint16_t num_buffers) { uint32_t vec_idx =3D 0; - uint64_t desc_addr, desc_gaddr; uint32_t mbuf_offset, mbuf_avail; - uint32_t desc_offset, desc_avail; + uint32_t buf_offset, buf_avail; + uint64_t buf_addr, buf_iova, buf_len; uint32_t cpy_len; - uint64_t desc_chunck_len; - uint64_t hdr_addr, hdr_phys_addr; + uint64_t hdr_addr; struct rte_mbuf *hdr_mbuf; struct batch_copy_elem *batch_copy =3D vq->batch_copy_elems; struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr =3D NULL; @@ -359,82 +389,5= 7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, goto out; } =20 - desc_chunck_len =3D buf_vec[vec_idx].buf_len; - desc_gaddr =3D buf_vec[vec_idx].buf_addr; - desc_addr =3D vhost_iova_to_vva(dev, vq, - desc_gaddr, - &desc_chunck_len, - VHOST_ACCESS_RW); - if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr) { + buf_addr =3D buf_vec[vec_idx].buf_addr; + buf_iova =3D buf_vec[vec_idx].buf_iova; + buf_len =3D buf_vec[vec_idx].buf_len; + + if (unlikely(buf_len < dev->vhost_hlen && nr_vec <=3D 1)) { error =3D -1; goto out; } =20 hdr_mbuf =3D m; - hdr_addr =3D desc_addr; - if (unlikely(desc_chunck_len < dev->vhost_hlen)) + hdr_addr =3D buf_addr; + if (unlikely(buf_len < dev->vhost_hlen)) hdr =3D &tmp_hdr; else hdr =3D (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)hdr_addr; - hdr_phys_addr =3D desc_gaddr; rte_prefetch0((void *)(uintptr_t)hdr_addr); =20 VHOST_LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n", dev->vid, num_buffers); =20 - desc_avail =3D buf_vec[vec_idx].buf_len - dev->vhost_hlen; - if (unlikely(desc_chunck_len < dev->vhost_hlen)) { - desc_chunck_len =3D desc_avail; - desc_gaddr +=3D dev->vhost_hlen; - desc_addr =3D vhost_iova_to_vva(dev, vq, - desc_gaddr, - &desc_chunck_len, - VHOST_ACCESS_RW); - if (unlikely(!desc_addr)) { - error =3D -1; - goto out; - } - - desc_offset =3D 0; + if (unlikely(buf_len < dev->vhost_hlen)) { + buf_offset =3D dev->vhost_hlen - buf_len; + vec_idx++; + buf_addr =3D buf_vec[vec_idx].buf_addr; + buf_iova =3D buf_vec[vec_idx].buf_iova; + buf_len =3D buf_vec[vec_idx].buf_len; + buf_avail =3D buf_len - buf_offset; } else { - desc_offset =3D dev->vhost_hlen; - desc_chunck_len -=3D dev->vhost_hlen; + buf_offset =3D dev->vhost_hlen; + buf_avail =3D buf_len - dev->vhost_hlen; } =20 - mbuf_avail =3D rte_pktmbuf_data_len(m); mbuf_offset =3D 0; while (mbuf_avail !=3D 0 || m->next !=3D NULL) { - /* done with current desc buf, get the next one */ - if (desc_avail =3D=3D 0) { + /* done with current buf, get the next one */ + if (buf_avail =3D=3D 0) { vec_idx++; - desc_chunck_len =3D buf_vec[vec_idx].buf_len; - desc_gaddr =3D buf_vec[vec_idx].buf_addr; - desc_addr =3D - vhost_iova_to_vva(dev, vq, - desc_gaddr, - &desc_chunck_len, - VHOST_ACCESS_RW); - if (unlikely(!desc_addr)) { + if (unlikely(vec_idx >=3D nr_vec)) { error =3D -1; goto out; } =20 + buf_addr =3D buf_vec[vec_idx].buf_addr; + buf_iova =3D buf_vec[vec_idx].buf_iova; + buf_len =3D buf_vec[vec_idx].buf_len; + /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)desc_addr); - desc_offset =3D 0; - desc_avail =3D buf_vec[vec_idx].buf_len; - } else if (unlikely(desc_chunck_len =3D=3D 0)) { - desc_chunck_len =3D desc_avail; - desc_gaddr +=3D desc_offset; - desc_addr =3D vhost_iova_to_vva(dev, vq, - desc_gaddr, - &desc_chunck_len, VHOST_ACCESS_RW); - if (unlikely(!desc_addr)) { - error =3D -1; - goto out; - } - desc_offset =3D 0; + rte_prefetch0((void *)(uintptr_t)buf_addr); + buf_offset =3D 0; + buf_avail =3D buf_len; } =20 /* done with current mbuf, get the next one */ @@ -455,18 +460,12 @@ cop= y_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, uint64_t len; uint64_t remain =3D dev->vhost_hlen; uint64_t src =3D (uint64_t)(uintptr_t)hdr, dst; - uint64_t guest_addr =3D hdr_phys_addr; + uint64_t iova =3D buf_vec[0].buf_iova; + uint16_t hdr_vec_idx =3D 0; =20 while (remain) { len =3D remain; - dst =3D vhost_iova_to_vva(dev, vq, - guest_addr, &len, - VHOST_ACCESS_RW); - if (unlikely(!dst || !len)) { - error =3D -1; - goto out; - } - + dst =3D buf_vec[hdr_vec_idx].buf_addr; rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); @@ -474,50 +473,50 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhos= t_virtqueue *vq, PRINT_PACKET(dev, (uintptr_t)dst, (uint32_t)len, 0); vhost_log_cache_write(dev, vq, - guest_addr, len); + iova, len); =20 remain -=3D len; - guest_addr +=3D len; + iova +=3D len; src +=3D len; + hdr_vec_idx++; } } else { PRINT_PACKET(dev, (uintptr_t)hdr_addr, dev->vhost_hlen, 0); - vhost_log_cache_write(dev, vq, hdr_phys_addr, + vhost_log_cache_write(dev, vq, + buf_vec[0].buf_iova, dev->vhost_hlen); } =20 hdr_addr =3D 0; } =20 - cpy_len =3D RTE_MIN(desc_chunck_len, mbuf_avail); + cpy_len =3D RTE_MIN(buf_len, mbuf_avail); =20 if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >=3D vq->size)) { - rte_memcpy((void *)((uintptr_t)(desc_addr + - desc_offset)), + rte_memcpy((void *)((uintptr_t)(buf_addr + buf_offset)), rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), cpy_len); - vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset, + vhost_log_cache_write(dev, vq, buf_iova + buf_offset, cpy_len); - PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), + PRINT_PACKET(dev, (uintptr_t)(buf_addr + buf_offset), cpy_len, 0); } else { batch_copy[vq->batch_copy_nb_elems].dst =3D - (void *)((uintptr_t)(desc_addr + desc_offset)); + (void *)((uintptr_t)(buf_addr + buf_offset)); batch_copy[vq->batch_copy_nb_elems].src =3D rte_pktmbuf_mtod_offset(m, void *, mbuf_offset); batch_copy[vq->batch_copy_nb_elems].log_addr =3D - desc_gaddr + desc_offset; + buf_iova + buf_offset; batch_copy[vq->batch_copy_nb_elems].len =3D cpy_len; vq->batch_copy_nb_elems++; } =20 mbuf_avail -=3D cpy_len; mbuf_offset +=3D cpy_len; - desc_avail -=3D cpy_len; - desc_offset +=3D cpy_len; - desc_chunck_len -=3D cpy_len; + buf_avail -=3D cpy_len; + buf_offset +=3D cpy_len; } =20 out: @@ -568,10 +567,11 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_= id, avail_head =3D *((volatile uint16_t *)&vq->avail->idx); for (pkt_idx =3D 0; pkt_idx < count; pkt_idx++) { uint32_t pkt_len =3D pkts[pkt_idx]->pkt_len + dev->vhost_hlen; + uint16_t nr_vec =3D 0; =20 if (unlikely(reserve_avail_buf(dev, vq, pkt_len, buf_vec, &num_buffers, - avail_head) < 0)) { + avail_head, &nr_vec) < 0)) { VHOST_LOG_DEBUG(VHOST_DATA, "(%d) failed to get enough desc from vring\n", dev->vid); @@ -584,7 +584,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id= , vq->last_avail_idx + num_buffers); =20 if (copy_mbuf_to_desc(dev, vq, pkts[pkt_idx], - buf_vec, num_buffers) < 0) { + buf_vec, nr_vec, + num_buffers) < 0) { vq->shadow_used_idx -=3D num_buffers; break; } @@ -750,49 +751,40 @@ put_zmbuf(struct zcopy_mbuf *zmbuf) =20 static __rte_always_inline int copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, - struct vring_desc *descs, uint16_t max_desc, - struct rte_mbuf *m, uint16_t desc_idx, - struct rte_mempool *mbuf_pool) + struct buf_vector *buf_vec, uint16_t nr_vec, + struct rte_mbuf *m, struct rte_mempool *mbuf_pool) { - struct vring_desc *desc; - uint64_t desc_addr, desc_gaddr; - uint32_t desc_avail, desc_offset; + uint32_t buf_avail, buf_offset; + uint64_t buf_addr, buf_iova, buf_len; uint32_t mbuf_avail, mbuf_offset; uint32_t cpy_len; - uint64_t desc_chunck_len; struct rte_mbuf *cur =3D m, *prev =3D m; struct virtio_net_hdr tmp_hdr; struct virtio_net_hdr *hdr =3D NULL; /* A counter to avoid desc dead loop chain */ - uint32_t nr_desc =3D 1; + uint16_t vec_idx =3D 0; struct batch_copy_elem *batch_copy =3D vq->batch_copy_elems; int error =3D 0; =20 - desc =3D &descs[desc_idx]; - if (unlikely((desc->len < dev->vhost_hlen)) || - (desc->flags & VRING_DESC_F_INDIRECT)) { - error =3D -1; - goto out; - } + buf_addr =3D buf_vec[vec_idx].buf_addr; + buf_iova =3D buf_vec[vec_idx].buf_iova; + buf_len =3D buf_vec[vec_idx].buf_len; =20 - desc_chunck_len =3D desc->len; - desc_gaddr =3D desc->addr; - desc_addr =3D vhost_iova_to_vva(dev, - vq, desc_gaddr, - &desc_chunck_len, - VHOST_ACCESS_RO); - if (unlikely(!desc_addr)) { + if (unlikely(buf_len < dev->vhost_hlen && nr_vec <=3D 1)) { error =3D -1; goto out; } =20 + if (likely(nr_vec > 1)) + rte_prefetch0((void *)(uintptr_t)buf_vec[1].buf_addr); + if (virtio_net_with_host_offload(dev)) { - if (unlikely(desc_chunck_len < sizeof(struct virtio_net_hdr))) { - uint64_t len =3D desc_chunck_len; + if (unlikely(buf_len < sizeof(struct virtio_net_hdr))) { + uint64_t len; uint64_t remain =3D sizeof(struct virtio_net_hdr); - uint64_t src =3D desc_addr; + uint64_t src; uint64_t dst =3D (uint64_t)(uintptr_t)&tmp_hdr; - uint64_t guest_addr =3D desc_gaddr; + uint16_t hdr_vec_idx =3D 0; =20 /* * No luck, the virtio-net header doesn't fit @@ -800,25 +792,18 @@ cop= y_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, */ while (remain) { len =3D remain; - src =3D vhost_iova_to_vva(dev, vq, - guest_addr, &len, - VHOST_ACCESS_RO); - if (unlikely(!src || !len)) { - error =3D -1; - goto out; - } - + src =3D buf_vec[hdr_vec_idx].buf_addr; rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); =20 - guest_addr +=3D len; remain -=3D len; dst +=3D len; + hdr_vec_idx++; } =20 hdr =3D &tmp_hdr; } else { - hdr =3D (struct virtio_net_hdr *)((uintptr_t)desc_addr); + hdr =3D (struct virtio_net_hdr *)((uintptr_t)buf_addr); rte_prefetch0(hdr); } } @@ -828,61 +813,40 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhos= t_virtqueue *vq, * for Tx: the first for storing the header, and others * for storing the data. */ - if (likely((desc->len =3D=3D dev->vhost_hlen) && - (desc->flags & VRING_DESC_F_NEXT) !=3D 0)) { - desc =3D &descs[desc->next]; - if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) { - error =3D -1; + if (unlikely(buf_len < dev->vhost_hlen)) { + buf_offset =3D dev->vhost_hlen - buf_len; + vec_idx++; + buf_addr =3D buf_vec[vec_idx].buf_addr; + buf_iova =3D buf_vec[vec_idx].buf_iova; + buf_len =3D buf_vec[vec_idx].buf_len; + buf_avail =3D buf_len - buf_offset; + } else if (buf_len =3D=3D dev->vhost_hlen) { + if (unlikely(++vec_idx >=3D nr_vec)) goto out; - } + buf_addr =3D buf_vec[vec_idx].buf_addr; + buf_iova =3D buf_vec[vec_idx].buf_iova; + buf_len =3D buf_vec[vec_idx].buf_len; =20 - desc_chunck_len =3D desc->len; - desc_gaddr =3D desc->addr; - desc_addr =3D vhost_iova_to_vva(dev, - vq, desc_gaddr, - &desc_chunck_len, - VHOST_ACCESS_RO); - if (unlikely(!desc_addr)) { - error =3D -1; - goto out; - } - - desc_offset =3D 0; - desc_avail =3D desc->len; - nr_desc +=3D 1; + buf_offset =3D 0; + buf_avail =3D buf_len; } else { - desc_avail =3D desc->len - dev->vhost_hlen; - - if (unlikely(desc_chunck_len < dev->vhost_hlen)) { - desc_chunck_len =3D desc_avail; - desc_gaddr +=3D dev->vhost_hlen; - desc_addr =3D vhost_iova_to_vva(dev, - vq, desc_gaddr, - &desc_chunck_len, - VHOST_ACCESS_RO); - if (unlikely(!desc_addr)) { - error =3D -1; - goto out; - } - - desc_offset =3D 0; - } else { - desc_offset =3D dev->vhost_hlen; - desc_chunck_len -=3D dev->vhost_hlen; - } + buf_offset =3D dev->vhost_hlen; + buf_avail =3D buf_vec[vec_idx].buf_len - dev->vhost_hlen; } =20 - rte_prefetch0((void *)(uintptr_t)(desc_addr + desc_offset)); + rte_prefetch0((void *)(uintptr_t) + (buf_addr + buf_offset)); =20 - PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), - (uint32_t)desc_chunck_len, 0); + PRINT_PACKET(dev, + (uintptr_t)(buf_addr + buf_offset), + (uint32_t)buf_avail, 0); =20 mbuf_offset =3D 0; mbuf_avail =3D m->buf_len - RTE_PKTMBUF_HEADROOM; while (1) { uint64_t hpa; =20 - cpy_len =3D RTE_MIN(desc_chunck_len, mbuf_avail); + cpy_len =3D RTE_MIN(buf_avail, mbuf_avail); =20 /* * A desc buf might across two host physical pages that are @@ -890,11 += 854,11 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue = *vq, * will be copied even though zero copy is enabled. */ if (unlikely(dev->dequeue_zero_copy && (hpa =3D gpa_to_hpa(dev, - desc_gaddr + desc_offset, cpy_len)))) { + buf_iova + buf_offset, cpy_len)))) { cur->data_len =3D cpy_len; cur->data_off =3D 0; - cur->buf_addr =3D (void *)(uintptr_t)(desc_addr - + desc_offset); + cur->buf_addr =3D + (void *)(uintptr_t)(buf_addr + buf_offset); cur->buf_iova =3D hpa; =20 /* @@ -905,20 +869,19 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhos= t_virtqueue *vq, } else { if (likely(cpy_len > MAX_BATCH_LEN || vq->batch_copy_nb_elems >=3D vq->size || - (hdr && cur =3D=3D m) || - desc->len !=3D desc_chunck_len)) { + (hdr && cur =3D=3D m))) { rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset), - (void *)((uintptr_t)(desc_addr + - desc_offset)), + (void *)((uintptr_t)(buf_addr + + buf_offset)), cpy_len); } else { batch_copy[vq->batch_copy_nb_elems].dst =3D rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset); batch_copy[vq->batch_copy_nb_elems].src =3D - (void *)((uintptr_t)(desc_addr + - desc_offset)); + (void *)((uintptr_t)(buf_addr + + buf_offset)); batch_copy[vq->batch_copy_nb_elems].len =3D cpy_len; vq->batch_copy_nb_elems++; @@ -927,59 +890,25 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhos= t_virtqueue *vq, =20 mbuf_avail -=3D cpy_len; mbuf_offset +=3D cpy_len; - desc_avail -=3D cpy_len; - desc_chunck_len -=3D cpy_len; - desc_offset +=3D cpy_len; + buf_avail -=3D cpy_len; + buf_offset +=3D cpy_len; =20 - /* This desc reaches to its end, get the next one */ - if (desc_avail =3D=3D 0) { - if ((desc->flags & VRING_DESC_F_NEXT) =3D=3D 0) + /* This buf reaches to its end, get the next one */ + if (buf_avail =3D=3D 0) { + if (++vec_idx >=3D nr_vec) break; =20 - if (unlikely(desc->next >=3D max_desc || - ++nr_desc > max_desc)) { - error =3D -1; - goto out; - } - desc =3D &descs[desc->next]; - if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) { - error =3D -1; - goto out; - } - - desc_chunck_len =3D desc->len; - desc_gaddr =3D desc->addr; - desc_addr =3D vhost_iova_to_vva(dev, - vq, desc_gaddr, - &desc_chunck_len, - VHOST_ACCESS_RO); - if (unlikely(!desc_addr)) { - error =3D -1; - goto out; - } + buf_addr =3D buf_vec[vec_idx].buf_addr; + buf_iova =3D buf_vec[vec_idx].buf_iova; + buf_len =3D buf_vec[vec_idx].buf_len; =20 - rte_prefetch0((void *)(uintptr_t)desc_addr); + rte_prefetch0((void *)(uintptr_t)buf_addr); =20 - desc_offset =3D 0; - desc_avail =3D desc->len; - - PRINT_PACKET(dev, (uintptr_t)desc_addr, - (uint32_t)desc_chunck_len, 0); - } else if (unlikely(desc_chunck_len =3D=3D 0)) { - desc_chunck_len =3D desc_avail; - desc_gaddr +=3D desc_offset; - desc_addr =3D vhost_iova_to_vva(dev, vq, - desc_gaddr, - &desc_chunck_len, - VHOST_ACCESS_RO); - if (unlikely(!desc_addr)) { - error =3D -1; - goto out; - } - desc_offset =3D 0; + buf_offset =3D 0; + buf_avail =3D buf_len; =20 - PRINT_PACKET(dev, (uintptr_t)desc_addr, - (uint32_t)desc_chunck_len, 0); + PRINT_PACKET(dev, (uintptr_t)buf_addr, + (uint32_t)buf_avail, 0); } =20 /* @@ -1085,10 +1014,8 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, struct virtio_net *dev; struct rte_mbuf *rarp_mbuf =3D NULL; struct vhost_virtqueue *vq; - uint32_t desc_indexes[MAX_PKT_BURST]; uint32_t i =3D 0; uint16_t free_entries; - uint16_t avail_idx; =20 dev =3D get_device(vid); if (!dev) @@ -1186,80 +1113,38 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, =20 VHOST_LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__); =20 - /* Prefetch available and used ring */ - avail_idx =3D vq->last_avail_idx & (vq->size - 1); - rte_prefetch0(&vq->avail->ring[avail_idx]); - count =3D RTE_MIN(count, MAX_PKT_BURST); count =3D RTE_MIN(count, free_entries); VHOST_LOG_DEBUG(VHOST_DATA, "(%d) about to dequeue %u buffers\n", dev->vid, count); =20 - /* Retrieve all of the head indexes first to avoid caching issues. */ - for (i =3D 0; i < count; i++) { - avail_idx =3D (vq->last_avail_idx + i) & (vq->size - 1); - desc_indexes[i] =3D vq->avail->ring[avail_idx]; - - if (likely(dev->dequeue_zero_copy =3D=3D 0)) - update_shadow_used_ring(vq, desc_indexes[i], 0); - } - - /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[desc_indexes[0]]); for (i =3D 0; i < count; i++) { - struct vring_desc *desc, *idesc =3D NULL; - uint16_t sz, idx; - uint64_t dlen; + struct buf_vector buf_vec[BUF_VECTOR_MAX]; + uint16_t head_idx, dummy_len; + uint32_t nr_vec =3D 0; int err; =20 - if (likely(i + 1 < count)) - rte_prefetch0(&vq->desc[desc_indexes[i + 1]]); - - if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) { - dlen =3D vq->desc[desc_indexes[i]].len; - desc =3D (struct vring_desc *)(uintptr_t) - vhost_iova_to_vva(dev, vq, - vq->desc[desc_indexes[i]].addr, - &dlen, - VHOST_ACCESS_RO); - if (unlikely(!desc)) - break; - - if (unlikely(dlen < vq->desc[desc_indexes[i]].len)) { - /* - * The indirect desc table is not contiguous - * in process VA space, we have to copy it. - */ - idesc =3D alloc_copy_ind_table(dev, vq, - &vq->desc[desc_indexes[i]]); - if (unlikely(!idesc)) - break; - - desc =3D idesc; - } + if (unlikely(fill_vec_buf(dev, vq, + vq->last_avail_idx + i, + &nr_vec, buf_vec, + &head_idx, &dummy_len, + VHOST_ACCESS_RO) < 0)) + break; =20 - rte_prefetch0(desc); - sz =3D vq->desc[desc_indexes[i]].len / sizeof(*desc); - idx =3D 0; - } else { - desc =3D vq->desc; - sz =3D vq->size; - idx =3D desc_indexes[i]; - } + if (likely(dev->dequeue_zero_copy =3D=3D 0)) + update_shadow_used_ring(vq, head_idx, 0); =20 pkts[i] =3D rte_pktmbuf_alloc(mbuf_pool); if (unlikely(pkts[i] =3D=3D NULL)) { RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); - free_ind_table(idesc); break; } =20 - err =3D copy_desc_to_mbuf(dev, vq, desc, sz, pkts[i], idx, - mbuf_pool); + err =3D copy_desc_to_mbuf(dev, vq, buf_vec, nr_vec, pkts[i], + mbuf_pool); if (unlikely(err)) { rte_pktmbuf_free(pkts[i]); - free_ind_table(idesc); break; } =20 @@ -1269,11 +1154,10 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, zmbuf =3D get_zmbuf(vq); if (!zmbuf) { rte_pktmbuf_free(pkts[i]); - free_ind_table(idesc); break; } zmbuf->mbuf =3D pkts[i]; - zmbuf->desc_idx =3D desc_indexes[i]; + zmbuf->desc_idx =3D head_idx; =20 /* * Pin lock the mbuf; we will check later to see @@ -1286,9 +1170,6 @@ = rte_vhost_dequeue_burst(int vid, uint16_t queue_id, vq->nr_zmbuf +=3D 1; TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next); } - - if (unlikely(!!idesc)) - free_ind_table(idesc); } vq->last_avail_idx +=3D i; =20 -- 2.14.4