From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 05ADF6885 for ; Wed, 24 Sep 2014 11:19:50 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 24 Sep 2014 02:26:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,587,1406617200"; d="scan'208";a="604551227" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga002.fm.intel.com with ESMTP; 24 Sep 2014 02:26:01 -0700 Received: from fmsmsx152.amr.corp.intel.com (10.18.125.5) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 24 Sep 2014 02:26:01 -0700 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by FMSMSX152.amr.corp.intel.com (10.18.125.5) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 24 Sep 2014 02:26:00 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.203]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.190]) with mapi id 14.03.0195.001; Wed, 24 Sep 2014 17:25:58 +0800 From: "Fu, JingguoX" To: "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] examples/vhost: Support jumbo frame in user space vhost Thread-Index: AQHPuEYmGwUpmTHly0ye3BDP+3zY1ZwQQgNw Date: Wed, 24 Sep 2014 09:25:58 +0000 Message-ID: <6BD6202160B55B409D423293115822625483C6@SHSMSX101.ccr.corp.intel.com> References: <1408078681-3511-1-git-send-email-changchun.ouyang@intel.com> In-Reply-To: <1408078681-3511-1-git-send-email-changchun.ouyang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] examples/vhost: Support jumbo frame in user space vhost X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Sep 2014 09:19:52 -0000 Tested-by: Jingguo Fu This patch includes 1 file, and has been tested by Intel. Please see information as the following: Host: Fedora 19 x86_64, Linux Kernel 3.9.0, GCC 4.8.2 Intel Xeon CPU E5-2680 v2 = @ 2.80GHz NIC: Intel Niantic 82599, Intel i350, Intel 82580 and Intel 82576 Guest: Fedora 16 x86_64, Linux Kernel 3.4.2, GCC 4.6.3 Qemu emulator 1.4.2 We verified zero copy and one copy functional test and performance test, th= at is regression test with front end support jumbo frame=20 We verified jumbo frame support on front end, with linux legacy back end. We verified jumbo frame support on front end, with vhost backend -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ouyang Changchun Sent: Friday, August 15, 2014 12:58 To: dev@dpdk.org Subject: [dpdk-dev] [PATCH] examples/vhost: Support jumbo frame in user spa= ce vhost This patch support mergeable RX feature and thus support jumbo frame RX and= TX in user space vhost(as virtio backend). =20 On RX, it secures enough room from vring to accommodate one complete scatte= red packet which is received by PMD from physical port, and then copy data from mbuf to vring buffer, possibly across a few vring entries and descriptors. =20 On TX, it gets a jumbo frame, possibly described by a few vring descriptors= which are chained together with the flags of 'NEXT', and then copy them into one = scattered packet and TX it to physical port through PMD. Signed-off-by: Changchun Ouyang Acked-by: Huawei Xie --- examples/vhost/main.c | 726 ++++++++++++++++++++++++++++++++++++++++= ---- examples/vhost/virtio-net.h | 14 + 2 files changed, 687 insertions(+), 53 deletions(-) diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 193aa25..7d9e6a2 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -106,6 +106,8 @@ #define BURST_RX_WAIT_US 15 /* Defines how long we wait between retries o= n RX */ #define BURST_RX_RETRIES 4 /* Number of retries on RX. */ =20 +#define JUMBO_FRAME_MAX_SIZE 0x2600 + /* State of virtio device. */ #define DEVICE_MAC_LEARNING 0 #define DEVICE_RX 1 @@ -676,8 +678,12 @@ us_vhost_parse_args(int argc, char **argv) us_vhost_usage(prgname); return -1; } else { - if (ret) + if (ret) { + vmdq_conf_default.rxmode.jumbo_frame =3D 1; + vmdq_conf_default.rxmode.max_rx_pkt_len + =3D JUMBO_FRAME_MAX_SIZE; VHOST_FEATURES =3D (1ULL << VIRTIO_NET_F_MRG_RXBUF); + } } } =20 @@ -797,6 +803,14 @@ us_vhost_parse_args(int argc, char **argv) return -1; } =20 + if ((zero_copy =3D=3D 1) && (vmdq_conf_default.rxmode.jumbo_frame =3D=3D = 1)) { + RTE_LOG(INFO, VHOST_PORT, + "Vhost zero copy doesn't support jumbo frame," + "please specify '--mergeable 0' to disable the " + "mergeable feature.\n"); + return -1; + } + return 0; } =20 @@ -916,7 +930,7 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t guest_pa, * This function adds buffers to the virtio devices RX virtqueue. Buffers = can * be received from the physical port or from another virtio device. A pac= ket * count is returned to indicate the number of packets that were succesful= ly - * added to the RX queue. + * added to the RX queue. This function works when mergeable is disabled. */ static inline uint32_t __attribute__((always_inline)) virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf **pkts, uint32_t cou= nt) @@ -930,7 +944,6 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf *= *pkts, uint32_t count) uint64_t buff_hdr_addr =3D 0; uint32_t head[MAX_PKT_BURST], packet_len =3D 0; uint32_t head_idx, packet_success =3D 0; - uint32_t mergeable, mrg_count =3D 0; uint32_t retry =3D 0; uint16_t avail_idx, res_cur_idx; uint16_t res_base_idx, res_end_idx; @@ -940,6 +953,7 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf *= *pkts, uint32_t count) LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh); vq =3D dev->virtqueue[VIRTIO_RXQ]; count =3D (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count; + /* As many data cores may want access to available buffers, they need to = be reserved. */ do { res_base_idx =3D vq->last_used_idx_res; @@ -976,9 +990,6 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf *= *pkts, uint32_t count) /* Prefetch available ring to retrieve indexes. */ rte_prefetch0(&vq->avail->ring[res_cur_idx & (vq->size - 1)]); =20 - /* Check if the VIRTIO_NET_F_MRG_RXBUF feature is enabled. */ - mergeable =3D dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF); - /* Retrieve all of the head indexes first to avoid caching issues. */ for (head_idx =3D 0; head_idx < count; head_idx++) head[head_idx] =3D vq->avail->ring[(res_cur_idx + head_idx) & (vq->size = - 1)]; @@ -997,56 +1008,44 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbu= f **pkts, uint32_t count) /* Prefetch buffer address. */ rte_prefetch0((void*)(uintptr_t)buff_addr); =20 - if (mergeable && (mrg_count !=3D 0)) { - desc->len =3D packet_len =3D rte_pktmbuf_data_len(buff); - } else { - /* Copy virtio_hdr to packet and increment buffer address */ - buff_hdr_addr =3D buff_addr; - packet_len =3D rte_pktmbuf_data_len(buff) + vq->vhost_hlen; + /* Copy virtio_hdr to packet and increment buffer address */ + buff_hdr_addr =3D buff_addr; + packet_len =3D rte_pktmbuf_data_len(buff) + vq->vhost_hlen; =20 - /* - * If the descriptors are chained the header and data are placed in - * separate buffers. - */ - if (desc->flags & VRING_DESC_F_NEXT) { - desc->len =3D vq->vhost_hlen; - desc =3D &vq->desc[desc->next]; - /* Buffer address translation. */ - buff_addr =3D gpa_to_vva(dev, desc->addr); - desc->len =3D rte_pktmbuf_data_len(buff); - } else { - buff_addr +=3D vq->vhost_hlen; - desc->len =3D packet_len; - } + /* + * If the descriptors are chained the header and data are + * placed in separate buffers. + */ + if (desc->flags & VRING_DESC_F_NEXT) { + desc->len =3D vq->vhost_hlen; + desc =3D &vq->desc[desc->next]; + /* Buffer address translation. */ + buff_addr =3D gpa_to_vva(dev, desc->addr); + desc->len =3D rte_pktmbuf_data_len(buff); + } else { + buff_addr +=3D vq->vhost_hlen; + desc->len =3D packet_len; } =20 - PRINT_PACKET(dev, (uintptr_t)buff_addr, rte_pktmbuf_data_len(buff), 0); - /* Update used ring with desc information */ vq->used->ring[res_cur_idx & (vq->size - 1)].id =3D head[packet_success]= ; vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D packet_len; =20 /* Copy mbuf data to buffer */ - rte_memcpy((void *)(uintptr_t)buff_addr, (const void*)buff->pkt.data, rt= e_pktmbuf_data_len(buff)); + rte_memcpy((void *)(uintptr_t)buff_addr, + (const void *)buff->pkt.data, + rte_pktmbuf_data_len(buff)); + PRINT_PACKET(dev, (uintptr_t)buff_addr, + rte_pktmbuf_data_len(buff), 0); =20 res_cur_idx++; packet_success++; =20 - /* If mergeable is disabled then a header is required per buffer. */ - if (!mergeable) { - rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const void*)&virtio_hdr, = vq->vhost_hlen); - PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); - } else { - mrg_count++; - /* Merge buffer can only handle so many buffers at a time. Tell the gue= st if this limit is reached. */ - if ((mrg_count =3D=3D MAX_MRG_PKT_BURST) || (res_cur_idx =3D=3D res_end= _idx)) { - virtio_hdr.num_buffers =3D mrg_count; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", dev->d= evice_fh, virtio_hdr.num_buffers); - rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const void*)&virtio_hdr,= vq->vhost_hlen); - PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); - mrg_count =3D 0; - } - } + rte_memcpy((void *)(uintptr_t)buff_hdr_addr, + (const void *)&virtio_hdr, vq->vhost_hlen); + + PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); + if (res_cur_idx < res_end_idx) { /* Prefetch descriptor index. */ rte_prefetch0(&vq->desc[head[packet_success]]); @@ -1068,6 +1067,356 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mb= uf **pkts, uint32_t count) return count; } =20 +static inline uint32_t __attribute__((always_inline)) +copy_from_mbuf_to_vring(struct virtio_net *dev, + uint16_t res_base_idx, uint16_t res_end_idx, + struct rte_mbuf *pkt) +{ + uint32_t vec_idx =3D 0; + uint32_t entry_success =3D 0; + struct vhost_virtqueue *vq; + /* The virtio_hdr is initialised to 0. */ + struct virtio_net_hdr_mrg_rxbuf virtio_hdr =3D { + {0, 0, 0, 0, 0, 0}, 0}; + uint16_t cur_idx =3D res_base_idx; + uint64_t vb_addr =3D 0; + uint64_t vb_hdr_addr =3D 0; + uint32_t seg_offset =3D 0; + uint32_t vb_offset =3D 0; + uint32_t seg_avail; + uint32_t vb_avail; + uint32_t cpy_len, entry_len; + + if (pkt =3D=3D NULL) + return 0; + + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| " + "End Index %d\n", + dev->device_fh, cur_idx, res_end_idx); + + /* + * Convert from gpa to vva + * (guest physical addr -> vhost virtual addr) + */ + vq =3D dev->virtqueue[VIRTIO_RXQ]; + vb_addr =3D + gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + vb_hdr_addr =3D vb_addr; + + /* Prefetch buffer address. */ + rte_prefetch0((void *)(uintptr_t)vb_addr); + + virtio_hdr.num_buffers =3D res_end_idx - res_base_idx; + + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", + dev->device_fh, virtio_hdr.num_buffers); + + rte_memcpy((void *)(uintptr_t)vb_hdr_addr, + (const void *)&virtio_hdr, vq->vhost_hlen); + + PRINT_PACKET(dev, (uintptr_t)vb_hdr_addr, vq->vhost_hlen, 1); + + seg_avail =3D rte_pktmbuf_data_len(pkt); + vb_offset =3D vq->vhost_hlen; + vb_avail =3D + vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + + entry_len =3D vq->vhost_hlen; + + if (vb_avail =3D=3D 0) { + uint32_t desc_idx =3D + vq->buf_vec[vec_idx].desc_idx; + vq->desc[desc_idx].len =3D vq->vhost_hlen; + + if ((vq->desc[desc_idx].flags + & VRING_DESC_F_NEXT) =3D=3D 0) { + /* Update used ring with desc information */ + vq->used->ring[cur_idx & (vq->size - 1)].id + =3D vq->buf_vec[vec_idx].desc_idx; + vq->used->ring[cur_idx & (vq->size - 1)].len + =3D entry_len; + + entry_len =3D 0; + cur_idx++; + entry_success++; + } + + vec_idx++; + vb_addr =3D + gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + + /* Prefetch buffer address. */ + rte_prefetch0((void *)(uintptr_t)vb_addr); + vb_offset =3D 0; + vb_avail =3D vq->buf_vec[vec_idx].buf_len; + } + + cpy_len =3D RTE_MIN(vb_avail, seg_avail); + + while (cpy_len > 0) { + /* Copy mbuf data to vring buffer */ + rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset), + (const void *)(rte_pktmbuf_mtod(pkt, char*) + seg_offset), + cpy_len); + + PRINT_PACKET(dev, + (uintptr_t)(vb_addr + vb_offset), + cpy_len, 0); + + seg_offset +=3D cpy_len; + vb_offset +=3D cpy_len; + seg_avail -=3D cpy_len; + vb_avail -=3D cpy_len; + entry_len +=3D cpy_len; + + if (seg_avail !=3D 0) { + /* + * The virtio buffer in this vring + * entry reach to its end. + * But the segment doesn't complete. + */ + if ((vq->desc[vq->buf_vec[vec_idx].desc_idx].flags & + VRING_DESC_F_NEXT) =3D=3D 0) { + /* Update used ring with desc information */ + vq->used->ring[cur_idx & (vq->size - 1)].id + =3D vq->buf_vec[vec_idx].desc_idx; + vq->used->ring[cur_idx & (vq->size - 1)].len + =3D entry_len; + entry_len =3D 0; + cur_idx++; + entry_success++; + } + + vec_idx++; + vb_addr =3D gpa_to_vva(dev, + vq->buf_vec[vec_idx].buf_addr); + vb_offset =3D 0; + vb_avail =3D vq->buf_vec[vec_idx].buf_len; + cpy_len =3D RTE_MIN(vb_avail, seg_avail); + } else { + /* + * This current segment complete, need continue to + * check if the whole packet complete or not. + */ + pkt =3D pkt->pkt.next; + if (pkt !=3D NULL) { + /* + * There are more segments. + */ + if (vb_avail =3D=3D 0) { + /* + * This current buffer from vring is + * used up, need fetch next buffer + * from buf_vec. + */ + uint32_t desc_idx =3D + vq->buf_vec[vec_idx].desc_idx; + vq->desc[desc_idx].len =3D vb_offset; + + if ((vq->desc[desc_idx].flags & + VRING_DESC_F_NEXT) =3D=3D 0) { + uint16_t wrapped_idx =3D + cur_idx & (vq->size - 1); + /* + * Update used ring with the + * descriptor information + */ + vq->used->ring[wrapped_idx].id + =3D desc_idx; + vq->used->ring[wrapped_idx].len + =3D entry_len; + entry_success++; + entry_len =3D 0; + cur_idx++; + } + + /* Get next buffer from buf_vec. */ + vec_idx++; + vb_addr =3D gpa_to_vva(dev, + vq->buf_vec[vec_idx].buf_addr); + vb_avail =3D + vq->buf_vec[vec_idx].buf_len; + vb_offset =3D 0; + } + + seg_offset =3D 0; + seg_avail =3D rte_pktmbuf_data_len(pkt); + cpy_len =3D RTE_MIN(vb_avail, seg_avail); + } else { + /* + * This whole packet completes. + */ + uint32_t desc_idx =3D + vq->buf_vec[vec_idx].desc_idx; + vq->desc[desc_idx].len =3D vb_offset; + + while (vq->desc[desc_idx].flags & + VRING_DESC_F_NEXT) { + desc_idx =3D vq->desc[desc_idx].next; + vq->desc[desc_idx].len =3D 0; + } + + /* Update used ring with desc information */ + vq->used->ring[cur_idx & (vq->size - 1)].id + =3D vq->buf_vec[vec_idx].desc_idx; + vq->used->ring[cur_idx & (vq->size - 1)].len + =3D entry_len; + entry_len =3D 0; + cur_idx++; + entry_success++; + seg_avail =3D 0; + cpy_len =3D RTE_MIN(vb_avail, seg_avail); + } + } + } + + return entry_success; +} + +/* + * This function adds buffers to the virtio devices RX virtqueue. Buffers = can + * be received from the physical port or from another virtio device. A pac= ket + * count is returned to indicate the number of packets that were succesful= ly + * added to the RX queue. This function works for mergeable RX. + */ +static inline uint32_t __attribute__((always_inline)) +virtio_dev_merge_rx(struct virtio_net *dev, struct rte_mbuf **pkts, + uint32_t count) +{ + struct vhost_virtqueue *vq; + uint32_t pkt_idx =3D 0, entry_success =3D 0; + uint32_t retry =3D 0; + uint16_t avail_idx, res_cur_idx; + uint16_t res_base_idx, res_end_idx; + uint8_t success =3D 0; + + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n", + dev->device_fh); + vq =3D dev->virtqueue[VIRTIO_RXQ]; + count =3D RTE_MIN((uint32_t)MAX_PKT_BURST, count); + + if (count =3D=3D 0) + return 0; + + for (pkt_idx =3D 0; pkt_idx < count; pkt_idx++) { + uint32_t secure_len =3D 0; + uint16_t need_cnt; + uint32_t vec_idx =3D 0; + uint32_t pkt_len =3D pkts[pkt_idx]->pkt.pkt_len + vq->vhost_hlen; + uint16_t i, id; + + do { + /* + * As many data cores may want access to available + * buffers, they need to be reserved. + */ + res_base_idx =3D vq->last_used_idx_res; + res_cur_idx =3D res_base_idx; + + do { + avail_idx =3D *((volatile uint16_t *)&vq->avail->idx); + if (unlikely(res_cur_idx =3D=3D avail_idx)) { + /* + * If retry is enabled and the queue is + * full then we wait and retry to avoid + * packet loss. + */ + if (enable_retry) { + uint8_t cont =3D 0; + for (retry =3D 0; retry < burst_rx_retry_num; retry++) { + rte_delay_us(burst_rx_delay_time); + avail_idx =3D + *((volatile uint16_t *)&vq->avail->idx); + if (likely(res_cur_idx !=3D avail_idx)) { + cont =3D 1; + break; + } + } + if (cont =3D=3D 1) + continue; + } + + LOG_DEBUG(VHOST_DATA, + "(%"PRIu64") Failed " + "to get enough desc from " + "vring\n", + dev->device_fh); + return pkt_idx; + } else { + uint16_t wrapped_idx =3D + (res_cur_idx) & (vq->size - 1); + uint32_t idx =3D + vq->avail->ring[wrapped_idx]; + uint8_t next_desc; + + do { + next_desc =3D 0; + secure_len +=3D vq->desc[idx].len; + if (vq->desc[idx].flags & + VRING_DESC_F_NEXT) { + idx =3D vq->desc[idx].next; + next_desc =3D 1; + } + } while (next_desc); + + res_cur_idx++; + } + } while (pkt_len > secure_len); + + /* vq->last_used_idx_res is atomically updated. */ + success =3D rte_atomic16_cmpset(&vq->last_used_idx_res, + res_base_idx, + res_cur_idx); + } while (success =3D=3D 0); + + id =3D res_base_idx; + need_cnt =3D res_cur_idx - res_base_idx; + + for (i =3D 0; i < need_cnt; i++, id++) { + uint16_t wrapped_idx =3D id & (vq->size - 1); + uint32_t idx =3D vq->avail->ring[wrapped_idx]; + uint8_t next_desc; + do { + next_desc =3D 0; + vq->buf_vec[vec_idx].buf_addr =3D + vq->desc[idx].addr; + vq->buf_vec[vec_idx].buf_len =3D + vq->desc[idx].len; + vq->buf_vec[vec_idx].desc_idx =3D idx; + vec_idx++; + + if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { + idx =3D vq->desc[idx].next; + next_desc =3D 1; + } + } while (next_desc); + } + + res_end_idx =3D res_cur_idx; + + entry_success =3D copy_from_mbuf_to_vring(dev, res_base_idx, + res_end_idx, pkts[pkt_idx]); + + rte_compiler_barrier(); + + /* + * Wait until it's our turn to add our buffer + * to the used ring. + */ + while (unlikely(vq->last_used_idx !=3D res_base_idx)) + rte_pause(); + + *(volatile uint16_t *)&vq->used->idx +=3D entry_success; + vq->last_used_idx =3D res_end_idx; + + /* Kick the guest if necessary. */ + if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) + eventfd_write((int)vq->kickfd, 1); + } + + return count; +} + /* * Compares a packet destination MAC address to a device MAC address. */ @@ -1199,8 +1548,17 @@ virtio_tx_local(struct virtio_net *dev, struct rte_m= buf *m) /*drop the packet if the device is marked for removal*/ LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Device is marked for removal\n", de= v_ll->dev->device_fh); } else { + uint32_t mergeable =3D + dev_ll->dev->features & + (1 << VIRTIO_NET_F_MRG_RXBUF); + /*send the packet to the local virtio device*/ - ret =3D virtio_dev_rx(dev_ll->dev, &m, 1); + if (likely(mergeable =3D=3D 0)) + ret =3D virtio_dev_rx(dev_ll->dev, &m, 1); + else + ret =3D virtio_dev_merge_rx(dev_ll->dev, + &m, 1); + if (enable_stats) { rte_atomic64_add( &dev_statistics[dev_ll->dev->device_fh].rx_total_atomic, @@ -1231,7 +1589,7 @@ virtio_tx_route(struct virtio_net* dev, struct rte_mb= uf *m, struct rte_mempool * struct mbuf_table *tx_q; struct vlan_ethhdr *vlan_hdr; struct rte_mbuf **m_table; - struct rte_mbuf *mbuf; + struct rte_mbuf *mbuf, *prev; unsigned len, ret, offset =3D 0; const uint16_t lcore_id =3D rte_lcore_id(); struct virtio_net_data_ll *dev_ll =3D ll_root_used; @@ -1284,12 +1642,14 @@ virtio_tx_route(struct virtio_net* dev, struct rte_= mbuf *m, struct rte_mempool * /* Allocate an mbuf and populate the structure. */ mbuf =3D rte_pktmbuf_alloc(mbuf_pool); if (unlikely(mbuf =3D=3D NULL)) { - RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); + RTE_LOG(ERR, VHOST_DATA, + "Failed to allocate memory for mbuf.\n"); return; } =20 mbuf->pkt.data_len =3D m->pkt.data_len + VLAN_HLEN + offset; - mbuf->pkt.pkt_len =3D mbuf->pkt.data_len; + mbuf->pkt.pkt_len =3D m->pkt.pkt_len + VLAN_HLEN + offset; + mbuf->pkt.nb_segs =3D m->pkt.nb_segs; =20 /* Copy ethernet header to mbuf. */ rte_memcpy((void*)mbuf->pkt.data, (const void*)m->pkt.data, ETH_HLEN); @@ -1304,6 +1664,29 @@ virtio_tx_route(struct virtio_net* dev, struct rte_m= buf *m, struct rte_mempool * /* Copy the remaining packet contents to the mbuf. */ rte_memcpy((void*) ((uint8_t*)mbuf->pkt.data + VLAN_ETH_HLEN), (const void*) ((uint8_t*)m->pkt.data + ETH_HLEN), (m->pkt.data_len - ETH= _HLEN)); + + /* Copy the remaining segments for the whole packet. */ + prev =3D mbuf; + while (m->pkt.next) { + /* Allocate an mbuf and populate the structure. */ + struct rte_mbuf *next_mbuf =3D rte_pktmbuf_alloc(mbuf_pool); + if (unlikely(next_mbuf =3D=3D NULL)) { + rte_pktmbuf_free(mbuf); + RTE_LOG(ERR, VHOST_DATA, + "Failed to allocate memory for mbuf.\n"); + return; + } + + m =3D m->pkt.next; + prev->pkt.next =3D next_mbuf; + prev =3D next_mbuf; + next_mbuf->pkt.data_len =3D m->pkt.data_len; + + /* Copy data to next mbuf. */ + rte_memcpy(rte_pktmbuf_mtod(next_mbuf, void *), + rte_pktmbuf_mtod(m, const void *), m->pkt.data_len); + } + tx_q->m_table[len] =3D mbuf; len++; if (enable_stats) { @@ -1394,6 +1777,7 @@ virtio_dev_tx(struct virtio_net* dev, struct rte_memp= ool *mbuf_pool) =20 /* Setup dummy mbuf. This is copied to a real mbuf if transmitted out th= e physical port. */ m.pkt.data_len =3D desc->len; + m.pkt.pkt_len =3D desc->len; m.pkt.data =3D (void*)(uintptr_t)buff_addr; =20 PRINT_PACKET(dev, (uintptr_t)buff_addr, desc->len, 0); @@ -1420,6 +1804,227 @@ virtio_dev_tx(struct virtio_net* dev, struct rte_me= mpool *mbuf_pool) eventfd_write((int)vq->kickfd, 1); } =20 +/* This function works for TX packets with mergeable feature enabled. */ +static inline void __attribute__((always_inline)) +virtio_dev_merge_tx(struct virtio_net *dev, struct rte_mempool *mbuf_pool) +{ + struct rte_mbuf *m, *prev; + struct vhost_virtqueue *vq; + struct vring_desc *desc; + uint64_t vb_addr =3D 0; + uint32_t head[MAX_PKT_BURST]; + uint32_t used_idx; + uint32_t i; + uint16_t free_entries, entry_success =3D 0; + uint16_t avail_idx; + uint32_t buf_size =3D MBUF_SIZE - (sizeof(struct rte_mbuf) + + RTE_PKTMBUF_HEADROOM); + + vq =3D dev->virtqueue[VIRTIO_TXQ]; + avail_idx =3D *((volatile uint16_t *)&vq->avail->idx); + + /* If there are no available buffers then return. */ + if (vq->last_used_idx =3D=3D avail_idx) + return; + + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_tx()\n", + dev->device_fh); + + /* Prefetch available ring to retrieve head indexes. */ + rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]); + + /*get the number of free entries in the ring*/ + free_entries =3D (avail_idx - vq->last_used_idx); + + /* Limit to MAX_PKT_BURST. */ + free_entries =3D RTE_MIN(free_entries, MAX_PKT_BURST); + + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", + dev->device_fh, free_entries); + /* Retrieve all of the head indexes first to avoid caching issues. */ + for (i =3D 0; i < free_entries; i++) + head[i] =3D vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; + + /* Prefetch descriptor index. */ + rte_prefetch0(&vq->desc[head[entry_success]]); + rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]); + + while (entry_success < free_entries) { + uint32_t vb_avail, vb_offset; + uint32_t seg_avail, seg_offset; + uint32_t cpy_len; + uint32_t seg_num =3D 0; + struct rte_mbuf *cur; + uint8_t alloc_err =3D 0; + + desc =3D &vq->desc[head[entry_success]]; + + /* Discard first buffer as it is the virtio header */ + desc =3D &vq->desc[desc->next]; + + /* Buffer address translation. */ + vb_addr =3D gpa_to_vva(dev, desc->addr); + /* Prefetch buffer address. */ + rte_prefetch0((void *)(uintptr_t)vb_addr); + + used_idx =3D vq->last_used_idx & (vq->size - 1); + + if (entry_success < (free_entries - 1)) { + /* Prefetch descriptor index. */ + rte_prefetch0(&vq->desc[head[entry_success+1]]); + rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); + } + + /* Update used index buffer information. */ + vq->used->ring[used_idx].id =3D head[entry_success]; + vq->used->ring[used_idx].len =3D 0; + + vb_offset =3D 0; + vb_avail =3D desc->len; + seg_offset =3D 0; + seg_avail =3D buf_size; + cpy_len =3D RTE_MIN(vb_avail, seg_avail); + + PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); + + /* Allocate an mbuf and populate the structure. */ + m =3D rte_pktmbuf_alloc(mbuf_pool); + if (unlikely(m =3D=3D NULL)) { + RTE_LOG(ERR, VHOST_DATA, + "Failed to allocate memory for mbuf.\n"); + return; + } + + seg_num++; + cur =3D m; + prev =3D m; + while (cpy_len !=3D 0) { + rte_memcpy((void *)(rte_pktmbuf_mtod(cur, char *) + seg_offset), + (void *)((uintptr_t)(vb_addr + vb_offset)), + cpy_len); + + seg_offset +=3D cpy_len; + vb_offset +=3D cpy_len; + vb_avail -=3D cpy_len; + seg_avail -=3D cpy_len; + + if (vb_avail !=3D 0) { + /* + * The segment reachs to its end, + * while the virtio buffer in TX vring has + * more data to be copied. + */ + cur->pkt.data_len =3D seg_offset; + m->pkt.pkt_len +=3D seg_offset; + /* Allocate mbuf and populate the structure. */ + cur =3D rte_pktmbuf_alloc(mbuf_pool); + if (unlikely(cur =3D=3D NULL)) { + RTE_LOG(ERR, VHOST_DATA, "Failed to " + "allocate memory for mbuf.\n"); + rte_pktmbuf_free(m); + alloc_err =3D 1; + break; + } + + seg_num++; + prev->pkt.next =3D cur; + prev =3D cur; + seg_offset =3D 0; + seg_avail =3D buf_size; + } else { + if (desc->flags & VRING_DESC_F_NEXT) { + /* + * There are more virtio buffers in + * same vring entry need to be copied. + */ + if (seg_avail =3D=3D 0) { + /* + * The current segment hasn't + * room to accomodate more + * data. + */ + cur->pkt.data_len =3D seg_offset; + m->pkt.pkt_len +=3D seg_offset; + /* + * Allocate an mbuf and + * populate the structure. + */ + cur =3D rte_pktmbuf_alloc(mbuf_pool); + if (unlikely(cur =3D=3D NULL)) { + RTE_LOG(ERR, + VHOST_DATA, + "Failed to " + "allocate memory " + "for mbuf\n"); + rte_pktmbuf_free(m); + alloc_err =3D 1; + break; + } + seg_num++; + prev->pkt.next =3D cur; + prev =3D cur; + seg_offset =3D 0; + seg_avail =3D buf_size; + } + + desc =3D &vq->desc[desc->next]; + + /* Buffer address translation. */ + vb_addr =3D gpa_to_vva(dev, desc->addr); + /* Prefetch buffer address. */ + rte_prefetch0((void *)(uintptr_t)vb_addr); + vb_offset =3D 0; + vb_avail =3D desc->len; + + PRINT_PACKET(dev, (uintptr_t)vb_addr, + desc->len, 0); + } else { + /* The whole packet completes. */ + cur->pkt.data_len =3D seg_offset; + m->pkt.pkt_len +=3D seg_offset; + vb_avail =3D 0; + } + } + + cpy_len =3D RTE_MIN(vb_avail, seg_avail); + } + + if (unlikely(alloc_err =3D=3D 1)) + break; + + m->pkt.nb_segs =3D seg_num; + + /* + * If this is the first received packet we need to learn + * the MAC and setup VMDQ + */ + if (dev->ready =3D=3D DEVICE_MAC_LEARNING) { + if (dev->remove || (link_vmdq(dev, m) =3D=3D -1)) { + /* + * Discard frame if device is scheduled for + * removal or a duplicate MAC address is found. + */ + entry_success =3D free_entries; + vq->last_used_idx +=3D entry_success; + rte_pktmbuf_free(m); + break; + } + } + + virtio_tx_route(dev, m, mbuf_pool, (uint16_t)dev->device_fh); + vq->last_used_idx++; + entry_success++; + rte_pktmbuf_free(m); + } + + rte_compiler_barrier(); + vq->used->idx +=3D entry_success; + /* Kick guest if required. */ + if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) + eventfd_write((int)vq->kickfd, 1); + +} + /* * This function is called by each data core. It handles all RX/TX registe= red with the * core. For TX the specific lcore linked list is used. For RX, MAC addres= ses are compared @@ -1440,8 +2045,9 @@ switch_worker(__attribute__((unused)) void *arg) const uint16_t lcore_id =3D rte_lcore_id(); const uint16_t num_cores =3D (uint16_t)rte_lcore_count(); uint16_t rx_count =3D 0; + uint32_t mergeable =3D 0; =20 - RTE_LOG(INFO, VHOST_DATA, "Procesing on Core %u started \n", lcore_id); + RTE_LOG(INFO, VHOST_DATA, "Procesing on Core %u started\n", lcore_id); lcore_ll =3D lcore_info[lcore_id].lcore_ll; prev_tsc =3D 0; =20 @@ -1497,6 +2103,8 @@ switch_worker(__attribute__((unused)) void *arg) while (dev_ll !=3D NULL) { /*get virtio device ID*/ dev =3D dev_ll->dev; + mergeable =3D + dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF); =20 if (dev->remove) { dev_ll =3D dev_ll->next; @@ -1510,7 +2118,15 @@ switch_worker(__attribute__((unused)) void *arg) (uint16_t)dev->vmdq_rx_q, pkts_burst, MAX_PKT_BURST); =20 if (rx_count) { - ret_count =3D virtio_dev_rx(dev, pkts_burst, rx_count); + if (likely(mergeable =3D=3D 0)) + ret_count =3D + virtio_dev_rx(dev, + pkts_burst, rx_count); + else + ret_count =3D + virtio_dev_merge_rx(dev, + pkts_burst, rx_count); + if (enable_stats) { rte_atomic64_add( &dev_statistics[dev_ll->dev->device_fh].rx_total_atomic, @@ -1520,15 +2136,19 @@ switch_worker(__attribute__((unused)) void *arg) } while (likely(rx_count)) { rx_count--; - rte_pktmbuf_free_seg(pkts_burst[rx_count]); + rte_pktmbuf_free(pkts_burst[rx_count]); } =20 } } =20 - if (!dev->remove) + if (!dev->remove) { /*Handle guest TX*/ - virtio_dev_tx(dev, mbuf_pool); + if (likely(mergeable =3D=3D 0)) + virtio_dev_tx(dev, mbuf_pool); + else + virtio_dev_merge_tx(dev, mbuf_pool); + } =20 /*move to the next device in the list*/ dev_ll =3D dev_ll->next; diff --git a/examples/vhost/virtio-net.h b/examples/vhost/virtio-net.h index 3d1f255..1a2f0dc 100644 --- a/examples/vhost/virtio-net.h +++ b/examples/vhost/virtio-net.h @@ -45,6 +45,18 @@ /* Enum for virtqueue management. */ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; =20 +#define BUF_VECTOR_MAX 256 + +/* + * Structure contains buffer address, length and descriptor index + * from vring to do scatter RX. +*/ +struct buf_vector { +uint64_t buf_addr; +uint32_t buf_len; +uint32_t desc_idx; +}; + /* * Structure contains variables relevant to TX/RX virtqueues. */ @@ -60,6 +72,8 @@ struct vhost_virtqueue volatile uint16_t last_used_idx_res; /* Used for multiple devices reservi= ng buffers. */ eventfd_t callfd; /* Currently unused as polling mode is enabled. */ eventfd_t kickfd; /* Used to notify the guest (trigger interrupt). *= / + /* Used for scatter RX. */ + struct buf_vector buf_vec[BUF_VECTOR_MAX]; } __rte_cache_aligned; =20 /* --=20 1.8.4.2