From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 49400C368 for ; Tue, 2 Jun 2015 10:11:03 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP; 02 Jun 2015 01:11:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,538,1427785200"; d="scan'208";a="501401875" Received: from pgsmsx104.gar.corp.intel.com ([10.221.44.91]) by FMSMGA003.fm.intel.com with ESMTP; 02 Jun 2015 01:11:01 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by PGSMSX104.gar.corp.intel.com (10.221.44.91) with Microsoft SMTP Server (TLS) id 14.3.224.2; Tue, 2 Jun 2015 16:11:00 +0800 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.109]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.180]) with mapi id 14.03.0224.002; Tue, 2 Jun 2015 16:10:58 +0800 From: "Ouyang, Changchun" To: "Xie, Huawei" , "dev@dpdk.org" Thread-Topic: [PATCH v3 1/4] lib_vhost: Fix enqueue/dequeue can't handle chained vring descriptors Thread-Index: AQHQnESfH5TgjUx9+ku0FEhiUfkpF52Y2nGQ Date: Tue, 2 Jun 2015 08:10:57 +0000 Message-ID: References: <1432826207-8428-1-git-send-email-changchun.ouyang@intel.com> <1433147149-31645-1-git-send-email-changchun.ouyang@intel.com> <1433147149-31645-2-git-send-email-changchun.ouyang@intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/4] lib_vhost: Fix enqueue/dequeue can't handle chained vring descriptors X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jun 2015 08:11:04 -0000 > -----Original Message----- > From: Xie, Huawei > Sent: Tuesday, June 2, 2015 3:51 PM > To: Ouyang, Changchun; dev@dpdk.org > Cc: Cao, Waterman > Subject: Re: [PATCH v3 1/4] lib_vhost: Fix enqueue/dequeue can't handle > chained vring descriptors >=20 > On 6/1/2015 4:26 PM, Ouyang, Changchun wrote: > > Vring enqueue need consider the 2 cases: > > 1. use separate descriptors to contain virtio header and actual data, = e.g. > the first descriptor > > is for virtio header, and then followed by descriptors for actual d= ata. > > 2. virtio header and some data are put together in one descriptor, e.g= . the > first descriptor contain both > > virtio header and part of actual data, and then followed by more > descriptors for rest of packet data, > > current DPDK based virtio-net pmd implementation is this case; > > > > So does vring dequeue, it should not assume vring descriptor is > > chained or not chained, it should use > > desc->flags to check whether it is chained or not. this patch also > > desc->fixes TX corrupt issue in fedora 21 > > which uses one single vring descriptor(header and data are in one > descriptor) for virtio tx process on default. > Suggest remove fedora 21 in the commit message, at least the bug is relat= ed > to virtio-net driver rather than distribution. Just try to give more information that this patch can fix the issue TX corr= upt issue on fedora 21, I don't think this information is contradict with what you mean.=20 > > Changes in v3 > > - support scattered mbuf, check the mbuf has 'next' pointer or not an= d > copy all segments to vring buffer. > > > > Changes in v2 > > - drop the uncompleted packet > > - refine code logic > > > > Signed-off-by: Changchun Ouyang > > --- > > lib/librte_vhost/vhost_rxtx.c | 88 > > ++++++++++++++++++++++++++++++++++--------- > > 1 file changed, 71 insertions(+), 17 deletions(-) > > > > diff --git a/lib/librte_vhost/vhost_rxtx.c > > b/lib/librte_vhost/vhost_rxtx.c index 4809d32..5fe1b6c 100644 > > --- a/lib/librte_vhost/vhost_rxtx.c > > +++ b/lib/librte_vhost/vhost_rxtx.c > > @@ -46,7 +46,8 @@ > > * This function adds buffers to the virtio devices RX virtqueue. Buff= ers can > > * be received from the physical port or from another virtio device. A > packet > > * count is returned to indicate the number of packets that are > > succesfully > > - * added to the RX queue. This function works when mergeable is disabl= ed. > > + * added to the RX queue. This function works when the mbuf is > > + scattered, but > > + * it doesn't support the mergeable feature. > > */ > > static inline uint32_t __attribute__((always_inline)) > > virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, @@ -59,7 > > +60,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, > > struct virtio_net_hdr_mrg_rxbuf virtio_hdr =3D {{0, 0, 0, 0, 0, 0}, 0= }; > > uint64_t buff_addr =3D 0; > > uint64_t buff_hdr_addr =3D 0; > > - uint32_t head[MAX_PKT_BURST], packet_len =3D 0; > > + uint32_t head[MAX_PKT_BURST]; > > uint32_t head_idx, packet_success =3D 0; > > uint16_t avail_idx, res_cur_idx; > > uint16_t res_base_idx, res_end_idx; > > @@ -113,6 +114,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t > queue_id, > > rte_prefetch0(&vq->desc[head[packet_success]]); > > > > while (res_cur_idx !=3D res_end_idx) { > > + uint32_t offset =3D 0, vb_offset =3D 0; > > + uint32_t pkt_len, len_to_cpy, data_len, total_copied =3D 0; > > + uint8_t hdr =3D 0, uncompleted_pkt =3D 0; > > + > > /* Get descriptor from available ring */ > > desc =3D &vq->desc[head[packet_success]]; > > > > @@ -125,7 +130,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t > > queue_id, > > > > /* Copy virtio_hdr to packet and increment buffer address */ > > buff_hdr_addr =3D buff_addr; > > - packet_len =3D rte_pktmbuf_data_len(buff) + vq->vhost_hlen; > > > > /* > > * If the descriptors are chained the header and data are @@ > > -136,28 +140,73 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t > queue_id, > > desc =3D &vq->desc[desc->next]; > > /* Buffer address translation. */ > > buff_addr =3D gpa_to_vva(dev, desc->addr); > > - desc->len =3D rte_pktmbuf_data_len(buff); > > } else { > > - buff_addr +=3D vq->vhost_hlen; > > - desc->len =3D packet_len; > > + vb_offset +=3D vq->vhost_hlen; > > + hdr =3D 1; > > } > > > > + pkt_len =3D rte_pktmbuf_pkt_len(buff); > > + data_len =3D rte_pktmbuf_data_len(buff); > > + len_to_cpy =3D RTE_MIN(data_len, > > + hdr ? desc->len - vq->vhost_hlen : desc->len); > > + while (len_to_cpy > 0) { > while (total_copied < pkt_len) is secure and readable. > Besides, what if we encounter some descriptor with zero length? With > len_to_cpy > 0, we would pass a partially copied mbuf to guest but still > used->len =3D packet_len. Good catch, will update it in v4. > > + /* Copy mbuf data to buffer */ > > + rte_memcpy((void *)(uintptr_t)(buff_addr + > vb_offset), > > + (const void *)(rte_pktmbuf_mtod(buff, > const char *) + offset), > > + len_to_cpy); > > + PRINT_PACKET(dev, (uintptr_t)(buff_addr + > vb_offset), > > + len_to_cpy, 0); > > + > > + offset +=3D len_to_cpy; > > + vb_offset +=3D len_to_cpy; > > + total_copied +=3D len_to_cpy; > > + > > + /* The whole packet completes */ > > + if (total_copied =3D=3D pkt_len) > > + break; > > + > > + /* The current segment completes */ > > + if (offset =3D=3D data_len) { > > + buff =3D buff->next; > > + if (buff !=3D NULL) { > > + offset =3D 0; > > + data_len =3D > rte_pktmbuf_data_len(buff); > > + } > What if (buf =3D=3D NULL)? Either we treat mbuf reliable, and don't do an= y sanity > check, or we check thoroughly. > if (buff !=3D NULL) { The check could be removed, as another check " total_copied =3D=3D pkt_len"= can guarantee when the current segment completes and The whole packet doesn't complete, then the next must not be null.=20 Will update it in v4 >=20 > } else { > ... > break; > } > > + } > > + > > + /* The current vring descriptor done */ > > + if (vb_offset =3D=3D desc->len) { > > + if (desc->flags & VRING_DESC_F_NEXT) { > > + desc =3D &vq->desc[desc->next]; > > + buff_addr =3D gpa_to_vva(dev, desc- > >addr); > > + vb_offset =3D 0; > > + } else { > > + /* Room in vring buffer is not enough > */ > > + uncompleted_pkt =3D 1; > > + break; > > + } > > + } > > + len_to_cpy =3D RTE_MIN(data_len - offset, desc->len - > vb_offset); > > + }; > > + > > /* Update used ring with desc information */ > > vq->used->ring[res_cur_idx & (vq->size - 1)].id =3D > > > head[packet_success]; > > - vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D > packet_len; > > > > - /* Copy mbuf data to buffer */ > > - /* FIXME for sg mbuf and the case that desc couldn't hold the > mbuf data */ > > - rte_memcpy((void *)(uintptr_t)buff_addr, > > - rte_pktmbuf_mtod(buff, const void *), > > - rte_pktmbuf_data_len(buff)); > > - PRINT_PACKET(dev, (uintptr_t)buff_addr, > > - rte_pktmbuf_data_len(buff), 0); > > + /* Drop the packet if it is uncompleted */ > > + if (unlikely(uncompleted_pkt =3D=3D 1)) > > + vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D > > + vq->vhost_hlen; > > + else > > + vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D > > + pkt_len + vq- > >vhost_hlen; > > > > res_cur_idx++; > > packet_success++; > > > > + if (unlikely(uncompleted_pkt =3D=3D 1)) > > + continue; > > + > > rte_memcpy((void *)(uintptr_t)buff_hdr_addr, > > (const void *)&virtio_hdr, vq->vhost_hlen); > > > > @@ -589,7 +638,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, > uint16_t queue_id, > > desc =3D &vq->desc[head[entry_success]]; > > > > /* Discard first buffer as it is the virtio header */ > > - desc =3D &vq->desc[desc->next]; > > + if (desc->flags & VRING_DESC_F_NEXT) { > > + desc =3D &vq->desc[desc->next]; > > + vb_offset =3D 0; > > + vb_avail =3D desc->len; > > + } else { > > + vb_offset =3D vq->vhost_hlen; > > + vb_avail =3D desc->len - vb_offset; > > + } > > > > /* Buffer address translation. */ > > vb_addr =3D gpa_to_vva(dev, desc->addr); @@ -608,8 +664,6 > @@ > > rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > > vq->used->ring[used_idx].id =3D head[entry_success]; > > vq->used->ring[used_idx].len =3D 0; > > > > - vb_offset =3D 0; > > - vb_avail =3D desc->len; > > /* Allocate an mbuf and populate the structure. */ > > m =3D rte_pktmbuf_alloc(mbuf_pool); > > if (unlikely(m =3D=3D NULL)) {