From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 9A9B4C350 for ; Tue, 2 Jun 2015 09:51:28 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP; 02 Jun 2015 00:51:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,538,1427785200"; d="scan'208";a="580492087" Received: from pgsmsx101.gar.corp.intel.com ([10.221.44.78]) by orsmga003.jf.intel.com with ESMTP; 02 Jun 2015 00:51:26 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by PGSMSX101.gar.corp.intel.com (10.221.44.78) with Microsoft SMTP Server (TLS) id 14.3.224.2; Tue, 2 Jun 2015 15:51:25 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.120]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.180]) with mapi id 14.03.0224.002; Tue, 2 Jun 2015 15:51:24 +0800 From: "Xie, Huawei" To: "Ouyang, Changchun" , "dev@dpdk.org" Thread-Topic: [PATCH v3 1/4] lib_vhost: Fix enqueue/dequeue can't handle chained vring descriptors Thread-Index: AdCdCOul7DySrUJiSuaUPHZ4tk+RFQ== Date: Tue, 2 Jun 2015 07:51:23 +0000 Message-ID: References: <1432826207-8428-1-git-send-email-changchun.ouyang@intel.com> <1433147149-31645-1-git-send-email-changchun.ouyang@intel.com> <1433147149-31645-2-git-send-email-changchun.ouyang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/4] lib_vhost: Fix enqueue/dequeue can't handle chained vring descriptors X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jun 2015 07:51:29 -0000 On 6/1/2015 4:26 PM, Ouyang, Changchun wrote:=0A= > Vring enqueue need consider the 2 cases:=0A= > 1. use separate descriptors to contain virtio header and actual data, e.= g. the first descriptor=0A= > is for virtio header, and then followed by descriptors for actual dat= a.=0A= > 2. virtio header and some data are put together in one descriptor, e.g. = the first descriptor contain both=0A= > virtio header and part of actual data, and then followed by more desc= riptors for rest of packet data,=0A= > current DPDK based virtio-net pmd implementation is this case;=0A= >=0A= > So does vring dequeue, it should not assume vring descriptor is chained o= r not chained, it should use=0A= > desc->flags to check whether it is chained or not. this patch also fixes = TX corrupt issue in fedora 21=0A= > which uses one single vring descriptor(header and data are in one descrip= tor) for virtio tx process on default.=0A= Suggest remove fedora 21 in the commit message, at least the bug is=0A= related to virtio-net driver rather than distribution.=0A= > Changes in v3=0A= > - support scattered mbuf, check the mbuf has 'next' pointer or not and = copy all segments to vring buffer.=0A= >=0A= > Changes in v2=0A= > - drop the uncompleted packet=0A= > - refine code logic=0A= >=0A= > Signed-off-by: Changchun Ouyang =0A= > ---=0A= > lib/librte_vhost/vhost_rxtx.c | 88 ++++++++++++++++++++++++++++++++++---= ------=0A= > 1 file changed, 71 insertions(+), 17 deletions(-)=0A= >=0A= > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.= c=0A= > index 4809d32..5fe1b6c 100644=0A= > --- a/lib/librte_vhost/vhost_rxtx.c=0A= > +++ b/lib/librte_vhost/vhost_rxtx.c=0A= > @@ -46,7 +46,8 @@=0A= > * This function adds buffers to the virtio devices RX virtqueue. Buffer= s can=0A= > * be received from the physical port or from another virtio device. A p= acket=0A= > * count is returned to indicate the number of packets that are succesfu= lly=0A= > - * added to the RX queue. This function works when mergeable is disabled= .=0A= > + * added to the RX queue. This function works when the mbuf is scattered= , but=0A= > + * it doesn't support the mergeable feature.=0A= > */=0A= > static inline uint32_t __attribute__((always_inline))=0A= > virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,=0A= > @@ -59,7 +60,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id= ,=0A= > struct virtio_net_hdr_mrg_rxbuf virtio_hdr =3D {{0, 0, 0, 0, 0, 0}, 0};= =0A= > uint64_t buff_addr =3D 0;=0A= > uint64_t buff_hdr_addr =3D 0;=0A= > - uint32_t head[MAX_PKT_BURST], packet_len =3D 0;=0A= > + uint32_t head[MAX_PKT_BURST];=0A= > uint32_t head_idx, packet_success =3D 0;=0A= > uint16_t avail_idx, res_cur_idx;=0A= > uint16_t res_base_idx, res_end_idx;=0A= > @@ -113,6 +114,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue= _id,=0A= > rte_prefetch0(&vq->desc[head[packet_success]]);=0A= > =0A= > while (res_cur_idx !=3D res_end_idx) {=0A= > + uint32_t offset =3D 0, vb_offset =3D 0;=0A= > + uint32_t pkt_len, len_to_cpy, data_len, total_copied =3D 0;=0A= > + uint8_t hdr =3D 0, uncompleted_pkt =3D 0;=0A= > +=0A= > /* Get descriptor from available ring */=0A= > desc =3D &vq->desc[head[packet_success]];=0A= > =0A= > @@ -125,7 +130,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_= id,=0A= > =0A= > /* Copy virtio_hdr to packet and increment buffer address */=0A= > buff_hdr_addr =3D buff_addr;=0A= > - packet_len =3D rte_pktmbuf_data_len(buff) + vq->vhost_hlen;=0A= > =0A= > /*=0A= > * If the descriptors are chained the header and data are=0A= > @@ -136,28 +140,73 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queu= e_id,=0A= > desc =3D &vq->desc[desc->next];=0A= > /* Buffer address translation. */=0A= > buff_addr =3D gpa_to_vva(dev, desc->addr);=0A= > - desc->len =3D rte_pktmbuf_data_len(buff);=0A= > } else {=0A= > - buff_addr +=3D vq->vhost_hlen;=0A= > - desc->len =3D packet_len;=0A= > + vb_offset +=3D vq->vhost_hlen;=0A= > + hdr =3D 1;=0A= > }=0A= > =0A= > + pkt_len =3D rte_pktmbuf_pkt_len(buff);=0A= > + data_len =3D rte_pktmbuf_data_len(buff);=0A= > + len_to_cpy =3D RTE_MIN(data_len,=0A= > + hdr ? desc->len - vq->vhost_hlen : desc->len);=0A= > + while (len_to_cpy > 0) {=0A= while (total_copied < pkt_len) is secure and readable.=0A= Besides, what if we encounter some descriptor with zero length? With=0A= len_to_cpy > 0, we would pass a partially copied mbuf to guest but still=0A= used->len =3D packet_len.=0A= > + /* Copy mbuf data to buffer */=0A= > + rte_memcpy((void *)(uintptr_t)(buff_addr + vb_offset),=0A= > + (const void *)(rte_pktmbuf_mtod(buff, const char *) + offset),=0A= > + len_to_cpy);=0A= > + PRINT_PACKET(dev, (uintptr_t)(buff_addr + vb_offset),=0A= > + len_to_cpy, 0);=0A= > +=0A= > + offset +=3D len_to_cpy;=0A= > + vb_offset +=3D len_to_cpy;=0A= > + total_copied +=3D len_to_cpy;=0A= > +=0A= > + /* The whole packet completes */=0A= > + if (total_copied =3D=3D pkt_len)=0A= > + break;=0A= > +=0A= > + /* The current segment completes */=0A= > + if (offset =3D=3D data_len) {=0A= > + buff =3D buff->next;=0A= > + if (buff !=3D NULL) {=0A= > + offset =3D 0;=0A= > + data_len =3D rte_pktmbuf_data_len(buff);=0A= > + }=0A= What if (buf =3D=3D NULL)? Either we treat mbuf reliable, and don't do any= =0A= sanity check, or we check thoroughly.=0A= if (buff !=3D NULL) {=0A= =0A= } else {=0A= ...=0A= break;=0A= }=0A= > + }=0A= > +=0A= > + /* The current vring descriptor done */=0A= > + if (vb_offset =3D=3D desc->len) {=0A= > + if (desc->flags & VRING_DESC_F_NEXT) {=0A= > + desc =3D &vq->desc[desc->next];=0A= > + buff_addr =3D gpa_to_vva(dev, desc->addr);=0A= > + vb_offset =3D 0;=0A= > + } else {=0A= > + /* Room in vring buffer is not enough */=0A= > + uncompleted_pkt =3D 1;=0A= > + break;=0A= > + }=0A= > + }=0A= > + len_to_cpy =3D RTE_MIN(data_len - offset, desc->len - vb_offset);=0A= > + };=0A= > +=0A= > /* Update used ring with desc information */=0A= > vq->used->ring[res_cur_idx & (vq->size - 1)].id =3D=0A= > head[packet_success];=0A= > - vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D packet_len;=0A= > =0A= > - /* Copy mbuf data to buffer */=0A= > - /* FIXME for sg mbuf and the case that desc couldn't hold the mbuf dat= a */=0A= > - rte_memcpy((void *)(uintptr_t)buff_addr,=0A= > - rte_pktmbuf_mtod(buff, const void *),=0A= > - rte_pktmbuf_data_len(buff));=0A= > - PRINT_PACKET(dev, (uintptr_t)buff_addr,=0A= > - rte_pktmbuf_data_len(buff), 0);=0A= > + /* Drop the packet if it is uncompleted */=0A= > + if (unlikely(uncompleted_pkt =3D=3D 1))=0A= > + vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D=0A= > + vq->vhost_hlen;=0A= > + else=0A= > + vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D=0A= > + pkt_len + vq->vhost_hlen;=0A= > =0A= > res_cur_idx++;=0A= > packet_success++;=0A= > =0A= > + if (unlikely(uncompleted_pkt =3D=3D 1))=0A= > + continue;=0A= > +=0A= > rte_memcpy((void *)(uintptr_t)buff_hdr_addr,=0A= > (const void *)&virtio_hdr, vq->vhost_hlen);=0A= > =0A= > @@ -589,7 +638,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint= 16_t queue_id,=0A= > desc =3D &vq->desc[head[entry_success]];=0A= > =0A= > /* Discard first buffer as it is the virtio header */=0A= > - desc =3D &vq->desc[desc->next];=0A= > + if (desc->flags & VRING_DESC_F_NEXT) {=0A= > + desc =3D &vq->desc[desc->next];=0A= > + vb_offset =3D 0;=0A= > + vb_avail =3D desc->len;=0A= > + } else {=0A= > + vb_offset =3D vq->vhost_hlen;=0A= > + vb_avail =3D desc->len - vb_offset;=0A= > + }=0A= > =0A= > /* Buffer address translation. */=0A= > vb_addr =3D gpa_to_vva(dev, desc->addr);=0A= > @@ -608,8 +664,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint1= 6_t queue_id,=0A= > vq->used->ring[used_idx].id =3D head[entry_success];=0A= > vq->used->ring[used_idx].len =3D 0;=0A= > =0A= > - vb_offset =3D 0;=0A= > - vb_avail =3D desc->len;=0A= > /* Allocate an mbuf and populate the structure. */=0A= > m =3D rte_pktmbuf_alloc(mbuf_pool);=0A= > if (unlikely(m =3D=3D NULL)) {=0A= =0A=