From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id DFA8D2716 for ; Sun, 31 May 2015 07:03:52 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP; 30 May 2015 22:03:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,525,1427785200"; d="scan'208";a="738136271" Received: from pgsmsx106.gar.corp.intel.com ([10.221.44.98]) by orsmga002.jf.intel.com with ESMTP; 30 May 2015 22:03:51 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by PGSMSX106.gar.corp.intel.com (10.221.44.98) with Microsoft SMTP Server (TLS) id 14.3.224.2; Sun, 31 May 2015 13:03:49 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.120]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.23]) with mapi id 14.03.0224.002; Sun, 31 May 2015 13:03:48 +0800 From: "Xie, Huawei" To: "Ouyang, Changchun" , "dev@dpdk.org" Thread-Topic: [PATCH v2 1/5] lib_vhost: Fix enqueue/dequeue can't handle chained vring descriptors Thread-Index: AdCbXy0SQoUKNh4MToK0CV/Cf/mZbA== Date: Sun, 31 May 2015 05:03:47 +0000 Message-ID: References: <1430720780-27525-1-git-send-email-changchun.ouyang@intel.com> <1432826207-8428-1-git-send-email-changchun.ouyang@intel.com> <1432826207-8428-2-git-send-email-changchun.ouyang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2 1/5] lib_vhost: Fix enqueue/dequeue can't handle chained vring descriptors X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 May 2015 05:03:53 -0000 On 5/28/2015 11:17 PM, Ouyang, Changchun wrote:=0A= > Vring enqueue need consider the 2 cases:=0A= > 1. Vring descriptors chained together, the first one is for virtio heade= r, the rest are for real=0A= > data, virtio driver in Linux usually use this scheme;=0A= > 2. Only one descriptor, virtio header and real data share one single des= criptor, virtio-net pmd use=0A= > such scheme;=0A= For the commit message, :), actually we should consider the desc chain=0A= as logically continuous memory space, so there is also the case like=0A= desc 1: virtio header and data; descs followed: data only.=0A= =0A= > So does vring dequeue, it should not assume vring descriptor is chained o= r not chained, virtio in=0A= > different Linux version has different behavior, e.g. fedora 20 use chaine= d vring descriptor, while=0A= > fedora 21 use one single vring descriptor for tx.=0A= This behavior could be configured. Besides it is not bound to=0A= distribution but virtio-net driver.=0A= They key thing is we should consider the generic case, rather than=0A= fitting the requirement of existing virtio-net implementation, so=0A= suggest remove the above message.=0A= >=0A= > Changes in v2=0A= > - drop the uncompleted packet=0A= > - refine code logic=0A= >=0A= > Signed-off-by: Changchun Ouyang =0A= > ---=0A= > lib/librte_vhost/vhost_rxtx.c | 65 +++++++++++++++++++++++++++++++++----= ------=0A= > 1 file changed, 50 insertions(+), 15 deletions(-)=0A= >=0A= > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.= c=0A= > index 4809d32..06ae2df 100644=0A= > --- a/lib/librte_vhost/vhost_rxtx.c=0A= > +++ b/lib/librte_vhost/vhost_rxtx.c=0A= > @@ -59,7 +59,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id= ,=0A= > struct virtio_net_hdr_mrg_rxbuf virtio_hdr =3D {{0, 0, 0, 0, 0, 0}, 0};= =0A= > uint64_t buff_addr =3D 0;=0A= > uint64_t buff_hdr_addr =3D 0;=0A= > - uint32_t head[MAX_PKT_BURST], packet_len =3D 0;=0A= > + uint32_t head[MAX_PKT_BURST];=0A= > uint32_t head_idx, packet_success =3D 0;=0A= > uint16_t avail_idx, res_cur_idx;=0A= > uint16_t res_base_idx, res_end_idx;=0A= > @@ -113,6 +113,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue= _id,=0A= > rte_prefetch0(&vq->desc[head[packet_success]]);=0A= > =0A= > while (res_cur_idx !=3D res_end_idx) {=0A= > + uint32_t offset =3D 0;=0A= > + uint32_t data_len, len_to_cpy;=0A= > + uint8_t hdr =3D 0, uncompleted_pkt =3D 0;=0A= > +=0A= > /* Get descriptor from available ring */=0A= > desc =3D &vq->desc[head[packet_success]];=0A= > =0A= > @@ -125,7 +129,6 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_= id,=0A= > =0A= > /* Copy virtio_hdr to packet and increment buffer address */=0A= > buff_hdr_addr =3D buff_addr;=0A= > - packet_len =3D rte_pktmbuf_data_len(buff) + vq->vhost_hlen;=0A= > =0A= > /*=0A= > * If the descriptors are chained the header and data are=0A= > @@ -136,28 +139,55 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queu= e_id,=0A= > desc =3D &vq->desc[desc->next];=0A= > /* Buffer address translation. */=0A= > buff_addr =3D gpa_to_vva(dev, desc->addr);=0A= I am wondering if there is the possibility the [GPA, GPA+desc->len]=0A= could cross multiple memory regions.=0A= Don't expect to fix in this patch, :).=0A= > - desc->len =3D rte_pktmbuf_data_len(buff);=0A= > } else {=0A= > buff_addr +=3D vq->vhost_hlen;=0A= > - desc->len =3D packet_len;=0A= > + hdr =3D 1;=0A= > }=0A= > =0A= > + data_len =3D rte_pktmbuf_data_len(buff);=0A= > + len_to_cpy =3D RTE_MIN(data_len,=0A= > + hdr ? desc->len - vq->vhost_hlen : desc->len);=0A= > + while (len_to_cpy > 0) {=0A= > + /* Copy mbuf data to buffer */=0A= > + rte_memcpy((void *)(uintptr_t)buff_addr,=0A= > + (const void *)(rte_pktmbuf_mtod(buff, const char *) + offset),=0A= > + len_to_cpy);=0A= > + PRINT_PACKET(dev, (uintptr_t)buff_addr,=0A= > + len_to_cpy, 0);=0A= > +=0A= > + offset +=3D len_to_cpy;=0A= > +=0A= > + if (offset =3D=3D data_len)=0A= > + break;=0A= I don't understand here. If offset reaches the end of the first segment,=0A= why don't we continue to copy from the next segment?=0A= =0A= > +=0A= > + if (desc->flags & VRING_DESC_F_NEXT) {=0A= > + desc =3D &vq->desc[desc->next];=0A= > + buff_addr =3D gpa_to_vva(dev, desc->addr);=0A= > + len_to_cpy =3D RTE_MIN(data_len - offset, desc->len);=0A= > + } else {=0A= > + /* Room in vring buffer is not enough */=0A= > + uncompleted_pkt =3D 1;=0A= > + break;=0A= > + }=0A= > + };=0A= > +=0A= > /* Update used ring with desc information */=0A= > vq->used->ring[res_cur_idx & (vq->size - 1)].id =3D=0A= > head[packet_success];=0A= > - vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D packet_len;=0A= > =0A= > - /* Copy mbuf data to buffer */=0A= > - /* FIXME for sg mbuf and the case that desc couldn't hold the mbuf dat= a */=0A= > - rte_memcpy((void *)(uintptr_t)buff_addr,=0A= > - rte_pktmbuf_mtod(buff, const void *),=0A= > - rte_pktmbuf_data_len(buff));=0A= > - PRINT_PACKET(dev, (uintptr_t)buff_addr,=0A= > - rte_pktmbuf_data_len(buff), 0);=0A= > + /* Drop the packet if it is uncompleted */=0A= > + if (unlikely(uncompleted_pkt =3D=3D 1))=0A= > + vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D 0;=0A= > + else=0A= > + vq->used->ring[res_cur_idx & (vq->size - 1)].len =3D=0A= > + offset + vq->vhost_hlen;=0A= > =0A= > res_cur_idx++;=0A= > packet_success++;=0A= > =0A= > + if (unlikely(uncompleted_pkt =3D=3D 1))=0A= > + continue;=0A= > +=0A= > rte_memcpy((void *)(uintptr_t)buff_hdr_addr,=0A= > (const void *)&virtio_hdr, vq->vhost_hlen);=0A= > =0A= > @@ -589,7 +619,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint= 16_t queue_id,=0A= > desc =3D &vq->desc[head[entry_success]];=0A= > =0A= > /* Discard first buffer as it is the virtio header */=0A= > - desc =3D &vq->desc[desc->next];=0A= > + if (desc->flags & VRING_DESC_F_NEXT) {=0A= > + desc =3D &vq->desc[desc->next];=0A= > + vb_offset =3D 0;=0A= > + vb_avail =3D desc->len;=0A= > + } else {=0A= > + vb_offset =3D vq->vhost_hlen;=0A= > + vb_avail =3D desc->len - vb_offset;=0A= > + }=0A= > =0A= > /* Buffer address translation. */=0A= > vb_addr =3D gpa_to_vva(dev, desc->addr);=0A= > @@ -608,8 +645,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint1= 6_t queue_id,=0A= > vq->used->ring[used_idx].id =3D head[entry_success];=0A= > vq->used->ring[used_idx].len =3D 0;=0A= > =0A= > - vb_offset =3D 0;=0A= > - vb_avail =3D desc->len;=0A= > /* Allocate an mbuf and populate the structure. */=0A= > m =3D rte_pktmbuf_alloc(mbuf_pool);=0A= > if (unlikely(m =3D=3D NULL)) {=0A= =0A=