From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <yuanhan.liu@linux.intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id F151F2BDA
 for <dev@dpdk.org>; Mon,  7 Mar 2016 03:42:00 +0100 (CET)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga101.fm.intel.com with ESMTP; 06 Mar 2016 18:42:00 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.22,549,1449561600"; d="scan'208";a="664937046"
Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.66.49])
 by FMSMGA003.fm.intel.com with ESMTP; 06 Mar 2016 18:41:57 -0800
Date: Mon, 7 Mar 2016 10:44:06 +0800
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To: "Xie, Huawei" <huawei.xie@intel.com>
Message-ID: <20160307024406.GX14300@yliu-dev.sh.intel.com>
References: <1449122773-25510-1-git-send-email-yuanhan.liu@linux.intel.com>
 <1455803352-5518-1-git-send-email-yuanhan.liu@linux.intel.com>
 <1455803352-5518-2-git-send-email-yuanhan.liu@linux.intel.com>
 <C37D651A908B024F974696C65296B57B4C626D94@SHSMSX101.ccr.corp.intel.com>
 <20160304022118.GU14300@yliu-dev.sh.intel.com>
 <C37D651A908B024F974696C65296B57B4C636CBD@SHSMSX101.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <C37D651A908B024F974696C65296B57B4C636CBD@SHSMSX101.ccr.corp.intel.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: "Michael S. Tsirkin" <mst@redhat.com>, "dev@dpdk.org" <dev@dpdk.org>,
 Victor Kaplansky <vkaplans@redhat.com>
Subject: Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor
	rte_vhost_dequeue_burst
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Mar 2016 02:42:01 -0000

On Mon, Mar 07, 2016 at 02:19:54AM +0000, Xie, Huawei wrote:
> On 3/4/2016 10:19 AM, Yuanhan Liu wrote:
> > On Thu, Mar 03, 2016 at 04:21:19PM +0000, Xie, Huawei wrote:
> >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote:
> >>> The current rte_vhost_dequeue_burst() implementation is a bit messy
> >>> and logic twisted. And you could see repeat code here and there: it
> >>> invokes rte_pktmbuf_alloc() three times at three different places!
> >>>
> >>> However, rte_vhost_dequeue_burst() acutally does a simple job: copy
> >>> the packet data from vring desc to mbuf. What's tricky here is:
> >>>
> >>> - desc buff could be chained (by desc->next field), so that you need
> >>>   fetch next one if current is wholly drained.
> >>>
> >>> - One mbuf could not be big enough to hold all desc buff, hence you
> >>>   need to chain the mbuf as well, by the mbuf->next field.
> >>>
> >>> Even though, the logic could be simple. Here is the pseudo code.
> >>>
> >>> 	while (this_desc_is_not_drained_totally || has_next_desc) {
> >>> 		if (this_desc_has_drained_totally) {
> >>> 			this_desc = next_desc();
> >>> 		}
> >>>
> >>> 		if (mbuf_has_no_room) {
> >>> 			mbuf = allocate_a_new_mbuf();
> >>> 		}
> >>>
> >>> 		COPY(mbuf, desc);
> >>> 	}
> >>>
> >>> And this is how I refactored rte_vhost_dequeue_burst.
> >>>
> >>> Note that the old patch does a special handling for skipping virtio
> >>> header. However, that could be simply done by adjusting desc_avail
> >>> and desc_offset var:
> >>>
> >>> 	desc_avail  = desc->len - vq->vhost_hlen;
> >>> 	desc_offset = vq->vhost_hlen;
> >>>
> >>> This refactor makes the code much more readable (IMO), yet it reduces
> >>> binary code size (nearly 2K).
> >>>
> >>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>> ---
> >>>
> >>> v2: - fix potential NULL dereference bug of var "prev" and "head"
> >>> ---
> >>>  lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++-------------------------
> >>>  1 file changed, 116 insertions(+), 181 deletions(-)
> >>>
> >>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> >>> index 5e7e5b1..d5cd0fa 100644
> >>> --- a/lib/librte_vhost/vhost_rxtx.c
> >>> +++ b/lib/librte_vhost/vhost_rxtx.c
> >>> @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> >>>  	}
> >>>  }
> >>>  
> >>> +static inline struct rte_mbuf *
> >>> +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> >>> +		  uint16_t desc_idx, struct rte_mempool *mbuf_pool)
> >>> +{
> >>> +	struct vring_desc *desc;
> >>> +	uint64_t desc_addr;
> >>> +	uint32_t desc_avail, desc_offset;
> >>> +	uint32_t mbuf_avail, mbuf_offset;
> >>> +	uint32_t cpy_len;
> >>> +	struct rte_mbuf *head = NULL;
> >>> +	struct rte_mbuf *cur = NULL, *prev = NULL;
> >>> +	struct virtio_net_hdr *hdr;
> >>> +
> >>> +	desc = &vq->desc[desc_idx];
> >>> +	desc_addr = gpa_to_vva(dev, desc->addr);
> >>> +	rte_prefetch0((void *)(uintptr_t)desc_addr);
> >>> +
> >>> +	/* Retrieve virtio net header */
> >>> +	hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
> >>> +	desc_avail  = desc->len - vq->vhost_hlen;
> >> There is a serious bug here, desc->len - vq->vhost_len could overflow.
> >> VM could easily create this case. Let us fix it here.
> > Nope, this issue has been there since the beginning, and this patch
> > is a refactor: we should not bring any functional changes. Therefore,
> > we should not fix it here.
> 
> No, I don't mean exactly fixing in this patch but in this series.
> 
> Besides, from refactoring's perspective, actually we could make things
> further much simpler and more readable. Both the desc chains and mbuf
> could be converted into iovec, then both dequeue(copy_desc_to_mbuf) and
> enqueue(copy_mbuf_to_desc) could use the commonly used iovec copying
> algorithms As data path are performance oriented, let us stop here.

Agreed, I had same performance concern on further simplication,
therefore I didn't go further.

> >
> > And actually, it's been fixed in the 6th patch in this series:
> 
> Will check that.

Do you have other comments for other patches? I'm considering to send
a new version recently, say maybe tomorrow.

	--yliu