From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id F151F2BDA for ; Mon, 7 Mar 2016 03:42:00 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 06 Mar 2016 18:42:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,549,1449561600"; d="scan'208";a="664937046" Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.66.49]) by FMSMGA003.fm.intel.com with ESMTP; 06 Mar 2016 18:41:57 -0800 Date: Mon, 7 Mar 2016 10:44:06 +0800 From: Yuanhan Liu To: "Xie, Huawei" Message-ID: <20160307024406.GX14300@yliu-dev.sh.intel.com> References: <1449122773-25510-1-git-send-email-yuanhan.liu@linux.intel.com> <1455803352-5518-1-git-send-email-yuanhan.liu@linux.intel.com> <1455803352-5518-2-git-send-email-yuanhan.liu@linux.intel.com> <20160304022118.GU14300@yliu-dev.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "Michael S. Tsirkin" , "dev@dpdk.org" , Victor Kaplansky Subject: Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 02:42:01 -0000 On Mon, Mar 07, 2016 at 02:19:54AM +0000, Xie, Huawei wrote: > On 3/4/2016 10:19 AM, Yuanhan Liu wrote: > > On Thu, Mar 03, 2016 at 04:21:19PM +0000, Xie, Huawei wrote: > >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > >>> The current rte_vhost_dequeue_burst() implementation is a bit messy > >>> and logic twisted. And you could see repeat code here and there: it > >>> invokes rte_pktmbuf_alloc() three times at three different places! > >>> > >>> However, rte_vhost_dequeue_burst() acutally does a simple job: copy > >>> the packet data from vring desc to mbuf. What's tricky here is: > >>> > >>> - desc buff could be chained (by desc->next field), so that you need > >>> fetch next one if current is wholly drained. > >>> > >>> - One mbuf could not be big enough to hold all desc buff, hence you > >>> need to chain the mbuf as well, by the mbuf->next field. > >>> > >>> Even though, the logic could be simple. Here is the pseudo code. > >>> > >>> while (this_desc_is_not_drained_totally || has_next_desc) { > >>> if (this_desc_has_drained_totally) { > >>> this_desc = next_desc(); > >>> } > >>> > >>> if (mbuf_has_no_room) { > >>> mbuf = allocate_a_new_mbuf(); > >>> } > >>> > >>> COPY(mbuf, desc); > >>> } > >>> > >>> And this is how I refactored rte_vhost_dequeue_burst. > >>> > >>> Note that the old patch does a special handling for skipping virtio > >>> header. However, that could be simply done by adjusting desc_avail > >>> and desc_offset var: > >>> > >>> desc_avail = desc->len - vq->vhost_hlen; > >>> desc_offset = vq->vhost_hlen; > >>> > >>> This refactor makes the code much more readable (IMO), yet it reduces > >>> binary code size (nearly 2K). > >>> > >>> Signed-off-by: Yuanhan Liu > >>> --- > >>> > >>> v2: - fix potential NULL dereference bug of var "prev" and "head" > >>> --- > >>> lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++------------------------- > >>> 1 file changed, 116 insertions(+), 181 deletions(-) > >>> > >>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > >>> index 5e7e5b1..d5cd0fa 100644 > >>> --- a/lib/librte_vhost/vhost_rxtx.c > >>> +++ b/lib/librte_vhost/vhost_rxtx.c > >>> @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > >>> } > >>> } > >>> > >>> +static inline struct rte_mbuf * > >>> +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > >>> + uint16_t desc_idx, struct rte_mempool *mbuf_pool) > >>> +{ > >>> + struct vring_desc *desc; > >>> + uint64_t desc_addr; > >>> + uint32_t desc_avail, desc_offset; > >>> + uint32_t mbuf_avail, mbuf_offset; > >>> + uint32_t cpy_len; > >>> + struct rte_mbuf *head = NULL; > >>> + struct rte_mbuf *cur = NULL, *prev = NULL; > >>> + struct virtio_net_hdr *hdr; > >>> + > >>> + desc = &vq->desc[desc_idx]; > >>> + desc_addr = gpa_to_vva(dev, desc->addr); > >>> + rte_prefetch0((void *)(uintptr_t)desc_addr); > >>> + > >>> + /* Retrieve virtio net header */ > >>> + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); > >>> + desc_avail = desc->len - vq->vhost_hlen; > >> There is a serious bug here, desc->len - vq->vhost_len could overflow. > >> VM could easily create this case. Let us fix it here. > > Nope, this issue has been there since the beginning, and this patch > > is a refactor: we should not bring any functional changes. Therefore, > > we should not fix it here. > > No, I don't mean exactly fixing in this patch but in this series. > > Besides, from refactoring's perspective, actually we could make things > further much simpler and more readable. Both the desc chains and mbuf > could be converted into iovec, then both dequeue(copy_desc_to_mbuf) and > enqueue(copy_mbuf_to_desc) could use the commonly used iovec copying > algorithms As data path are performance oriented, let us stop here. Agreed, I had same performance concern on further simplication, therefore I didn't go further. > > > > And actually, it's been fixed in the 6th patch in this series: > > Will check that. Do you have other comments for other patches? I'm considering to send a new version recently, say maybe tomorrow. --yliu