From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 5414D2931 for ; Tue, 23 Aug 2016 16:22:02 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga104.jf.intel.com with ESMTP; 23 Aug 2016 07:22:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,566,1464678000"; d="scan'208";a="752473883" Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.67.162]) by FMSMGA003.fm.intel.com with ESMTP; 23 Aug 2016 07:22:01 -0700 Date: Tue, 23 Aug 2016 22:31:47 +0800 From: Yuanhan Liu To: Maxime Coquelin Cc: dev@dpdk.org Message-ID: <20160823143147.GO30752@yliu-dev.sh.intel.com> References: <1471939839-29778-1-git-send-email-yuanhan.liu@linux.intel.com> <1471939839-29778-5-git-send-email-yuanhan.liu@linux.intel.com> <8043aa6f-baa6-3680-dc07-0e535f1b9b2b@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8043aa6f-baa6-3680-dc07-0e535f1b9b2b@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH 4/6] vhost: add Tx zero copy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Aug 2016 14:22:02 -0000 On Tue, Aug 23, 2016 at 04:04:30PM +0200, Maxime Coquelin wrote: > > > On 08/23/2016 10:10 AM, Yuanhan Liu wrote: > >The basic idea of Tx zero copy is, instead of copying data from the > >desc buf, here we let the mbuf reference the desc buf addr directly. > > > >Doing so, however, has one major issue: we can't update the used ring > >at the end of rte_vhost_dequeue_burst. Because we don't do the copy > >here, an update of the used ring would let the driver to reclaim the > >desc buf. As a result, DPDK might reference a stale memory region. > > > >To update the used ring properly, this patch does several tricks: > > > >- when mbuf references a desc buf, refcnt is added by 1. > > > > This is to pin lock the mbuf, so that a mbuf free from the DPDK > > won't actually free it, instead, refcnt is subtracted by 1. > > > >- We chain all those mbuf together (by tailq) > > > > And we check it every time on the rte_vhost_dequeue_burst entrance, > > to see if the mbuf is freed (when refcnt equals to 1). If that > > happens, it means we are the last user of this mbuf and we are > > safe to update the used ring. > > > >- "struct zcopy_mbuf" is introduced, to associate an mbuf with the > > right desc idx. > > > >Tx zero copy is introduced for performance reason, and some rough tests > >show about 40% perfomance boost for packet size 1400B. FOr small packets, > >(e.g. 64B), it actually slows a bit down. That is expected because this > >patch introduces some extra works, and it outweighs the benefit from > >saving few bytes copy. > > > >Signed-off-by: Yuanhan Liu > >--- > > lib/librte_vhost/vhost.c | 2 + > > lib/librte_vhost/vhost.h | 21 ++++++ > > lib/librte_vhost/vhost_user.c | 41 +++++++++- > > lib/librte_vhost/virtio_net.c | 169 +++++++++++++++++++++++++++++++++++++----- > > 4 files changed, 214 insertions(+), 19 deletions(-) > > > ... > > > rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) > >@@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > > if (unlikely(vq->enabled == 0)) > > return 0; > > > >+ if (dev->tx_zero_copy) { > >+ struct zcopy_mbuf *zmbuf, *next; > >+ int nr_updated = 0; > >+ > >+ for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); > >+ zmbuf != NULL; zmbuf = next) { > >+ next = TAILQ_NEXT(zmbuf, next); > >+ > >+ if (mbuf_is_consumed(zmbuf->mbuf)) { > >+ used_idx = vq->last_used_idx++ & (vq->size - 1); > >+ update_used_ring(dev, vq, used_idx, > >+ zmbuf->desc_idx); > >+ nr_updated += 1; > >+ > >+ TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); > >+ rte_pktmbuf_free(zmbuf->mbuf); > >+ put_zmbuf(zmbuf); > >+ vq->nr_zmbuf -= 1; > >+ } > Shouldn't you break the loop here as soon as a mbuf is not consumed? I have thought of that as well, as a micro optimization. But I was wondering what if a heading mbuf is pin locked by the DPDK APP? Then the whole chain would be blocked. This should be rare, but I think we should think of the worst case. Besides that, the performance boost I got is quite decent, that I think we could drop this micro optimization. > Indeed, they might not be consumed sequentially, and would cause > last_used_idx to be incremented whereas it shouldn't. I think the out of order used vring update won't be an issue here. Well, there might be some problems for reconnect. The trick the commit 0823c1cb0a73 ("vhost: workaround stale vring base") introduced assumes that used vring will always be updated in order. --yliu