From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id CBA76FE5 for ; Tue, 23 Aug 2016 17:40:23 +0200 (CEST) Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D897AC04B325; Tue, 23 Aug 2016 15:40:22 +0000 (UTC) Received: from [10.36.4.245] (vpn1-4-245.ams2.redhat.com [10.36.4.245]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u7NFeKm6015291 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 23 Aug 2016 11:40:22 -0400 To: Yuanhan Liu References: <1471939839-29778-1-git-send-email-yuanhan.liu@linux.intel.com> <1471939839-29778-5-git-send-email-yuanhan.liu@linux.intel.com> <8043aa6f-baa6-3680-dc07-0e535f1b9b2b@redhat.com> <20160823143147.GO30752@yliu-dev.sh.intel.com> Cc: dev@dpdk.org From: Maxime Coquelin Message-ID: <5a9c099c-0a64-6864-9761-649e0837a54e@redhat.com> Date: Tue, 23 Aug 2016 17:40:20 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160823143147.GO30752@yliu-dev.sh.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 23 Aug 2016 15:40:23 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH 4/6] vhost: add Tx zero copy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Aug 2016 15:40:24 -0000 On 08/23/2016 04:31 PM, Yuanhan Liu wrote: > On Tue, Aug 23, 2016 at 04:04:30PM +0200, Maxime Coquelin wrote: >> >> >> On 08/23/2016 10:10 AM, Yuanhan Liu wrote: >>> The basic idea of Tx zero copy is, instead of copying data from the >>> desc buf, here we let the mbuf reference the desc buf addr directly. >>> >>> Doing so, however, has one major issue: we can't update the used ring >>> at the end of rte_vhost_dequeue_burst. Because we don't do the copy >>> here, an update of the used ring would let the driver to reclaim the >>> desc buf. As a result, DPDK might reference a stale memory region. >>> >>> To update the used ring properly, this patch does several tricks: >>> >>> - when mbuf references a desc buf, refcnt is added by 1. >>> >>> This is to pin lock the mbuf, so that a mbuf free from the DPDK >>> won't actually free it, instead, refcnt is subtracted by 1. >>> >>> - We chain all those mbuf together (by tailq) >>> >>> And we check it every time on the rte_vhost_dequeue_burst entrance, >>> to see if the mbuf is freed (when refcnt equals to 1). If that >>> happens, it means we are the last user of this mbuf and we are >>> safe to update the used ring. >>> >>> - "struct zcopy_mbuf" is introduced, to associate an mbuf with the >>> right desc idx. >>> >>> Tx zero copy is introduced for performance reason, and some rough tests >>> show about 40% perfomance boost for packet size 1400B. FOr small packets, >>> (e.g. 64B), it actually slows a bit down. That is expected because this >>> patch introduces some extra works, and it outweighs the benefit from >>> saving few bytes copy. >>> >>> Signed-off-by: Yuanhan Liu >>> --- >>> lib/librte_vhost/vhost.c | 2 + >>> lib/librte_vhost/vhost.h | 21 ++++++ >>> lib/librte_vhost/vhost_user.c | 41 +++++++++- >>> lib/librte_vhost/virtio_net.c | 169 +++++++++++++++++++++++++++++++++++++----- >>> 4 files changed, 214 insertions(+), 19 deletions(-) >>> >> ... >> >>> rte_vhost_dequeue_burst(int vid, uint16_t queue_id, >>> struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) >>> @@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, >>> if (unlikely(vq->enabled == 0)) >>> return 0; >>> >>> + if (dev->tx_zero_copy) { >>> + struct zcopy_mbuf *zmbuf, *next; >>> + int nr_updated = 0; >>> + >>> + for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); >>> + zmbuf != NULL; zmbuf = next) { >>> + next = TAILQ_NEXT(zmbuf, next); >>> + >>> + if (mbuf_is_consumed(zmbuf->mbuf)) { >>> + used_idx = vq->last_used_idx++ & (vq->size - 1); >>> + update_used_ring(dev, vq, used_idx, >>> + zmbuf->desc_idx); >>> + nr_updated += 1; >>> + >>> + TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); >>> + rte_pktmbuf_free(zmbuf->mbuf); >>> + put_zmbuf(zmbuf); >>> + vq->nr_zmbuf -= 1; >>> + } >> Shouldn't you break the loop here as soon as a mbuf is not consumed? > > I have thought of that as well, as a micro optimization. But I was > wondering what if a heading mbuf is pin locked by the DPDK APP? Then > the whole chain would be blocked. This should be rare, but I think > we should think of the worst case. > > Besides that, the performance boost I got is quite decent, that I think > we could drop this micro optimization. Forget my comment, this was a misunderstanding of the code on my side. Regards, Maxime