From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 2BDC02BAD for ; Mon, 26 Sep 2016 22:45:49 +0200 (CEST) Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 748711E31A; Mon, 26 Sep 2016 20:45:48 +0000 (UTC) Received: from [10.36.4.83] (vpn1-4-83.ams2.redhat.com [10.36.4.83]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8QKjkH2009856 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 26 Sep 2016 16:45:47 -0400 To: Yuanhan Liu , dev@dpdk.org References: <1471939839-29778-1-git-send-email-yuanhan.liu@linux.intel.com> <1474604007-5221-1-git-send-email-yuanhan.liu@linux.intel.com> <1474604007-5221-5-git-send-email-yuanhan.liu@linux.intel.com> From: Maxime Coquelin Message-ID: <038cdf17-511f-c582-ef8f-46c81f51d161@redhat.com> Date: Mon, 26 Sep 2016 22:45:46 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1474604007-5221-5-git-send-email-yuanhan.liu@linux.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 26 Sep 2016 20:45:48 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Sep 2016 20:45:49 -0000 On 09/23/2016 06:13 AM, Yuanhan Liu wrote: > The basic idea of dequeue zero copy is, instead of copying data from > the desc buf, here we let the mbuf reference the desc buf addr directly. > > Doing so, however, has one major issue: we can't update the used ring > at the end of rte_vhost_dequeue_burst. Because we don't do the copy > here, an update of the used ring would let the driver to reclaim the > desc buf. As a result, DPDK might reference a stale memory region. > > To update the used ring properly, this patch does several tricks: > > - when mbuf references a desc buf, refcnt is added by 1. > > This is to pin lock the mbuf, so that a mbuf free from the DPDK > won't actually free it, instead, refcnt is subtracted by 1. > > - We chain all those mbuf together (by tailq) > > And we check it every time on the rte_vhost_dequeue_burst entrance, > to see if the mbuf is freed (when refcnt equals to 1). If that > happens, it means we are the last user of this mbuf and we are > safe to update the used ring. > > - "struct zcopy_mbuf" is introduced, to associate an mbuf with the > right desc idx. > > Dequeue zero copy is introduced for performance reason, and some rough > tests show about 50% perfomance boost for packet size 1500B. For small > packets, (e.g. 64B), it actually slows a bit down (well, it could up to > 15%). That is expected because this patch introduces some extra works, > and it outweighs the benefit from saving few bytes copy. > > Signed-off-by: Yuanhan Liu > --- > > v2: - use unlikely/likely for dequeue_zero_copy check, as it's not enabled > by default, as well as it has some limitations in vm2nic case. > > - handle the case that a desc buf might across 2 host phys pages > > - reset nr_zmbuf to 0 at set_vring_num > > - set the zmbuf_size to vq->size, but not the double of it. > --- > lib/librte_vhost/vhost.c | 2 + > lib/librte_vhost/vhost.h | 22 +++++- > lib/librte_vhost/vhost_user.c | 42 +++++++++- > lib/librte_vhost/virtio_net.c | 173 +++++++++++++++++++++++++++++++++++++----- > 4 files changed, 219 insertions(+), 20 deletions(-) Reviewed-by: Maxime Coquelin Thanks, Maxime