From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 4/6] vhost: add Tx zero copy
Date: Tue, 23 Aug 2016 17:40:20 +0200 [thread overview]
Message-ID: <5a9c099c-0a64-6864-9761-649e0837a54e@redhat.com> (raw)
In-Reply-To: <20160823143147.GO30752@yliu-dev.sh.intel.com>
On 08/23/2016 04:31 PM, Yuanhan Liu wrote:
> On Tue, Aug 23, 2016 at 04:04:30PM +0200, Maxime Coquelin wrote:
>>
>>
>> On 08/23/2016 10:10 AM, Yuanhan Liu wrote:
>>> The basic idea of Tx zero copy is, instead of copying data from the
>>> desc buf, here we let the mbuf reference the desc buf addr directly.
>>>
>>> Doing so, however, has one major issue: we can't update the used ring
>>> at the end of rte_vhost_dequeue_burst. Because we don't do the copy
>>> here, an update of the used ring would let the driver to reclaim the
>>> desc buf. As a result, DPDK might reference a stale memory region.
>>>
>>> To update the used ring properly, this patch does several tricks:
>>>
>>> - when mbuf references a desc buf, refcnt is added by 1.
>>>
>>> This is to pin lock the mbuf, so that a mbuf free from the DPDK
>>> won't actually free it, instead, refcnt is subtracted by 1.
>>>
>>> - We chain all those mbuf together (by tailq)
>>>
>>> And we check it every time on the rte_vhost_dequeue_burst entrance,
>>> to see if the mbuf is freed (when refcnt equals to 1). If that
>>> happens, it means we are the last user of this mbuf and we are
>>> safe to update the used ring.
>>>
>>> - "struct zcopy_mbuf" is introduced, to associate an mbuf with the
>>> right desc idx.
>>>
>>> Tx zero copy is introduced for performance reason, and some rough tests
>>> show about 40% perfomance boost for packet size 1400B. FOr small packets,
>>> (e.g. 64B), it actually slows a bit down. That is expected because this
>>> patch introduces some extra works, and it outweighs the benefit from
>>> saving few bytes copy.
>>>
>>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>> ---
>>> lib/librte_vhost/vhost.c | 2 +
>>> lib/librte_vhost/vhost.h | 21 ++++++
>>> lib/librte_vhost/vhost_user.c | 41 +++++++++-
>>> lib/librte_vhost/virtio_net.c | 169 +++++++++++++++++++++++++++++++++++++-----
>>> 4 files changed, 214 insertions(+), 19 deletions(-)
>>>
>> ...
>>
>>> rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>>> struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
>>> @@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>>> if (unlikely(vq->enabled == 0))
>>> return 0;
>>>
>>> + if (dev->tx_zero_copy) {
>>> + struct zcopy_mbuf *zmbuf, *next;
>>> + int nr_updated = 0;
>>> +
>>> + for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
>>> + zmbuf != NULL; zmbuf = next) {
>>> + next = TAILQ_NEXT(zmbuf, next);
>>> +
>>> + if (mbuf_is_consumed(zmbuf->mbuf)) {
>>> + used_idx = vq->last_used_idx++ & (vq->size - 1);
>>> + update_used_ring(dev, vq, used_idx,
>>> + zmbuf->desc_idx);
>>> + nr_updated += 1;
>>> +
>>> + TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
>>> + rte_pktmbuf_free(zmbuf->mbuf);
>>> + put_zmbuf(zmbuf);
>>> + vq->nr_zmbuf -= 1;
>>> + }
>> Shouldn't you break the loop here as soon as a mbuf is not consumed?
>
> I have thought of that as well, as a micro optimization. But I was
> wondering what if a heading mbuf is pin locked by the DPDK APP? Then
> the whole chain would be blocked. This should be rare, but I think
> we should think of the worst case.
>
> Besides that, the performance boost I got is quite decent, that I think
> we could drop this micro optimization.
Forget my comment, this was a misunderstanding of the code on my side.
Regards,
Maxime
next prev parent reply other threads:[~2016-08-23 15:40 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-23 8:10 [dpdk-dev] [PATCH 0/6] vhost: add Tx zero copy support Yuanhan Liu
2016-08-23 8:10 ` [dpdk-dev] [PATCH 1/6] vhost: simplify memory regions handling Yuanhan Liu
2016-08-23 9:17 ` Maxime Coquelin
2016-08-24 7:26 ` Xu, Qian Q
2016-08-24 7:40 ` Yuanhan Liu
2016-08-24 7:36 ` Xu, Qian Q
2016-08-23 8:10 ` [dpdk-dev] [PATCH 2/6] vhost: get guest/host physical address mappings Yuanhan Liu
2016-08-23 9:58 ` Maxime Coquelin
2016-08-23 12:32 ` Yuanhan Liu
2016-08-23 13:25 ` Maxime Coquelin
2016-08-23 13:49 ` Yuanhan Liu
2016-08-23 14:05 ` Maxime Coquelin
2016-08-23 8:10 ` [dpdk-dev] [PATCH 3/6] vhost: introduce last avail idx for Tx Yuanhan Liu
2016-08-23 12:27 ` Maxime Coquelin
2016-08-23 8:10 ` [dpdk-dev] [PATCH 4/6] vhost: add Tx zero copy Yuanhan Liu
2016-08-23 14:04 ` Maxime Coquelin
2016-08-23 14:31 ` Yuanhan Liu
2016-08-23 15:40 ` Maxime Coquelin [this message]
2016-08-23 8:10 ` [dpdk-dev] [PATCH 5/6] vhost: add a flag to enable " Yuanhan Liu
2016-09-06 9:00 ` Xu, Qian Q
2016-09-06 9:42 ` Xu, Qian Q
2016-09-06 10:02 ` Yuanhan Liu
2016-09-07 2:43 ` Xu, Qian Q
2016-09-06 9:55 ` Yuanhan Liu
2016-09-07 16:00 ` Thomas Monjalon
2016-09-08 7:21 ` Yuanhan Liu
2016-09-08 7:57 ` Thomas Monjalon
2016-08-23 8:10 ` [dpdk-dev] [PATCH 6/6] examples/vhost: add an option " Yuanhan Liu
2016-08-23 9:31 ` Thomas Monjalon
2016-08-23 12:33 ` Yuanhan Liu
2016-08-23 14:14 ` Maxime Coquelin
2016-08-23 14:45 ` Yuanhan Liu
2016-08-23 14:18 ` [dpdk-dev] [PATCH 0/6] vhost: add Tx zero copy support Maxime Coquelin
2016-08-23 14:42 ` Yuanhan Liu
2016-08-23 14:53 ` Yuanhan Liu
2016-08-23 16:41 ` Maxime Coquelin
2016-08-29 8:32 ` Xu, Qian Q
2016-08-29 8:57 ` Xu, Qian Q
2016-09-23 4:11 ` Yuanhan Liu
2016-10-09 15:20 ` Yuanhan Liu
2016-09-23 4:13 ` [dpdk-dev] [PATCH v2 0/7] vhost: add dequeue " Yuanhan Liu
2016-09-23 4:13 ` [dpdk-dev] [PATCH v2 1/7] vhost: simplify memory regions handling Yuanhan Liu
2016-09-23 4:13 ` [dpdk-dev] [PATCH v2 2/7] vhost: get guest/host physical address mappings Yuanhan Liu
2016-09-26 20:17 ` Maxime Coquelin
2016-09-23 4:13 ` [dpdk-dev] [PATCH v2 3/7] vhost: introduce last avail idx for dequeue Yuanhan Liu
2016-09-23 4:13 ` [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy Yuanhan Liu
2016-09-26 20:45 ` Maxime Coquelin
2016-10-06 14:37 ` Xu, Qian Q
2016-10-09 2:03 ` Yuanhan Liu
2016-10-10 10:12 ` Xu, Qian Q
2016-10-10 10:14 ` Maxime Coquelin
2016-10-10 10:22 ` Xu, Qian Q
2016-10-10 10:40 ` Xu, Qian Q
2016-10-10 11:48 ` Maxime Coquelin
2016-09-23 4:13 ` [dpdk-dev] [PATCH v2 5/7] vhost: add a flag to enable " Yuanhan Liu
2016-09-26 20:57 ` Maxime Coquelin
2016-09-23 4:13 ` [dpdk-dev] [PATCH v2 6/7] examples/vhost: add an option " Yuanhan Liu
2016-09-26 21:05 ` Maxime Coquelin
2016-09-23 4:13 ` [dpdk-dev] [PATCH v2 7/7] net/vhost: " Yuanhan Liu
2016-09-26 21:05 ` Maxime Coquelin
2016-10-09 7:27 ` [dpdk-dev] [PATCH v3 0/7] vhost: add dequeue zero copy support Yuanhan Liu
2016-10-09 7:27 ` [dpdk-dev] [PATCH v3 1/7] vhost: simplify memory regions handling Yuanhan Liu
2016-10-09 7:27 ` [dpdk-dev] [PATCH v3 2/7] vhost: get guest/host physical address mappings Yuanhan Liu
2016-11-29 3:10 ` linhaifeng
2016-11-29 13:14 ` linhaifeng
2016-10-09 7:27 ` [dpdk-dev] [PATCH v3 3/7] vhost: introduce last avail idx for dequeue Yuanhan Liu
2016-10-09 7:27 ` [dpdk-dev] [PATCH v3 4/7] vhost: add dequeue zero copy Yuanhan Liu
2016-10-09 7:27 ` [dpdk-dev] [PATCH v3 5/7] vhost: add a flag to enable " Yuanhan Liu
2016-10-09 7:27 ` [dpdk-dev] [PATCH v3 6/7] examples/vhost: add an option " Yuanhan Liu
2016-10-09 7:28 ` [dpdk-dev] [PATCH v3 7/7] net/vhost: " Yuanhan Liu
2016-10-11 13:04 ` [dpdk-dev] [PATCH v3 0/7] vhost: add dequeue zero copy support Xu, Qian Q
2016-10-12 7:48 ` Yuanhan Liu
2016-10-09 10:46 ` [dpdk-dev] [PATCH 0/6] vhost: add Tx " linhaifeng
2016-10-10 8:03 ` Yuanhan Liu
2016-10-14 7:30 ` linhaifeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a9c099c-0a64-6864-9761-649e0837a54e@redhat.com \
--to=maxime.coquelin@redhat.com \
--cc=dev@dpdk.org \
--cc=yuanhan.liu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).