From: "Fu, Patrick" <patrick.fu@intel.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Xia, Chenbo" <chenbo.xia@intel.com>
Subject: Re: [dpdk-dev] [PATCH v4] vhost: fix async copy fail on multi-page buffers
Date: Wed, 29 Jul 2020 02:05:48 +0000 [thread overview]
Message-ID: <DM5PR1101MB216916391F84F09603100CAB84700@DM5PR1101MB2169.namprd11.prod.outlook.com> (raw)
In-Reply-To: <DM5PR1101MB21696FD2CFADBB948600155A84700@DM5PR1101MB2169.namprd11.prod.outlook.com>
Hi Maxime,
> -----Original Message-----
> From: Fu, Patrick
> Sent: Wednesday, July 29, 2020 9:40 AM
> To: 'Maxime Coquelin' <maxime.coquelin@redhat.com>; dev@dpdk.org; Xia,
> Chenbo <Chenbo.Xia@intel.com>
> Subject: RE: [PATCH v4] vhost: fix async copy fail on multi-page buffers
>
> Hi Maxime,
>
> > -----Original Message-----
> > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Sent: Tuesday, July 28, 2020 9:55 PM
> > To: Fu, Patrick <patrick.fu@intel.com>; dev@dpdk.org; Xia, Chenbo
> > <chenbo.xia@intel.com>
> > Subject: Re: [PATCH v4] vhost: fix async copy fail on multi-page
> > buffers
> >
> >
> >
> > On 7/28/20 5:28 AM, patrick.fu@intel.com wrote:
> > > From: Patrick Fu <patrick.fu@intel.com>
> > >
> > > Async copy fails when single ring buffer vector is splited on
> > > multiple physical pages. This happens because current hpa address
> > > translation function doesn't handle multi-page buffers. A new gpa to
> > > hpa address conversion function, which returns the hpa on the first
> > > hitting host pages, is implemented in this patch. Async data path
> > > recursively calls this new function to construct a multi-segments
> > > async copy descriptor for ring buffers crossing physical page boundaries.
> > >
> > > Fixes: cd6760da1076 ("vhost: introduce async enqueue for split
> > > ring")
> > >
> > > Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> > > ---
> > > v2:
> > > - change commit message and title
> > > - v1 patch used CPU to copy multi-page buffers; v2 patch split the
> > > copy into multiple async copy segments whenever possible
> > >
> > > v3:
> > > - added fixline
> > >
> > > v4:
> > > - fix miss translation of the gpa which is the same length with host
> > > page size
> > >
> > > lib/librte_vhost/vhost.h | 50
> +++++++++++++++++++++++++++++++++++
> > > lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
> > > 2 files changed, 75 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/lib/librte_vhost/virtio_net.c
> > > b/lib/librte_vhost/virtio_net.c index 95a0bc19f..124a33a10 100644
> > > --- a/lib/librte_vhost/virtio_net.c
> > > +++ b/lib/librte_vhost/virtio_net.c
> > > @@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev,
> > > struct
> > vhost_virtqueue *vq,
> > > struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
> > > struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
> > > int error = 0;
> > > + uint64_t mapped_len;
> > >
> > > uint32_t tlen = 0;
> > > int tvec_idx = 0;
> > > @@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev,
> > > struct vhost_virtqueue *vq,
> > >
> > > cpy_len = RTE_MIN(buf_avail, mbuf_avail);
> > >
> > > - if (unlikely(cpy_len >= cpy_threshold)) {
> > > - hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> > > - buf_iova + buf_offset, cpy_len);
> > > + while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
> > > + hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
> > > + buf_iova + buf_offset,
> > > + cpy_len, &mapped_len);
> > >
> > > - if (unlikely(!hpa)) {
> > > - error = -1;
> > > - goto out;
> > > - }
> > > + if (unlikely(!hpa || mapped_len < cpy_threshold))
> > > + break;
> > >
> > > async_fill_vec(src_iovec + tvec_idx,
> > > (void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
> > > - mbuf_offset), cpy_len);
> > > + mbuf_offset), (size_t)mapped_len);
> > >
> > > - async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
> > > + async_fill_vec(dst_iovec + tvec_idx,
> > > + hpa, (size_t)mapped_len);
> > >
> > > - tlen += cpy_len;
> > > + tlen += (uint32_t)mapped_len;
> > > + cpy_len -= (uint32_t)mapped_len;
> > > + mbuf_avail -= (uint32_t)mapped_len;
> > > + mbuf_offset += (uint32_t)mapped_len;
> > > + buf_avail -= (uint32_t)mapped_len;
> > > + buf_offset += (uint32_t)mapped_len;
> > > tvec_idx++;
> > > - } else {
> > > + }
> > > +
> > > + if (likely(cpy_len)) {
> > > if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
> > > rte_memcpy(
> > > (void *)((uintptr_t)(buf_addr + buf_offset)),
> > @@ -1112,10
> > > +1120,12 @@ async_mbuf_to_desc(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> > > }
> > > }
> > >
> > > - mbuf_avail -= cpy_len;
> > > - mbuf_offset += cpy_len;
> > > - buf_avail -= cpy_len;
> > > - buf_offset += cpy_len;
> > > + if (cpy_len) {
> > > + mbuf_avail -= cpy_len;
> > > + mbuf_offset += cpy_len;
> > > + buf_avail -= cpy_len;
> > > + buf_offset += cpy_len;
> > > + }
> >
> > Is that really necessary to check if copy length is not 0?
> >
> The intension is to optimize for the case that ring buffers are NOT split (which
> should be the most common case). In that case, cpy_len will be zero and by
> this "if" statement we can save couple of cycles. With that said, the actual
> difference is minor. I'm open with either adding an "unlikely" to the "if", or
> removing this the "if". Would like to hear your option and submit modified
> patch.
>
I have a better way to handle the case (combine this "if" logic with the previous one).
Please review my v5 patch for the code change.
Thanks,
Patrick
next prev parent reply other threads:[~2020-07-29 2:06 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-20 2:52 [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path patrick.fu
2020-07-20 16:39 ` Maxime Coquelin
2020-07-21 2:57 ` Fu, Patrick
2020-07-21 8:35 ` Maxime Coquelin
2020-07-21 9:01 ` Fu, Patrick
2020-07-24 13:49 ` [dpdk-dev] [PATCH v2] vhost: fix async copy fail on multi-page buffers patrick.fu
2020-07-27 6:33 ` [dpdk-dev] [PATCH v3] " patrick.fu
2020-07-27 13:14 ` Xia, Chenbo
2020-07-28 3:09 ` Fu, Patrick
2020-07-28 3:28 ` [dpdk-dev] [PATCH v4] " patrick.fu
2020-07-28 13:55 ` Maxime Coquelin
2020-07-29 1:40 ` Fu, Patrick
2020-07-29 2:05 ` Fu, Patrick [this message]
2020-07-29 2:04 ` [dpdk-dev] [PATCH v5] " Patrick Fu
2020-07-29 14:24 ` Maxime Coquelin
2020-07-29 14:55 ` Maxime Coquelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM5PR1101MB216916391F84F09603100CAB84700@DM5PR1101MB2169.namprd11.prod.outlook.com \
--to=patrick.fu@intel.com \
--cc=chenbo.xia@intel.com \
--cc=dev@dpdk.org \
--cc=maxime.coquelin@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).