* [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path
@ 2020-07-20  2:52 patrick.fu
  2020-07-20 16:39 ` Maxime Coquelin
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: patrick.fu @ 2020-07-20  2:52 UTC (permalink / raw)
  To: dev, maxime.coquelin, chenbo.xia; +Cc: patrick.fu
From: Patrick Fu <patrick.fu@intel.com>
Async copy fails when ring buffer cross two physical pages. This patch
fix the failure by letting copies occur in sync mode if crossing page
buffers are given.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
---
 lib/librte_vhost/virtio_net.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 1d0be3dd4..44b22a8ad 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1071,16 +1071,10 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		}
 
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
+		hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
+				buf_iova + buf_offset, cpy_len);
 
-		if (unlikely(cpy_len >= cpy_threshold)) {
-			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
-					buf_iova + buf_offset, cpy_len);
-
-			if (unlikely(!hpa)) {
-				error = -1;
-				goto out;
-			}
-
+		if (unlikely(cpy_len >= cpy_threshold && hpa)) {
 			async_fill_vec(src_iovec + tvec_idx,
 				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
 						mbuf_offset), cpy_len);
-- 
2.18.4
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path
  2020-07-20  2:52 [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path patrick.fu
@ 2020-07-20 16:39 ` Maxime Coquelin
  2020-07-21  2:57   ` Fu, Patrick
  2020-07-24 13:49 ` [dpdk-dev] [PATCH v2] vhost: fix async copy fail on multi-page buffers patrick.fu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Maxime Coquelin @ 2020-07-20 16:39 UTC (permalink / raw)
  To: patrick.fu, dev, chenbo.xia
The title could be improved, it is not very clear IMHO.
On 7/20/20 4:52 AM, patrick.fu@intel.com wrote:
> From: Patrick Fu <patrick.fu@intel.com>
> 
> Async copy fails when ring buffer cross two physical pages. This patch
> fix the failure by letting copies occur in sync mode if crossing page
> buffers are given.
Wouldn't it be possible to have the buffer split into two iovecs?
> Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
> 
> Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> ---
>  lib/librte_vhost/virtio_net.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 1d0be3dd4..44b22a8ad 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -1071,16 +1071,10 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  		}
>  
>  		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
> +		hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> +				buf_iova + buf_offset, cpy_len);
>  
> -		if (unlikely(cpy_len >= cpy_threshold)) {
> -			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> -					buf_iova + buf_offset, cpy_len);
> -
> -			if (unlikely(!hpa)) {
> -				error = -1;
> -				goto out;
> -			}
> -
> +		if (unlikely(cpy_len >= cpy_threshold && hpa)) {
>  			async_fill_vec(src_iovec + tvec_idx,
>  				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
>  						mbuf_offset), cpy_len);
> 
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path
  2020-07-20 16:39 ` Maxime Coquelin
@ 2020-07-21  2:57   ` Fu, Patrick
  2020-07-21  8:35     ` Maxime Coquelin
  0 siblings, 1 reply; 16+ messages in thread
From: Fu, Patrick @ 2020-07-21  2:57 UTC (permalink / raw)
  To: Maxime Coquelin, dev, Xia, Chenbo
Hi Maxime,
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, July 21, 2020 12:40 AM
> To: Fu, Patrick <patrick.fu@intel.com>; dev@dpdk.org; Xia, Chenbo
> <chenbo.xia@intel.com>
> Subject: Re: [PATCH v1] vhost: support cross page buf in async data path
> 
> The title could be improved, it is not very clear IMHO.
How about: 
vhost: fix async copy failure on buffers cross page boundary
> On 7/20/20 4:52 AM, patrick.fu@intel.com wrote:
> > From: Patrick Fu <patrick.fu@intel.com>
> >
> > Async copy fails when ring buffer cross two physical pages. This patch
> > fix the failure by letting copies occur in sync mode if crossing page
> > buffers are given.
> 
> Wouldn't it be possible to have the buffer split into two iovecs?
Technically we can do that, however, it will also introduce significant overhead:
 - overhead from adding additional logic in vhost async data path to handle the case
 - overhead from dma device to consume 2 iovecs
In average, I don't think dma copy can benefit too much for the buffer which are split into multiple pages. 
CPU copy shall be a more suitable method.
 
> > Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
> >
> > Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> > ---
> >  lib/librte_vhost/virtio_net.c | 12 +++---------
> >  1 file changed, 3 insertions(+), 9 deletions(-)
> >
> > diff --git a/lib/librte_vhost/virtio_net.c
> > b/lib/librte_vhost/virtio_net.c index 1d0be3dd4..44b22a8ad 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -1071,16 +1071,10 @@ async_mbuf_to_desc(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> >  		}
> >
> >  		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
> > +		hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> > +				buf_iova + buf_offset, cpy_len);
> >
> > -		if (unlikely(cpy_len >= cpy_threshold)) {
> > -			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> > -					buf_iova + buf_offset, cpy_len);
> > -
> > -			if (unlikely(!hpa)) {
> > -				error = -1;
> > -				goto out;
> > -			}
> > -
> > +		if (unlikely(cpy_len >= cpy_threshold && hpa)) {
> >  			async_fill_vec(src_iovec + tvec_idx,
> >  				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
> >  						mbuf_offset), cpy_len);
> >
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path
  2020-07-21  2:57   ` Fu, Patrick
@ 2020-07-21  8:35     ` Maxime Coquelin
  2020-07-21  9:01       ` Fu, Patrick
  0 siblings, 1 reply; 16+ messages in thread
From: Maxime Coquelin @ 2020-07-21  8:35 UTC (permalink / raw)
  To: Fu, Patrick, dev, Xia, Chenbo
Hi Patrick,
On 7/21/20 4:57 AM, Fu, Patrick wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, July 21, 2020 12:40 AM
>> To: Fu, Patrick <patrick.fu@intel.com>; dev@dpdk.org; Xia, Chenbo
>> <chenbo.xia@intel.com>
>> Subject: Re: [PATCH v1] vhost: support cross page buf in async data path
>>
>> The title could be improved, it is not very clear IMHO.
> How about: 
> vhost: fix async copy failure on buffers cross page boundary
> 
>> On 7/20/20 4:52 AM, patrick.fu@intel.com wrote:
>>> From: Patrick Fu <patrick.fu@intel.com>
>>>
>>> Async copy fails when ring buffer cross two physical pages. This patch
>>> fix the failure by letting copies occur in sync mode if crossing page
>>> buffers are given.
>>
>> Wouldn't it be possible to have the buffer split into two iovecs?
> Technically we can do that, however, it will also introduce significant overhead:
>  - overhead from adding additional logic in vhost async data path to handle the case
>  - overhead from dma device to consume 2 iovecs
> In average, I don't think dma copy can benefit too much for the buffer which are split into multiple pages. 
> CPU copy shall be a more suitable method.
I think we should try, that would make a cleaner implementation. I don't
think having to fallback to sync mode is a good idea because it adds an
overhead on the CPU, which is what we try to avoid with this async mode.
Also, I am not convinced the overhead would be that significant, at
least I hope so, otherwise it would mean this new path is just
performing better because it takes a lot of shortcuts, like the vector
path in Virtio PMD.
Regards,
Maxime
>  
>>> Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
>>>
>>> Signed-off-by: Patrick Fu <patrick.fu@intel.com>
>>> ---
>>>  lib/librte_vhost/virtio_net.c | 12 +++---------
>>>  1 file changed, 3 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/virtio_net.c
>>> b/lib/librte_vhost/virtio_net.c index 1d0be3dd4..44b22a8ad 100644
>>> --- a/lib/librte_vhost/virtio_net.c
>>> +++ b/lib/librte_vhost/virtio_net.c
>>> @@ -1071,16 +1071,10 @@ async_mbuf_to_desc(struct virtio_net *dev,
>> struct vhost_virtqueue *vq,
>>>  		}
>>>
>>>  		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
>>> +		hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
>>> +				buf_iova + buf_offset, cpy_len);
>>>
>>> -		if (unlikely(cpy_len >= cpy_threshold)) {
>>> -			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
>>> -					buf_iova + buf_offset, cpy_len);
>>> -
>>> -			if (unlikely(!hpa)) {
>>> -				error = -1;
>>> -				goto out;
>>> -			}
>>> -
>>> +		if (unlikely(cpy_len >= cpy_threshold && hpa)) {
>>>  			async_fill_vec(src_iovec + tvec_idx,
>>>  				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
>>>  						mbuf_offset), cpy_len);
>>>
> 
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path
  2020-07-21  8:35     ` Maxime Coquelin
@ 2020-07-21  9:01       ` Fu, Patrick
  0 siblings, 0 replies; 16+ messages in thread
From: Fu, Patrick @ 2020-07-21  9:01 UTC (permalink / raw)
  To: Maxime Coquelin, dev, Xia, Chenbo
Hi Maxime,
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, July 21, 2020 4:35 PM
> To: Fu, Patrick <patrick.fu@intel.com>; dev@dpdk.org; Xia, Chenbo
> <chenbo.xia@intel.com>
> Subject: Re: [PATCH v1] vhost: support cross page buf in async data path
> 
> Hi Patrick,
> 
> On 7/21/20 4:57 AM, Fu, Patrick wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, July 21, 2020 12:40 AM
> >> To: Fu, Patrick <patrick.fu@intel.com>; dev@dpdk.org; Xia, Chenbo
> >> <chenbo.xia@intel.com>
> >> Subject: Re: [PATCH v1] vhost: support cross page buf in async data
> >> path
> >>
> >> The title could be improved, it is not very clear IMHO.
> > How about:
> > vhost: fix async copy failure on buffers cross page boundary
> >
> >> On 7/20/20 4:52 AM, patrick.fu@intel.com wrote:
> >>> From: Patrick Fu <patrick.fu@intel.com>
> >>>
> >>> Async copy fails when ring buffer cross two physical pages. This
> >>> patch fix the failure by letting copies occur in sync mode if
> >>> crossing page buffers are given.
> >>
> >> Wouldn't it be possible to have the buffer split into two iovecs?
> > Technically we can do that, however, it will also introduce significant
> overhead:
> >  - overhead from adding additional logic in vhost async data path to
> > handle the case
> >  - overhead from dma device to consume 2 iovecs In average, I don't
> > think dma copy can benefit too much for the buffer which are split into
> multiple pages.
> > CPU copy shall be a more suitable method.
> 
> I think we should try, that would make a cleaner implementation. I don't
> think having to fallback to sync mode is a good idea because it adds an
> overhead on the CPU, which is what we try to avoid with this async mode.
> 
> Also, I am not convinced the overhead would be that significant, at least I
> hope so, otherwise it would mean this new path is just performing better
> because it takes a lot of shortcuts, like the vector path in Virtio PMD.
I can make a trial patch and do some comparison. I will try to feedback the result by this weekend.
Thanks,
Patrick
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v2] vhost: fix async copy fail on multi-page buffers
  2020-07-20  2:52 [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path patrick.fu
  2020-07-20 16:39 ` Maxime Coquelin
@ 2020-07-24 13:49 ` patrick.fu
  2020-07-27  6:33 ` [dpdk-dev] [PATCH v3] " patrick.fu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: patrick.fu @ 2020-07-24 13:49 UTC (permalink / raw)
  To: dev, maxime.coquelin, chenbo.xia; +Cc: Patrick Fu
From: Patrick Fu <patrick.fu@intel.com>
Async copy fails when single ring buffer vector is splited on multiple
physical pages. This happens because current hpa address translation
function doesn't handle multi-page buffers. A new gpa to hpa address
conversion function, which returns the hpa on the first hitting host
pages, is implemented in this patch. Async data path recursively calls
this new function to construct a multi-segments async copy descriptor
for ring buffers crossing physical page boundaries.
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
---
v2:
 - change commit message and title
 - v1 patch used CPU to copy multi-page buffers; v2 patch split the
copy into multiple async copy segments whenever possible
 lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
 lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
 2 files changed, 75 insertions(+), 15 deletions(-)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 0f7212f88..05c202a57 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -616,6 +616,56 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
 	return 0;
 }
 
+static __rte_always_inline rte_iova_t
+gpa_to_first_hpa(struct virtio_net *dev, uint64_t gpa,
+	uint64_t gpa_size, uint64_t *hpa_size)
+{
+	uint32_t i;
+	struct guest_page *page;
+	struct guest_page key;
+
+	*hpa_size = gpa_size;
+	if (dev->nr_guest_pages >= VHOST_BINARY_SEARCH_THRESH) {
+		key.guest_phys_addr = gpa & ~(dev->guest_pages[0].size - 1);
+		page = bsearch(&key, dev->guest_pages, dev->nr_guest_pages,
+			       sizeof(struct guest_page), guest_page_addrcmp);
+		if (page) {
+			if (gpa + gpa_size <=
+					page->guest_phys_addr + page->size) {
+				return gpa - page->guest_phys_addr +
+					page->host_phys_addr;
+			} else if (gpa < page->guest_phys_addr +
+						page->size) {
+				*hpa_size = page->guest_phys_addr +
+					page->size - gpa;
+				return gpa - page->guest_phys_addr +
+					page->host_phys_addr;
+			}
+		}
+	} else {
+		for (i = 0; i < dev->nr_guest_pages; i++) {
+			page = &dev->guest_pages[i];
+
+			if (gpa >= page->guest_phys_addr) {
+				if (gpa + gpa_size <
+					page->guest_phys_addr + page->size) {
+					return gpa - page->guest_phys_addr +
+						page->host_phys_addr;
+				} else if (gpa < page->guest_phys_addr +
+							page->size) {
+					*hpa_size = page->guest_phys_addr +
+						page->size - gpa;
+					return gpa - page->guest_phys_addr +
+						page->host_phys_addr;
+				}
+			}
+		}
+	}
+
+	*hpa_size = 0;
+	return 0;
+}
+
 static __rte_always_inline uint64_t
 hva_to_gpa(struct virtio_net *dev, uint64_t vva, uint64_t len)
 {
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 95a0bc19f..124a33a10 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
 	int error = 0;
+	uint64_t mapped_len;
 
 	uint32_t tlen = 0;
 	int tvec_idx = 0;
@@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (unlikely(cpy_len >= cpy_threshold)) {
-			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
-					buf_iova + buf_offset, cpy_len);
+		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
+			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
+					buf_iova + buf_offset,
+					cpy_len, &mapped_len);
 
-			if (unlikely(!hpa)) {
-				error = -1;
-				goto out;
-			}
+			if (unlikely(!hpa || mapped_len < cpy_threshold))
+				break;
 
 			async_fill_vec(src_iovec + tvec_idx,
 				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-						mbuf_offset), cpy_len);
+				mbuf_offset), (size_t)mapped_len);
 
-			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
+			async_fill_vec(dst_iovec + tvec_idx,
+					hpa, (size_t)mapped_len);
 
-			tlen += cpy_len;
+			tlen += (uint32_t)mapped_len;
+			cpy_len -= (uint32_t)mapped_len;
+			mbuf_avail  -= (uint32_t)mapped_len;
+			mbuf_offset += (uint32_t)mapped_len;
+			buf_avail  -= (uint32_t)mapped_len;
+			buf_offset += (uint32_t)mapped_len;
 			tvec_idx++;
-		} else {
+		}
+
+		if (likely(cpy_len)) {
 			if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
 				rte_memcpy(
 				(void *)((uintptr_t)(buf_addr + buf_offset)),
@@ -1112,10 +1120,12 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			}
 		}
 
-		mbuf_avail  -= cpy_len;
-		mbuf_offset += cpy_len;
-		buf_avail  -= cpy_len;
-		buf_offset += cpy_len;
+		if (cpy_len) {
+			mbuf_avail  -= cpy_len;
+			mbuf_offset += cpy_len;
+			buf_avail  -= cpy_len;
+			buf_offset += cpy_len;
+		}
 	}
 
 out:
-- 
2.18.4
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v3] vhost: fix async copy fail on multi-page buffers
  2020-07-20  2:52 [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path patrick.fu
  2020-07-20 16:39 ` Maxime Coquelin
  2020-07-24 13:49 ` [dpdk-dev] [PATCH v2] vhost: fix async copy fail on multi-page buffers patrick.fu
@ 2020-07-27  6:33 ` patrick.fu
  2020-07-27 13:14   ` Xia, Chenbo
  2020-07-28  3:28 ` [dpdk-dev] [PATCH v4] " patrick.fu
  2020-07-29  2:04 ` [dpdk-dev] [PATCH v5] " Patrick Fu
  4 siblings, 1 reply; 16+ messages in thread
From: patrick.fu @ 2020-07-27  6:33 UTC (permalink / raw)
  To: dev, maxime.coquelin, chenbo.xia; +Cc: Patrick Fu
From: Patrick Fu <patrick.fu@intel.com>
Async copy fails when single ring buffer vector is splited on multiple
physical pages. This happens because current hpa address translation
function doesn't handle multi-page buffers. A new gpa to hpa address
conversion function, which returns the hpa on the first hitting host
pages, is implemented in this patch. Async data path recursively calls
this new function to construct a multi-segments async copy descriptor
for ring buffers crossing physical page boundaries.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
---
v2:
 - change commit message and title
 - v1 patch used CPU to copy multi-page buffers; v2 patch split the
copy into multiple async copy segments whenever possible
v3:
 - added fixline
 lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
 lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
 2 files changed, 75 insertions(+), 15 deletions(-)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 0f7212f88..05c202a57 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -616,6 +616,56 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
 	return 0;
 }
 
+static __rte_always_inline rte_iova_t
+gpa_to_first_hpa(struct virtio_net *dev, uint64_t gpa,
+	uint64_t gpa_size, uint64_t *hpa_size)
+{
+	uint32_t i;
+	struct guest_page *page;
+	struct guest_page key;
+
+	*hpa_size = gpa_size;
+	if (dev->nr_guest_pages >= VHOST_BINARY_SEARCH_THRESH) {
+		key.guest_phys_addr = gpa & ~(dev->guest_pages[0].size - 1);
+		page = bsearch(&key, dev->guest_pages, dev->nr_guest_pages,
+			       sizeof(struct guest_page), guest_page_addrcmp);
+		if (page) {
+			if (gpa + gpa_size <=
+					page->guest_phys_addr + page->size) {
+				return gpa - page->guest_phys_addr +
+					page->host_phys_addr;
+			} else if (gpa < page->guest_phys_addr +
+						page->size) {
+				*hpa_size = page->guest_phys_addr +
+					page->size - gpa;
+				return gpa - page->guest_phys_addr +
+					page->host_phys_addr;
+			}
+		}
+	} else {
+		for (i = 0; i < dev->nr_guest_pages; i++) {
+			page = &dev->guest_pages[i];
+
+			if (gpa >= page->guest_phys_addr) {
+				if (gpa + gpa_size <
+					page->guest_phys_addr + page->size) {
+					return gpa - page->guest_phys_addr +
+						page->host_phys_addr;
+				} else if (gpa < page->guest_phys_addr +
+							page->size) {
+					*hpa_size = page->guest_phys_addr +
+						page->size - gpa;
+					return gpa - page->guest_phys_addr +
+						page->host_phys_addr;
+				}
+			}
+		}
+	}
+
+	*hpa_size = 0;
+	return 0;
+}
+
 static __rte_always_inline uint64_t
 hva_to_gpa(struct virtio_net *dev, uint64_t vva, uint64_t len)
 {
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 95a0bc19f..124a33a10 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
 	int error = 0;
+	uint64_t mapped_len;
 
 	uint32_t tlen = 0;
 	int tvec_idx = 0;
@@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (unlikely(cpy_len >= cpy_threshold)) {
-			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
-					buf_iova + buf_offset, cpy_len);
+		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
+			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
+					buf_iova + buf_offset,
+					cpy_len, &mapped_len);
 
-			if (unlikely(!hpa)) {
-				error = -1;
-				goto out;
-			}
+			if (unlikely(!hpa || mapped_len < cpy_threshold))
+				break;
 
 			async_fill_vec(src_iovec + tvec_idx,
 				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-						mbuf_offset), cpy_len);
+				mbuf_offset), (size_t)mapped_len);
 
-			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
+			async_fill_vec(dst_iovec + tvec_idx,
+					hpa, (size_t)mapped_len);
 
-			tlen += cpy_len;
+			tlen += (uint32_t)mapped_len;
+			cpy_len -= (uint32_t)mapped_len;
+			mbuf_avail  -= (uint32_t)mapped_len;
+			mbuf_offset += (uint32_t)mapped_len;
+			buf_avail  -= (uint32_t)mapped_len;
+			buf_offset += (uint32_t)mapped_len;
 			tvec_idx++;
-		} else {
+		}
+
+		if (likely(cpy_len)) {
 			if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
 				rte_memcpy(
 				(void *)((uintptr_t)(buf_addr + buf_offset)),
@@ -1112,10 +1120,12 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			}
 		}
 
-		mbuf_avail  -= cpy_len;
-		mbuf_offset += cpy_len;
-		buf_avail  -= cpy_len;
-		buf_offset += cpy_len;
+		if (cpy_len) {
+			mbuf_avail  -= cpy_len;
+			mbuf_offset += cpy_len;
+			buf_avail  -= cpy_len;
+			buf_offset += cpy_len;
+		}
 	}
 
 out:
-- 
2.18.4
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v3] vhost: fix async copy fail on multi-page buffers
  2020-07-27  6:33 ` [dpdk-dev] [PATCH v3] " patrick.fu
@ 2020-07-27 13:14   ` Xia, Chenbo
  2020-07-28  3:09     ` Fu, Patrick
  0 siblings, 1 reply; 16+ messages in thread
From: Xia, Chenbo @ 2020-07-27 13:14 UTC (permalink / raw)
  To: Fu, Patrick, dev, maxime.coquelin
Hi Patrick,
> -----Original Message-----
> From: Fu, Patrick <patrick.fu@intel.com>
> Sent: Monday, July 27, 2020 2:33 PM
> To: dev@dpdk.org; maxime.coquelin@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>
> Cc: Fu, Patrick <patrick.fu@intel.com>
> Subject: [PATCH v3] vhost: fix async copy fail on multi-page buffers
> 
> From: Patrick Fu <patrick.fu@intel.com>
> 
> Async copy fails when single ring buffer vector is splited on multiple physical
> pages. This happens because current hpa address translation function doesn't
> handle multi-page buffers. A new gpa to hpa address conversion function, which
> returns the hpa on the first hitting host pages, is implemented in this patch.
> Async data path recursively calls this new function to construct a multi-segments
> async copy descriptor for ring buffers crossing physical page boundaries.
> 
> Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
> 
> Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> ---
> v2:
>  - change commit message and title
>  - v1 patch used CPU to copy multi-page buffers; v2 patch split the copy into
> multiple async copy segments whenever possible
> 
> v3:
>  - added fixline
> 
>  lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
>  lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
>  2 files changed, 75 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index
> 0f7212f88..05c202a57 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -616,6 +616,56 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa,
> uint64_t size)
>  	return 0;
>  }
> 
> +static __rte_always_inline rte_iova_t
> +gpa_to_first_hpa(struct virtio_net *dev, uint64_t gpa,
> +	uint64_t gpa_size, uint64_t *hpa_size) {
> +	uint32_t i;
> +	struct guest_page *page;
> +	struct guest_page key;
> +
> +	*hpa_size = gpa_size;
> +	if (dev->nr_guest_pages >= VHOST_BINARY_SEARCH_THRESH) {
> +		key.guest_phys_addr = gpa & ~(dev->guest_pages[0].size - 1);
> +		page = bsearch(&key, dev->guest_pages, dev->nr_guest_pages,
> +			       sizeof(struct guest_page), guest_page_addrcmp);
> +		if (page) {
> +			if (gpa + gpa_size <=
> +					page->guest_phys_addr + page->size) {
> +				return gpa - page->guest_phys_addr +
> +					page->host_phys_addr;
> +			} else if (gpa < page->guest_phys_addr +
> +						page->size) {
> +				*hpa_size = page->guest_phys_addr +
> +					page->size - gpa;
> +				return gpa - page->guest_phys_addr +
> +					page->host_phys_addr;
> +			}
> +		}
> +	} else {
> +		for (i = 0; i < dev->nr_guest_pages; i++) {
> +			page = &dev->guest_pages[i];
> +
> +			if (gpa >= page->guest_phys_addr) {
> +				if (gpa + gpa_size <
Should the '<' be '<=' here?
> +					page->guest_phys_addr + page->size) {
> +					return gpa - page->guest_phys_addr +
> +						page->host_phys_addr;
> +				} else if (gpa < page->guest_phys_addr +
> +							page->size) {
> +					*hpa_size = page->guest_phys_addr +
> +						page->size - gpa;
> +					return gpa - page->guest_phys_addr +
> +						page->host_phys_addr;
> +				}
> +			}
> +		}
> +	}
> +
> +	*hpa_size = 0;
> +	return 0;
> +}
> +
>  static __rte_always_inline uint64_t
>  hva_to_gpa(struct virtio_net *dev, uint64_t vva, uint64_t len)  { diff --git
> a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index
> 95a0bc19f..124a33a10 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>  	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
>  	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
>  	int error = 0;
> +	uint64_t mapped_len;
> 
>  	uint32_t tlen = 0;
>  	int tvec_idx = 0;
> @@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> 
>  		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
> 
> -		if (unlikely(cpy_len >= cpy_threshold)) {
> -			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> -					buf_iova + buf_offset, cpy_len);
> +		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
> +			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
> +					buf_iova + buf_offset,
> +					cpy_len, &mapped_len);
> 
> -			if (unlikely(!hpa)) {
> -				error = -1;
> -				goto out;
> -			}
> +			if (unlikely(!hpa || mapped_len < cpy_threshold))
> +				break;
> 
>  			async_fill_vec(src_iovec + tvec_idx,
>  				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
> -						mbuf_offset), cpy_len);
> +				mbuf_offset), (size_t)mapped_len);
> 
> -			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
> +			async_fill_vec(dst_iovec + tvec_idx,
> +					hpa, (size_t)mapped_len);
> 
> -			tlen += cpy_len;
> +			tlen += (uint32_t)mapped_len;
> +			cpy_len -= (uint32_t)mapped_len;
> +			mbuf_avail  -= (uint32_t)mapped_len;
> +			mbuf_offset += (uint32_t)mapped_len;
> +			buf_avail  -= (uint32_t)mapped_len;
> +			buf_offset += (uint32_t)mapped_len;
Will it be ok we just transform the uint64_t to uint32_t here?
What if mapped_len > MAX uint32_t ?
Thanks!
Chenbo
>  			tvec_idx++;
> -		} else {
> +		}
> +
> +		if (likely(cpy_len)) {
>  			if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
>  				rte_memcpy(
>  				(void *)((uintptr_t)(buf_addr + buf_offset)),
> @@ -1112,10 +1120,12 @@ async_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>  			}
>  		}
> 
> -		mbuf_avail  -= cpy_len;
> -		mbuf_offset += cpy_len;
> -		buf_avail  -= cpy_len;
> -		buf_offset += cpy_len;
> +		if (cpy_len) {
> +			mbuf_avail  -= cpy_len;
> +			mbuf_offset += cpy_len;
> +			buf_avail  -= cpy_len;
> +			buf_offset += cpy_len;
> +		}
>  	}
> 
>  out:
> --
> 2.18.4
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v3] vhost: fix async copy fail on multi-page buffers
  2020-07-27 13:14   ` Xia, Chenbo
@ 2020-07-28  3:09     ` Fu, Patrick
  0 siblings, 0 replies; 16+ messages in thread
From: Fu, Patrick @ 2020-07-28  3:09 UTC (permalink / raw)
  To: Xia, Chenbo, dev, maxime.coquelin
Hi,
> -----Original Message-----
> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Monday, July 27, 2020 9:14 PM
> To: Fu, Patrick <patrick.fu@intel.com>; dev@dpdk.org;
> maxime.coquelin@redhat.com
> Subject: RE: [PATCH v3] vhost: fix async copy fail on multi-page buffers
> 
> Hi Patrick,
> 
> > -----Original Message-----
> > From: Fu, Patrick <patrick.fu@intel.com>
> > Sent: Monday, July 27, 2020 2:33 PM
> > To: dev@dpdk.org; maxime.coquelin@redhat.com; Xia, Chenbo
> > <chenbo.xia@intel.com>
> > Cc: Fu, Patrick <patrick.fu@intel.com>
> > Subject: [PATCH v3] vhost: fix async copy fail on multi-page buffers
> >
> >  lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
> >  lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
> >  2 files changed, 75 insertions(+), 15 deletions(-)
> >
> > +	} else {
> > +		for (i = 0; i < dev->nr_guest_pages; i++) {
> > +			page = &dev->guest_pages[i];
> > +
> > +			if (gpa >= page->guest_phys_addr) {
> > +				if (gpa + gpa_size <
> 
> Should the '<' be '<=' here?
> 
Yes, it should be. Will update in v4 patch
> >  static __rte_always_inline uint64_t
> >  hva_to_gpa(struct virtio_net *dev, uint64_t vva, uint64_t len)  {
> > diff --git a/lib/librte_vhost/virtio_net.c
> > b/lib/librte_vhost/virtio_net.c index
> > 95a0bc19f..124a33a10 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> >  	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
> >  	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
> >  	int error = 0;
> > +	uint64_t mapped_len;
> >
> >  	uint32_t tlen = 0;
> >  	int tvec_idx = 0;
> > @@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> >
> >  		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
> >
> > -		if (unlikely(cpy_len >= cpy_threshold)) {
> > -			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> > -					buf_iova + buf_offset, cpy_len);
> > +		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
> > +			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
> > +					buf_iova + buf_offset,
> > +					cpy_len, &mapped_len);
> >
> > -			if (unlikely(!hpa)) {
> > -				error = -1;
> > -				goto out;
> > -			}
> > +			if (unlikely(!hpa || mapped_len < cpy_threshold))
> > +				break;
> >
> >  			async_fill_vec(src_iovec + tvec_idx,
> >  				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
> > -						mbuf_offset), cpy_len);
> > +				mbuf_offset), (size_t)mapped_len);
> >
> > -			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
> > +			async_fill_vec(dst_iovec + tvec_idx,
> > +					hpa, (size_t)mapped_len);
> >
> > -			tlen += cpy_len;
> > +			tlen += (uint32_t)mapped_len;
> > +			cpy_len -= (uint32_t)mapped_len;
> > +			mbuf_avail  -= (uint32_t)mapped_len;
> > +			mbuf_offset += (uint32_t)mapped_len;
> > +			buf_avail  -= (uint32_t)mapped_len;
> > +			buf_offset += (uint32_t)mapped_len;
> 
> Will it be ok we just transform the uint64_t to uint32_t here?
> What if mapped_len > MAX uint32_t ?
> 
"mapped_len" could not exceed "cpy_len" (according to the gpa_to_first_hpa() logic). As cpy_len is uint32, mapped_len is safe to down casting.
Thanks,
Patrick
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v4] vhost: fix async copy fail on multi-page buffers
  2020-07-20  2:52 [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path patrick.fu
                   ` (2 preceding siblings ...)
  2020-07-27  6:33 ` [dpdk-dev] [PATCH v3] " patrick.fu
@ 2020-07-28  3:28 ` patrick.fu
  2020-07-28 13:55   ` Maxime Coquelin
  2020-07-29  2:04 ` [dpdk-dev] [PATCH v5] " Patrick Fu
  4 siblings, 1 reply; 16+ messages in thread
From: patrick.fu @ 2020-07-28  3:28 UTC (permalink / raw)
  To: dev, maxime.coquelin, chenbo.xia; +Cc: Patrick Fu
From: Patrick Fu <patrick.fu@intel.com>
Async copy fails when single ring buffer vector is splited on multiple
physical pages. This happens because current hpa address translation
function doesn't handle multi-page buffers. A new gpa to hpa address
conversion function, which returns the hpa on the first hitting host
pages, is implemented in this patch. Async data path recursively calls
this new function to construct a multi-segments async copy descriptor
for ring buffers crossing physical page boundaries.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
---
v2:
 - change commit message and title
 - v1 patch used CPU to copy multi-page buffers; v2 patch split the
copy into multiple async copy segments whenever possible
v3:
 - added fixline
v4:
 - fix miss translation of the gpa which is the same length with host
   page size
 lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
 lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
 2 files changed, 75 insertions(+), 15 deletions(-)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 0f7212f88..05c202a57 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -616,6 +616,56 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
 	return 0;
 }
 
+static __rte_always_inline rte_iova_t
+gpa_to_first_hpa(struct virtio_net *dev, uint64_t gpa,
+	uint64_t gpa_size, uint64_t *hpa_size)
+{
+	uint32_t i;
+	struct guest_page *page;
+	struct guest_page key;
+
+	*hpa_size = gpa_size;
+	if (dev->nr_guest_pages >= VHOST_BINARY_SEARCH_THRESH) {
+		key.guest_phys_addr = gpa & ~(dev->guest_pages[0].size - 1);
+		page = bsearch(&key, dev->guest_pages, dev->nr_guest_pages,
+			       sizeof(struct guest_page), guest_page_addrcmp);
+		if (page) {
+			if (gpa + gpa_size <=
+					page->guest_phys_addr + page->size) {
+				return gpa - page->guest_phys_addr +
+					page->host_phys_addr;
+			} else if (gpa < page->guest_phys_addr +
+						page->size) {
+				*hpa_size = page->guest_phys_addr +
+					page->size - gpa;
+				return gpa - page->guest_phys_addr +
+					page->host_phys_addr;
+			}
+		}
+	} else {
+		for (i = 0; i < dev->nr_guest_pages; i++) {
+			page = &dev->guest_pages[i];
+
+			if (gpa >= page->guest_phys_addr) {
+				if (gpa + gpa_size <=
+					page->guest_phys_addr + page->size) {
+					return gpa - page->guest_phys_addr +
+						page->host_phys_addr;
+				} else if (gpa < page->guest_phys_addr +
+							page->size) {
+					*hpa_size = page->guest_phys_addr +
+						page->size - gpa;
+					return gpa - page->guest_phys_addr +
+						page->host_phys_addr;
+				}
+			}
+		}
+	}
+
+	*hpa_size = 0;
+	return 0;
+}
+
 static __rte_always_inline uint64_t
 hva_to_gpa(struct virtio_net *dev, uint64_t vva, uint64_t len)
 {
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 95a0bc19f..124a33a10 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
 	int error = 0;
+	uint64_t mapped_len;
 
 	uint32_t tlen = 0;
 	int tvec_idx = 0;
@@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (unlikely(cpy_len >= cpy_threshold)) {
-			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
-					buf_iova + buf_offset, cpy_len);
+		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
+			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
+					buf_iova + buf_offset,
+					cpy_len, &mapped_len);
 
-			if (unlikely(!hpa)) {
-				error = -1;
-				goto out;
-			}
+			if (unlikely(!hpa || mapped_len < cpy_threshold))
+				break;
 
 			async_fill_vec(src_iovec + tvec_idx,
 				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-						mbuf_offset), cpy_len);
+				mbuf_offset), (size_t)mapped_len);
 
-			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
+			async_fill_vec(dst_iovec + tvec_idx,
+					hpa, (size_t)mapped_len);
 
-			tlen += cpy_len;
+			tlen += (uint32_t)mapped_len;
+			cpy_len -= (uint32_t)mapped_len;
+			mbuf_avail  -= (uint32_t)mapped_len;
+			mbuf_offset += (uint32_t)mapped_len;
+			buf_avail  -= (uint32_t)mapped_len;
+			buf_offset += (uint32_t)mapped_len;
 			tvec_idx++;
-		} else {
+		}
+
+		if (likely(cpy_len)) {
 			if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
 				rte_memcpy(
 				(void *)((uintptr_t)(buf_addr + buf_offset)),
@@ -1112,10 +1120,12 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			}
 		}
 
-		mbuf_avail  -= cpy_len;
-		mbuf_offset += cpy_len;
-		buf_avail  -= cpy_len;
-		buf_offset += cpy_len;
+		if (cpy_len) {
+			mbuf_avail  -= cpy_len;
+			mbuf_offset += cpy_len;
+			buf_avail  -= cpy_len;
+			buf_offset += cpy_len;
+		}
 	}
 
 out:
-- 
2.18.4
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v4] vhost: fix async copy fail on multi-page buffers
  2020-07-28  3:28 ` [dpdk-dev] [PATCH v4] " patrick.fu
@ 2020-07-28 13:55   ` Maxime Coquelin
  2020-07-29  1:40     ` Fu, Patrick
  0 siblings, 1 reply; 16+ messages in thread
From: Maxime Coquelin @ 2020-07-28 13:55 UTC (permalink / raw)
  To: patrick.fu, dev, chenbo.xia
On 7/28/20 5:28 AM, patrick.fu@intel.com wrote:
> From: Patrick Fu <patrick.fu@intel.com>
> 
> Async copy fails when single ring buffer vector is splited on multiple
> physical pages. This happens because current hpa address translation
> function doesn't handle multi-page buffers. A new gpa to hpa address
> conversion function, which returns the hpa on the first hitting host
> pages, is implemented in this patch. Async data path recursively calls
> this new function to construct a multi-segments async copy descriptor
> for ring buffers crossing physical page boundaries.
> 
> Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
> 
> Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> ---
> v2:
>  - change commit message and title
>  - v1 patch used CPU to copy multi-page buffers; v2 patch split the
> copy into multiple async copy segments whenever possible
> 
> v3:
>  - added fixline
> 
> v4:
>  - fix miss translation of the gpa which is the same length with host
>    page size
> 
>  lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
>  lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
>  2 files changed, 75 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 0f7212f88..05c202a57 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -616,6 +616,56 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
>  	return 0;
>  }
>  
> +static __rte_always_inline rte_iova_t
> +gpa_to_first_hpa(struct virtio_net *dev, uint64_t gpa,
> +	uint64_t gpa_size, uint64_t *hpa_size)
> +{
> +	uint32_t i;
> +	struct guest_page *page;
> +	struct guest_page key;
> +
> +	*hpa_size = gpa_size;
> +	if (dev->nr_guest_pages >= VHOST_BINARY_SEARCH_THRESH) {
> +		key.guest_phys_addr = gpa & ~(dev->guest_pages[0].size - 1);
> +		page = bsearch(&key, dev->guest_pages, dev->nr_guest_pages,
> +			       sizeof(struct guest_page), guest_page_addrcmp);
> +		if (page) {
> +			if (gpa + gpa_size <=
> +					page->guest_phys_addr + page->size) {
> +				return gpa - page->guest_phys_addr +
> +					page->host_phys_addr;
> +			} else if (gpa < page->guest_phys_addr +
> +						page->size) {
> +				*hpa_size = page->guest_phys_addr +
> +					page->size - gpa;
> +				return gpa - page->guest_phys_addr +
> +					page->host_phys_addr;
> +			}
> +		}
> +	} else {
> +		for (i = 0; i < dev->nr_guest_pages; i++) {
> +			page = &dev->guest_pages[i];
> +
> +			if (gpa >= page->guest_phys_addr) {
> +				if (gpa + gpa_size <=
> +					page->guest_phys_addr + page->size) {
> +					return gpa - page->guest_phys_addr +
> +						page->host_phys_addr;
> +				} else if (gpa < page->guest_phys_addr +
> +							page->size) {
> +					*hpa_size = page->guest_phys_addr +
> +						page->size - gpa;
> +					return gpa - page->guest_phys_addr +
> +						page->host_phys_addr;
> +				}
> +			}
> +		}
> +	}
> +
> +	*hpa_size = 0;
> +	return 0;
> +}
> +
>  static __rte_always_inline uint64_t
>  hva_to_gpa(struct virtio_net *dev, uint64_t vva, uint64_t len)
>  {
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 95a0bc19f..124a33a10 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
>  	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
>  	int error = 0;
> +	uint64_t mapped_len;
>  
>  	uint32_t tlen = 0;
>  	int tvec_idx = 0;
> @@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  
>  		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
>  
> -		if (unlikely(cpy_len >= cpy_threshold)) {
> -			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> -					buf_iova + buf_offset, cpy_len);
> +		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
> +			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
> +					buf_iova + buf_offset,
> +					cpy_len, &mapped_len);
>  
> -			if (unlikely(!hpa)) {
> -				error = -1;
> -				goto out;
> -			}
> +			if (unlikely(!hpa || mapped_len < cpy_threshold))
> +				break;
>  
>  			async_fill_vec(src_iovec + tvec_idx,
>  				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
> -						mbuf_offset), cpy_len);
> +				mbuf_offset), (size_t)mapped_len);
>  
> -			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
> +			async_fill_vec(dst_iovec + tvec_idx,
> +					hpa, (size_t)mapped_len);
>  
> -			tlen += cpy_len;
> +			tlen += (uint32_t)mapped_len;
> +			cpy_len -= (uint32_t)mapped_len;
> +			mbuf_avail  -= (uint32_t)mapped_len;
> +			mbuf_offset += (uint32_t)mapped_len;
> +			buf_avail  -= (uint32_t)mapped_len;
> +			buf_offset += (uint32_t)mapped_len;
>  			tvec_idx++;
> -		} else {
> +		}
> +
> +		if (likely(cpy_len)) {
>  			if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
>  				rte_memcpy(
>  				(void *)((uintptr_t)(buf_addr + buf_offset)),
> @@ -1112,10 +1120,12 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  			}
>  		}
>  
> -		mbuf_avail  -= cpy_len;
> -		mbuf_offset += cpy_len;
> -		buf_avail  -= cpy_len;
> -		buf_offset += cpy_len;
> +		if (cpy_len) {
> +			mbuf_avail  -= cpy_len;
> +			mbuf_offset += cpy_len;
> +			buf_avail  -= cpy_len;
> +			buf_offset += cpy_len;
> +		}
Is that really necessary to check if copy length is not 0?
Thanks,
Maxime
>  	}
>  
>  out:
> 
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v4] vhost: fix async copy fail on multi-page buffers
  2020-07-28 13:55   ` Maxime Coquelin
@ 2020-07-29  1:40     ` Fu, Patrick
  2020-07-29  2:05       ` Fu, Patrick
  0 siblings, 1 reply; 16+ messages in thread
From: Fu, Patrick @ 2020-07-29  1:40 UTC (permalink / raw)
  To: Maxime Coquelin, dev, Xia, Chenbo
Hi Maxime,
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, July 28, 2020 9:55 PM
> To: Fu, Patrick <patrick.fu@intel.com>; dev@dpdk.org; Xia, Chenbo
> <chenbo.xia@intel.com>
> Subject: Re: [PATCH v4] vhost: fix async copy fail on multi-page buffers
> 
> 
> 
> On 7/28/20 5:28 AM, patrick.fu@intel.com wrote:
> > From: Patrick Fu <patrick.fu@intel.com>
> >
> > Async copy fails when single ring buffer vector is splited on multiple
> > physical pages. This happens because current hpa address translation
> > function doesn't handle multi-page buffers. A new gpa to hpa address
> > conversion function, which returns the hpa on the first hitting host
> > pages, is implemented in this patch. Async data path recursively calls
> > this new function to construct a multi-segments async copy descriptor
> > for ring buffers crossing physical page boundaries.
> >
> > Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
> >
> > Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> > ---
> > v2:
> >  - change commit message and title
> >  - v1 patch used CPU to copy multi-page buffers; v2 patch split the
> > copy into multiple async copy segments whenever possible
> >
> > v3:
> >  - added fixline
> >
> > v4:
> >  - fix miss translation of the gpa which is the same length with host
> >    page size
> >
> >  lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
> >  lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
> >  2 files changed, 75 insertions(+), 15 deletions(-)
> >
> > diff --git a/lib/librte_vhost/virtio_net.c
> > b/lib/librte_vhost/virtio_net.c index 95a0bc19f..124a33a10 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> >  	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
> >  	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
> >  	int error = 0;
> > +	uint64_t mapped_len;
> >
> >  	uint32_t tlen = 0;
> >  	int tvec_idx = 0;
> > @@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> >
> >  		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
> >
> > -		if (unlikely(cpy_len >= cpy_threshold)) {
> > -			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> > -					buf_iova + buf_offset, cpy_len);
> > +		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
> > +			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
> > +					buf_iova + buf_offset,
> > +					cpy_len, &mapped_len);
> >
> > -			if (unlikely(!hpa)) {
> > -				error = -1;
> > -				goto out;
> > -			}
> > +			if (unlikely(!hpa || mapped_len < cpy_threshold))
> > +				break;
> >
> >  			async_fill_vec(src_iovec + tvec_idx,
> >  				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
> > -						mbuf_offset), cpy_len);
> > +				mbuf_offset), (size_t)mapped_len);
> >
> > -			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
> > +			async_fill_vec(dst_iovec + tvec_idx,
> > +					hpa, (size_t)mapped_len);
> >
> > -			tlen += cpy_len;
> > +			tlen += (uint32_t)mapped_len;
> > +			cpy_len -= (uint32_t)mapped_len;
> > +			mbuf_avail  -= (uint32_t)mapped_len;
> > +			mbuf_offset += (uint32_t)mapped_len;
> > +			buf_avail  -= (uint32_t)mapped_len;
> > +			buf_offset += (uint32_t)mapped_len;
> >  			tvec_idx++;
> > -		} else {
> > +		}
> > +
> > +		if (likely(cpy_len)) {
> >  			if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
> >  				rte_memcpy(
> >  				(void *)((uintptr_t)(buf_addr + buf_offset)),
> @@ -1112,10
> > +1120,12 @@ async_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> >  			}
> >  		}
> >
> > -		mbuf_avail  -= cpy_len;
> > -		mbuf_offset += cpy_len;
> > -		buf_avail  -= cpy_len;
> > -		buf_offset += cpy_len;
> > +		if (cpy_len) {
> > +			mbuf_avail  -= cpy_len;
> > +			mbuf_offset += cpy_len;
> > +			buf_avail  -= cpy_len;
> > +			buf_offset += cpy_len;
> > +		}
> 
> Is that really necessary to check if copy length is not 0?
> 
The intension is to optimize for the case that ring buffers are NOT split (which should be the most common case). In that case, cpy_len will be zero and by this "if" statement we can save couple of cycles. With that said, the actual difference is minor. I'm open with either adding an "unlikely" to the "if", or removing this the "if". Would like to hear your option and submit modified patch.
> Thanks,
> Maxime
> 
> >  	}
> >
> >  out:
> >
^ permalink raw reply	[flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v5] vhost: fix async copy fail on multi-page buffers
  2020-07-20  2:52 [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path patrick.fu
                   ` (3 preceding siblings ...)
  2020-07-28  3:28 ` [dpdk-dev] [PATCH v4] " patrick.fu
@ 2020-07-29  2:04 ` Patrick Fu
  2020-07-29 14:24   ` Maxime Coquelin
  2020-07-29 14:55   ` Maxime Coquelin
  4 siblings, 2 replies; 16+ messages in thread
From: Patrick Fu @ 2020-07-29  2:04 UTC (permalink / raw)
  To: dev, maxime.coquelin, chenbo.xia; +Cc: Patrick Fu
Async copy fails when single ring buffer vector is splited on multiple
physical pages. This happens because current hpa address translation
function doesn't handle multi-page buffers. A new gpa to hpa address
conversion function, which returns the hpa on the first hitting host
pages, is implemented in this patch. Async data path recursively calls
this new function to construct a multi-segments async copy descriptor
for ring buffers crossing physical page boundaries.
Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
Signed-off-by: Patrick Fu <patrick.fu@intel.com>
---
v2:
 - change commit message and title
 - v1 patch used CPU to copy multi-page buffers; v2 patch split the
copy into multiple async copy segments whenever possible
v3:
 - added fixline
v4:
 - fix miss translation of the gpa which is the same length with host
   page size
v5:
 - combine redundant "if" statement in async_mbuf_to_desc()
 lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
 lib/librte_vhost/virtio_net.c | 39 ++++++++++++++++-----------
 2 files changed, 74 insertions(+), 15 deletions(-)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 0f7212f88..070eb960f 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -616,6 +616,56 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
 	return 0;
 }
 
+static __rte_always_inline rte_iova_t
+gpa_to_first_hpa(struct virtio_net *dev, uint64_t gpa,
+	uint64_t gpa_size, uint64_t *hpa_size)
+{
+	uint32_t i;
+	struct guest_page *page;
+	struct guest_page key;
+
+	*hpa_size = gpa_size;
+	if (dev->nr_guest_pages >= VHOST_BINARY_SEARCH_THRESH) {
+		key.guest_phys_addr = gpa & ~(dev->guest_pages[0].size - 1);
+		page = bsearch(&key, dev->guest_pages, dev->nr_guest_pages,
+			       sizeof(struct guest_page), guest_page_addrcmp);
+		if (page) {
+			if (gpa + gpa_size <=
+					page->guest_phys_addr + page->size) {
+				return gpa - page->guest_phys_addr +
+					page->host_phys_addr;
+			} else if (gpa < page->guest_phys_addr +
+						page->size) {
+				*hpa_size = page->guest_phys_addr +
+					page->size - gpa;
+				return gpa - page->guest_phys_addr +
+					page->host_phys_addr;
+			}
+		}
+	} else {
+		for (i = 0; i < dev->nr_guest_pages; i++) {
+			page = &dev->guest_pages[i];
+
+			if (gpa >= page->guest_phys_addr) {
+				if (gpa + gpa_size <=
+					page->guest_phys_addr + page->size) {
+					return gpa - page->guest_phys_addr +
+						page->host_phys_addr;
+				} else if (gpa < page->guest_phys_addr +
+							page->size) {
+					*hpa_size = page->guest_phys_addr +
+						page->size - gpa;
+					return gpa - page->guest_phys_addr +
+						page->host_phys_addr;
+				}
+			}
+		}
+	}
+
+	*hpa_size = 0;
+	return 0;
+}
+
 static __rte_always_inline uint64_t
 hva_to_gpa(struct virtio_net *dev, uint64_t vva, uint64_t len)
 {
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 95a0bc19f..bd9303c8a 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
 	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
 	int error = 0;
+	uint64_t mapped_len;
 
 	uint32_t tlen = 0;
 	int tvec_idx = 0;
@@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
 
-		if (unlikely(cpy_len >= cpy_threshold)) {
-			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
-					buf_iova + buf_offset, cpy_len);
+		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
+			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
+					buf_iova + buf_offset,
+					cpy_len, &mapped_len);
 
-			if (unlikely(!hpa)) {
-				error = -1;
-				goto out;
-			}
+			if (unlikely(!hpa || mapped_len < cpy_threshold))
+				break;
 
 			async_fill_vec(src_iovec + tvec_idx,
 				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
-						mbuf_offset), cpy_len);
+				mbuf_offset), (size_t)mapped_len);
 
-			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
+			async_fill_vec(dst_iovec + tvec_idx,
+					hpa, (size_t)mapped_len);
 
-			tlen += cpy_len;
+			tlen += (uint32_t)mapped_len;
+			cpy_len -= (uint32_t)mapped_len;
+			mbuf_avail  -= (uint32_t)mapped_len;
+			mbuf_offset += (uint32_t)mapped_len;
+			buf_avail  -= (uint32_t)mapped_len;
+			buf_offset += (uint32_t)mapped_len;
 			tvec_idx++;
-		} else {
+		}
+
+		if (likely(cpy_len)) {
 			if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
 				rte_memcpy(
 				(void *)((uintptr_t)(buf_addr + buf_offset)),
@@ -1110,12 +1118,13 @@ async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 					cpy_len;
 				vq->batch_copy_nb_elems++;
 			}
+
+			mbuf_avail  -= cpy_len;
+			mbuf_offset += cpy_len;
+			buf_avail  -= cpy_len;
+			buf_offset += cpy_len;
 		}
 
-		mbuf_avail  -= cpy_len;
-		mbuf_offset += cpy_len;
-		buf_avail  -= cpy_len;
-		buf_offset += cpy_len;
 	}
 
 out:
-- 
2.18.4
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v4] vhost: fix async copy fail on multi-page buffers
  2020-07-29  1:40     ` Fu, Patrick
@ 2020-07-29  2:05       ` Fu, Patrick
  0 siblings, 0 replies; 16+ messages in thread
From: Fu, Patrick @ 2020-07-29  2:05 UTC (permalink / raw)
  To: Maxime Coquelin, dev, Xia, Chenbo
Hi Maxime,
> -----Original Message-----
> From: Fu, Patrick
> Sent: Wednesday, July 29, 2020 9:40 AM
> To: 'Maxime Coquelin' <maxime.coquelin@redhat.com>; dev@dpdk.org; Xia,
> Chenbo <Chenbo.Xia@intel.com>
> Subject: RE: [PATCH v4] vhost: fix async copy fail on multi-page buffers
> 
> Hi Maxime,
> 
> > -----Original Message-----
> > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Sent: Tuesday, July 28, 2020 9:55 PM
> > To: Fu, Patrick <patrick.fu@intel.com>; dev@dpdk.org; Xia, Chenbo
> > <chenbo.xia@intel.com>
> > Subject: Re: [PATCH v4] vhost: fix async copy fail on multi-page
> > buffers
> >
> >
> >
> > On 7/28/20 5:28 AM, patrick.fu@intel.com wrote:
> > > From: Patrick Fu <patrick.fu@intel.com>
> > >
> > > Async copy fails when single ring buffer vector is splited on
> > > multiple physical pages. This happens because current hpa address
> > > translation function doesn't handle multi-page buffers. A new gpa to
> > > hpa address conversion function, which returns the hpa on the first
> > > hitting host pages, is implemented in this patch. Async data path
> > > recursively calls this new function to construct a multi-segments
> > > async copy descriptor for ring buffers crossing physical page boundaries.
> > >
> > > Fixes: cd6760da1076 ("vhost: introduce async enqueue for split
> > > ring")
> > >
> > > Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> > > ---
> > > v2:
> > >  - change commit message and title
> > >  - v1 patch used CPU to copy multi-page buffers; v2 patch split the
> > > copy into multiple async copy segments whenever possible
> > >
> > > v3:
> > >  - added fixline
> > >
> > > v4:
> > >  - fix miss translation of the gpa which is the same length with host
> > >    page size
> > >
> > >  lib/librte_vhost/vhost.h      | 50
> +++++++++++++++++++++++++++++++++++
> > >  lib/librte_vhost/virtio_net.c | 40 +++++++++++++++++-----------
> > >  2 files changed, 75 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/lib/librte_vhost/virtio_net.c
> > > b/lib/librte_vhost/virtio_net.c index 95a0bc19f..124a33a10 100644
> > > --- a/lib/librte_vhost/virtio_net.c
> > > +++ b/lib/librte_vhost/virtio_net.c
> > > @@ -980,6 +980,7 @@ async_mbuf_to_desc(struct virtio_net *dev,
> > > struct
> > vhost_virtqueue *vq,
> > >  	struct batch_copy_elem *batch_copy = vq->batch_copy_elems;
> > >  	struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
> > >  	int error = 0;
> > > +	uint64_t mapped_len;
> > >
> > >  	uint32_t tlen = 0;
> > >  	int tvec_idx = 0;
> > > @@ -1072,24 +1073,31 @@ async_mbuf_to_desc(struct virtio_net *dev,
> > > struct vhost_virtqueue *vq,
> > >
> > >  		cpy_len = RTE_MIN(buf_avail, mbuf_avail);
> > >
> > > -		if (unlikely(cpy_len >= cpy_threshold)) {
> > > -			hpa = (void *)(uintptr_t)gpa_to_hpa(dev,
> > > -					buf_iova + buf_offset, cpy_len);
> > > +		while (unlikely(cpy_len && cpy_len >= cpy_threshold)) {
> > > +			hpa = (void *)(uintptr_t)gpa_to_first_hpa(dev,
> > > +					buf_iova + buf_offset,
> > > +					cpy_len, &mapped_len);
> > >
> > > -			if (unlikely(!hpa)) {
> > > -				error = -1;
> > > -				goto out;
> > > -			}
> > > +			if (unlikely(!hpa || mapped_len < cpy_threshold))
> > > +				break;
> > >
> > >  			async_fill_vec(src_iovec + tvec_idx,
> > >  				(void *)(uintptr_t)rte_pktmbuf_iova_offset(m,
> > > -						mbuf_offset), cpy_len);
> > > +				mbuf_offset), (size_t)mapped_len);
> > >
> > > -			async_fill_vec(dst_iovec + tvec_idx, hpa, cpy_len);
> > > +			async_fill_vec(dst_iovec + tvec_idx,
> > > +					hpa, (size_t)mapped_len);
> > >
> > > -			tlen += cpy_len;
> > > +			tlen += (uint32_t)mapped_len;
> > > +			cpy_len -= (uint32_t)mapped_len;
> > > +			mbuf_avail  -= (uint32_t)mapped_len;
> > > +			mbuf_offset += (uint32_t)mapped_len;
> > > +			buf_avail  -= (uint32_t)mapped_len;
> > > +			buf_offset += (uint32_t)mapped_len;
> > >  			tvec_idx++;
> > > -		} else {
> > > +		}
> > > +
> > > +		if (likely(cpy_len)) {
> > >  			if (unlikely(vq->batch_copy_nb_elems >= vq->size)) {
> > >  				rte_memcpy(
> > >  				(void *)((uintptr_t)(buf_addr + buf_offset)),
> > @@ -1112,10
> > > +1120,12 @@ async_mbuf_to_desc(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> > >  			}
> > >  		}
> > >
> > > -		mbuf_avail  -= cpy_len;
> > > -		mbuf_offset += cpy_len;
> > > -		buf_avail  -= cpy_len;
> > > -		buf_offset += cpy_len;
> > > +		if (cpy_len) {
> > > +			mbuf_avail  -= cpy_len;
> > > +			mbuf_offset += cpy_len;
> > > +			buf_avail  -= cpy_len;
> > > +			buf_offset += cpy_len;
> > > +		}
> >
> > Is that really necessary to check if copy length is not 0?
> >
> The intension is to optimize for the case that ring buffers are NOT split (which
> should be the most common case). In that case, cpy_len will be zero and by
> this "if" statement we can save couple of cycles. With that said, the actual
> difference is minor. I'm open with either adding an "unlikely" to the "if", or
> removing this the "if". Would like to hear your option and submit modified
> patch.
> 
I have a better way to handle the case (combine this "if" logic with the previous one). 
Please review my v5 patch for the code change.
Thanks,
Patrick
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v5] vhost: fix async copy fail on multi-page buffers
  2020-07-29  2:04 ` [dpdk-dev] [PATCH v5] " Patrick Fu
@ 2020-07-29 14:24   ` Maxime Coquelin
  2020-07-29 14:55   ` Maxime Coquelin
  1 sibling, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2020-07-29 14:24 UTC (permalink / raw)
  To: Patrick Fu, dev, chenbo.xia
On 7/29/20 4:04 AM, Patrick Fu wrote:
> Async copy fails when single ring buffer vector is splited on multiple
> physical pages. This happens because current hpa address translation
> function doesn't handle multi-page buffers. A new gpa to hpa address
> conversion function, which returns the hpa on the first hitting host
> pages, is implemented in this patch. Async data path recursively calls
> this new function to construct a multi-segments async copy descriptor
> for ring buffers crossing physical page boundaries.
> 
> Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
> 
> Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> ---
> v2:
>  - change commit message and title
>  - v1 patch used CPU to copy multi-page buffers; v2 patch split the
> copy into multiple async copy segments whenever possible
> 
> v3:
>  - added fixline
> 
> v4:
>  - fix miss translation of the gpa which is the same length with host
>    page size
> v5:
>  - combine redundant "if" statement in async_mbuf_to_desc()
> 
>  lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
>  lib/librte_vhost/virtio_net.c | 39 ++++++++++++++++-----------
>  2 files changed, 74 insertions(+), 15 deletions(-)
Thanks Patrick, it looks better to me now:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Maxime
^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v5] vhost: fix async copy fail on multi-page buffers
  2020-07-29  2:04 ` [dpdk-dev] [PATCH v5] " Patrick Fu
  2020-07-29 14:24   ` Maxime Coquelin
@ 2020-07-29 14:55   ` Maxime Coquelin
  1 sibling, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2020-07-29 14:55 UTC (permalink / raw)
  To: Patrick Fu, dev, chenbo.xia
On 7/29/20 4:04 AM, Patrick Fu wrote:
> Async copy fails when single ring buffer vector is splited on multiple
> physical pages. This happens because current hpa address translation
> function doesn't handle multi-page buffers. A new gpa to hpa address
> conversion function, which returns the hpa on the first hitting host
> pages, is implemented in this patch. Async data path recursively calls
> this new function to construct a multi-segments async copy descriptor
> for ring buffers crossing physical page boundaries.
> 
> Fixes: cd6760da1076 ("vhost: introduce async enqueue for split ring")
> 
> Signed-off-by: Patrick Fu <patrick.fu@intel.com>
> ---
> v2:
>  - change commit message and title
>  - v1 patch used CPU to copy multi-page buffers; v2 patch split the
> copy into multiple async copy segments whenever possible
> 
> v3:
>  - added fixline
> 
> v4:
>  - fix miss translation of the gpa which is the same length with host
>    page size
> v5:
>  - combine redundant "if" statement in async_mbuf_to_desc()
> 
>  lib/librte_vhost/vhost.h      | 50 +++++++++++++++++++++++++++++++++++
>  lib/librte_vhost/virtio_net.c | 39 ++++++++++++++++-----------
>  2 files changed, 74 insertions(+), 15 deletions(-)
Applied to dpdk-next-virtio/master.
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-07-29 14:56 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-20  2:52 [dpdk-dev] [PATCH v1] vhost: support cross page buf in async data path patrick.fu
2020-07-20 16:39 ` Maxime Coquelin
2020-07-21  2:57   ` Fu, Patrick
2020-07-21  8:35     ` Maxime Coquelin
2020-07-21  9:01       ` Fu, Patrick
2020-07-24 13:49 ` [dpdk-dev] [PATCH v2] vhost: fix async copy fail on multi-page buffers patrick.fu
2020-07-27  6:33 ` [dpdk-dev] [PATCH v3] " patrick.fu
2020-07-27 13:14   ` Xia, Chenbo
2020-07-28  3:09     ` Fu, Patrick
2020-07-28  3:28 ` [dpdk-dev] [PATCH v4] " patrick.fu
2020-07-28 13:55   ` Maxime Coquelin
2020-07-29  1:40     ` Fu, Patrick
2020-07-29  2:05       ` Fu, Patrick
2020-07-29  2:04 ` [dpdk-dev] [PATCH v5] " Patrick Fu
2020-07-29 14:24   ` Maxime Coquelin
2020-07-29 14:55   ` Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).