DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
@ 2018-12-19  9:29 Maxime Coquelin
  2018-12-19 16:43 ` Michael S. Tsirkin
  2018-12-20  4:44 ` Tiwei Bie
  0 siblings, 2 replies; 6+ messages in thread
From: Maxime Coquelin @ 2018-12-19  9:29 UTC (permalink / raw)
  To: dev, i.maximets, tiwei.bie, zhihong.wang, jfreiman, mst; +Cc: Maxime Coquelin

Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.

With Kernel's pktgen benchmark, ~3% performance gain is measured.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---

V2:
Revert back to initial implementation to have a write
barrier before every descs flags store, but still
store first desc flags last. (Missing barrier reported
by Ilya)


 lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8c657a101..de436af79 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
 {
 	int i;
 	uint16_t used_idx = vq->last_used_idx;
+	uint16_t head_idx = vq->last_used_idx;
+	uint16_t head_flags = 0;
 
 	/* Split loop in two to save memory barriers */
 	for (i = 0; i < vq->shadow_used_idx; i++) {
@@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
 			flags &= ~VRING_DESC_F_AVAIL;
 		}
 
-		vq->desc_packed[vq->last_used_idx].flags = flags;
+		if (i > 0) {
+			vq->desc_packed[vq->last_used_idx].flags = flags;
 
-		vhost_log_cache_used_vring(dev, vq,
+			vhost_log_cache_used_vring(dev, vq,
 					vq->last_used_idx *
 					sizeof(struct vring_packed_desc),
 					sizeof(struct vring_packed_desc));
+		} else {
+			head_idx = vq->last_used_idx;
+			head_flags = flags;
+		}
 
 		vq->last_used_idx += vq->shadow_used_packed[i].count;
 		if (vq->last_used_idx >= vq->size) {
@@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
 		}
 	}
 
-	rte_smp_wmb();
+	vq->desc_packed[head_idx].flags = head_flags;
+
+	vhost_log_cache_used_vring(dev, vq,
+				vq->last_used_idx *
+				sizeof(struct vring_packed_desc),
+				sizeof(struct vring_packed_desc));
+
 	vq->shadow_used_idx = 0;
 	vhost_log_cache_sync(dev, vq);
 }
-- 
2.17.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
  2018-12-19  9:29 [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
@ 2018-12-19 16:43 ` Michael S. Tsirkin
  2018-12-20  4:44 ` Tiwei Bie
  1 sibling, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2018-12-19 16:43 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, i.maximets, tiwei.bie, zhihong.wang, jfreiman

On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
> Instead of writing back descriptors chains in order, let's
> write the first chain flags last in order to improve batching.
> 
> With Kernel's pktgen benchmark, ~3% performance gain is measured.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> 
> V2:
> Revert back to initial implementation to have a write
> barrier before every descs flags store, but still
> store first desc flags last. (Missing barrier reported
> by Ilya)
> 
> 
>  lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 8c657a101..de436af79 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  {
>  	int i;
>  	uint16_t used_idx = vq->last_used_idx;
> +	uint16_t head_idx = vq->last_used_idx;
> +	uint16_t head_flags = 0;
>  
>  	/* Split loop in two to save memory barriers */
>  	for (i = 0; i < vq->shadow_used_idx; i++) {
> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  			flags &= ~VRING_DESC_F_AVAIL;
>  		}
>  
> -		vq->desc_packed[vq->last_used_idx].flags = flags;
> +		if (i > 0) {
> +			vq->desc_packed[vq->last_used_idx].flags = flags;
>  
> -		vhost_log_cache_used_vring(dev, vq,
> +			vhost_log_cache_used_vring(dev, vq,
>  					vq->last_used_idx *
>  					sizeof(struct vring_packed_desc),
>  					sizeof(struct vring_packed_desc));
> +		} else {
> +			head_idx = vq->last_used_idx;
> +			head_flags = flags;
> +		}
>  
>  		vq->last_used_idx += vq->shadow_used_packed[i].count;
>  		if (vq->last_used_idx >= vq->size) {
> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  		}
>  	}
>  
> -	rte_smp_wmb();
> +	vq->desc_packed[head_idx].flags = head_flags;
> +
> +	vhost_log_cache_used_vring(dev, vq,
> +				vq->last_used_idx *
> +				sizeof(struct vring_packed_desc),
> +				sizeof(struct vring_packed_desc));
> +
>  	vq->shadow_used_idx = 0;
>  	vhost_log_cache_sync(dev, vq);
>  }
> -- 
> 2.17.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
  2018-12-19  9:29 [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
  2018-12-19 16:43 ` Michael S. Tsirkin
@ 2018-12-20  4:44 ` Tiwei Bie
  2018-12-20  8:49   ` Maxime Coquelin
  2018-12-20 14:03   ` Michael S. Tsirkin
  1 sibling, 2 replies; 6+ messages in thread
From: Tiwei Bie @ 2018-12-20  4:44 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, i.maximets, zhihong.wang, jfreiman, mst

On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
> Instead of writing back descriptors chains in order, let's
> write the first chain flags last in order to improve batching.
> 
> With Kernel's pktgen benchmark, ~3% performance gain is measured.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> 
> V2:
> Revert back to initial implementation to have a write
> barrier before every descs flags store, but still
> store first desc flags last. (Missing barrier reported
> by Ilya)
> 
> 
>  lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 8c657a101..de436af79 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  {
>  	int i;
>  	uint16_t used_idx = vq->last_used_idx;
> +	uint16_t head_idx = vq->last_used_idx;
> +	uint16_t head_flags = 0;
>  
>  	/* Split loop in two to save memory barriers */
>  	for (i = 0; i < vq->shadow_used_idx; i++) {
> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  			flags &= ~VRING_DESC_F_AVAIL;
>  		}
>  
> -		vq->desc_packed[vq->last_used_idx].flags = flags;
> +		if (i > 0) {
> +			vq->desc_packed[vq->last_used_idx].flags = flags;
>  
> -		vhost_log_cache_used_vring(dev, vq,
> +			vhost_log_cache_used_vring(dev, vq,
>  					vq->last_used_idx *
>  					sizeof(struct vring_packed_desc),
>  					sizeof(struct vring_packed_desc));
> +		} else {
> +			head_idx = vq->last_used_idx;
> +			head_flags = flags;
> +		}
>  
>  		vq->last_used_idx += vq->shadow_used_packed[i].count;
>  		if (vq->last_used_idx >= vq->size) {
> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  		}
>  	}
>  
> -	rte_smp_wmb();
> +	vq->desc_packed[head_idx].flags = head_flags;
> +
> +	vhost_log_cache_used_vring(dev, vq,
> +				vq->last_used_idx *

Should be head_idx.

> +				sizeof(struct vring_packed_desc),
> +				sizeof(struct vring_packed_desc));
> +
>  	vq->shadow_used_idx = 0;

A wmb() is needed before log_cache_sync?

>  	vhost_log_cache_sync(dev, vq);
>  }
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
  2018-12-20  4:44 ` Tiwei Bie
@ 2018-12-20  8:49   ` Maxime Coquelin
  2018-12-20  9:27     ` Maxime Coquelin
  2018-12-20 14:03   ` Michael S. Tsirkin
  1 sibling, 1 reply; 6+ messages in thread
From: Maxime Coquelin @ 2018-12-20  8:49 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: dev, i.maximets, zhihong.wang, jfreiman, mst



On 12/20/18 5:44 AM, Tiwei Bie wrote:
> On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
>> Instead of writing back descriptors chains in order, let's
>> write the first chain flags last in order to improve batching.
>>
>> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>
>> V2:
>> Revert back to initial implementation to have a write
>> barrier before every descs flags store, but still
>> store first desc flags last. (Missing barrier reported
>> by Ilya)
>>
>>
>>   lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
>>   1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>> index 8c657a101..de436af79 100644
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   {
>>   	int i;
>>   	uint16_t used_idx = vq->last_used_idx;
>> +	uint16_t head_idx = vq->last_used_idx;
>> +	uint16_t head_flags = 0;
>>   
>>   	/* Split loop in two to save memory barriers */
>>   	for (i = 0; i < vq->shadow_used_idx; i++) {
>> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   			flags &= ~VRING_DESC_F_AVAIL;
>>   		}
>>   
>> -		vq->desc_packed[vq->last_used_idx].flags = flags;
>> +		if (i > 0) {
>> +			vq->desc_packed[vq->last_used_idx].flags = flags;
>>   
>> -		vhost_log_cache_used_vring(dev, vq,
>> +			vhost_log_cache_used_vring(dev, vq,
>>   					vq->last_used_idx *
>>   					sizeof(struct vring_packed_desc),
>>   					sizeof(struct vring_packed_desc));
>> +		} else {
>> +			head_idx = vq->last_used_idx;
>> +			head_flags = flags;
>> +		}
>>   
>>   		vq->last_used_idx += vq->shadow_used_packed[i].count;
>>   		if (vq->last_used_idx >= vq->size) {
>> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   		}
>>   	}
>>   
>> -	rte_smp_wmb();
>> +	vq->desc_packed[head_idx].flags = head_flags;
>> +
>> +	vhost_log_cache_used_vring(dev, vq,
>> +				vq->last_used_idx *
> 
> Should be head_idx.

Oh yes, thanks for spotting this.

> 
>> +				sizeof(struct vring_packed_desc),
>> +				sizeof(struct vring_packed_desc));
>> +
>>   	vq->shadow_used_idx = 0;
> 
> A wmb() is needed before log_cache_sync?

I think you're right, I was wrong but thought we had a barrier in cache
sync function.
That's not very important for x86, but I think it should be preferable 
to do it in vhost_log_cache_sync(), if logging is enabled.

What do you think?

>>   	vhost_log_cache_sync(dev, vq);
>>   }
>> -- 
>> 2.17.2
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
  2018-12-20  8:49   ` Maxime Coquelin
@ 2018-12-20  9:27     ` Maxime Coquelin
  0 siblings, 0 replies; 6+ messages in thread
From: Maxime Coquelin @ 2018-12-20  9:27 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: dev, i.maximets, zhihong.wang, jfreiman, mst



On 12/20/18 9:49 AM, Maxime Coquelin wrote:
> 
> 
> On 12/20/18 5:44 AM, Tiwei Bie wrote:
>> On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
>>> Instead of writing back descriptors chains in order, let's
>>> write the first chain flags last in order to improve batching.
>>>
>>> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>>
>>> V2:
>>> Revert back to initial implementation to have a write
>>> barrier before every descs flags store, but still
>>> store first desc flags last. (Missing barrier reported
>>> by Ilya)
>>>
>>>
>>>   lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
>>>   1 file changed, 16 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/virtio_net.c 
>>> b/lib/librte_vhost/virtio_net.c
>>> index 8c657a101..de436af79 100644
>>> --- a/lib/librte_vhost/virtio_net.c
>>> +++ b/lib/librte_vhost/virtio_net.c
>>> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>>   {
>>>       int i;
>>>       uint16_t used_idx = vq->last_used_idx;
>>> +    uint16_t head_idx = vq->last_used_idx;
>>> +    uint16_t head_flags = 0;
>>>       /* Split loop in two to save memory barriers */
>>>       for (i = 0; i < vq->shadow_used_idx; i++) {
>>> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net 
>>> *dev,
>>>               flags &= ~VRING_DESC_F_AVAIL;
>>>           }
>>> -        vq->desc_packed[vq->last_used_idx].flags = flags;
>>> +        if (i > 0) {
>>> +            vq->desc_packed[vq->last_used_idx].flags = flags;
>>> -        vhost_log_cache_used_vring(dev, vq,
>>> +            vhost_log_cache_used_vring(dev, vq,
>>>                       vq->last_used_idx *
>>>                       sizeof(struct vring_packed_desc),
>>>                       sizeof(struct vring_packed_desc));
>>> +        } else {
>>> +            head_idx = vq->last_used_idx;
>>> +            head_flags = flags;
>>> +        }
>>>           vq->last_used_idx += vq->shadow_used_packed[i].count;
>>>           if (vq->last_used_idx >= vq->size) {
>>> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net 
>>> *dev,
>>>           }
>>>       }
>>> -    rte_smp_wmb();
>>> +    vq->desc_packed[head_idx].flags = head_flags;
>>> +
>>> +    vhost_log_cache_used_vring(dev, vq,
>>> +                vq->last_used_idx *
>>
>> Should be head_idx.
> 
> Oh yes, thanks for spotting this.
> 
>>
>>> +                sizeof(struct vring_packed_desc),
>>> +                sizeof(struct vring_packed_desc));
>>> +
>>>       vq->shadow_used_idx = 0;
>>
>> A wmb() is needed before log_cache_sync?
> 
> I think you're right, I was wrong but thought we had a barrier in cache
> sync function.
> That's not very important for x86, but I think it should be preferable 
> to do it in vhost_log_cache_sync(), if logging is enabled.
> 
> What do you think?

I'll keep it in this function for now, as I think we cannot remove the
one in the split variant so it would mean having two barriers in that
case.

>>>       vhost_log_cache_sync(dev, vq);
>>>   }
>>> -- 
>>> 2.17.2
>>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
  2018-12-20  4:44 ` Tiwei Bie
  2018-12-20  8:49   ` Maxime Coquelin
@ 2018-12-20 14:03   ` Michael S. Tsirkin
  1 sibling, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2018-12-20 14:03 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: Maxime Coquelin, dev, i.maximets, zhihong.wang, jfreiman

On Thu, Dec 20, 2018 at 12:44:46PM +0800, Tiwei Bie wrote:
> On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
> > Instead of writing back descriptors chains in order, let's
> > write the first chain flags last in order to improve batching.
> > 
> > With Kernel's pktgen benchmark, ~3% performance gain is measured.
> > 
> > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > ---
> > 
> > V2:
> > Revert back to initial implementation to have a write
> > barrier before every descs flags store, but still
> > store first desc flags last. (Missing barrier reported
> > by Ilya)
> > 
> > 
> >  lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > index 8c657a101..de436af79 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> >  {
> >  	int i;
> >  	uint16_t used_idx = vq->last_used_idx;
> > +	uint16_t head_idx = vq->last_used_idx;
> > +	uint16_t head_flags = 0;
> >  
> >  	/* Split loop in two to save memory barriers */
> >  	for (i = 0; i < vq->shadow_used_idx; i++) {
> > @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> >  			flags &= ~VRING_DESC_F_AVAIL;
> >  		}
> >  
> > -		vq->desc_packed[vq->last_used_idx].flags = flags;
> > +		if (i > 0) {
> > +			vq->desc_packed[vq->last_used_idx].flags = flags;
> >  
> > -		vhost_log_cache_used_vring(dev, vq,
> > +			vhost_log_cache_used_vring(dev, vq,
> >  					vq->last_used_idx *
> >  					sizeof(struct vring_packed_desc),
> >  					sizeof(struct vring_packed_desc));
> > +		} else {
> > +			head_idx = vq->last_used_idx;
> > +			head_flags = flags;
> > +		}
> >  
> >  		vq->last_used_idx += vq->shadow_used_packed[i].count;
> >  		if (vq->last_used_idx >= vq->size) {
> > @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> >  		}
> >  	}
> >  
> > -	rte_smp_wmb();
> > +	vq->desc_packed[head_idx].flags = head_flags;
> > +
> > +	vhost_log_cache_used_vring(dev, vq,
> > +				vq->last_used_idx *
> 
> Should be head_idx.
> 
> > +				sizeof(struct vring_packed_desc),
> > +				sizeof(struct vring_packed_desc));
> > +
> >  	vq->shadow_used_idx = 0;
> 
> A wmb() is needed before log_cache_sync?
> 
> >  	vhost_log_cache_sync(dev, vq);


Probably smarter to have the wmb in there.
This way we can skip it if not logging.

> >  }
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-12-20 14:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-19  9:29 [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
2018-12-19 16:43 ` Michael S. Tsirkin
2018-12-20  4:44 ` Tiwei Bie
2018-12-20  8:49   ` Maxime Coquelin
2018-12-20  9:27     ` Maxime Coquelin
2018-12-20 14:03   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).