DPDK patches and discussions
 help / color / mirror / Atom feed
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: Ilya Maximets <i.maximets@samsung.com>,
	dev@dpdk.org, tiwei.bie@intel.com, zhihong.wang@intel.com,
	jfreimann@redhat.com, mst@redhat.com
Subject: Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring
Date: Wed, 12 Dec 2018 17:34:31 +0100	[thread overview]
Message-ID: <e91c5b6f-de71-08fd-73a0-c1bde731b26e@redhat.com> (raw)
In-Reply-To: <e5d2fc9b-5d3e-95fc-3365-75a210770323@samsung.com>

Hi Ilya,

On 12/12/18 4:23 PM, Ilya Maximets wrote:
> On 12.12.2018 11:24, Maxime Coquelin wrote:
>> Instead of writing back descriptors chains in order, let's
>> write the first chain flags last in order to improve batching.
>>
>> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++--------------
>>   1 file changed, 24 insertions(+), 15 deletions(-)
>>
> 
> Hi.
> I made some rough testing on my ARMv8 system with this patch and v1 of it.
> Here is the performance difference with current master:
>      v1: +1.1 %
>      v2: -3.6 %
> 
> So, write barriers are quiet heavy in practice.

Thanks for testing it on ARM. Indeed, SMP WMB is heavier on ARM.

To reduce the number of WMBs, I propose to revert back to original
implementation by first writing all .len and .id fields, do the write
barrier then writing all flags by writing first one last:

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5e1a1a727..58a277c53 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -136,6 +136,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
  {
         int i;
         uint16_t used_idx = vq->last_used_idx;
+       uint16_t head_idx = vq->last_used_idx;
+       uint16_t head_flags = 0;

         /* Split loop in two to save memory barriers */
         for (i = 0; i < vq->shadow_used_idx; i++) {
@@ -165,12 +167,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
                         flags &= ~VRING_DESC_F_AVAIL;
                 }

-               vq->desc_packed[vq->last_used_idx].flags = flags;
+               if (i > 0) {
+                       vq->desc_packed[vq->last_used_idx].flags = flags;

-               vhost_log_cache_used_vring(dev, vq,
+                       vhost_log_cache_used_vring(dev, vq,
                                         vq->last_used_idx *
                                         sizeof(struct vring_packed_desc),
                                         sizeof(struct vring_packed_desc));
+               } else {
+                       head_idx = vq->last_used_idx;
+                       head_flags = flags;
+               }

                 vq->last_used_idx += vq->shadow_used_packed[i].count;
                 if (vq->last_used_idx >= vq->size) {
@@ -179,7 +186,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
                 }
         }

-       rte_smp_wmb();
+       vq->desc_packed[head_idx].flags = head_flags;
+
+       vhost_log_cache_used_vring(dev, vq,
+                               vq->last_used_idx *
+                               sizeof(struct vring_packed_desc),
+                               sizeof(struct vring_packed_desc));
+
         vq->shadow_used_idx = 0;
         vhost_log_cache_sync(dev, vq);
  }


> My testcase is the three instances of testpmd on a same host (with v11 from Jens):
> 
>      txonly (virtio_user0) --> fwd mode io (vhost0, vhost1) --> rxonly (virtio_user1)
> 
> Best regards, Ilya Maximets.
> 
>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>> index 5e1a1a727..c0b3d1137 100644
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   			struct vhost_virtqueue *vq)
>>   {
>>   	int i;
>> -	uint16_t used_idx = vq->last_used_idx;
>> +	uint16_t head_flags, head_idx = vq->last_used_idx;
>>   
>> -	/* Split loop in two to save memory barriers */
>> -	for (i = 0; i < vq->shadow_used_idx; i++) {
>> -		vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id;
>> -		vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len;
>> -
>> -		used_idx += vq->shadow_used_packed[i].count;
>> -		if (used_idx >= vq->size)
>> -			used_idx -= vq->size;
>> -	}
>> -
>> -	rte_smp_wmb();
>> +	if (unlikely(vq->shadow_used_idx == 0))
>> +		return;
>>   
>>   	for (i = 0; i < vq->shadow_used_idx; i++) {
>>   		uint16_t flags;
>> @@ -165,12 +156,24 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   			flags &= ~VRING_DESC_F_AVAIL;
>>   		}
>>   
>> -		vq->desc_packed[vq->last_used_idx].flags = flags;
>> +		vq->desc_packed[vq->last_used_idx].id =
>> +			vq->shadow_used_packed[i].id;
>> +		vq->desc_packed[vq->last_used_idx].len =
>> +			vq->shadow_used_packed[i].len;
>> +
>> +		rte_smp_wmb();
>>   
>> -		vhost_log_cache_used_vring(dev, vq,
>> +		if (i > 0) {
>> +			vq->desc_packed[vq->last_used_idx].flags = flags;
>> +
>> +			vhost_log_cache_used_vring(dev, vq,
>>   					vq->last_used_idx *
>>   					sizeof(struct vring_packed_desc),
>>   					sizeof(struct vring_packed_desc));
>> +		} else {
>> +			head_idx = vq->last_used_idx;
>> +			head_flags = flags;
>> +		}
>>   
>>   		vq->last_used_idx += vq->shadow_used_packed[i].count;
>>   		if (vq->last_used_idx >= vq->size) {
>> @@ -179,8 +182,14 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   		}
>>   	}
>>   
>> -	rte_smp_wmb();
>> +	vq->desc_packed[head_idx].flags = head_flags;
>>   	vq->shadow_used_idx = 0;
>> +
>> +	vhost_log_cache_used_vring(dev, vq,
>> +				head_idx *
>> +				sizeof(struct vring_packed_desc),
>> +				sizeof(struct vring_packed_desc));
>> +
>>   	vhost_log_cache_sync(dev, vq);
>>   }
>>   
>>

  reply	other threads:[~2018-12-12 16:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20181212082505epcas4p36ea3086cfd223aae70a925e946fced48@epcas4p3.samsung.com>
2018-12-12  8:24 ` Maxime Coquelin
2018-12-12  9:41   ` Maxime Coquelin
2018-12-12 15:23   ` Ilya Maximets
2018-12-12 16:34     ` Maxime Coquelin [this message]
2018-12-12 18:53       ` Michael S. Tsirkin
2018-12-19  9:16         ` Maxime Coquelin
2018-12-19 16:10           ` Michael S. Tsirkin
2018-11-28  9:47 Maxime Coquelin
2018-11-28 10:05 ` Jens Freimann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e91c5b6f-de71-08fd-73a0-c1bde731b26e@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=dev@dpdk.org \
    --cc=i.maximets@samsung.com \
    --cc=jfreimann@redhat.com \
    --cc=mst@redhat.com \
    --cc=tiwei.bie@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).