* [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
@ 2018-12-19 9:29 Maxime Coquelin
2018-12-19 16:43 ` Michael S. Tsirkin
2018-12-20 4:44 ` Tiwei Bie
0 siblings, 2 replies; 6+ messages in thread
From: Maxime Coquelin @ 2018-12-19 9:29 UTC (permalink / raw)
To: dev, i.maximets, tiwei.bie, zhihong.wang, jfreiman, mst; +Cc: Maxime Coquelin
Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.
With Kernel's pktgen benchmark, ~3% performance gain is measured.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
V2:
Revert back to initial implementation to have a write
barrier before every descs flags store, but still
store first desc flags last. (Missing barrier reported
by Ilya)
lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8c657a101..de436af79 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
{
int i;
uint16_t used_idx = vq->last_used_idx;
+ uint16_t head_idx = vq->last_used_idx;
+ uint16_t head_flags = 0;
/* Split loop in two to save memory barriers */
for (i = 0; i < vq->shadow_used_idx; i++) {
@@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
flags &= ~VRING_DESC_F_AVAIL;
}
- vq->desc_packed[vq->last_used_idx].flags = flags;
+ if (i > 0) {
+ vq->desc_packed[vq->last_used_idx].flags = flags;
- vhost_log_cache_used_vring(dev, vq,
+ vhost_log_cache_used_vring(dev, vq,
vq->last_used_idx *
sizeof(struct vring_packed_desc),
sizeof(struct vring_packed_desc));
+ } else {
+ head_idx = vq->last_used_idx;
+ head_flags = flags;
+ }
vq->last_used_idx += vq->shadow_used_packed[i].count;
if (vq->last_used_idx >= vq->size) {
@@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
}
}
- rte_smp_wmb();
+ vq->desc_packed[head_idx].flags = head_flags;
+
+ vhost_log_cache_used_vring(dev, vq,
+ vq->last_used_idx *
+ sizeof(struct vring_packed_desc),
+ sizeof(struct vring_packed_desc));
+
vq->shadow_used_idx = 0;
vhost_log_cache_sync(dev, vq);
}
--
2.17.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
2018-12-19 9:29 [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
@ 2018-12-19 16:43 ` Michael S. Tsirkin
2018-12-20 4:44 ` Tiwei Bie
1 sibling, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2018-12-19 16:43 UTC (permalink / raw)
To: Maxime Coquelin; +Cc: dev, i.maximets, tiwei.bie, zhihong.wang, jfreiman
On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
> Instead of writing back descriptors chains in order, let's
> write the first chain flags last in order to improve batching.
>
> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> V2:
> Revert back to initial implementation to have a write
> barrier before every descs flags store, but still
> store first desc flags last. (Missing barrier reported
> by Ilya)
>
>
> lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 8c657a101..de436af79 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> {
> int i;
> uint16_t used_idx = vq->last_used_idx;
> + uint16_t head_idx = vq->last_used_idx;
> + uint16_t head_flags = 0;
>
> /* Split loop in two to save memory barriers */
> for (i = 0; i < vq->shadow_used_idx; i++) {
> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> flags &= ~VRING_DESC_F_AVAIL;
> }
>
> - vq->desc_packed[vq->last_used_idx].flags = flags;
> + if (i > 0) {
> + vq->desc_packed[vq->last_used_idx].flags = flags;
>
> - vhost_log_cache_used_vring(dev, vq,
> + vhost_log_cache_used_vring(dev, vq,
> vq->last_used_idx *
> sizeof(struct vring_packed_desc),
> sizeof(struct vring_packed_desc));
> + } else {
> + head_idx = vq->last_used_idx;
> + head_flags = flags;
> + }
>
> vq->last_used_idx += vq->shadow_used_packed[i].count;
> if (vq->last_used_idx >= vq->size) {
> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> }
> }
>
> - rte_smp_wmb();
> + vq->desc_packed[head_idx].flags = head_flags;
> +
> + vhost_log_cache_used_vring(dev, vq,
> + vq->last_used_idx *
> + sizeof(struct vring_packed_desc),
> + sizeof(struct vring_packed_desc));
> +
> vq->shadow_used_idx = 0;
> vhost_log_cache_sync(dev, vq);
> }
> --
> 2.17.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
2018-12-19 9:29 [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
2018-12-19 16:43 ` Michael S. Tsirkin
@ 2018-12-20 4:44 ` Tiwei Bie
2018-12-20 8:49 ` Maxime Coquelin
2018-12-20 14:03 ` Michael S. Tsirkin
1 sibling, 2 replies; 6+ messages in thread
From: Tiwei Bie @ 2018-12-20 4:44 UTC (permalink / raw)
To: Maxime Coquelin; +Cc: dev, i.maximets, zhihong.wang, jfreiman, mst
On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
> Instead of writing back descriptors chains in order, let's
> write the first chain flags last in order to improve batching.
>
> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>
> V2:
> Revert back to initial implementation to have a write
> barrier before every descs flags store, but still
> store first desc flags last. (Missing barrier reported
> by Ilya)
>
>
> lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 8c657a101..de436af79 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> {
> int i;
> uint16_t used_idx = vq->last_used_idx;
> + uint16_t head_idx = vq->last_used_idx;
> + uint16_t head_flags = 0;
>
> /* Split loop in two to save memory barriers */
> for (i = 0; i < vq->shadow_used_idx; i++) {
> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> flags &= ~VRING_DESC_F_AVAIL;
> }
>
> - vq->desc_packed[vq->last_used_idx].flags = flags;
> + if (i > 0) {
> + vq->desc_packed[vq->last_used_idx].flags = flags;
>
> - vhost_log_cache_used_vring(dev, vq,
> + vhost_log_cache_used_vring(dev, vq,
> vq->last_used_idx *
> sizeof(struct vring_packed_desc),
> sizeof(struct vring_packed_desc));
> + } else {
> + head_idx = vq->last_used_idx;
> + head_flags = flags;
> + }
>
> vq->last_used_idx += vq->shadow_used_packed[i].count;
> if (vq->last_used_idx >= vq->size) {
> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> }
> }
>
> - rte_smp_wmb();
> + vq->desc_packed[head_idx].flags = head_flags;
> +
> + vhost_log_cache_used_vring(dev, vq,
> + vq->last_used_idx *
Should be head_idx.
> + sizeof(struct vring_packed_desc),
> + sizeof(struct vring_packed_desc));
> +
> vq->shadow_used_idx = 0;
A wmb() is needed before log_cache_sync?
> vhost_log_cache_sync(dev, vq);
> }
> --
> 2.17.2
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
2018-12-20 4:44 ` Tiwei Bie
@ 2018-12-20 8:49 ` Maxime Coquelin
2018-12-20 9:27 ` Maxime Coquelin
2018-12-20 14:03 ` Michael S. Tsirkin
1 sibling, 1 reply; 6+ messages in thread
From: Maxime Coquelin @ 2018-12-20 8:49 UTC (permalink / raw)
To: Tiwei Bie; +Cc: dev, i.maximets, zhihong.wang, jfreiman, mst
On 12/20/18 5:44 AM, Tiwei Bie wrote:
> On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
>> Instead of writing back descriptors chains in order, let's
>> write the first chain flags last in order to improve batching.
>>
>> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>
>> V2:
>> Revert back to initial implementation to have a write
>> barrier before every descs flags store, but still
>> store first desc flags last. (Missing barrier reported
>> by Ilya)
>>
>>
>> lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
>> 1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>> index 8c657a101..de436af79 100644
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>> {
>> int i;
>> uint16_t used_idx = vq->last_used_idx;
>> + uint16_t head_idx = vq->last_used_idx;
>> + uint16_t head_flags = 0;
>>
>> /* Split loop in two to save memory barriers */
>> for (i = 0; i < vq->shadow_used_idx; i++) {
>> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>> flags &= ~VRING_DESC_F_AVAIL;
>> }
>>
>> - vq->desc_packed[vq->last_used_idx].flags = flags;
>> + if (i > 0) {
>> + vq->desc_packed[vq->last_used_idx].flags = flags;
>>
>> - vhost_log_cache_used_vring(dev, vq,
>> + vhost_log_cache_used_vring(dev, vq,
>> vq->last_used_idx *
>> sizeof(struct vring_packed_desc),
>> sizeof(struct vring_packed_desc));
>> + } else {
>> + head_idx = vq->last_used_idx;
>> + head_flags = flags;
>> + }
>>
>> vq->last_used_idx += vq->shadow_used_packed[i].count;
>> if (vq->last_used_idx >= vq->size) {
>> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>> }
>> }
>>
>> - rte_smp_wmb();
>> + vq->desc_packed[head_idx].flags = head_flags;
>> +
>> + vhost_log_cache_used_vring(dev, vq,
>> + vq->last_used_idx *
>
> Should be head_idx.
Oh yes, thanks for spotting this.
>
>> + sizeof(struct vring_packed_desc),
>> + sizeof(struct vring_packed_desc));
>> +
>> vq->shadow_used_idx = 0;
>
> A wmb() is needed before log_cache_sync?
I think you're right, I was wrong but thought we had a barrier in cache
sync function.
That's not very important for x86, but I think it should be preferable
to do it in vhost_log_cache_sync(), if logging is enabled.
What do you think?
>> vhost_log_cache_sync(dev, vq);
>> }
>> --
>> 2.17.2
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
2018-12-20 8:49 ` Maxime Coquelin
@ 2018-12-20 9:27 ` Maxime Coquelin
0 siblings, 0 replies; 6+ messages in thread
From: Maxime Coquelin @ 2018-12-20 9:27 UTC (permalink / raw)
To: Tiwei Bie; +Cc: dev, i.maximets, zhihong.wang, jfreiman, mst
On 12/20/18 9:49 AM, Maxime Coquelin wrote:
>
>
> On 12/20/18 5:44 AM, Tiwei Bie wrote:
>> On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
>>> Instead of writing back descriptors chains in order, let's
>>> write the first chain flags last in order to improve batching.
>>>
>>> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>>
>>> V2:
>>> Revert back to initial implementation to have a write
>>> barrier before every descs flags store, but still
>>> store first desc flags last. (Missing barrier reported
>>> by Ilya)
>>>
>>>
>>> lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
>>> 1 file changed, 16 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/virtio_net.c
>>> b/lib/librte_vhost/virtio_net.c
>>> index 8c657a101..de436af79 100644
>>> --- a/lib/librte_vhost/virtio_net.c
>>> +++ b/lib/librte_vhost/virtio_net.c
>>> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>> {
>>> int i;
>>> uint16_t used_idx = vq->last_used_idx;
>>> + uint16_t head_idx = vq->last_used_idx;
>>> + uint16_t head_flags = 0;
>>> /* Split loop in two to save memory barriers */
>>> for (i = 0; i < vq->shadow_used_idx; i++) {
>>> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net
>>> *dev,
>>> flags &= ~VRING_DESC_F_AVAIL;
>>> }
>>> - vq->desc_packed[vq->last_used_idx].flags = flags;
>>> + if (i > 0) {
>>> + vq->desc_packed[vq->last_used_idx].flags = flags;
>>> - vhost_log_cache_used_vring(dev, vq,
>>> + vhost_log_cache_used_vring(dev, vq,
>>> vq->last_used_idx *
>>> sizeof(struct vring_packed_desc),
>>> sizeof(struct vring_packed_desc));
>>> + } else {
>>> + head_idx = vq->last_used_idx;
>>> + head_flags = flags;
>>> + }
>>> vq->last_used_idx += vq->shadow_used_packed[i].count;
>>> if (vq->last_used_idx >= vq->size) {
>>> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net
>>> *dev,
>>> }
>>> }
>>> - rte_smp_wmb();
>>> + vq->desc_packed[head_idx].flags = head_flags;
>>> +
>>> + vhost_log_cache_used_vring(dev, vq,
>>> + vq->last_used_idx *
>>
>> Should be head_idx.
>
> Oh yes, thanks for spotting this.
>
>>
>>> + sizeof(struct vring_packed_desc),
>>> + sizeof(struct vring_packed_desc));
>>> +
>>> vq->shadow_used_idx = 0;
>>
>> A wmb() is needed before log_cache_sync?
>
> I think you're right, I was wrong but thought we had a barrier in cache
> sync function.
> That's not very important for x86, but I think it should be preferable
> to do it in vhost_log_cache_sync(), if logging is enabled.
>
> What do you think?
I'll keep it in this function for now, as I think we cannot remove the
one in the split variant so it would mean having two barriers in that
case.
>>> vhost_log_cache_sync(dev, vq);
>>> }
>>> --
>>> 2.17.2
>>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring
2018-12-20 4:44 ` Tiwei Bie
2018-12-20 8:49 ` Maxime Coquelin
@ 2018-12-20 14:03 ` Michael S. Tsirkin
1 sibling, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2018-12-20 14:03 UTC (permalink / raw)
To: Tiwei Bie; +Cc: Maxime Coquelin, dev, i.maximets, zhihong.wang, jfreiman
On Thu, Dec 20, 2018 at 12:44:46PM +0800, Tiwei Bie wrote:
> On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
> > Instead of writing back descriptors chains in order, let's
> > write the first chain flags last in order to improve batching.
> >
> > With Kernel's pktgen benchmark, ~3% performance gain is measured.
> >
> > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > ---
> >
> > V2:
> > Revert back to initial implementation to have a write
> > barrier before every descs flags store, but still
> > store first desc flags last. (Missing barrier reported
> > by Ilya)
> >
> >
> > lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
> > 1 file changed, 16 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > index 8c657a101..de436af79 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> > {
> > int i;
> > uint16_t used_idx = vq->last_used_idx;
> > + uint16_t head_idx = vq->last_used_idx;
> > + uint16_t head_flags = 0;
> >
> > /* Split loop in two to save memory barriers */
> > for (i = 0; i < vq->shadow_used_idx; i++) {
> > @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> > flags &= ~VRING_DESC_F_AVAIL;
> > }
> >
> > - vq->desc_packed[vq->last_used_idx].flags = flags;
> > + if (i > 0) {
> > + vq->desc_packed[vq->last_used_idx].flags = flags;
> >
> > - vhost_log_cache_used_vring(dev, vq,
> > + vhost_log_cache_used_vring(dev, vq,
> > vq->last_used_idx *
> > sizeof(struct vring_packed_desc),
> > sizeof(struct vring_packed_desc));
> > + } else {
> > + head_idx = vq->last_used_idx;
> > + head_flags = flags;
> > + }
> >
> > vq->last_used_idx += vq->shadow_used_packed[i].count;
> > if (vq->last_used_idx >= vq->size) {
> > @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> > }
> > }
> >
> > - rte_smp_wmb();
> > + vq->desc_packed[head_idx].flags = head_flags;
> > +
> > + vhost_log_cache_used_vring(dev, vq,
> > + vq->last_used_idx *
>
> Should be head_idx.
>
> > + sizeof(struct vring_packed_desc),
> > + sizeof(struct vring_packed_desc));
> > +
> > vq->shadow_used_idx = 0;
>
> A wmb() is needed before log_cache_sync?
>
> > vhost_log_cache_sync(dev, vq);
Probably smarter to have the wmb in there.
This way we can skip it if not logging.
> > }
> > --
> > 2.17.2
> >
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-20 14:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-19 9:29 [dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
2018-12-19 16:43 ` Michael S. Tsirkin
2018-12-20 4:44 ` Tiwei Bie
2018-12-20 8:49 ` Maxime Coquelin
2018-12-20 9:27 ` Maxime Coquelin
2018-12-20 14:03 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).