* [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring
@ 2018-11-28 9:47 Maxime Coquelin
2018-11-28 10:05 ` Jens Freimann
[not found] ` <CGME20181205160124eucas1p1470e3dc9afe8e59ceab54a58140cf400@eucas1p1.samsung.com>
0 siblings, 2 replies; 5+ messages in thread
From: Maxime Coquelin @ 2018-11-28 9:47 UTC (permalink / raw)
To: dev, tiwei.bie, zhihong.wang, jfreimann; +Cc: Maxime Coquelin
Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.
With Kernel's pktgen benchmark, ~3% performance gain is measured.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
lib/librte_vhost/virtio_net.c | 37 ++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 14 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5e1a1a727..f54642c2d 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
struct vhost_virtqueue *vq)
{
int i;
- uint16_t used_idx = vq->last_used_idx;
+ uint16_t head_flags, head_idx = vq->last_used_idx;
- /* Split loop in two to save memory barriers */
- for (i = 0; i < vq->shadow_used_idx; i++) {
- vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id;
- vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len;
-
- used_idx += vq->shadow_used_packed[i].count;
- if (used_idx >= vq->size)
- used_idx -= vq->size;
- }
-
- rte_smp_wmb();
+ if (unlikely(vq->shadow_used_idx == 0))
+ return;
for (i = 0; i < vq->shadow_used_idx; i++) {
uint16_t flags;
@@ -165,12 +156,22 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
flags &= ~VRING_DESC_F_AVAIL;
}
- vq->desc_packed[vq->last_used_idx].flags = flags;
+ vq->desc_packed[vq->last_used_idx].id =
+ vq->shadow_used_packed[i].id;
+ vq->desc_packed[vq->last_used_idx].len =
+ vq->shadow_used_packed[i].len;
+
+ if (i > 0) {
+ vq->desc_packed[vq->last_used_idx].flags = flags;
- vhost_log_cache_used_vring(dev, vq,
+ vhost_log_cache_used_vring(dev, vq,
vq->last_used_idx *
sizeof(struct vring_packed_desc),
sizeof(struct vring_packed_desc));
+ } else {
+ head_idx = vq->last_used_idx;
+ head_flags = flags;
+ }
vq->last_used_idx += vq->shadow_used_packed[i].count;
if (vq->last_used_idx >= vq->size) {
@@ -180,7 +181,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
}
rte_smp_wmb();
+
+ vq->desc_packed[head_idx].flags = head_flags;
vq->shadow_used_idx = 0;
+
+ vhost_log_cache_used_vring(dev, vq,
+ head_idx *
+ sizeof(struct vring_packed_desc),
+ sizeof(struct vring_packed_desc));
+
vhost_log_cache_sync(dev, vq);
}
--
2.17.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring
2018-11-28 9:47 [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring Maxime Coquelin
@ 2018-11-28 10:05 ` Jens Freimann
[not found] ` <CGME20181205160124eucas1p1470e3dc9afe8e59ceab54a58140cf400@eucas1p1.samsung.com>
1 sibling, 0 replies; 5+ messages in thread
From: Jens Freimann @ 2018-11-28 10:05 UTC (permalink / raw)
To: Maxime Coquelin; +Cc: dev, tiwei.bie, zhihong.wang
On Wed, Nov 28, 2018 at 10:47:00AM +0100, Maxime Coquelin wrote:
>Instead of writing back descriptors chains in order, let's
>write the first chain flags last in order to improve batching.
>
>With Kernel's pktgen benchmark, ~3% performance gain is measured.
>
>Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>---
> lib/librte_vhost/virtio_net.c | 37 ++++++++++++++++++++++-------------
> 1 file changed, 23 insertions(+), 14 deletions(-)
>
Tested-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Jens Freimann <jfreimann@redhat.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CGME20181205160124eucas1p1470e3dc9afe8e59ceab54a58140cf400@eucas1p1.samsung.com>]
* Re: [dpdk-dev] vhost: batch used descriptors chains write-back with packed ring
[not found] ` <CGME20181205160124eucas1p1470e3dc9afe8e59ceab54a58140cf400@eucas1p1.samsung.com>
@ 2018-12-05 16:01 ` Ilya Maximets
2018-12-06 0:56 ` Michael S. Tsirkin
2018-12-06 17:10 ` Maxime Coquelin
0 siblings, 2 replies; 5+ messages in thread
From: Ilya Maximets @ 2018-12-05 16:01 UTC (permalink / raw)
To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann
Cc: Michael S. Tsirkin
On 28.11.2018 12:47, Maxime Coquelin wrote:
> Instead of writing back descriptors chains in order, let's
> write the first chain flags last in order to improve batching.
I'm not sure if this fully compliant with virtio spec.
It says that 'each side (driver and device) are only required to poll
(or test) a single location in memory', but it does not forbid to
test other descriptors. So, if the driver will try to check not only
'the next device descriptor after the one they processed previously,
in circular order' but a few descriptors ahead, it could read an
inconsistent memory because there are no more write barriers between
updates for flags and id/len for them.
What do you think ?
>
> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Tested-by: Jens Freimann <jfreimann@redhat.com>
> Reviewed-by: Jens Freimann <jfreimann@redhat.com>
> ---
> lib/librte_vhost/virtio_net.c | 37 ++++++++++++++++++++++-------------
> 1 file changed, 23 insertions(+), 14 deletions(-)
>
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 5e1a1a727..f54642c2d 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq)
> {
> int i;
> - uint16_t used_idx = vq->last_used_idx;
> + uint16_t head_flags, head_idx = vq->last_used_idx;
>
> - /* Split loop in two to save memory barriers */
> - for (i = 0; i < vq->shadow_used_idx; i++) {
> - vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id;
> - vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len;
> -
> - used_idx += vq->shadow_used_packed[i].count;
> - if (used_idx >= vq->size)
> - used_idx -= vq->size;
> - }
> -
> - rte_smp_wmb();
> + if (unlikely(vq->shadow_used_idx == 0))
> + return;
>
> for (i = 0; i < vq->shadow_used_idx; i++) {
> uint16_t flags;
> @@ -165,12 +156,22 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> flags &= ~VRING_DESC_F_AVAIL;
> }
>
> - vq->desc_packed[vq->last_used_idx].flags = flags;
> + vq->desc_packed[vq->last_used_idx].id =
> + vq->shadow_used_packed[i].id;
> + vq->desc_packed[vq->last_used_idx].len =
> + vq->shadow_used_packed[i].len;
> +
> + if (i > 0) {
> + vq->desc_packed[vq->last_used_idx].flags = flags;
>
> - vhost_log_cache_used_vring(dev, vq,
> + vhost_log_cache_used_vring(dev, vq,
> vq->last_used_idx *
> sizeof(struct vring_packed_desc),
> sizeof(struct vring_packed_desc));
> + } else {
> + head_idx = vq->last_used_idx;
> + head_flags = flags;
> + }
>
> vq->last_used_idx += vq->shadow_used_packed[i].count;
> if (vq->last_used_idx >= vq->size) {
> @@ -180,7 +181,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> }
>
> rte_smp_wmb();
> +
> + vq->desc_packed[head_idx].flags = head_flags;
> vq->shadow_used_idx = 0;
> +
> + vhost_log_cache_used_vring(dev, vq,
> + head_idx *
> + sizeof(struct vring_packed_desc),
> + sizeof(struct vring_packed_desc));
> +
> vhost_log_cache_sync(dev, vq);
> }
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] vhost: batch used descriptors chains write-back with packed ring
2018-12-05 16:01 ` [dpdk-dev] " Ilya Maximets
@ 2018-12-06 0:56 ` Michael S. Tsirkin
2018-12-06 17:10 ` Maxime Coquelin
1 sibling, 0 replies; 5+ messages in thread
From: Michael S. Tsirkin @ 2018-12-06 0:56 UTC (permalink / raw)
To: Ilya Maximets; +Cc: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann
On Wed, Dec 05, 2018 at 07:01:23PM +0300, Ilya Maximets wrote:
> On 28.11.2018 12:47, Maxime Coquelin wrote:
> > Instead of writing back descriptors chains in order, let's
> > write the first chain flags last in order to improve batching.
>
> I'm not sure if this fully compliant with virtio spec.
> It says that 'each side (driver and device) are only required to poll
> (or test) a single location in memory', but it does not forbid to
> test other descriptors. So, if the driver will try to check not only
> 'the next device descriptor after the one they processed previously,
> in circular order' but a few descriptors ahead, it could read an
> inconsistent memory because there are no more write barriers between
> updates for flags and id/len for them.
>
> What do you think ?
Write barriers for SMP effects are quite cheap on most architectures.
So adding them before each flag write is probably not a big deal.
> >
> > With Kernel's pktgen benchmark, ~3% performance gain is measured.
> >
> > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Tested-by: Jens Freimann <jfreimann@redhat.com>
> > Reviewed-by: Jens Freimann <jfreimann@redhat.com>
> > ---
> > lib/librte_vhost/virtio_net.c | 37 ++++++++++++++++++++++-------------
> > 1 file changed, 23 insertions(+), 14 deletions(-)
> >
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > index 5e1a1a727..f54642c2d 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> > struct vhost_virtqueue *vq)
> > {
> > int i;
> > - uint16_t used_idx = vq->last_used_idx;
> > + uint16_t head_flags, head_idx = vq->last_used_idx;
> >
> > - /* Split loop in two to save memory barriers */
> > - for (i = 0; i < vq->shadow_used_idx; i++) {
> > - vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id;
> > - vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len;
> > -
> > - used_idx += vq->shadow_used_packed[i].count;
> > - if (used_idx >= vq->size)
> > - used_idx -= vq->size;
> > - }
> > -
> > - rte_smp_wmb();
> > + if (unlikely(vq->shadow_used_idx == 0))
> > + return;
> >
> > for (i = 0; i < vq->shadow_used_idx; i++) {
> > uint16_t flags;
> > @@ -165,12 +156,22 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> > flags &= ~VRING_DESC_F_AVAIL;
> > }
> >
> > - vq->desc_packed[vq->last_used_idx].flags = flags;
> > + vq->desc_packed[vq->last_used_idx].id =
> > + vq->shadow_used_packed[i].id;
> > + vq->desc_packed[vq->last_used_idx].len =
> > + vq->shadow_used_packed[i].len;
> > +
> > + if (i > 0) {
Specifically here?
> > + vq->desc_packed[vq->last_used_idx].flags = flags;
> >
> > - vhost_log_cache_used_vring(dev, vq,
> > + vhost_log_cache_used_vring(dev, vq,
> > vq->last_used_idx *
> > sizeof(struct vring_packed_desc),
> > sizeof(struct vring_packed_desc));
> > + } else {
> > + head_idx = vq->last_used_idx;
> > + head_flags = flags;
> > + }
> >
> > vq->last_used_idx += vq->shadow_used_packed[i].count;
> > if (vq->last_used_idx >= vq->size) {
> > @@ -180,7 +181,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
> > }
> >
> > rte_smp_wmb();
> > +
> > + vq->desc_packed[head_idx].flags = head_flags;
> > vq->shadow_used_idx = 0;
> > +
> > + vhost_log_cache_used_vring(dev, vq,
> > + head_idx *
> > + sizeof(struct vring_packed_desc),
> > + sizeof(struct vring_packed_desc));
> > +
> > vhost_log_cache_sync(dev, vq);
> > }
> >
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] vhost: batch used descriptors chains write-back with packed ring
2018-12-05 16:01 ` [dpdk-dev] " Ilya Maximets
2018-12-06 0:56 ` Michael S. Tsirkin
@ 2018-12-06 17:10 ` Maxime Coquelin
1 sibling, 0 replies; 5+ messages in thread
From: Maxime Coquelin @ 2018-12-06 17:10 UTC (permalink / raw)
To: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann; +Cc: Michael S. Tsirkin
On 12/5/18 5:01 PM, Ilya Maximets wrote:
> On 28.11.2018 12:47, Maxime Coquelin wrote:
>> Instead of writing back descriptors chains in order, let's
>> write the first chain flags last in order to improve batching.
>
> I'm not sure if this fully compliant with virtio spec.
> It says that 'each side (driver and device) are only required to poll
> (or test) a single location in memory', but it does not forbid to
> test other descriptors. So, if the driver will try to check not only
> 'the next device descriptor after the one they processed previously,
> in circular order' but a few descriptors ahead, it could read an
> inconsistent memory because there are no more write barriers between
> updates for flags and id/len for them.
>
> What do you think ?
Yes, that makes sense.
It should have no cost on x86 moreover.
I'll fix it in v2.
Thanks,
Maxime
>>
>> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Tested-by: Jens Freimann <jfreimann@redhat.com>
>> Reviewed-by: Jens Freimann <jfreimann@redhat.com>
>> ---
>> lib/librte_vhost/virtio_net.c | 37 ++++++++++++++++++++++-------------
>> 1 file changed, 23 insertions(+), 14 deletions(-)
>>
>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>> index 5e1a1a727..f54642c2d 100644
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>> struct vhost_virtqueue *vq)
>> {
>> int i;
>> - uint16_t used_idx = vq->last_used_idx;
>> + uint16_t head_flags, head_idx = vq->last_used_idx;
>>
>> - /* Split loop in two to save memory barriers */
>> - for (i = 0; i < vq->shadow_used_idx; i++) {
>> - vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id;
>> - vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len;
>> -
>> - used_idx += vq->shadow_used_packed[i].count;
>> - if (used_idx >= vq->size)
>> - used_idx -= vq->size;
>> - }
>> -
>> - rte_smp_wmb();
>> + if (unlikely(vq->shadow_used_idx == 0))
>> + return;
>>
>> for (i = 0; i < vq->shadow_used_idx; i++) {
>> uint16_t flags;
>> @@ -165,12 +156,22 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>> flags &= ~VRING_DESC_F_AVAIL;
>> }
>>
>> - vq->desc_packed[vq->last_used_idx].flags = flags;
>> + vq->desc_packed[vq->last_used_idx].id =
>> + vq->shadow_used_packed[i].id;
>> + vq->desc_packed[vq->last_used_idx].len =
>> + vq->shadow_used_packed[i].len;
>> +
>> + if (i > 0) {
>> + vq->desc_packed[vq->last_used_idx].flags = flags;
>>
>> - vhost_log_cache_used_vring(dev, vq,
>> + vhost_log_cache_used_vring(dev, vq,
>> vq->last_used_idx *
>> sizeof(struct vring_packed_desc),
>> sizeof(struct vring_packed_desc));
>> + } else {
>> + head_idx = vq->last_used_idx;
>> + head_flags = flags;
>> + }
>>
>> vq->last_used_idx += vq->shadow_used_packed[i].count;
>> if (vq->last_used_idx >= vq->size) {
>> @@ -180,7 +181,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>> }
>>
>> rte_smp_wmb();
>> +
>> + vq->desc_packed[head_idx].flags = head_flags;
>> vq->shadow_used_idx = 0;
>> +
>> + vhost_log_cache_used_vring(dev, vq,
>> + head_idx *
>> + sizeof(struct vring_packed_desc),
>> + sizeof(struct vring_packed_desc));
>> +
>> vhost_log_cache_sync(dev, vq);
>> }
>>
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-12-06 17:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-28 9:47 [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring Maxime Coquelin
2018-11-28 10:05 ` Jens Freimann
[not found] ` <CGME20181205160124eucas1p1470e3dc9afe8e59ceab54a58140cf400@eucas1p1.samsung.com>
2018-12-05 16:01 ` [dpdk-dev] " Ilya Maximets
2018-12-06 0:56 ` Michael S. Tsirkin
2018-12-06 17:10 ` Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).