* [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring @ 2018-12-12 8:24 ` Maxime Coquelin 2018-12-12 9:41 ` Maxime Coquelin 2018-12-12 15:23 ` Ilya Maximets 0 siblings, 2 replies; 9+ messages in thread From: Maxime Coquelin @ 2018-12-12 8:24 UTC (permalink / raw) To: dev, i.maximets, tiwei.bie, zhihong.wang, jfreimann, mst; +Cc: Maxime Coquelin Instead of writing back descriptors chains in order, let's write the first chain flags last in order to improve batching. With Kernel's pktgen benchmark, ~3% performance gain is measured. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5e1a1a727..c0b3d1137 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) { int i; - uint16_t used_idx = vq->last_used_idx; + uint16_t head_flags, head_idx = vq->last_used_idx; - /* Split loop in two to save memory barriers */ - for (i = 0; i < vq->shadow_used_idx; i++) { - vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id; - vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len; - - used_idx += vq->shadow_used_packed[i].count; - if (used_idx >= vq->size) - used_idx -= vq->size; - } - - rte_smp_wmb(); + if (unlikely(vq->shadow_used_idx == 0)) + return; for (i = 0; i < vq->shadow_used_idx; i++) { uint16_t flags; @@ -165,12 +156,24 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, flags &= ~VRING_DESC_F_AVAIL; } - vq->desc_packed[vq->last_used_idx].flags = flags; + vq->desc_packed[vq->last_used_idx].id = + vq->shadow_used_packed[i].id; + vq->desc_packed[vq->last_used_idx].len = + vq->shadow_used_packed[i].len; + + rte_smp_wmb(); - vhost_log_cache_used_vring(dev, vq, + if (i > 0) { + vq->desc_packed[vq->last_used_idx].flags = flags; + + vhost_log_cache_used_vring(dev, vq, vq->last_used_idx * sizeof(struct vring_packed_desc), sizeof(struct vring_packed_desc)); + } else { + head_idx = vq->last_used_idx; + head_flags = flags; + } vq->last_used_idx += vq->shadow_used_packed[i].count; if (vq->last_used_idx >= vq->size) { @@ -179,8 +182,14 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, } } - rte_smp_wmb(); + vq->desc_packed[head_idx].flags = head_flags; vq->shadow_used_idx = 0; + + vhost_log_cache_used_vring(dev, vq, + head_idx * + sizeof(struct vring_packed_desc), + sizeof(struct vring_packed_desc)); + vhost_log_cache_sync(dev, vq); } -- 2.17.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring 2018-12-12 8:24 ` [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring Maxime Coquelin @ 2018-12-12 9:41 ` Maxime Coquelin 2018-12-12 15:23 ` Ilya Maximets 1 sibling, 0 replies; 9+ messages in thread From: Maxime Coquelin @ 2018-12-12 9:41 UTC (permalink / raw) To: dev, i.maximets, tiwei.bie, zhihong.wang, jfreimann, mst Sorry, just notice I missed to add v2 to the commit message prefix. On 12/12/18 9:24 AM, Maxime Coquelin wrote: > Instead of writing back descriptors chains in order, let's > write the first chain flags last in order to improve batching. > > With Kernel's pktgen benchmark, ~3% performance gain is measured. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++-------------- > 1 file changed, 24 insertions(+), 15 deletions(-) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring 2018-12-12 8:24 ` [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring Maxime Coquelin 2018-12-12 9:41 ` Maxime Coquelin @ 2018-12-12 15:23 ` Ilya Maximets 2018-12-12 16:34 ` Maxime Coquelin 1 sibling, 1 reply; 9+ messages in thread From: Ilya Maximets @ 2018-12-12 15:23 UTC (permalink / raw) To: Maxime Coquelin, dev, tiwei.bie, zhihong.wang, jfreimann, mst On 12.12.2018 11:24, Maxime Coquelin wrote: > Instead of writing back descriptors chains in order, let's > write the first chain flags last in order to improve batching. > > With Kernel's pktgen benchmark, ~3% performance gain is measured. > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++-------------- > 1 file changed, 24 insertions(+), 15 deletions(-) > Hi. I made some rough testing on my ARMv8 system with this patch and v1 of it. Here is the performance difference with current master: v1: +1.1 % v2: -3.6 % So, write barriers are quiet heavy in practice. My testcase is the three instances of testpmd on a same host (with v11 from Jens): txonly (virtio_user0) --> fwd mode io (vhost0, vhost1) --> rxonly (virtio_user1) Best regards, Ilya Maximets. > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > index 5e1a1a727..c0b3d1137 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, > struct vhost_virtqueue *vq) > { > int i; > - uint16_t used_idx = vq->last_used_idx; > + uint16_t head_flags, head_idx = vq->last_used_idx; > > - /* Split loop in two to save memory barriers */ > - for (i = 0; i < vq->shadow_used_idx; i++) { > - vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id; > - vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len; > - > - used_idx += vq->shadow_used_packed[i].count; > - if (used_idx >= vq->size) > - used_idx -= vq->size; > - } > - > - rte_smp_wmb(); > + if (unlikely(vq->shadow_used_idx == 0)) > + return; > > for (i = 0; i < vq->shadow_used_idx; i++) { > uint16_t flags; > @@ -165,12 +156,24 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, > flags &= ~VRING_DESC_F_AVAIL; > } > > - vq->desc_packed[vq->last_used_idx].flags = flags; > + vq->desc_packed[vq->last_used_idx].id = > + vq->shadow_used_packed[i].id; > + vq->desc_packed[vq->last_used_idx].len = > + vq->shadow_used_packed[i].len; > + > + rte_smp_wmb(); > > - vhost_log_cache_used_vring(dev, vq, > + if (i > 0) { > + vq->desc_packed[vq->last_used_idx].flags = flags; > + > + vhost_log_cache_used_vring(dev, vq, > vq->last_used_idx * > sizeof(struct vring_packed_desc), > sizeof(struct vring_packed_desc)); > + } else { > + head_idx = vq->last_used_idx; > + head_flags = flags; > + } > > vq->last_used_idx += vq->shadow_used_packed[i].count; > if (vq->last_used_idx >= vq->size) { > @@ -179,8 +182,14 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, > } > } > > - rte_smp_wmb(); > + vq->desc_packed[head_idx].flags = head_flags; > vq->shadow_used_idx = 0; > + > + vhost_log_cache_used_vring(dev, vq, > + head_idx * > + sizeof(struct vring_packed_desc), > + sizeof(struct vring_packed_desc)); > + > vhost_log_cache_sync(dev, vq); > } > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring 2018-12-12 15:23 ` Ilya Maximets @ 2018-12-12 16:34 ` Maxime Coquelin 2018-12-12 18:53 ` Michael S. Tsirkin 0 siblings, 1 reply; 9+ messages in thread From: Maxime Coquelin @ 2018-12-12 16:34 UTC (permalink / raw) To: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann, mst Hi Ilya, On 12/12/18 4:23 PM, Ilya Maximets wrote: > On 12.12.2018 11:24, Maxime Coquelin wrote: >> Instead of writing back descriptors chains in order, let's >> write the first chain flags last in order to improve batching. >> >> With Kernel's pktgen benchmark, ~3% performance gain is measured. >> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >> --- >> lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++-------------- >> 1 file changed, 24 insertions(+), 15 deletions(-) >> > > Hi. > I made some rough testing on my ARMv8 system with this patch and v1 of it. > Here is the performance difference with current master: > v1: +1.1 % > v2: -3.6 % > > So, write barriers are quiet heavy in practice. Thanks for testing it on ARM. Indeed, SMP WMB is heavier on ARM. To reduce the number of WMBs, I propose to revert back to original implementation by first writing all .len and .id fields, do the write barrier then writing all flags by writing first one last: diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5e1a1a727..58a277c53 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -136,6 +136,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, { int i; uint16_t used_idx = vq->last_used_idx; + uint16_t head_idx = vq->last_used_idx; + uint16_t head_flags = 0; /* Split loop in two to save memory barriers */ for (i = 0; i < vq->shadow_used_idx; i++) { @@ -165,12 +167,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, flags &= ~VRING_DESC_F_AVAIL; } - vq->desc_packed[vq->last_used_idx].flags = flags; + if (i > 0) { + vq->desc_packed[vq->last_used_idx].flags = flags; - vhost_log_cache_used_vring(dev, vq, + vhost_log_cache_used_vring(dev, vq, vq->last_used_idx * sizeof(struct vring_packed_desc), sizeof(struct vring_packed_desc)); + } else { + head_idx = vq->last_used_idx; + head_flags = flags; + } vq->last_used_idx += vq->shadow_used_packed[i].count; if (vq->last_used_idx >= vq->size) { @@ -179,7 +186,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, } } - rte_smp_wmb(); + vq->desc_packed[head_idx].flags = head_flags; + + vhost_log_cache_used_vring(dev, vq, + vq->last_used_idx * + sizeof(struct vring_packed_desc), + sizeof(struct vring_packed_desc)); + vq->shadow_used_idx = 0; vhost_log_cache_sync(dev, vq); } > My testcase is the three instances of testpmd on a same host (with v11 from Jens): > > txonly (virtio_user0) --> fwd mode io (vhost0, vhost1) --> rxonly (virtio_user1) > > Best regards, Ilya Maximets. > >> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c >> index 5e1a1a727..c0b3d1137 100644 >> --- a/lib/librte_vhost/virtio_net.c >> +++ b/lib/librte_vhost/virtio_net.c >> @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> struct vhost_virtqueue *vq) >> { >> int i; >> - uint16_t used_idx = vq->last_used_idx; >> + uint16_t head_flags, head_idx = vq->last_used_idx; >> >> - /* Split loop in two to save memory barriers */ >> - for (i = 0; i < vq->shadow_used_idx; i++) { >> - vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id; >> - vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len; >> - >> - used_idx += vq->shadow_used_packed[i].count; >> - if (used_idx >= vq->size) >> - used_idx -= vq->size; >> - } >> - >> - rte_smp_wmb(); >> + if (unlikely(vq->shadow_used_idx == 0)) >> + return; >> >> for (i = 0; i < vq->shadow_used_idx; i++) { >> uint16_t flags; >> @@ -165,12 +156,24 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> flags &= ~VRING_DESC_F_AVAIL; >> } >> >> - vq->desc_packed[vq->last_used_idx].flags = flags; >> + vq->desc_packed[vq->last_used_idx].id = >> + vq->shadow_used_packed[i].id; >> + vq->desc_packed[vq->last_used_idx].len = >> + vq->shadow_used_packed[i].len; >> + >> + rte_smp_wmb(); >> >> - vhost_log_cache_used_vring(dev, vq, >> + if (i > 0) { >> + vq->desc_packed[vq->last_used_idx].flags = flags; >> + >> + vhost_log_cache_used_vring(dev, vq, >> vq->last_used_idx * >> sizeof(struct vring_packed_desc), >> sizeof(struct vring_packed_desc)); >> + } else { >> + head_idx = vq->last_used_idx; >> + head_flags = flags; >> + } >> >> vq->last_used_idx += vq->shadow_used_packed[i].count; >> if (vq->last_used_idx >= vq->size) { >> @@ -179,8 +182,14 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> } >> } >> >> - rte_smp_wmb(); >> + vq->desc_packed[head_idx].flags = head_flags; >> vq->shadow_used_idx = 0; >> + >> + vhost_log_cache_used_vring(dev, vq, >> + head_idx * >> + sizeof(struct vring_packed_desc), >> + sizeof(struct vring_packed_desc)); >> + >> vhost_log_cache_sync(dev, vq); >> } >> >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring 2018-12-12 16:34 ` Maxime Coquelin @ 2018-12-12 18:53 ` Michael S. Tsirkin 2018-12-19 9:16 ` Maxime Coquelin 0 siblings, 1 reply; 9+ messages in thread From: Michael S. Tsirkin @ 2018-12-12 18:53 UTC (permalink / raw) To: Maxime Coquelin Cc: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann, Jason Wang On Wed, Dec 12, 2018 at 05:34:31PM +0100, Maxime Coquelin wrote: > Hi Ilya, > > On 12/12/18 4:23 PM, Ilya Maximets wrote: > > On 12.12.2018 11:24, Maxime Coquelin wrote: > > > Instead of writing back descriptors chains in order, let's > > > write the first chain flags last in order to improve batching. > > > > > > With Kernel's pktgen benchmark, ~3% performance gain is measured. > > > > > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > > > --- > > > lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++-------------- > > > 1 file changed, 24 insertions(+), 15 deletions(-) > > > > > > > Hi. > > I made some rough testing on my ARMv8 system with this patch and v1 of it. > > Here is the performance difference with current master: > > v1: +1.1 % > > v2: -3.6 % > > > > So, write barriers are quiet heavy in practice. > > Thanks for testing it on ARM. Indeed, SMP WMB is heavier on ARM. Besides your ideas for improving packed rings, maybe we should switch to load_acquite/store_release? See virtio: use smp_load_acquire/smp_store_release which worked fine but as I only tested on x86 did not result in any gains. -- MST ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring 2018-12-12 18:53 ` Michael S. Tsirkin @ 2018-12-19 9:16 ` Maxime Coquelin 2018-12-19 16:10 ` Michael S. Tsirkin 0 siblings, 1 reply; 9+ messages in thread From: Maxime Coquelin @ 2018-12-19 9:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann, Jason Wang On 12/12/18 7:53 PM, Michael S. Tsirkin wrote: > On Wed, Dec 12, 2018 at 05:34:31PM +0100, Maxime Coquelin wrote: >> Hi Ilya, >> >> On 12/12/18 4:23 PM, Ilya Maximets wrote: >>> On 12.12.2018 11:24, Maxime Coquelin wrote: >>>> Instead of writing back descriptors chains in order, let's >>>> write the first chain flags last in order to improve batching. >>>> >>>> With Kernel's pktgen benchmark, ~3% performance gain is measured. >>>> >>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >>>> --- >>>> lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++-------------- >>>> 1 file changed, 24 insertions(+), 15 deletions(-) >>>> >>> >>> Hi. >>> I made some rough testing on my ARMv8 system with this patch and v1 of it. >>> Here is the performance difference with current master: >>> v1: +1.1 % >>> v2: -3.6 % >>> >>> So, write barriers are quiet heavy in practice. >> >> Thanks for testing it on ARM. Indeed, SMP WMB is heavier on ARM. > > Besides your ideas for improving packed rings, maybe we should switch to > load_acquite/store_release? > > See > virtio: use smp_load_acquire/smp_store_release > > which worked fine but as I only tested on x86 did not result in any gains. > Thanks for the pointer. We'll look into it for v19.05, as -rc1 for v19.02 is planned for end of week, so it will be too late to introduce such changes. Regards, Maxime ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring 2018-12-19 9:16 ` Maxime Coquelin @ 2018-12-19 16:10 ` Michael S. Tsirkin 0 siblings, 0 replies; 9+ messages in thread From: Michael S. Tsirkin @ 2018-12-19 16:10 UTC (permalink / raw) To: Maxime Coquelin Cc: Ilya Maximets, dev, tiwei.bie, zhihong.wang, jfreimann, Jason Wang On Wed, Dec 19, 2018 at 10:16:24AM +0100, Maxime Coquelin wrote: > > > On 12/12/18 7:53 PM, Michael S. Tsirkin wrote: > > On Wed, Dec 12, 2018 at 05:34:31PM +0100, Maxime Coquelin wrote: > > > Hi Ilya, > > > > > > On 12/12/18 4:23 PM, Ilya Maximets wrote: > > > > On 12.12.2018 11:24, Maxime Coquelin wrote: > > > > > Instead of writing back descriptors chains in order, let's > > > > > write the first chain flags last in order to improve batching. > > > > > > > > > > With Kernel's pktgen benchmark, ~3% performance gain is measured. > > > > > > > > > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > > > > > --- > > > > > lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++-------------- > > > > > 1 file changed, 24 insertions(+), 15 deletions(-) > > > > > > > > > > > > > Hi. > > > > I made some rough testing on my ARMv8 system with this patch and v1 of it. > > > > Here is the performance difference with current master: > > > > v1: +1.1 % > > > > v2: -3.6 % > > > > > > > > So, write barriers are quiet heavy in practice. > > > > > > Thanks for testing it on ARM. Indeed, SMP WMB is heavier on ARM. > > > > Besides your ideas for improving packed rings, maybe we should switch to > > load_acquite/store_release? > > > > See > > virtio: use smp_load_acquire/smp_store_release > > > > which worked fine but as I only tested on x86 did not result in any gains. > > > > Thanks for the pointer. > We'll look into it for v19.05, as -rc1 for v19.02 is planned for end of > week, so it will be too late to introduce such changes. > > Regards, > Maxime That's not the only option BTW. For loads, another option it to work the value into an indirect dependency which does not need a barrier. For example: #define OPTIMIZER_HIDE_VAR(var) \ __asm__ ("" : "=r" (var) : "0" (var)) unsigned empty = last_used == idx->used; if (!empty) { OPTIMIZER_HIDE_VAR(empty); desc = used->ring[last_used + empty]; } See linux for definitions of OPTIMIZER_HIDE_VAR. One side effect of this is that this also blocks code speculation. which can be a good or a bad thing for performance, but can be a good thing for security. -- MST ^ permalink raw reply [flat|nested] 9+ messages in thread
* [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring @ 2018-11-28 9:47 Maxime Coquelin 2018-11-28 10:05 ` Jens Freimann 0 siblings, 1 reply; 9+ messages in thread From: Maxime Coquelin @ 2018-11-28 9:47 UTC (permalink / raw) To: dev, tiwei.bie, zhihong.wang, jfreimann; +Cc: Maxime Coquelin Instead of writing back descriptors chains in order, let's write the first chain flags last in order to improve batching. With Kernel's pktgen benchmark, ~3% performance gain is measured. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- lib/librte_vhost/virtio_net.c | 37 ++++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5e1a1a727..f54642c2d 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) { int i; - uint16_t used_idx = vq->last_used_idx; + uint16_t head_flags, head_idx = vq->last_used_idx; - /* Split loop in two to save memory barriers */ - for (i = 0; i < vq->shadow_used_idx; i++) { - vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id; - vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len; - - used_idx += vq->shadow_used_packed[i].count; - if (used_idx >= vq->size) - used_idx -= vq->size; - } - - rte_smp_wmb(); + if (unlikely(vq->shadow_used_idx == 0)) + return; for (i = 0; i < vq->shadow_used_idx; i++) { uint16_t flags; @@ -165,12 +156,22 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, flags &= ~VRING_DESC_F_AVAIL; } - vq->desc_packed[vq->last_used_idx].flags = flags; + vq->desc_packed[vq->last_used_idx].id = + vq->shadow_used_packed[i].id; + vq->desc_packed[vq->last_used_idx].len = + vq->shadow_used_packed[i].len; + + if (i > 0) { + vq->desc_packed[vq->last_used_idx].flags = flags; - vhost_log_cache_used_vring(dev, vq, + vhost_log_cache_used_vring(dev, vq, vq->last_used_idx * sizeof(struct vring_packed_desc), sizeof(struct vring_packed_desc)); + } else { + head_idx = vq->last_used_idx; + head_flags = flags; + } vq->last_used_idx += vq->shadow_used_packed[i].count; if (vq->last_used_idx >= vq->size) { @@ -180,7 +181,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, } rte_smp_wmb(); + + vq->desc_packed[head_idx].flags = head_flags; vq->shadow_used_idx = 0; + + vhost_log_cache_used_vring(dev, vq, + head_idx * + sizeof(struct vring_packed_desc), + sizeof(struct vring_packed_desc)); + vhost_log_cache_sync(dev, vq); } -- 2.17.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring 2018-11-28 9:47 Maxime Coquelin @ 2018-11-28 10:05 ` Jens Freimann 0 siblings, 0 replies; 9+ messages in thread From: Jens Freimann @ 2018-11-28 10:05 UTC (permalink / raw) To: Maxime Coquelin; +Cc: dev, tiwei.bie, zhihong.wang On Wed, Nov 28, 2018 at 10:47:00AM +0100, Maxime Coquelin wrote: >Instead of writing back descriptors chains in order, let's >write the first chain flags last in order to improve batching. > >With Kernel's pktgen benchmark, ~3% performance gain is measured. > >Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> >--- > lib/librte_vhost/virtio_net.c | 37 ++++++++++++++++++++++------------- > 1 file changed, 23 insertions(+), 14 deletions(-) > Tested-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-12-19 16:10 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20181212082505epcas4p36ea3086cfd223aae70a925e946fced48@epcas4p3.samsung.com> 2018-12-12 8:24 ` [dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring Maxime Coquelin 2018-12-12 9:41 ` Maxime Coquelin 2018-12-12 15:23 ` Ilya Maximets 2018-12-12 16:34 ` Maxime Coquelin 2018-12-12 18:53 ` Michael S. Tsirkin 2018-12-19 9:16 ` Maxime Coquelin 2018-12-19 16:10 ` Michael S. Tsirkin 2018-11-28 9:47 Maxime Coquelin 2018-11-28 10:05 ` Jens Freimann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).