From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: dev@dpdk.org, jianfeng.tan@intel.com, tiwei.bie@intel.com,
mst@redhat.com
Cc: stable@dpdk.org, Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: [dpdk-dev] [PATCH] vhost: improve dirty pages logging performance
Date: Mon, 30 Apr 2018 17:59:54 +0200 [thread overview]
Message-ID: <20180430155954.9939-1-maxime.coquelin@redhat.com> (raw)
This patch caches all dirty pages logging until the used ring index
is updated. These dirty pages won't be accessed by the guest as
long as the host doesn't give them back to it by updating the
index.
The goal of this optimization is to fix a performance regression
introduced when the vhost library started to use atomic operations
to set bits in the shared dirty log map. While the fix was valid
as previous implementation wasn't safe against concurent accesses,
contention was induced.
With this patch, during migration, we have:
1. Less atomic operations as only a single atomic OR operation
per 32 pages.
2. Less atomic operations as during a burst, the same page will
be marked dirty only once.
3. Less write memory barriers.
Fixes: 897f13a1f726 ("vhost: make page logging atomic")
Cc: stable@dpdk.org
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
Hi,
This series was tested with migrating a guest while running PVP
benchmark at 1Mpps with both ovs-dpdk and testpmd as vswitch.
With this patch we recover the packet drops regressions seen since
the use of atomic operations to log dirty pages.
Some numbers:
A) PVP Live migration using testpmd (Single queue pair)
-------------------------------------------------------
Without patch:
=========================Stream Rate: 1Mpps==============
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
0 1Mpps 134 18966 16 11628963.0
1 1Mpps 125 18790 16 8436300.0
2 1Mpps 122 19171 15 13881342.0
3 1Mpps 132 18913 15 12079492.0
<------------------------Summary------------------------>
Max 1Mpps 134 19171 16 13881342
Min 1Mpps 122 18790 15 8436300
Mean 1Mpps 128 18960 15 11506524
Median 1Mpps 128 18939 15 11854227
Stdev 0 5.68 158.81 0.58 2266371.52
=========================================================
With patch:
=========================Stream Rate: 1Mpps==============
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
0 1Mpps 119 13521 15 478174.0
1 1Mpps 116 14084 14 452018.0
2 1Mpps 122 13908 14 464486.0
3 1Mpps 129 13787 16 478234.0
<------------------------Summary------------------------>
Max 1Mpps 129 14084 16 478234
Min 1Mpps 116 13521 14 452018
Mean 1Mpps 121 13825 14 468228
Median 1Mpps 120 13847 14 471330
Stdev 0 5.57 236.52 0.96 12593.78
=========================================================
B) OVS-DPDK migration with 2 queue pairs
----------------------------------------
Without patch:
=======================Stream Rate: 1Mpps================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
0 1Mpps 146 20270 116 15937394.0
1 1Mpps 150 25617 16 11120370.0
2 1Mpps 138 40983 115 24779971.0
3 1Mpps 161 20435 17 15363519.0
<------------------------Summary------------------------>
Max 1Mpps 161 40983 116 24779971
Min 1Mpps 138 20270 16 11120370
Mean 1Mpps 148 26826 66 16800313
Median 1Mpps 148 23026 66 15650456
Stdev 0 9.57 9758.9 57.16 5737179.93
=========================================================
With patch:
=======================Stream Rate: 1Mpps================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
0 1Mpps 155 18915 17 481330.0
1 1Mpps 156 21097 18 370556.0
2 1Mpps 156 49610 15 369291.0
3 1Mpps 144 31124 15 361914.0
<------------------------Summary------------------------>
Max 1Mpps 156 49610 18 481330
Min 1Mpps 144 18915 15 361914
Mean 1Mpps 152 30186 16 395772
Median 1Mpps 155 26110 16 369923
Stdev 0 5.85 13997.82 1.5 57165.33
=========================================================
C) OVS-DPDK migration with single queue pair
--------------------------------------------
Without patch:
=======================Stream Rate: 1Mpps================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
0 1Mpps 129 17411 15 11105414.0
1 1Mpps 130 16544 15 8028438.0
2 1Mpps 132 15202 15 7491584.0
3 1Mpps 133 18100 15 8385047.0
<------------------------Summary------------------------>
Max 1Mpps 133 18100 15 11105414
Min 1Mpps 129 15202 15 7491584
Mean 1Mpps 131 16814 15 8752620
Median 1Mpps 131 16977 15 8206742
Stdev 0 1.83 1249.22 0.0 1610941.84
=========================================================
With patch:
=======================Stream Rate: 1Mpps================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
0 1Mpps 117 12966 14 275873.0
1 1Mpps 127 13398 14 460323.0
2 1Mpps 119 13216 14 280831.0
3 1Mpps 131 13544 15 465951.0
<------------------------Summary------------------------>
Max 1Mpps 131 13544 15 465951
Min 1Mpps 117 12966 14 275873
Mean 1Mpps 123 13281 14 370744
Median 1Mpps 123 13307 14 370577
Stdev 0 6.61 249.2 0.5 106729.6
=========================================================
lib/librte_vhost/vhost.h | 96 +++++++++++++++++++++++++++++++++++++++++++
lib/librte_vhost/virtio_net.c | 29 ++++++++-----
2 files changed, 115 insertions(+), 10 deletions(-)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 891978131..0d9129504 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -36,6 +36,8 @@
#define BUF_VECTOR_MAX 256
+#define VHOST_LOG_CACHE_NR 32
+
/**
* Structure contains buffer address, length and descriptor index
* from vring to do scatter RX.
@@ -69,6 +71,14 @@ struct batch_copy_elem {
uint64_t log_addr;
};
+/*
+ * Structure that contains the info for batched dirty logging.
+ */
+struct log_cache_entry {
+ uint32_t offset;
+ uint32_t val;
+};
+
/**
* Structure contains variables relevant to RX/TX virtqueues.
*/
@@ -112,6 +122,9 @@ struct vhost_virtqueue {
struct batch_copy_elem *batch_copy_elems;
uint16_t batch_copy_nb_elems;
+ struct log_cache_entry log_cache[VHOST_LOG_CACHE_NR];
+ uint16_t log_cache_nb_elem;
+
rte_rwlock_t iotlb_lock;
rte_rwlock_t iotlb_pending_lock;
struct rte_mempool *iotlb_pool;
@@ -340,6 +353,89 @@ vhost_log_write(struct virtio_net *dev, uint64_t addr, uint64_t len)
}
}
+static __rte_always_inline void
+vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+ uint32_t *log_base;
+ int i;
+
+ if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) ||
+ !dev->log_base))
+ return;
+
+ log_base = (uint32_t *)(uintptr_t)dev->log_base;
+
+ /* To make sure guest memory updates are committed before logging */
+ rte_smp_wmb();
+
+ for (i = 0; i < vq->log_cache_nb_elem; i++) {
+ struct log_cache_entry *elem = vq->log_cache + i;
+
+ __sync_fetch_and_or(log_base + elem->offset, elem->val);
+ }
+
+ vq->log_cache_nb_elem = 0;
+}
+
+static __rte_always_inline void
+vhost_log_cache_page(struct virtio_net *dev, struct vhost_virtqueue *vq,
+ uint64_t page)
+{
+ uint32_t bit_nr = page % 32;
+ uint32_t offset = page / 32;
+ int i;
+
+ for (i = 0; i < vq->log_cache_nb_elem; i++) {
+ struct log_cache_entry *elem = vq->log_cache + i;
+
+ if (elem->offset == offset) {
+ elem->val |= (1U << bit_nr);
+ return;
+ }
+ }
+
+ if (unlikely(i >= VHOST_LOG_CACHE_NR)) {
+ /*
+ * No more room for a new log cache entry,
+ * so write the dirty log map directly.
+ */
+ rte_smp_wmb();
+ vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
+
+ return;
+ }
+
+ vq->log_cache[i].offset = offset;
+ vq->log_cache[i].val = (1U << bit_nr);
+}
+
+static __rte_always_inline void
+vhost_log_cache_write(struct virtio_net *dev, struct vhost_virtqueue *vq,
+ uint64_t addr, uint64_t len)
+{
+ uint64_t page;
+
+ if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) ||
+ !dev->log_base || !len))
+ return;
+
+ if (unlikely(dev->log_size <= ((addr + len - 1) / VHOST_LOG_PAGE / 8)))
+ return;
+
+ page = addr / VHOST_LOG_PAGE;
+ while (page * VHOST_LOG_PAGE < addr + len) {
+ vhost_log_cache_page(dev, vq, page);
+ page += 1;
+ }
+}
+
+static __rte_always_inline void
+vhost_log_cache_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq,
+ uint64_t offset, uint64_t len)
+{
+ vhost_log_cache_write(dev, vq, vq->log_guest_addr + offset, len);
+}
+
static __rte_always_inline void
vhost_log_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq,
uint64_t offset, uint64_t len)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5fdd4172b..ac42cdbd0 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -78,7 +78,7 @@ do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
rte_memcpy(&vq->used->ring[to],
&vq->shadow_used_ring[from],
size * sizeof(struct vring_used_elem));
- vhost_log_used_vring(dev, vq,
+ vhost_log_cache_used_vring(dev, vq,
offsetof(struct vring_used, ring[to]),
size * sizeof(struct vring_used_elem));
}
@@ -106,6 +106,8 @@ flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq)
rte_smp_wmb();
+ vhost_log_cache_sync(dev, vq);
+
*(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
sizeof(vq->used->idx));
@@ -130,7 +132,7 @@ do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq)
for (i = 0; i < count; i++) {
rte_memcpy(elem[i].dst, elem[i].src, elem[i].len);
- vhost_log_write(dev, elem[i].log_addr, elem[i].len);
+ vhost_log_cache_write(dev, vq, elem[i].log_addr, elem[i].len);
PRINT_PACKET(dev, (uintptr_t)elem[i].dst, elem[i].len, 0);
}
}
@@ -251,7 +253,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
virtio_enqueue_offload(m,
(struct virtio_net_hdr *)(uintptr_t)desc_addr);
PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
- vhost_log_write(dev, desc_gaddr, dev->vhost_hlen);
+ vhost_log_cache_write(dev, vq, desc_gaddr, dev->vhost_hlen);
} else {
struct virtio_net_hdr vnet_hdr;
uint64_t remain = dev->vhost_hlen;
@@ -274,7 +276,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
(void *)(uintptr_t)src, len);
PRINT_PACKET(dev, (uintptr_t)dst, (uint32_t)len, 0);
- vhost_log_write(dev, guest_addr, len);
+ vhost_log_cache_write(dev, vq, guest_addr, len);
remain -= len;
guest_addr += len;
dst += len;
@@ -355,7 +357,8 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
desc_offset)),
rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
cpy_len);
- vhost_log_write(dev, desc_gaddr + desc_offset, cpy_len);
+ vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset,
+ cpy_len);
PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset),
cpy_len, 0);
} else {
@@ -444,7 +447,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
vq->used->ring[used_idx].id = desc_indexes[i];
vq->used->ring[used_idx].len = pkts[i]->pkt_len +
dev->vhost_hlen;
- vhost_log_used_vring(dev, vq,
+ vhost_log_cache_used_vring(dev, vq,
offsetof(struct vring_used, ring[used_idx]),
sizeof(vq->used->ring[used_idx]));
}
@@ -504,6 +507,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
rte_smp_wmb();
+ vhost_log_cache_sync(dev, vq);
+
*(volatile uint16_t *)&vq->used->idx += count;
vq->last_used_idx += count;
vhost_log_used_vring(dev, vq,
@@ -767,7 +772,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
PRINT_PACKET(dev, (uintptr_t)dst,
(uint32_t)len, 0);
- vhost_log_write(dev, guest_addr, len);
+ vhost_log_cache_write(dev, vq,
+ guest_addr, len);
remain -= len;
guest_addr += len;
@@ -776,7 +782,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
} else {
PRINT_PACKET(dev, (uintptr_t)hdr_addr,
dev->vhost_hlen, 0);
- vhost_log_write(dev, hdr_phys_addr,
+ vhost_log_cache_write(dev, vq, hdr_phys_addr,
dev->vhost_hlen);
}
@@ -790,7 +796,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
desc_offset)),
rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
cpy_len);
- vhost_log_write(dev, desc_gaddr + desc_offset, cpy_len);
+ vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset,
+ cpy_len);
PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset),
cpy_len, 0);
} else {
@@ -1320,7 +1327,7 @@ update_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
{
vq->used->ring[used_idx].id = desc_idx;
vq->used->ring[used_idx].len = 0;
- vhost_log_used_vring(dev, vq,
+ vhost_log_cache_used_vring(dev, vq,
offsetof(struct vring_used, ring[used_idx]),
sizeof(vq->used->ring[used_idx]));
}
@@ -1335,6 +1342,8 @@ update_used_idx(struct virtio_net *dev, struct vhost_virtqueue *vq,
rte_smp_wmb();
rte_smp_rmb();
+ vhost_log_cache_sync(dev, vq);
+
vq->used->idx += count;
vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
sizeof(vq->used->idx));
--
2.14.3
next reply other threads:[~2018-04-30 16:00 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-30 15:59 Maxime Coquelin [this message]
2018-05-03 11:56 ` Tiwei Bie
2018-05-04 15:48 ` Maxime Coquelin
2018-05-04 18:54 ` Michael S. Tsirkin
2018-05-07 3:49 ` Tiwei Bie
2018-05-07 3:58 ` Michael S. Tsirkin
2018-05-15 13:50 ` Maxime Coquelin
2018-05-16 6:10 ` Tiwei Bie
2018-05-16 15:00 ` Maxime Coquelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180430155954.9939-1-maxime.coquelin@redhat.com \
--to=maxime.coquelin@redhat.com \
--cc=dev@dpdk.org \
--cc=jianfeng.tan@intel.com \
--cc=mst@redhat.com \
--cc=stable@dpdk.org \
--cc=tiwei.bie@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).