From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) by dpdk.org (Postfix) with ESMTP id 9C95B2BE1 for ; Wed, 23 May 2018 14:10:19 +0200 (CEST) Received: by mail-wm0-f67.google.com with SMTP id w194-v6so8285086wmf.2 for ; Wed, 23 May 2018 05:10:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=KAWxQFobEJZuF4MyRu2ibNdZ0riQ1KOeR2rKhJ9A5MU=; b=LI3YXKQgVkN7ovAlbrwpB38tKRwSc/Js6EELW0zDugAxQH3xqTRIFSD9UFrmNp+CXi ICoshi2ICaabTE+R7sjVjbB2FS8/nZ1MQaQzFbXOvTFW6LHJ88ayeTAFCuDVNLLCcV8U YSJ+v0/tH+SHqoL1wCw5JmHYr1PDI4uEHKEAC3q3QOwfJYE9amypmpvoHjQA+36r6czn Nkx5SUYmnvtCAgds1C/xu9t18VXAEcqHA0svsOpWOjB5yf7veSr/IlYk//OTx9SRgLzO wLIG7avuj2CVa3Hon/qy8yCU9MTfL7aWYyTT2kQCZziecTYfu353xPx+rTYUoIxFzg8h qm8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=KAWxQFobEJZuF4MyRu2ibNdZ0riQ1KOeR2rKhJ9A5MU=; b=CtRuZDjNJS64IYNHfJX2OatDwrt1XCmzDOaWsKFAGfOpr3lkh/1Z6OyYhozsX5wg8Y gnDIifld2mYD23ZHFRyWkmDyKA8mShg+WOGlBxvO8KhDVTHGJJW/2PDUJTog1d9+RNaw f2zKIk5n5LTIhITXDjbgyCiJTlb/H7hJZUi3X+ydZ01IlrUmY6efjuB/g5TVAwvFkQ2W GV2DiyLvKHGaIfWThrmheVMWnSM5St/X1OyVJO/zKT7FfExHyx+5Rs9HzqfQIi/dO3wE 5Opm5yCvK6LAl343MLi6VePfQz/QmUo129awSZwTEZLusV6uYKhHJ2dHVJOsST1mniNF kgGA== X-Gm-Message-State: ALKqPwfuvNpWwGa7kw5foIf5G+ym6tUKvU1dZQXrtidj5E+kiLcMV4xC n2n7yIdtNvk2ojhRlWQ5z+k= X-Google-Smtp-Source: AB8JxZqslmrdW/4Oa0MQjstJq4YecDzsz5Tt2+nTFcHxwwXH6UFCPw9ME6jlPxA+IJw8TjzfXxY/Bw== X-Received: by 2002:a1c:711c:: with SMTP id m28-v6mr4400888wmc.92.1527077419294; Wed, 23 May 2018 05:10:19 -0700 (PDT) Received: from localhost ([2a00:23c5:be9a:5200:ce4c:82c0:d567:ecbb]) by smtp.gmail.com with ESMTPSA id b16-v6sm19067610wrm.89.2018.05.23.05.10.17 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 23 May 2018 05:10:18 -0700 (PDT) From: luca.boccassi@gmail.com To: Maxime Coquelin Cc: "Michael S . Tsirkin" , Tiwei Bie , dpdk stable Date: Wed, 23 May 2018 13:09:13 +0100 Message-Id: <20180523121010.8385-2-luca.boccassi@gmail.com> X-Mailer: git-send-email 2.14.2 In-Reply-To: <20180523121010.8385-1-luca.boccassi@gmail.com> References: <20180516101323.2234-2-luca.boccassi@gmail.com> <20180523121010.8385-1-luca.boccassi@gmail.com> Subject: [dpdk-stable] patch 'vhost: improve dirty pages logging performance' has been queued to stable release 18.02.2 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 May 2018 12:10:19 -0000 Hi, FYI, your patch has been queued to stable release 18.02.2 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 05/25/18. So please shout if anyone has objections. Thanks. Luca Boccassi --- >>From e11c7fff31f2cc91fea4e6acb89f51539b7109da Mon Sep 17 00:00:00 2001 From: Maxime Coquelin Date: Thu, 17 May 2018 13:44:47 +0200 Subject: [PATCH] vhost: improve dirty pages logging performance [ upstream commit c16915b8710911a75f0fbdb1aa5243f4cdfaf26a ] This patch caches all dirty pages logging until the used ring index is updated. The goal of this optimization is to fix a performance regression introduced when the vhost library started to use atomic operations to set bits in the shared dirty log map. While the fix was valid as previous implementation wasn't safe against concurrent accesses, contention was induced. With this patch, during migration, we have: 1. Less atomic operations as only a single atomic OR operation per 32 or 64 (depending on CPU) pages. 2. Less atomic operations as during a burst, the same page will be marked dirty only once. 3. Less write memory barriers. Fixes: 897f13a1f726 ("vhost: make page logging atomic") Suggested-by: Michael S. Tsirkin Signed-off-by: Maxime Coquelin Reviewed-by: Tiwei Bie --- lib/librte_vhost/vhost.h | 119 +++++++++++++++++++++++++++++++++++++++++- lib/librte_vhost/virtio_net.c | 29 ++++++---- 2 files changed, 137 insertions(+), 11 deletions(-) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index bab0b456e..b3b8bc2e0 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -32,6 +32,8 @@ #define BUF_VECTOR_MAX 256 +#define VHOST_LOG_CACHE_NR 32 + /** * Structure contains buffer address, length and descriptor index * from vring to do scatter RX. @@ -65,6 +67,14 @@ struct batch_copy_elem { uint64_t log_addr; }; +/* + * Structure that contains the info for batched dirty logging. + */ +struct log_cache_entry { + uint32_t offset; + unsigned long val; +}; + /** * Structure contains variables relevant to RX/TX virtqueues. */ @@ -108,6 +118,9 @@ struct vhost_virtqueue { struct batch_copy_elem *batch_copy_elems; uint16_t batch_copy_nb_elems; + struct log_cache_entry log_cache[VHOST_LOG_CACHE_NR]; + uint16_t log_cache_nb_elem; + rte_rwlock_t iotlb_lock; rte_rwlock_t iotlb_pending_lock; struct rte_mempool *iotlb_pool; @@ -252,7 +265,15 @@ struct virtio_net { static __rte_always_inline void vhost_set_bit(unsigned int nr, volatile uint8_t *addr) { - __sync_fetch_and_or_8(addr, (1U << nr)); +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 70100) + /* + * __sync_ built-ins are deprecated, but __atomic_ ones + * are sub-optimized in older GCC versions. + */ + __sync_fetch_and_or_1(addr, (1U << nr)); +#else + __atomic_fetch_or(addr, (1U << nr), __ATOMIC_RELAXED); +#endif } static __rte_always_inline void @@ -283,6 +304,102 @@ vhost_log_write(struct virtio_net *dev, uint64_t addr, uint64_t len) } } +static __rte_always_inline void +vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq) +{ + unsigned long *log_base; + int i; + + if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) || + !dev->log_base)) + return; + + log_base = (unsigned long *)(uintptr_t)dev->log_base; + + /* + * It is expected a write memory barrier has been issued + * before this function is called. + */ + + for (i = 0; i < vq->log_cache_nb_elem; i++) { + struct log_cache_entry *elem = vq->log_cache + i; + +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 70100) + /* + * '__sync' builtins are deprecated, but '__atomic' ones + * are sub-optimized in older GCC versions. + */ + __sync_fetch_and_or(log_base + elem->offset, elem->val); +#else + __atomic_fetch_or(log_base + elem->offset, elem->val, + __ATOMIC_RELAXED); +#endif + } + + rte_smp_wmb(); + + vq->log_cache_nb_elem = 0; +} + +static __rte_always_inline void +vhost_log_cache_page(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t page) +{ + uint32_t bit_nr = page % (sizeof(unsigned long) << 3); + uint32_t offset = page / (sizeof(unsigned long) << 3); + int i; + + for (i = 0; i < vq->log_cache_nb_elem; i++) { + struct log_cache_entry *elem = vq->log_cache + i; + + if (elem->offset == offset) { + elem->val |= (1UL << bit_nr); + return; + } + } + + if (unlikely(i >= VHOST_LOG_CACHE_NR)) { + /* + * No more room for a new log cache entry, + * so write the dirty log map directly. + */ + rte_smp_wmb(); + vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page); + + return; + } + + vq->log_cache[i].offset = offset; + vq->log_cache[i].val = (1UL << bit_nr); +} + +static __rte_always_inline void +vhost_log_cache_write(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t addr, uint64_t len) +{ + uint64_t page; + + if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) || + !dev->log_base || !len)) + return; + + if (unlikely(dev->log_size <= ((addr + len - 1) / VHOST_LOG_PAGE / 8))) + return; + + page = addr / VHOST_LOG_PAGE; + while (page * VHOST_LOG_PAGE < addr + len) { + vhost_log_cache_page(dev, vq, page); + page += 1; + } +} + +static __rte_always_inline void +vhost_log_cache_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t offset, uint64_t len) +{ + vhost_log_cache_write(dev, vq, vq->log_guest_addr + offset, len); +} + static __rte_always_inline void vhost_log_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq, uint64_t offset, uint64_t len) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5fdd4172b..ac42cdbd0 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -78,7 +78,7 @@ do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq, rte_memcpy(&vq->used->ring[to], &vq->shadow_used_ring[from], size * sizeof(struct vring_used_elem)); - vhost_log_used_vring(dev, vq, + vhost_log_cache_used_vring(dev, vq, offsetof(struct vring_used, ring[to]), size * sizeof(struct vring_used_elem)); } @@ -106,6 +106,8 @@ flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq) rte_smp_wmb(); + vhost_log_cache_sync(dev, vq); + *(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx; vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx), sizeof(vq->used->idx)); @@ -130,7 +132,7 @@ do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq) for (i = 0; i < count; i++) { rte_memcpy(elem[i].dst, elem[i].src, elem[i].len); - vhost_log_write(dev, elem[i].log_addr, elem[i].len); + vhost_log_cache_write(dev, vq, elem[i].log_addr, elem[i].len); PRINT_PACKET(dev, (uintptr_t)elem[i].dst, elem[i].len, 0); } } @@ -251,7 +253,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, virtio_enqueue_offload(m, (struct virtio_net_hdr *)(uintptr_t)desc_addr); PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0); - vhost_log_write(dev, desc_gaddr, dev->vhost_hlen); + vhost_log_cache_write(dev, vq, desc_gaddr, dev->vhost_hlen); } else { struct virtio_net_hdr vnet_hdr; uint64_t remain = dev->vhost_hlen; @@ -274,7 +276,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, (void *)(uintptr_t)src, len); PRINT_PACKET(dev, (uintptr_t)dst, (uint32_t)len, 0); - vhost_log_write(dev, guest_addr, len); + vhost_log_cache_write(dev, vq, guest_addr, len); remain -= len; guest_addr += len; dst += len; @@ -355,7 +357,8 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, desc_offset)), rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), cpy_len); - vhost_log_write(dev, desc_gaddr + desc_offset, cpy_len); + vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset, + cpy_len); PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), cpy_len, 0); } else { @@ -444,7 +447,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, vq->used->ring[used_idx].id = desc_indexes[i]; vq->used->ring[used_idx].len = pkts[i]->pkt_len + dev->vhost_hlen; - vhost_log_used_vring(dev, vq, + vhost_log_cache_used_vring(dev, vq, offsetof(struct vring_used, ring[used_idx]), sizeof(vq->used->ring[used_idx])); } @@ -504,6 +507,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, rte_smp_wmb(); + vhost_log_cache_sync(dev, vq); + *(volatile uint16_t *)&vq->used->idx += count; vq->last_used_idx += count; vhost_log_used_vring(dev, vq, @@ -767,7 +772,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, PRINT_PACKET(dev, (uintptr_t)dst, (uint32_t)len, 0); - vhost_log_write(dev, guest_addr, len); + vhost_log_cache_write(dev, vq, + guest_addr, len); remain -= len; guest_addr += len; @@ -776,7 +782,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, } else { PRINT_PACKET(dev, (uintptr_t)hdr_addr, dev->vhost_hlen, 0); - vhost_log_write(dev, hdr_phys_addr, + vhost_log_cache_write(dev, vq, hdr_phys_addr, dev->vhost_hlen); } @@ -790,7 +796,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, desc_offset)), rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), cpy_len); - vhost_log_write(dev, desc_gaddr + desc_offset, cpy_len); + vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset, + cpy_len); PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), cpy_len, 0); } else { @@ -1320,7 +1327,7 @@ update_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq, { vq->used->ring[used_idx].id = desc_idx; vq->used->ring[used_idx].len = 0; - vhost_log_used_vring(dev, vq, + vhost_log_cache_used_vring(dev, vq, offsetof(struct vring_used, ring[used_idx]), sizeof(vq->used->ring[used_idx])); } @@ -1335,6 +1342,8 @@ update_used_idx(struct virtio_net *dev, struct vhost_virtqueue *vq, rte_smp_wmb(); rte_smp_rmb(); + vhost_log_cache_sync(dev, vq); + vq->used->idx += count; vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx), sizeof(vq->used->idx)); -- 2.14.2