From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by dpdk.org (Postfix) with ESMTP id 7F2FC4CA1 for ; Fri, 2 Mar 2018 18:11:06 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C373F80AD238; Fri, 2 Mar 2018 17:11:04 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-17.ams2.redhat.com [10.36.112.17]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5DC891C72E; Fri, 2 Mar 2018 17:10:59 +0000 (UTC) From: Maxime Coquelin To: stable@dpdk.org, bluca@debian.org, ktraynor@redhat.com, victork@redhat.com Cc: Maxime Coquelin Date: Fri, 2 Mar 2018 18:10:42 +0100 Message-Id: <20180302171042.26094-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 02 Mar 2018 17:11:04 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 02 Mar 2018 17:11:04 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:'' Subject: [dpdk-stable] [PATCH v16.11 LTS] vhost: protect active rings from async ring changes X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Mar 2018 17:11:06 -0000 From: Victor Kaplansky [ backported from upstream commit a3688046995f88c518fa27c45b39ae389260b18d ] When performing live migration or memory hot-plugging, the changes to the device and vrings made by message handler done independently from vring usage by PMD threads. This causes for example segfaults during live-migration with MQ enable, but in general virtually any request sent by qemu changing the state of device can cause problems. These patches fixes all above issues by adding a spinlock to every vring and requiring message handler to start operation only after ensuring that all PMD threads related to the device are out of critical section accessing the vring data. Each vring has its own lock in order to not create contention between PMD threads of different vrings and to prevent performance degradation by scaling queue pair number. See https://bugzilla.redhat.com/show_bug.cgi?id=1450680 Cc: stable@dpdk.org Signed-off-by: Victor Kaplansky Reviewed-by: Maxime Coquelin Acked-by: Yuanhan Liu Backport conflicts: lib/librte_vhost/vhost.c lib/librte_vhost/vhost.h lib/librte_vhost/vhost_user.c lib/librte_vhost/virtio_net.c Signed-off-by: Maxime Coquelin --- Hi Luca, All, This is the v16.11 backport for Victor's patch already available in master and v17.11 LTS. It needed some rework to be applied to v16.11. Please find below the steps for testing the issue it fixes. Without the patch it would segfault, and with it, the live migration with multiple queues works: • Start src's testpmd ./install/bin/testpmd -m 512 --file-prefix=src -l 0,2 -n 4 --vdev 'net_vhost0,iface=/tmp/vu-src,queues=2' -- --portmask=1 --disable-hw-vlan -i --rxq=2 --txq=2 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 • Start dst's testpmd ./install/bin/testpmd -m 512 --file-prefix=dst -l 0,2 -n 4 --vdev 'net_vhost0,iface=/tmp/vu-dst,queues=2' -- --portmask=1 --disable-hw-vlan -i --rxq=2 --txq=2 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12 • Start src's QEMU ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu Haswell -m 3G -smp 2 -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tmp/vu-src -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce,queues=2 -device virtio-net-pci,netdev=mynet1,mq=on,vectors=6 /home/virt/guest2.qcow2 -net none -vnc :0 -monitor stdio -k fr • Start dst's QEMU ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu Haswell -m 3G -smp 2 -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tmp/vu-dst -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce,queues=2 -device virtio-net-pci,netdev=mynet1,mq=on,vectors=6 /home/virt/guest2.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888 • Open VNC session for both src and dst vncviewer localhost:0 vncviewer localhost:1 • In guest, build DPDK and run it echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages modprobe vfio enable_unsafe_noiommu_mode cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode modprobe vfio-pci cd src git clone git://dpdk.org/dpdk cd dpdk make install T=x86_64-native-linuxapp-gcc DESTDIR=install ./install/sbin/dpdk-devbind -b vfio-pci 0000:00:03.0 ./install/bin/testpmd --socket-mem=512 -- --txq=2 --rxq=2 -i --auto-start • In both host testpmd prompts, start IO loopback set fwd io start tx_first 64 • At that point, you should see traffic on src show port info 0 show port info 0 • In src QEMU monitor, start migration migrate -d tcp:0:8888 • Monitor migration progress info migrate • Once done, dst VNC session should be up, and traffic should go through host's testpmd dst --- lib/librte_vhost/vhost.c | 2 ++ lib/librte_vhost/vhost.h | 2 ++ lib/librte_vhost/vhost_user.c | 69 +++++++++++++++++++++++++++++++++++++++++++ lib/librte_vhost/virtio_net.c | 36 +++++++++++++++++----- 4 files changed, 101 insertions(+), 8 deletions(-) diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 3c3f6a427..36fdfb555 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -202,6 +202,8 @@ alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx) dev->virtqueue[virt_tx_q_idx] = virtqueue + VIRTIO_TXQ; init_vring_queue_pair(dev, qp_idx); + rte_spinlock_init(&dev->virtqueue[virt_rx_q_idx]->access_lock); + rte_spinlock_init(&dev->virtqueue[virt_tx_q_idx]->access_lock); dev->virt_qp_nb += 1; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index d97df1d83..9f60ff81a 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -91,6 +91,8 @@ struct vhost_virtqueue { /* Backend value to determine if device should started/stopped */ int backend; + rte_spinlock_t access_lock; + /* Used to notify the guest (trigger interrupt) */ int callfd; /* Currently unused as polling mode is enabled */ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 5a67eae08..364540de8 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -952,12 +952,47 @@ send_vhost_message(int sockfd, struct VhostUserMsg *msg) return ret; } +static void +vhost_user_lock_all_queue_pairs(struct virtio_net *dev) +{ + unsigned int i = 0; + unsigned int vq_num = 0; + + while (vq_num < dev->virt_qp_nb * 2) { + struct vhost_virtqueue *vq = dev->virtqueue[i]; + + if (vq) { + rte_spinlock_lock(&vq->access_lock); + vq_num++; + } + i++; + } +} + +static void +vhost_user_unlock_all_queue_pairs(struct virtio_net *dev) +{ + unsigned int i = 0; + unsigned int vq_num = 0; + + while (vq_num < dev->virt_qp_nb * 2) { + struct vhost_virtqueue *vq = dev->virtqueue[i]; + + if (vq) { + rte_spinlock_unlock(&vq->access_lock); + vq_num++; + } + i++; + } +} + int vhost_user_msg_handler(int vid, int fd) { struct virtio_net *dev; struct VhostUserMsg msg; int ret; + int unlock_required = 0; dev = get_device(vid); if (dev == NULL) @@ -980,6 +1015,37 @@ vhost_user_msg_handler(int vid, int fd) RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n", vhost_message_str[msg.request]); + + /* + * Note: we don't lock all queues on VHOST_USER_GET_VRING_BASE, + * since it is sent when virtio stops and device is destroyed. + * destroy_device waits for queues to be inactive, so it is safe. + * Otherwise taking the access_lock would cause a dead lock. + */ + switch (msg.request) { + case VHOST_USER_SET_FEATURES: + case VHOST_USER_SET_PROTOCOL_FEATURES: + case VHOST_USER_SET_OWNER: + case VHOST_USER_RESET_OWNER: + case VHOST_USER_SET_MEM_TABLE: + case VHOST_USER_SET_LOG_BASE: + case VHOST_USER_SET_LOG_FD: + case VHOST_USER_SET_VRING_NUM: + case VHOST_USER_SET_VRING_ADDR: + case VHOST_USER_SET_VRING_BASE: + case VHOST_USER_SET_VRING_KICK: + case VHOST_USER_SET_VRING_CALL: + case VHOST_USER_SET_VRING_ERR: + case VHOST_USER_SET_VRING_ENABLE: + case VHOST_USER_SEND_RARP: + vhost_user_lock_all_queue_pairs(dev); + unlock_required = 1; + break; + default: + break; + + } + switch (msg.request) { case VHOST_USER_GET_FEATURES: msg.payload.u64 = vhost_user_get_features(); @@ -1069,5 +1135,8 @@ vhost_user_msg_handler(int vid, int fd) } + if (unlock_required) + vhost_user_unlock_all_queue_pairs(dev); + return 0; } diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index db194507e..0024f729e 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -44,6 +44,7 @@ #include #include #include +#include #include "vhost.h" @@ -313,8 +314,11 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, } vq = dev->virtqueue[queue_id]; + + rte_spinlock_lock(&vq->access_lock); + if (unlikely(vq->enabled == 0)) - return 0; + goto out_access_unlock; avail_idx = *((volatile uint16_t *)&vq->avail->idx); start_idx = vq->last_used_idx; @@ -322,7 +326,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, count = RTE_MIN(count, free_entries); count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST); if (count == 0) - return 0; + goto out_access_unlock; LOG_DEBUG(VHOST_DATA, "(%d) start_idx %d | end_idx %d\n", dev->vid, start_idx, start_idx + count); @@ -388,6 +392,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) && (vq->callfd >= 0)) eventfd_write(vq->callfd, (eventfd_t)1); + +out_access_unlock: + rte_spinlock_unlock(&vq->access_lock); + return count; } @@ -582,12 +590,15 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, } vq = dev->virtqueue[queue_id]; + + rte_spinlock_lock(&vq->access_lock); + if (unlikely(vq->enabled == 0)) - return 0; + goto out_access_unlock; count = RTE_MIN((uint32_t)MAX_PKT_BURST, count); if (count == 0) - return 0; + goto out_access_unlock; rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]); @@ -631,6 +642,9 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, eventfd_write(vq->callfd, (eventfd_t)1); } +out_access_unlock: + rte_spinlock_unlock(&vq->access_lock); + return pkt_idx; } @@ -1068,9 +1082,13 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, } vq = dev->virtqueue[queue_id]; - if (unlikely(vq->enabled == 0)) + + if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0)) return 0; + if (unlikely(vq->enabled == 0)) + goto out_access_unlock; + if (unlikely(dev->dequeue_zero_copy)) { struct zcopy_mbuf *zmbuf, *next; int nr_updated = 0; @@ -1120,7 +1138,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, if (rarp_mbuf == NULL) { RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); - return 0; + goto out_access_unlock; } if (make_rarp_packet(rarp_mbuf, &dev->mac)) { @@ -1134,7 +1152,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, free_entries = *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx; if (free_entries == 0) - goto out; + goto out_access_unlock; LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__); @@ -1227,7 +1245,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, update_used_idx(dev, vq, i); } -out: +out_access_unlock: + rte_spinlock_unlock(&vq->access_lock); + if (unlikely(rarp_mbuf != NULL)) { /* * Inject it to the head of "pkts" array, so that switch's mac -- 2.14.3