From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <maxime.coquelin@redhat.com>
Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73])
 by dpdk.org (Postfix) with ESMTP id 7F2FC4CA1
 for <stable@dpdk.org>; Fri,  2 Mar 2018 18:11:06 +0100 (CET)
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com
 [10.11.54.5])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id C373F80AD238;
 Fri,  2 Mar 2018 17:11:04 +0000 (UTC)
Received: from localhost.localdomain (ovpn-112-17.ams2.redhat.com
 [10.36.112.17])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 5DC891C72E;
 Fri,  2 Mar 2018 17:10:59 +0000 (UTC)
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: stable@dpdk.org, bluca@debian.org, ktraynor@redhat.com, victork@redhat.com
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Date: Fri,  2 Mar 2018 18:10:42 +0100
Message-Id: <20180302171042.26094-1-maxime.coquelin@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.11.55.8]); Fri, 02 Mar 2018 17:11:04 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]);
 Fri, 02 Mar 2018 17:11:04 +0000 (UTC) for IP:'10.11.54.5'
 DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com'
 HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:''
Subject: [dpdk-stable] [PATCH v16.11 LTS] vhost: protect active rings from
	async ring changes
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Mar 2018 17:11:06 -0000

From: Victor Kaplansky <victork@redhat.com>

[ backported from upstream commit a3688046995f88c518fa27c45b39ae389260b18d ]

When performing live migration or memory hot-plugging,
the changes to the device and vrings made by message handler
done independently from vring usage by PMD threads.

This causes for example segfaults during live-migration
with MQ enable, but in general virtually any request
sent by qemu changing the state of device can cause
problems.

These patches fixes all above issues by adding a spinlock
to every vring and requiring message handler to start operation
only after ensuring that all PMD threads related to the device
are out of critical section accessing the vring data.

Each vring has its own lock in order to not create contention
between PMD threads of different vrings and to prevent
performance degradation by scaling queue pair number.

See https://bugzilla.redhat.com/show_bug.cgi?id=1450680

Cc: stable@dpdk.org
Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Yuanhan Liu <yliu@fridaylinux.org>

Backport conflicts:
	lib/librte_vhost/vhost.c
	lib/librte_vhost/vhost.h
	lib/librte_vhost/vhost_user.c
	lib/librte_vhost/virtio_net.c

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---

Hi Luca, All,

This is the v16.11 backport for Victor's patch already available in
master and v17.11 LTS. It needed some rework to be applied to v16.11.

Please find below the steps for testing the issue it fixes.
Without the patch it would segfault, and with it, the live migration
with multiple queues works:

• Start src's testpmd
./install/bin/testpmd -m 512 --file-prefix=src -l 0,2 -n 4 --vdev 'net_vhost0,iface=/tmp/vu-src,queues=2' -- --portmask=1 --disable-hw-vlan -i --rxq=2 --txq=2 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12


• Start dst's testpmd
./install/bin/testpmd -m 512 --file-prefix=dst -l 0,2 -n 4 --vdev 'net_vhost0,iface=/tmp/vu-dst,queues=2' -- --portmask=1 --disable-hw-vlan -i --rxq=2 --txq=2 --nb-cores=1 --eth-peer=0,52:54:00:11:22:12


• Start src's QEMU
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu Haswell -m 3G -smp 2 -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tmp/vu-src -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce,queues=2 -device virtio-net-pci,netdev=mynet1,mq=on,vectors=6  /home/virt/guest2.qcow2 -net none -vnc :0 -monitor stdio -k fr


• Start dst's QEMU
./x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu Haswell -m 3G -smp 2 -object memory-backend-file,id=mem,size=3G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -chardev socket,id=char0,path=/tmp/vu-dst -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce,queues=2 -device virtio-net-pci,netdev=mynet1,mq=on,vectors=6  /home/virt/guest2.qcow2 -net none -vnc :1 -monitor stdio -incoming tcp::8888

• Open VNC session for both src and dst
 vncviewer localhost:0
 vncviewer localhost:1

• In guest, build DPDK and run it
echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
modprobe vfio enable_unsafe_noiommu_mode
cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode
modprobe vfio-pci
cd src
git clone git://dpdk.org/dpdk
cd dpdk
make install T=x86_64-native-linuxapp-gcc DESTDIR=install
./install/sbin/dpdk-devbind -b vfio-pci 0000:00:03.0
./install/bin/testpmd --socket-mem=512 -- --txq=2 --rxq=2 -i  --auto-start


• In both host testpmd prompts, start IO loopback
set fwd io
start tx_first 64

• At that point, you should see traffic on src
show port info 0
show port info 0

• In src QEMU monitor, start migration
migrate -d tcp:0:8888

• Monitor migration progress
info migrate

• Once done, dst VNC session should be up, and traffic should go through host's testpmd dst

---

 lib/librte_vhost/vhost.c      |  2 ++
 lib/librte_vhost/vhost.h      |  2 ++
 lib/librte_vhost/vhost_user.c | 69 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/virtio_net.c | 36 +++++++++++++++++-----
 4 files changed, 101 insertions(+), 8 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 3c3f6a427..36fdfb555 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -202,6 +202,8 @@ alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
 	dev->virtqueue[virt_tx_q_idx] = virtqueue + VIRTIO_TXQ;
 
 	init_vring_queue_pair(dev, qp_idx);
+	rte_spinlock_init(&dev->virtqueue[virt_rx_q_idx]->access_lock);
+	rte_spinlock_init(&dev->virtqueue[virt_tx_q_idx]->access_lock);
 
 	dev->virt_qp_nb += 1;
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index d97df1d83..9f60ff81a 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -91,6 +91,8 @@ struct vhost_virtqueue {
 
 	/* Backend value to determine if device should started/stopped */
 	int			backend;
+	rte_spinlock_t		access_lock;
+
 	/* Used to notify the guest (trigger interrupt) */
 	int			callfd;
 	/* Currently unused as polling mode is enabled */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 5a67eae08..364540de8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -952,12 +952,47 @@ send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 	return ret;
 }
 
+static void
+vhost_user_lock_all_queue_pairs(struct virtio_net *dev)
+{
+	unsigned int i = 0;
+	unsigned int vq_num = 0;
+
+	while (vq_num < dev->virt_qp_nb * 2) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+		if (vq) {
+			rte_spinlock_lock(&vq->access_lock);
+			vq_num++;
+		}
+		i++;
+	}
+}
+
+static void
+vhost_user_unlock_all_queue_pairs(struct virtio_net *dev)
+{
+	unsigned int i = 0;
+	unsigned int vq_num = 0;
+
+	while (vq_num < dev->virt_qp_nb * 2) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+		if (vq) {
+			rte_spinlock_unlock(&vq->access_lock);
+			vq_num++;
+		}
+		i++;
+	}
+}
+
 int
 vhost_user_msg_handler(int vid, int fd)
 {
 	struct virtio_net *dev;
 	struct VhostUserMsg msg;
 	int ret;
+	int unlock_required = 0;
 
 	dev = get_device(vid);
 	if (dev == NULL)
@@ -980,6 +1015,37 @@ vhost_user_msg_handler(int vid, int fd)
 
 	RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
 		vhost_message_str[msg.request]);
+
+	/*
+	 * Note: we don't lock all queues on VHOST_USER_GET_VRING_BASE,
+	 * since it is sent when virtio stops and device is destroyed.
+	 * destroy_device waits for queues to be inactive, so it is safe.
+	 * Otherwise taking the access_lock would cause a dead lock.
+	 */
+	switch (msg.request) {
+	case VHOST_USER_SET_FEATURES:
+	case VHOST_USER_SET_PROTOCOL_FEATURES:
+	case VHOST_USER_SET_OWNER:
+	case VHOST_USER_RESET_OWNER:
+	case VHOST_USER_SET_MEM_TABLE:
+	case VHOST_USER_SET_LOG_BASE:
+	case VHOST_USER_SET_LOG_FD:
+	case VHOST_USER_SET_VRING_NUM:
+	case VHOST_USER_SET_VRING_ADDR:
+	case VHOST_USER_SET_VRING_BASE:
+	case VHOST_USER_SET_VRING_KICK:
+	case VHOST_USER_SET_VRING_CALL:
+	case VHOST_USER_SET_VRING_ERR:
+	case VHOST_USER_SET_VRING_ENABLE:
+	case VHOST_USER_SEND_RARP:
+		vhost_user_lock_all_queue_pairs(dev);
+		unlock_required = 1;
+		break;
+	default:
+		break;
+
+	}
+
 	switch (msg.request) {
 	case VHOST_USER_GET_FEATURES:
 		msg.payload.u64 = vhost_user_get_features();
@@ -1069,5 +1135,8 @@ vhost_user_msg_handler(int vid, int fd)
 
 	}
 
+	if (unlock_required)
+		vhost_user_unlock_all_queue_pairs(dev);
+
 	return 0;
 }
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index db194507e..0024f729e 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -44,6 +44,7 @@
 #include <rte_udp.h>
 #include <rte_sctp.h>
 #include <rte_arp.h>
+#include <rte_spinlock.h>
 
 #include "vhost.h"
 
@@ -313,8 +314,11 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
+
+	rte_spinlock_lock(&vq->access_lock);
+
 	if (unlikely(vq->enabled == 0))
-		return 0;
+		goto out_access_unlock;
 
 	avail_idx = *((volatile uint16_t *)&vq->avail->idx);
 	start_idx = vq->last_used_idx;
@@ -322,7 +326,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	count = RTE_MIN(count, free_entries);
 	count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST);
 	if (count == 0)
-		return 0;
+		goto out_access_unlock;
 
 	LOG_DEBUG(VHOST_DATA, "(%d) start_idx %d | end_idx %d\n",
 		dev->vid, start_idx, start_idx + count);
@@ -388,6 +392,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
 			&& (vq->callfd >= 0))
 		eventfd_write(vq->callfd, (eventfd_t)1);
+
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
 	return count;
 }
 
@@ -582,12 +590,15 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
+
+	rte_spinlock_lock(&vq->access_lock);
+
 	if (unlikely(vq->enabled == 0))
-		return 0;
+		goto out_access_unlock;
 
 	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
 	if (count == 0)
-		return 0;
+		goto out_access_unlock;
 
 	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
 
@@ -631,6 +642,9 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 			eventfd_write(vq->callfd, (eventfd_t)1);
 	}
 
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
 	return pkt_idx;
 }
 
@@ -1068,9 +1082,13 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	}
 
 	vq = dev->virtqueue[queue_id];
-	if (unlikely(vq->enabled == 0))
+
+	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
 		return 0;
 
+	if (unlikely(vq->enabled == 0))
+		goto out_access_unlock;
+
 	if (unlikely(dev->dequeue_zero_copy)) {
 		struct zcopy_mbuf *zmbuf, *next;
 		int nr_updated = 0;
@@ -1120,7 +1138,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 		if (rarp_mbuf == NULL) {
 			RTE_LOG(ERR, VHOST_DATA,
 				"Failed to allocate memory for mbuf.\n");
-			return 0;
+			goto out_access_unlock;
 		}
 
 		if (make_rarp_packet(rarp_mbuf, &dev->mac)) {
@@ -1134,7 +1152,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	free_entries = *((volatile uint16_t *)&vq->avail->idx) -
 			vq->last_avail_idx;
 	if (free_entries == 0)
-		goto out;
+		goto out_access_unlock;
 
 	LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
 
@@ -1227,7 +1245,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 		update_used_idx(dev, vq, i);
 	}
 
-out:
+out_access_unlock:
+	rte_spinlock_unlock(&vq->access_lock);
+
 	if (unlikely(rarp_mbuf != NULL)) {
 		/*
 		 * Inject it to the head of "pkts" array, so that switch's mac
-- 
2.14.3