From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id D4A231C00; Fri, 8 Dec 2017 11:11:45 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0CEEBC04575A; Fri, 8 Dec 2017 10:11:45 +0000 (UTC) Received: from [10.36.112.38] (ovpn-112-38.ams2.redhat.com [10.36.112.38]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9695F183B2; Fri, 8 Dec 2017 10:11:42 +0000 (UTC) To: "Tan, Jianfeng" , Victor Kaplansky , "dev@dpdk.org" , "yliu@fridaylinux.org" , "Bie, Tiwei" Cc: "stable@dpdk.org" , "jfreiman@redhat.com" References: <20171206135329.652-1-vkaplans@redhat.com> <00a522dc-2287-b3ed-7ca4-e666f6217c91@redhat.com> <7352a09d-f90d-5b3a-51f9-bfc82faf744e@redhat.com> From: Maxime Coquelin Message-ID: Date: Fri, 8 Dec 2017 11:11:40 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Fri, 08 Dec 2017 10:11:45 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Dec 2017 10:11:46 -0000 Hi Jianfeng, On 12/08/2017 09:51 AM, Tan, Jianfeng wrote: > > >> -----Original Message----- >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >> Sent: Friday, December 8, 2017 4:36 PM >> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie, >> Tiwei >> Cc: stable@dpdk.org; jfreiman@redhat.com >> Subject: Re: [PATCH] vhost_user: protect active rings from async ring >> changes >> >> >> >> On 12/08/2017 03:14 AM, Tan, Jianfeng wrote: >>> >>> >>>> -----Original Message----- >>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >>>> Sent: Thursday, December 7, 2017 6:02 PM >>>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; >> Bie, >>>> Tiwei >>>> Cc: stable@dpdk.org; jfreiman@redhat.com >>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring >>>> changes >>>> >>>> >>>> >>>> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com] >>>>>> Sent: Wednesday, December 6, 2017 9:56 PM >>>>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; >>>>>> vkaplans@redhat.com >>>>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin >>>>>> Subject: [PATCH] vhost_user: protect active rings from async ring >> changes >>>>>> >>>>>> When performing live migration or memory hot-plugging, >>>>>> the changes to the device and vrings made by message handler >>>>>> done independently from vring usage by PMD threads. >>>>>> >>>>>> This causes for example segfauls during live-migration >>>>> >>>>> segfauls ->segfaults? >>>>> >>>>>> with MQ enable, but in general virtually any request >>>>>> sent by qemu changing the state of device can cause >>>>>> problems. >>>>>> >>>>>> These patches fixes all above issues by adding a spinlock >>>>>> to every vring and requiring message handler to start operation >>>>>> only after ensuring that all PMD threads related to the divece >>>>> >>>>> Another typo: divece. >>>>> >>>>>> are out of critical section accessing the vring data. >>>>>> >>>>>> Each vring has its own lock in order to not create contention >>>>>> between PMD threads of different vrings and to prevent >>>>>> performance degradation by scaling queue pair number. >>>>> >>>>> Also wonder how much overhead it brings. >>>>> >>>>> Instead of locking each vring, can we just, waiting a while (10us for >> example) >>>> after call destroy_device() callback so that every PMD thread has enough >>>> time to skip out the criterial area? >>>> >>>> No, because we are not destroying the device when it is needed. >>>> Actually, once destroy_device() is called, it is likely that the >>>> application has taken care the ring aren't being processed anymore >>>> before returning from the callback (This is at least the case with Vhost >>>> PMD). >>> >>> OK, I did not put it right way as there are multiple cases above: migration >> and memory hot plug. Let me try again: >>> >>> Whenever a vhost thread handles a message affecting PMD threads, (like >> SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag - >> VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of >> those criterial area. After message handling, reset the flag - >> VIRTIO_DEV_RUNNING. >> >> I think you mean clearing vq's enabled flag, because PMD threads never >> check the VIRTIO_DEV_RUNNING flag. > > Ah, yes. > >> >>> I suppose it can work, basing on an assumption that PMD threads work in >> polling mode and can skip criterial area quickly and inevitably. >> >> That sounds very fragile, because if the CPU aren't perfectly isolated, >> your PMD thread can be preempted for interrupt handling for example. >> >> Or what if for some reason the PMD thread CPU stalls for a short while? >> >> The later is unlikely, but if it happens, it will be hard to debug. >> >> Let's see first the performance impact of using the spinlock. It might >> not be that important because 99.9999% of the times, it will not even >> spin. > > Fair enough. I did some benchmarks on my Broadwell test bench (see patch below), and it seems that depending on the benchmark, perfs are on par, or better with the spinlock! I guess it explains because with the spinlock there is a better batching and less concurrent accesses on the rings, but I'm not sure. Please find my results below (CPU E5-2667 v4 @ 3.20GHz): Bench v17.11 v17.11 + spinlock ---------------- ----------- ------------------- PVP Bidir run1 19.29Mpps 19.26Mpps PVP Bidir run2 19.26Mpps 19.28Mpps TxOnly 18.47Mpps 18.83Mpps RxOnly 13.83Mpps 13.83Mpps IO Loopback 7.94Mpps 7.97Mpps Patch: --------------------------------------------------------- From 7e18cefce682235558fed66a1fb87ab937ec9297 Mon Sep 17 00:00:00 2001 From: Maxime Coquelin Date: Fri, 8 Dec 2017 04:21:25 -0500 Subject: [PATCH] vhost: spinlock benchmarking Signed-off-by: Maxime Coquelin --- lib/librte_vhost/vhost.c | 2 ++ lib/librte_vhost/vhost.h | 2 ++ lib/librte_vhost/virtio_net.c | 12 ++++++++++++ 3 files changed, 16 insertions(+) diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 4f8b73a..d9752cf 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -219,6 +219,8 @@ struct virtio_net * vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD; + rte_spinlock_init(&vq->access_lock); + vhost_user_iotlb_init(dev, vring_idx); /* Backends are set to -1 indicating an inactive device. */ vq->backend = -1; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 1cc81c1..56242a8 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -131,6 +131,8 @@ struct vhost_virtqueue { struct batch_copy_elem *batch_copy_elems; uint16_t batch_copy_nb_elems; + rte_spinlock_t access_lock; + rte_rwlock_t iotlb_lock; rte_rwlock_t iotlb_pending_lock; struct rte_mempool *iotlb_pool; diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 6fee16e..50cc3de 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -329,6 +329,8 @@ if (unlikely(vq->enabled == 0)) return 0; + rte_spinlock_lock(&vq->access_lock); + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_lock(vq); @@ -419,6 +421,8 @@ if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_unlock(vq); + rte_spinlock_unlock(&vq->access_lock); + return count; } @@ -654,6 +658,8 @@ if (unlikely(vq->enabled == 0)) return 0; + rte_spinlock_lock(&vq->access_lock); + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_lock(vq); @@ -715,6 +721,8 @@ if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_unlock(vq); + rte_spinlock_unlock(&vq->access_lock); + return pkt_idx; } @@ -1183,6 +1191,8 @@ if (unlikely(vq->enabled == 0)) return 0; + rte_spinlock_lock(&vq->access_lock); + vq->batch_copy_nb_elems = 0; if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) @@ -1356,6 +1366,8 @@ if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_unlock(vq); + rte_spinlock_unlock(&vq->access_lock); + if (unlikely(rarp_mbuf != NULL)) { /* * Inject it to the head of "pkts" array, so that switch's mac -- 1.8.3.1