From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>,
Victor Kaplansky <vkaplans@redhat.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"yliu@fridaylinux.org" <yliu@fridaylinux.org>,
"Bie, Tiwei" <tiwei.bie@intel.com>
Cc: "stable@dpdk.org" <stable@dpdk.org>,
"jfreiman@redhat.com" <jfreiman@redhat.com>
Subject: Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes
Date: Fri, 8 Dec 2017 11:11:40 +0100 [thread overview]
Message-ID: <ea080aa6-8f0b-4788-bc71-3fdf5b2d636e@redhat.com> (raw)
In-Reply-To: <ED26CBA2FAD1BF48A8719AEF02201E365137ACAC@SHSMSX103.ccr.corp.intel.com>
Hi Jianfeng,
On 12/08/2017 09:51 AM, Tan, Jianfeng wrote:
>
>
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Friday, December 8, 2017 4:36 PM
>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie,
>> Tiwei
>> Cc: stable@dpdk.org; jfreiman@redhat.com
>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring
>> changes
>>
>>
>>
>> On 12/08/2017 03:14 AM, Tan, Jianfeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Thursday, December 7, 2017 6:02 PM
>>>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org;
>> Bie,
>>>> Tiwei
>>>> Cc: stable@dpdk.org; jfreiman@redhat.com
>>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring
>>>> changes
>>>>
>>>>
>>>>
>>>> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com]
>>>>>> Sent: Wednesday, December 6, 2017 9:56 PM
>>>>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng;
>>>>>> vkaplans@redhat.com
>>>>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin
>>>>>> Subject: [PATCH] vhost_user: protect active rings from async ring
>> changes
>>>>>>
>>>>>> When performing live migration or memory hot-plugging,
>>>>>> the changes to the device and vrings made by message handler
>>>>>> done independently from vring usage by PMD threads.
>>>>>>
>>>>>> This causes for example segfauls during live-migration
>>>>>
>>>>> segfauls ->segfaults?
>>>>>
>>>>>> with MQ enable, but in general virtually any request
>>>>>> sent by qemu changing the state of device can cause
>>>>>> problems.
>>>>>>
>>>>>> These patches fixes all above issues by adding a spinlock
>>>>>> to every vring and requiring message handler to start operation
>>>>>> only after ensuring that all PMD threads related to the divece
>>>>>
>>>>> Another typo: divece.
>>>>>
>>>>>> are out of critical section accessing the vring data.
>>>>>>
>>>>>> Each vring has its own lock in order to not create contention
>>>>>> between PMD threads of different vrings and to prevent
>>>>>> performance degradation by scaling queue pair number.
>>>>>
>>>>> Also wonder how much overhead it brings.
>>>>>
>>>>> Instead of locking each vring, can we just, waiting a while (10us for
>> example)
>>>> after call destroy_device() callback so that every PMD thread has enough
>>>> time to skip out the criterial area?
>>>>
>>>> No, because we are not destroying the device when it is needed.
>>>> Actually, once destroy_device() is called, it is likely that the
>>>> application has taken care the ring aren't being processed anymore
>>>> before returning from the callback (This is at least the case with Vhost
>>>> PMD).
>>>
>>> OK, I did not put it right way as there are multiple cases above: migration
>> and memory hot plug. Let me try again:
>>>
>>> Whenever a vhost thread handles a message affecting PMD threads, (like
>> SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag -
>> VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of
>> those criterial area. After message handling, reset the flag -
>> VIRTIO_DEV_RUNNING.
>>
>> I think you mean clearing vq's enabled flag, because PMD threads never
>> check the VIRTIO_DEV_RUNNING flag.
>
> Ah, yes.
>
>>
>>> I suppose it can work, basing on an assumption that PMD threads work in
>> polling mode and can skip criterial area quickly and inevitably.
>>
>> That sounds very fragile, because if the CPU aren't perfectly isolated,
>> your PMD thread can be preempted for interrupt handling for example.
>>
>> Or what if for some reason the PMD thread CPU stalls for a short while?
>>
>> The later is unlikely, but if it happens, it will be hard to debug.
>>
>> Let's see first the performance impact of using the spinlock. It might
>> not be that important because 99.9999% of the times, it will not even
>> spin.
>
> Fair enough.
I did some benchmarks on my Broadwell test bench (see patch below), and
it seems that depending on the benchmark, perfs are on par, or better
with the spinlock! I guess it explains because with the spinlock there
is a better batching and less concurrent accesses on the rings, but I'm
not sure.
Please find my results below (CPU E5-2667 v4 @ 3.20GHz):
Bench v17.11 v17.11 + spinlock
---------------- ----------- -------------------
PVP Bidir run1 19.29Mpps 19.26Mpps
PVP Bidir run2 19.26Mpps 19.28Mpps
TxOnly 18.47Mpps 18.83Mpps
RxOnly 13.83Mpps 13.83Mpps
IO Loopback 7.94Mpps 7.97Mpps
Patch:
---------------------------------------------------------
From 7e18cefce682235558fed66a1fb87ab937ec9297 Mon Sep 17 00:00:00 2001
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Date: Fri, 8 Dec 2017 04:21:25 -0500
Subject: [PATCH] vhost: spinlock benchmarking
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
lib/librte_vhost/vhost.c | 2 ++
lib/librte_vhost/vhost.h | 2 ++
lib/librte_vhost/virtio_net.c | 12 ++++++++++++
3 files changed, 16 insertions(+)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 4f8b73a..d9752cf 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -219,6 +219,8 @@ struct virtio_net *
vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
+ rte_spinlock_init(&vq->access_lock);
+
vhost_user_iotlb_init(dev, vring_idx);
/* Backends are set to -1 indicating an inactive device. */
vq->backend = -1;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 1cc81c1..56242a8 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -131,6 +131,8 @@ struct vhost_virtqueue {
struct batch_copy_elem *batch_copy_elems;
uint16_t batch_copy_nb_elems;
+ rte_spinlock_t access_lock;
+
rte_rwlock_t iotlb_lock;
rte_rwlock_t iotlb_pending_lock;
struct rte_mempool *iotlb_pool;
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 6fee16e..50cc3de 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -329,6 +329,8 @@
if (unlikely(vq->enabled == 0))
return 0;
+ rte_spinlock_lock(&vq->access_lock);
+
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_lock(vq);
@@ -419,6 +421,8 @@
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
+ rte_spinlock_unlock(&vq->access_lock);
+
return count;
}
@@ -654,6 +658,8 @@
if (unlikely(vq->enabled == 0))
return 0;
+ rte_spinlock_lock(&vq->access_lock);
+
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_lock(vq);
@@ -715,6 +721,8 @@
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
+ rte_spinlock_unlock(&vq->access_lock);
+
return pkt_idx;
}
@@ -1183,6 +1191,8 @@
if (unlikely(vq->enabled == 0))
return 0;
+ rte_spinlock_lock(&vq->access_lock);
+
vq->batch_copy_nb_elems = 0;
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
@@ -1356,6 +1366,8 @@
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
+ rte_spinlock_unlock(&vq->access_lock);
+
if (unlikely(rarp_mbuf != NULL)) {
/*
* Inject it to the head of "pkts" array, so that switch's mac
--
1.8.3.1
next prev parent reply other threads:[~2017-12-08 10:11 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-06 13:55 Victor Kaplansky
2017-12-06 14:11 ` Yuanhan Liu
2017-12-07 9:33 ` Tan, Jianfeng
2017-12-07 10:02 ` Maxime Coquelin
2017-12-08 2:14 ` Tan, Jianfeng
2017-12-08 8:35 ` Maxime Coquelin
2017-12-08 8:51 ` Tan, Jianfeng
2017-12-08 10:11 ` Maxime Coquelin [this message]
2017-12-12 5:25 ` Tan, Jianfeng
2017-12-12 8:41 ` Maxime Coquelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea080aa6-8f0b-4788-bc71-3fdf5b2d636e@redhat.com \
--to=maxime.coquelin@redhat.com \
--cc=dev@dpdk.org \
--cc=jfreiman@redhat.com \
--cc=jianfeng.tan@intel.com \
--cc=stable@dpdk.org \
--cc=tiwei.bie@intel.com \
--cc=vkaplans@redhat.com \
--cc=yliu@fridaylinux.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).