DPDK patches and discussions
 help / color / mirror / Atom feed
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>,
	Victor Kaplansky <vkaplans@redhat.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"yliu@fridaylinux.org" <yliu@fridaylinux.org>,
	"Bie, Tiwei" <tiwei.bie@intel.com>
Cc: "stable@dpdk.org" <stable@dpdk.org>,
	"jfreiman@redhat.com" <jfreiman@redhat.com>
Subject: Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes
Date: Fri, 8 Dec 2017 11:11:40 +0100	[thread overview]
Message-ID: <ea080aa6-8f0b-4788-bc71-3fdf5b2d636e@redhat.com> (raw)
In-Reply-To: <ED26CBA2FAD1BF48A8719AEF02201E365137ACAC@SHSMSX103.ccr.corp.intel.com>

Hi Jianfeng,

On 12/08/2017 09:51 AM, Tan, Jianfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Friday, December 8, 2017 4:36 PM
>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie,
>> Tiwei
>> Cc: stable@dpdk.org; jfreiman@redhat.com
>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring
>> changes
>>
>>
>>
>> On 12/08/2017 03:14 AM, Tan, Jianfeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Thursday, December 7, 2017 6:02 PM
>>>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org;
>> Bie,
>>>> Tiwei
>>>> Cc: stable@dpdk.org; jfreiman@redhat.com
>>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring
>>>> changes
>>>>
>>>>
>>>>
>>>> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com]
>>>>>> Sent: Wednesday, December 6, 2017 9:56 PM
>>>>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng;
>>>>>> vkaplans@redhat.com
>>>>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin
>>>>>> Subject: [PATCH] vhost_user: protect active rings from async ring
>> changes
>>>>>>
>>>>>> When performing live migration or memory hot-plugging,
>>>>>> the changes to the device and vrings made by message handler
>>>>>> done independently from vring usage by PMD threads.
>>>>>>
>>>>>> This causes for example segfauls during live-migration
>>>>>
>>>>> segfauls ->segfaults?
>>>>>
>>>>>> with MQ enable, but in general virtually any request
>>>>>> sent by qemu changing the state of device can cause
>>>>>> problems.
>>>>>>
>>>>>> These patches fixes all above issues by adding a spinlock
>>>>>> to every vring and requiring message handler to start operation
>>>>>> only after ensuring that all PMD threads related to the divece
>>>>>
>>>>> Another typo: divece.
>>>>>
>>>>>> are out of critical section accessing the vring data.
>>>>>>
>>>>>> Each vring has its own lock in order to not create contention
>>>>>> between PMD threads of different vrings and to prevent
>>>>>> performance degradation by scaling queue pair number.
>>>>>
>>>>> Also wonder how much overhead it brings.
>>>>>
>>>>> Instead of locking each vring, can we just, waiting a while (10us for
>> example)
>>>> after call destroy_device() callback so that every PMD thread has enough
>>>> time to skip out the criterial area?
>>>>
>>>> No, because we are not destroying the device when it is needed.
>>>> Actually, once destroy_device() is called, it is likely that the
>>>> application has taken care the ring aren't being processed anymore
>>>> before returning from the callback (This is at least the case with Vhost
>>>> PMD).
>>>
>>> OK, I did not put it right way as there are multiple cases above: migration
>> and memory hot plug. Let me try again:
>>>
>>> Whenever a vhost thread handles a message affecting PMD threads, (like
>> SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag -
>> VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of
>> those criterial area. After message handling, reset the flag -
>> VIRTIO_DEV_RUNNING.
>>
>> I think you mean clearing vq's enabled flag, because PMD threads never
>> check the VIRTIO_DEV_RUNNING flag.
> 
> Ah, yes.
> 
>>
>>> I suppose it can work, basing on an assumption that PMD threads work in
>> polling mode and can skip criterial area quickly and inevitably.
>>
>> That sounds very fragile, because if the CPU aren't perfectly isolated,
>> your PMD thread can be preempted for interrupt handling for example.
>>
>> Or what if for some reason the PMD thread CPU stalls for a short while?
>>
>> The later is unlikely, but if it happens, it will be hard to debug.
>>
>> Let's see first the performance impact of using the spinlock. It might
>> not be that important because 99.9999% of the times, it will not even
>> spin.
> 
> Fair enough.

I did some benchmarks on my Broadwell test bench (see patch below), and
it seems that depending on the benchmark, perfs are on par, or better
with the spinlock! I guess it explains because with the spinlock there
is a better batching and less concurrent accesses on the rings, but I'm
not sure.

Please find my results below (CPU E5-2667 v4 @ 3.20GHz):

       Bench         v17.11     v17.11 + spinlock
  ---------------- ----------- -------------------
   PVP Bidir run1   19.29Mpps   19.26Mpps
   PVP Bidir run2   19.26Mpps   19.28Mpps
   TxOnly           18.47Mpps   18.83Mpps
   RxOnly           13.83Mpps   13.83Mpps
   IO Loopback      7.94Mpps    7.97Mpps



Patch:
---------------------------------------------------------

 From 7e18cefce682235558fed66a1fb87ab937ec9297 Mon Sep 17 00:00:00 2001
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Date: Fri, 8 Dec 2017 04:21:25 -0500
Subject: [PATCH] vhost: spinlock benchmarking

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
  lib/librte_vhost/vhost.c      |  2 ++
  lib/librte_vhost/vhost.h      |  2 ++
  lib/librte_vhost/virtio_net.c | 12 ++++++++++++
  3 files changed, 16 insertions(+)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 4f8b73a..d9752cf 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -219,6 +219,8 @@ struct virtio_net *
  	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
  	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;

+	rte_spinlock_init(&vq->access_lock);
+
  	vhost_user_iotlb_init(dev, vring_idx);
  	/* Backends are set to -1 indicating an inactive device. */
  	vq->backend = -1;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 1cc81c1..56242a8 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -131,6 +131,8 @@ struct vhost_virtqueue {
  	struct batch_copy_elem	*batch_copy_elems;
  	uint16_t		batch_copy_nb_elems;

+	rte_spinlock_t access_lock;
+
  	rte_rwlock_t	iotlb_lock;
  	rte_rwlock_t	iotlb_pending_lock;
  	struct rte_mempool *iotlb_pool;
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 6fee16e..50cc3de 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -329,6 +329,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_lock(vq);

@@ -419,6 +421,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	return count;
  }

@@ -654,6 +658,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_lock(vq);

@@ -715,6 +721,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	return pkt_idx;
  }

@@ -1183,6 +1191,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	vq->batch_copy_nb_elems = 0;

  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
@@ -1356,6 +1366,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	if (unlikely(rarp_mbuf != NULL)) {
  		/*
  		 * Inject it to the head of "pkts" array, so that switch's mac
-- 
1.8.3.1

  reply	other threads:[~2017-12-08 10:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-06 13:55 Victor Kaplansky
2017-12-06 14:11 ` Yuanhan Liu
2017-12-07  9:33 ` Tan, Jianfeng
2017-12-07 10:02   ` Maxime Coquelin
2017-12-08  2:14     ` Tan, Jianfeng
2017-12-08  8:35       ` Maxime Coquelin
2017-12-08  8:51         ` Tan, Jianfeng
2017-12-08 10:11           ` Maxime Coquelin [this message]
2017-12-12  5:25             ` Tan, Jianfeng
2017-12-12  8:41               ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea080aa6-8f0b-4788-bc71-3fdf5b2d636e@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=dev@dpdk.org \
    --cc=jfreiman@redhat.com \
    --cc=jianfeng.tan@intel.com \
    --cc=stable@dpdk.org \
    --cc=tiwei.bie@intel.com \
    --cc=vkaplans@redhat.com \
    --cc=yliu@fridaylinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).