* [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes @ 2017-12-06 13:55 Victor Kaplansky 2017-12-06 14:11 ` Yuanhan Liu 2017-12-07 9:33 ` Tan, Jianfeng 0 siblings, 2 replies; 10+ messages in thread From: Victor Kaplansky @ 2017-12-06 13:55 UTC (permalink / raw) To: dev, yliu, tiwei.bie, jianfeng.tan, vkaplans Cc: stable, jfreiman, Maxime Coquelin When performing live migration or memory hot-plugging, the changes to the device and vrings made by message handler done independently from vring usage by PMD threads. This causes for example segfauls during live-migration with MQ enable, but in general virtually any request sent by qemu changing the state of device can cause problems. These patches fixes all above issues by adding a spinlock to every vring and requiring message handler to start operation only after ensuring that all PMD threads related to the divece are out of critical section accessing the vring data. Each vring has its own lock in order to not create contention between PMD threads of different vrings and to prevent performance degradation by scaling queue pair number. --- lib/librte_vhost/vhost.h | 1 + lib/librte_vhost/vhost_user.c | 44 +++++++++++++++++++++++++++++++++++++++++++ lib/librte_vhost/virtio_net.c | 8 ++++++++ 3 files changed, 53 insertions(+) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 1cc81c17..812aeccd 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -137,6 +137,7 @@ struct vhost_virtqueue { TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list; int iotlb_cache_nr; TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list; + rte_spinlock_t active_lock; } __rte_cache_aligned; /* Old kernels have no such macros defined */ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index f4c7ce46..02a3a7d3 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1190,6 +1190,46 @@ vhost_user_check_and_alloc_queue_pair(struct virtio_net *dev, VhostUserMsg *msg) return alloc_vring_queue(dev, vring_idx); } +static void +vhost_user_lock_all_queue_pairs(struct virtio_net *dev) +{ + unsigned int i = 0; + unsigned int vq_num = 0; + + while (vq_num < dev->nr_vring) { + struct vhost_virtqueue *vq = dev->virtqueue[i]; + + if (!vq) { + i++; + continue; + } + + rte_spinlock_lock(&vq->active_lock); + vq_num++; + i++; + } +} + +static void +vhost_user_unlock_all_queue_pairs(struct virtio_net *dev) +{ + unsigned int i = 0; + unsigned int vq_num = 0; + + while (vq_num < dev->nr_vring) { + struct vhost_virtqueue *vq = dev->virtqueue[i]; + + if (!vq) { + i++; + continue; + } + + rte_spinlock_unlock(&vq->active_lock); + vq_num++; + i++; + } +} + int vhost_user_msg_handler(int vid, int fd) { @@ -1241,6 +1281,8 @@ vhost_user_msg_handler(int vid, int fd) return -1; } + vhost_user_lock_all_queue_pairs(dev); + switch (msg.request.master) { case VHOST_USER_GET_FEATURES: msg.payload.u64 = vhost_user_get_features(dev); @@ -1342,6 +1384,8 @@ vhost_user_msg_handler(int vid, int fd) } + vhost_user_unlock_all_queue_pairs(dev); + if (msg.flags & VHOST_USER_NEED_REPLY) { msg.payload.u64 = !!ret; msg.size = sizeof(msg.payload.u64); diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 6fee16e5..de7e38bb 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -44,6 +44,7 @@ #include <rte_udp.h> #include <rte_sctp.h> #include <rte_arp.h> +#include <rte_spinlock.h> #include "iotlb.h" #include "vhost.h" @@ -1180,9 +1181,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, } vq = dev->virtqueue[queue_id]; + if (unlikely(vq->enabled == 0)) return 0; + rte_spinlock_lock(&vq->active_lock); + vq->batch_copy_nb_elems = 0; if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) @@ -1240,6 +1244,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, if (rarp_mbuf == NULL) { RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); + rte_spinlock_unlock(&vq->active_lock); return 0; } @@ -1356,6 +1361,8 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_unlock(vq); + rte_spinlock_unlock(&vq->active_lock); + if (unlikely(rarp_mbuf != NULL)) { /* * Inject it to the head of "pkts" array, so that switch's mac @@ -1366,5 +1373,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, i += 1; } + return i; } -- Victor ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-06 13:55 [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes Victor Kaplansky @ 2017-12-06 14:11 ` Yuanhan Liu 2017-12-07 9:33 ` Tan, Jianfeng 1 sibling, 0 replies; 10+ messages in thread From: Yuanhan Liu @ 2017-12-06 14:11 UTC (permalink / raw) To: Victor Kaplansky Cc: dev, tiwei.bie, jianfeng.tan, stable, jfreiman, Maxime Coquelin On Wed, Dec 06, 2017 at 03:55:49PM +0200, Victor Kaplansky wrote: > When performing live migration or memory hot-plugging, > the changes to the device and vrings made by message handler > done independently from vring usage by PMD threads. > > This causes for example segfauls during live-migration > with MQ enable, but in general virtually any request > sent by qemu changing the state of device can cause > problems. > > These patches fixes all above issues by adding a spinlock > to every vring and requiring message handler to start operation > only after ensuring that all PMD threads related to the divece > are out of critical section accessing the vring data. > > Each vring has its own lock in order to not create contention > between PMD threads of different vrings and to prevent > performance degradation by scaling queue pair number. Hi, Thanks for the patch! Firstly, I didn't see your SoB. I'm also more interested to know do you see any performance penalty? > --- > lib/librte_vhost/vhost.h | 1 + > lib/librte_vhost/vhost_user.c | 44 +++++++++++++++++++++++++++++++++++++++++++ > lib/librte_vhost/virtio_net.c | 8 ++++++++ > 3 files changed, 53 insertions(+) > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 1cc81c17..812aeccd 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -137,6 +137,7 @@ struct vhost_virtqueue { > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list; > int iotlb_cache_nr; > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list; > + rte_spinlock_t active_lock; The indentation is broken. > } __rte_cache_aligned; > @@ -1356,6 +1361,8 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > vhost_user_iotlb_rd_unlock(vq); > > + rte_spinlock_unlock(&vq->active_lock); Ditto. --yliu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-06 13:55 [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes Victor Kaplansky 2017-12-06 14:11 ` Yuanhan Liu @ 2017-12-07 9:33 ` Tan, Jianfeng 2017-12-07 10:02 ` Maxime Coquelin 1 sibling, 1 reply; 10+ messages in thread From: Tan, Jianfeng @ 2017-12-07 9:33 UTC (permalink / raw) To: Victor Kaplansky, dev, yliu, Bie, Tiwei; +Cc: stable, jfreiman, Maxime Coquelin > -----Original Message----- > From: Victor Kaplansky [mailto:vkaplans@redhat.com] > Sent: Wednesday, December 6, 2017 9:56 PM > To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; > vkaplans@redhat.com > Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin > Subject: [PATCH] vhost_user: protect active rings from async ring changes > > When performing live migration or memory hot-plugging, > the changes to the device and vrings made by message handler > done independently from vring usage by PMD threads. > > This causes for example segfauls during live-migration segfauls ->segfaults? > with MQ enable, but in general virtually any request > sent by qemu changing the state of device can cause > problems. > > These patches fixes all above issues by adding a spinlock > to every vring and requiring message handler to start operation > only after ensuring that all PMD threads related to the divece Another typo: divece. > are out of critical section accessing the vring data. > > Each vring has its own lock in order to not create contention > between PMD threads of different vrings and to prevent > performance degradation by scaling queue pair number. Also wonder how much overhead it brings. Instead of locking each vring, can we just, waiting a while (10us for example) after call destroy_device() callback so that every PMD thread has enough time to skip out the criterial area? > --- > lib/librte_vhost/vhost.h | 1 + > lib/librte_vhost/vhost_user.c | 44 > +++++++++++++++++++++++++++++++++++++++++++ > lib/librte_vhost/virtio_net.c | 8 ++++++++ > 3 files changed, 53 insertions(+) > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 1cc81c17..812aeccd 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -137,6 +137,7 @@ struct vhost_virtqueue { > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list; > int iotlb_cache_nr; > TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list; > + rte_spinlock_t active_lock; > } __rte_cache_aligned; > > /* Old kernels have no such macros defined */ > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > index f4c7ce46..02a3a7d3 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -1190,6 +1190,46 @@ vhost_user_check_and_alloc_queue_pair(struct > virtio_net *dev, VhostUserMsg *msg) > return alloc_vring_queue(dev, vring_idx); > } > > +static void > +vhost_user_lock_all_queue_pairs(struct virtio_net *dev) > +{ > + unsigned int i = 0; > + unsigned int vq_num = 0; > + > + while (vq_num < dev->nr_vring) { > + struct vhost_virtqueue *vq = dev->virtqueue[i]; > + > + if (!vq) { > + i++; > + continue; > + } > + > + rte_spinlock_lock(&vq->active_lock); > + vq_num++; > + i++; > + } > +} > + > +static void > +vhost_user_unlock_all_queue_pairs(struct virtio_net *dev) > +{ > + unsigned int i = 0; > + unsigned int vq_num = 0; > + > + while (vq_num < dev->nr_vring) { > + struct vhost_virtqueue *vq = dev->virtqueue[i]; > + > + if (!vq) { > + i++; > + continue; > + } > + > + rte_spinlock_unlock(&vq->active_lock); > + vq_num++; > + i++; > + } > +} > + > int > vhost_user_msg_handler(int vid, int fd) > { > @@ -1241,6 +1281,8 @@ vhost_user_msg_handler(int vid, int fd) > return -1; > } > > + vhost_user_lock_all_queue_pairs(dev); > + > switch (msg.request.master) { > case VHOST_USER_GET_FEATURES: > msg.payload.u64 = vhost_user_get_features(dev); > @@ -1342,6 +1384,8 @@ vhost_user_msg_handler(int vid, int fd) > > } > > + vhost_user_unlock_all_queue_pairs(dev); > + > if (msg.flags & VHOST_USER_NEED_REPLY) { > msg.payload.u64 = !!ret; > msg.size = sizeof(msg.payload.u64); > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > index 6fee16e5..de7e38bb 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -44,6 +44,7 @@ > #include <rte_udp.h> > #include <rte_sctp.h> > #include <rte_arp.h> > +#include <rte_spinlock.h> > > #include "iotlb.h" > #include "vhost.h" > @@ -1180,9 +1181,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > } > > vq = dev->virtqueue[queue_id]; > + Remove above blank line. > if (unlikely(vq->enabled == 0)) > return 0; > > + rte_spinlock_lock(&vq->active_lock); > + > vq->batch_copy_nb_elems = 0; > > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > @@ -1240,6 +1244,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > if (rarp_mbuf == NULL) { > RTE_LOG(ERR, VHOST_DATA, > "Failed to allocate memory for mbuf.\n"); > + rte_spinlock_unlock(&vq->active_lock); > return 0; > } > > @@ -1356,6 +1361,8 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > vhost_user_iotlb_rd_unlock(vq); > > + rte_spinlock_unlock(&vq->active_lock); > + > if (unlikely(rarp_mbuf != NULL)) { > /* > * Inject it to the head of "pkts" array, so that switch's mac > @@ -1366,5 +1373,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > i += 1; > } > > + Remove above blank line. > return i; > } > -- > Victor ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-07 9:33 ` Tan, Jianfeng @ 2017-12-07 10:02 ` Maxime Coquelin 2017-12-08 2:14 ` Tan, Jianfeng 0 siblings, 1 reply; 10+ messages in thread From: Maxime Coquelin @ 2017-12-07 10:02 UTC (permalink / raw) To: Tan, Jianfeng, Victor Kaplansky, dev, yliu, Bie, Tiwei; +Cc: stable, jfreiman On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: > > >> -----Original Message----- >> From: Victor Kaplansky [mailto:vkaplans@redhat.com] >> Sent: Wednesday, December 6, 2017 9:56 PM >> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; >> vkaplans@redhat.com >> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin >> Subject: [PATCH] vhost_user: protect active rings from async ring changes >> >> When performing live migration or memory hot-plugging, >> the changes to the device and vrings made by message handler >> done independently from vring usage by PMD threads. >> >> This causes for example segfauls during live-migration > > segfauls ->segfaults? > >> with MQ enable, but in general virtually any request >> sent by qemu changing the state of device can cause >> problems. >> >> These patches fixes all above issues by adding a spinlock >> to every vring and requiring message handler to start operation >> only after ensuring that all PMD threads related to the divece > > Another typo: divece. > >> are out of critical section accessing the vring data. >> >> Each vring has its own lock in order to not create contention >> between PMD threads of different vrings and to prevent >> performance degradation by scaling queue pair number. > > Also wonder how much overhead it brings. > > Instead of locking each vring, can we just, waiting a while (10us for example) after call destroy_device() callback so that every PMD thread has enough time to skip out the criterial area? No, because we are not destroying the device when it is needed. Actually, once destroy_device() is called, it is likely that the application has taken care the ring aren't being processed anymore before returning from the callback (This is at least the case with Vhost PMD). The reason we need the lock is to protect PMD threads from the handling of some vhost-user protocol requests. For example SET_MEM_TABLE in the case of memory hotplug, or SET_LOG_BASE in case of multiqueue, which is sent for every queue pair and results in unmapping/remapping the logging area. Maxime ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-07 10:02 ` Maxime Coquelin @ 2017-12-08 2:14 ` Tan, Jianfeng 2017-12-08 8:35 ` Maxime Coquelin 0 siblings, 1 reply; 10+ messages in thread From: Tan, Jianfeng @ 2017-12-08 2:14 UTC (permalink / raw) To: Maxime Coquelin, Victor Kaplansky, dev, yliu, Bie, Tiwei; +Cc: stable, jfreiman > -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > Sent: Thursday, December 7, 2017 6:02 PM > To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie, > Tiwei > Cc: stable@dpdk.org; jfreiman@redhat.com > Subject: Re: [PATCH] vhost_user: protect active rings from async ring > changes > > > > On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: > > > > > >> -----Original Message----- > >> From: Victor Kaplansky [mailto:vkaplans@redhat.com] > >> Sent: Wednesday, December 6, 2017 9:56 PM > >> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; > >> vkaplans@redhat.com > >> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin > >> Subject: [PATCH] vhost_user: protect active rings from async ring changes > >> > >> When performing live migration or memory hot-plugging, > >> the changes to the device and vrings made by message handler > >> done independently from vring usage by PMD threads. > >> > >> This causes for example segfauls during live-migration > > > > segfauls ->segfaults? > > > >> with MQ enable, but in general virtually any request > >> sent by qemu changing the state of device can cause > >> problems. > >> > >> These patches fixes all above issues by adding a spinlock > >> to every vring and requiring message handler to start operation > >> only after ensuring that all PMD threads related to the divece > > > > Another typo: divece. > > > >> are out of critical section accessing the vring data. > >> > >> Each vring has its own lock in order to not create contention > >> between PMD threads of different vrings and to prevent > >> performance degradation by scaling queue pair number. > > > > Also wonder how much overhead it brings. > > > > Instead of locking each vring, can we just, waiting a while (10us for example) > after call destroy_device() callback so that every PMD thread has enough > time to skip out the criterial area? > > No, because we are not destroying the device when it is needed. > Actually, once destroy_device() is called, it is likely that the > application has taken care the ring aren't being processed anymore > before returning from the callback (This is at least the case with Vhost > PMD). OK, I did not put it right way as there are multiple cases above: migration and memory hot plug. Let me try again: Whenever a vhost thread handles a message affecting PMD threads, (like SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag - VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of those criterial area. After message handling, reset the flag - VIRTIO_DEV_RUNNING. I suppose it can work, basing on an assumption that PMD threads work in polling mode and can skip criterial area quickly and inevitably. > > The reason we need the lock is to protect PMD threads from the handling > of some vhost-user protocol requests. > For example SET_MEM_TABLE in the case of memory hotplug, or > SET_LOG_BASE > in case of multiqueue, which is sent for every queue pair and results in > unmapping/remapping the logging area. Yes, I understand how it fails. Thanks, Jianfeng > > Maxime ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-08 2:14 ` Tan, Jianfeng @ 2017-12-08 8:35 ` Maxime Coquelin 2017-12-08 8:51 ` Tan, Jianfeng 0 siblings, 1 reply; 10+ messages in thread From: Maxime Coquelin @ 2017-12-08 8:35 UTC (permalink / raw) To: Tan, Jianfeng, Victor Kaplansky, dev, yliu, Bie, Tiwei; +Cc: stable, jfreiman On 12/08/2017 03:14 AM, Tan, Jianfeng wrote: > > >> -----Original Message----- >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >> Sent: Thursday, December 7, 2017 6:02 PM >> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie, >> Tiwei >> Cc: stable@dpdk.org; jfreiman@redhat.com >> Subject: Re: [PATCH] vhost_user: protect active rings from async ring >> changes >> >> >> >> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: >>> >>> >>>> -----Original Message----- >>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com] >>>> Sent: Wednesday, December 6, 2017 9:56 PM >>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; >>>> vkaplans@redhat.com >>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin >>>> Subject: [PATCH] vhost_user: protect active rings from async ring changes >>>> >>>> When performing live migration or memory hot-plugging, >>>> the changes to the device and vrings made by message handler >>>> done independently from vring usage by PMD threads. >>>> >>>> This causes for example segfauls during live-migration >>> >>> segfauls ->segfaults? >>> >>>> with MQ enable, but in general virtually any request >>>> sent by qemu changing the state of device can cause >>>> problems. >>>> >>>> These patches fixes all above issues by adding a spinlock >>>> to every vring and requiring message handler to start operation >>>> only after ensuring that all PMD threads related to the divece >>> >>> Another typo: divece. >>> >>>> are out of critical section accessing the vring data. >>>> >>>> Each vring has its own lock in order to not create contention >>>> between PMD threads of different vrings and to prevent >>>> performance degradation by scaling queue pair number. >>> >>> Also wonder how much overhead it brings. >>> >>> Instead of locking each vring, can we just, waiting a while (10us for example) >> after call destroy_device() callback so that every PMD thread has enough >> time to skip out the criterial area? >> >> No, because we are not destroying the device when it is needed. >> Actually, once destroy_device() is called, it is likely that the >> application has taken care the ring aren't being processed anymore >> before returning from the callback (This is at least the case with Vhost >> PMD). > > OK, I did not put it right way as there are multiple cases above: migration and memory hot plug. Let me try again: > > Whenever a vhost thread handles a message affecting PMD threads, (like SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag - VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of those criterial area. After message handling, reset the flag - VIRTIO_DEV_RUNNING. I think you mean clearing vq's enabled flag, because PMD threads never check the VIRTIO_DEV_RUNNING flag. > I suppose it can work, basing on an assumption that PMD threads work in polling mode and can skip criterial area quickly and inevitably. That sounds very fragile, because if the CPU aren't perfectly isolated, your PMD thread can be preempted for interrupt handling for example. Or what if for some reason the PMD thread CPU stalls for a short while? The later is unlikely, but if it happens, it will be hard to debug. Let's see first the performance impact of using the spinlock. It might not be that important because 99.9999% of the times, it will not even spin. Thanks, Maxime >> >> The reason we need the lock is to protect PMD threads from the handling >> of some vhost-user protocol requests. >> For example SET_MEM_TABLE in the case of memory hotplug, or >> SET_LOG_BASE >> in case of multiqueue, which is sent for every queue pair and results in >> unmapping/remapping the logging area. > > Yes, I understand how it fails. > > Thanks, > Jianfeng > >> >> Maxime ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-08 8:35 ` Maxime Coquelin @ 2017-12-08 8:51 ` Tan, Jianfeng 2017-12-08 10:11 ` Maxime Coquelin 0 siblings, 1 reply; 10+ messages in thread From: Tan, Jianfeng @ 2017-12-08 8:51 UTC (permalink / raw) To: Maxime Coquelin, Victor Kaplansky, dev, yliu, Bie, Tiwei; +Cc: stable, jfreiman > -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > Sent: Friday, December 8, 2017 4:36 PM > To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie, > Tiwei > Cc: stable@dpdk.org; jfreiman@redhat.com > Subject: Re: [PATCH] vhost_user: protect active rings from async ring > changes > > > > On 12/08/2017 03:14 AM, Tan, Jianfeng wrote: > > > > > >> -----Original Message----- > >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > >> Sent: Thursday, December 7, 2017 6:02 PM > >> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; > Bie, > >> Tiwei > >> Cc: stable@dpdk.org; jfreiman@redhat.com > >> Subject: Re: [PATCH] vhost_user: protect active rings from async ring > >> changes > >> > >> > >> > >> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com] > >>>> Sent: Wednesday, December 6, 2017 9:56 PM > >>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; > >>>> vkaplans@redhat.com > >>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin > >>>> Subject: [PATCH] vhost_user: protect active rings from async ring > changes > >>>> > >>>> When performing live migration or memory hot-plugging, > >>>> the changes to the device and vrings made by message handler > >>>> done independently from vring usage by PMD threads. > >>>> > >>>> This causes for example segfauls during live-migration > >>> > >>> segfauls ->segfaults? > >>> > >>>> with MQ enable, but in general virtually any request > >>>> sent by qemu changing the state of device can cause > >>>> problems. > >>>> > >>>> These patches fixes all above issues by adding a spinlock > >>>> to every vring and requiring message handler to start operation > >>>> only after ensuring that all PMD threads related to the divece > >>> > >>> Another typo: divece. > >>> > >>>> are out of critical section accessing the vring data. > >>>> > >>>> Each vring has its own lock in order to not create contention > >>>> between PMD threads of different vrings and to prevent > >>>> performance degradation by scaling queue pair number. > >>> > >>> Also wonder how much overhead it brings. > >>> > >>> Instead of locking each vring, can we just, waiting a while (10us for > example) > >> after call destroy_device() callback so that every PMD thread has enough > >> time to skip out the criterial area? > >> > >> No, because we are not destroying the device when it is needed. > >> Actually, once destroy_device() is called, it is likely that the > >> application has taken care the ring aren't being processed anymore > >> before returning from the callback (This is at least the case with Vhost > >> PMD). > > > > OK, I did not put it right way as there are multiple cases above: migration > and memory hot plug. Let me try again: > > > > Whenever a vhost thread handles a message affecting PMD threads, (like > SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag - > VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of > those criterial area. After message handling, reset the flag - > VIRTIO_DEV_RUNNING. > > I think you mean clearing vq's enabled flag, because PMD threads never > check the VIRTIO_DEV_RUNNING flag. Ah, yes. > > > I suppose it can work, basing on an assumption that PMD threads work in > polling mode and can skip criterial area quickly and inevitably. > > That sounds very fragile, because if the CPU aren't perfectly isolated, > your PMD thread can be preempted for interrupt handling for example. > > Or what if for some reason the PMD thread CPU stalls for a short while? > > The later is unlikely, but if it happens, it will be hard to debug. > > Let's see first the performance impact of using the spinlock. It might > not be that important because 99.9999% of the times, it will not even > spin. Fair enough. Thanks, Jianfeng > > Thanks, > Maxime > > >> > >> The reason we need the lock is to protect PMD threads from the handling > >> of some vhost-user protocol requests. > >> For example SET_MEM_TABLE in the case of memory hotplug, or > >> SET_LOG_BASE > >> in case of multiqueue, which is sent for every queue pair and results in > >> unmapping/remapping the logging area. > > > > Yes, I understand how it fails. > > > > Thanks, > > Jianfeng > > > >> > >> Maxime ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-08 8:51 ` Tan, Jianfeng @ 2017-12-08 10:11 ` Maxime Coquelin 2017-12-12 5:25 ` Tan, Jianfeng 0 siblings, 1 reply; 10+ messages in thread From: Maxime Coquelin @ 2017-12-08 10:11 UTC (permalink / raw) To: Tan, Jianfeng, Victor Kaplansky, dev, yliu, Bie, Tiwei; +Cc: stable, jfreiman Hi Jianfeng, On 12/08/2017 09:51 AM, Tan, Jianfeng wrote: > > >> -----Original Message----- >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >> Sent: Friday, December 8, 2017 4:36 PM >> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie, >> Tiwei >> Cc: stable@dpdk.org; jfreiman@redhat.com >> Subject: Re: [PATCH] vhost_user: protect active rings from async ring >> changes >> >> >> >> On 12/08/2017 03:14 AM, Tan, Jianfeng wrote: >>> >>> >>>> -----Original Message----- >>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >>>> Sent: Thursday, December 7, 2017 6:02 PM >>>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; >> Bie, >>>> Tiwei >>>> Cc: stable@dpdk.org; jfreiman@redhat.com >>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring >>>> changes >>>> >>>> >>>> >>>> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com] >>>>>> Sent: Wednesday, December 6, 2017 9:56 PM >>>>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; >>>>>> vkaplans@redhat.com >>>>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin >>>>>> Subject: [PATCH] vhost_user: protect active rings from async ring >> changes >>>>>> >>>>>> When performing live migration or memory hot-plugging, >>>>>> the changes to the device and vrings made by message handler >>>>>> done independently from vring usage by PMD threads. >>>>>> >>>>>> This causes for example segfauls during live-migration >>>>> >>>>> segfauls ->segfaults? >>>>> >>>>>> with MQ enable, but in general virtually any request >>>>>> sent by qemu changing the state of device can cause >>>>>> problems. >>>>>> >>>>>> These patches fixes all above issues by adding a spinlock >>>>>> to every vring and requiring message handler to start operation >>>>>> only after ensuring that all PMD threads related to the divece >>>>> >>>>> Another typo: divece. >>>>> >>>>>> are out of critical section accessing the vring data. >>>>>> >>>>>> Each vring has its own lock in order to not create contention >>>>>> between PMD threads of different vrings and to prevent >>>>>> performance degradation by scaling queue pair number. >>>>> >>>>> Also wonder how much overhead it brings. >>>>> >>>>> Instead of locking each vring, can we just, waiting a while (10us for >> example) >>>> after call destroy_device() callback so that every PMD thread has enough >>>> time to skip out the criterial area? >>>> >>>> No, because we are not destroying the device when it is needed. >>>> Actually, once destroy_device() is called, it is likely that the >>>> application has taken care the ring aren't being processed anymore >>>> before returning from the callback (This is at least the case with Vhost >>>> PMD). >>> >>> OK, I did not put it right way as there are multiple cases above: migration >> and memory hot plug. Let me try again: >>> >>> Whenever a vhost thread handles a message affecting PMD threads, (like >> SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag - >> VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of >> those criterial area. After message handling, reset the flag - >> VIRTIO_DEV_RUNNING. >> >> I think you mean clearing vq's enabled flag, because PMD threads never >> check the VIRTIO_DEV_RUNNING flag. > > Ah, yes. > >> >>> I suppose it can work, basing on an assumption that PMD threads work in >> polling mode and can skip criterial area quickly and inevitably. >> >> That sounds very fragile, because if the CPU aren't perfectly isolated, >> your PMD thread can be preempted for interrupt handling for example. >> >> Or what if for some reason the PMD thread CPU stalls for a short while? >> >> The later is unlikely, but if it happens, it will be hard to debug. >> >> Let's see first the performance impact of using the spinlock. It might >> not be that important because 99.9999% of the times, it will not even >> spin. > > Fair enough. I did some benchmarks on my Broadwell test bench (see patch below), and it seems that depending on the benchmark, perfs are on par, or better with the spinlock! I guess it explains because with the spinlock there is a better batching and less concurrent accesses on the rings, but I'm not sure. Please find my results below (CPU E5-2667 v4 @ 3.20GHz): Bench v17.11 v17.11 + spinlock ---------------- ----------- ------------------- PVP Bidir run1 19.29Mpps 19.26Mpps PVP Bidir run2 19.26Mpps 19.28Mpps TxOnly 18.47Mpps 18.83Mpps RxOnly 13.83Mpps 13.83Mpps IO Loopback 7.94Mpps 7.97Mpps Patch: --------------------------------------------------------- From 7e18cefce682235558fed66a1fb87ab937ec9297 Mon Sep 17 00:00:00 2001 From: Maxime Coquelin <maxime.coquelin@redhat.com> Date: Fri, 8 Dec 2017 04:21:25 -0500 Subject: [PATCH] vhost: spinlock benchmarking Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> --- lib/librte_vhost/vhost.c | 2 ++ lib/librte_vhost/vhost.h | 2 ++ lib/librte_vhost/virtio_net.c | 12 ++++++++++++ 3 files changed, 16 insertions(+) diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 4f8b73a..d9752cf 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -219,6 +219,8 @@ struct virtio_net * vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD; + rte_spinlock_init(&vq->access_lock); + vhost_user_iotlb_init(dev, vring_idx); /* Backends are set to -1 indicating an inactive device. */ vq->backend = -1; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 1cc81c1..56242a8 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -131,6 +131,8 @@ struct vhost_virtqueue { struct batch_copy_elem *batch_copy_elems; uint16_t batch_copy_nb_elems; + rte_spinlock_t access_lock; + rte_rwlock_t iotlb_lock; rte_rwlock_t iotlb_pending_lock; struct rte_mempool *iotlb_pool; diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 6fee16e..50cc3de 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -329,6 +329,8 @@ if (unlikely(vq->enabled == 0)) return 0; + rte_spinlock_lock(&vq->access_lock); + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_lock(vq); @@ -419,6 +421,8 @@ if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_unlock(vq); + rte_spinlock_unlock(&vq->access_lock); + return count; } @@ -654,6 +658,8 @@ if (unlikely(vq->enabled == 0)) return 0; + rte_spinlock_lock(&vq->access_lock); + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_lock(vq); @@ -715,6 +721,8 @@ if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_unlock(vq); + rte_spinlock_unlock(&vq->access_lock); + return pkt_idx; } @@ -1183,6 +1191,8 @@ if (unlikely(vq->enabled == 0)) return 0; + rte_spinlock_lock(&vq->access_lock); + vq->batch_copy_nb_elems = 0; if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) @@ -1356,6 +1366,8 @@ if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) vhost_user_iotlb_rd_unlock(vq); + rte_spinlock_unlock(&vq->access_lock); + if (unlikely(rarp_mbuf != NULL)) { /* * Inject it to the head of "pkts" array, so that switch's mac -- 1.8.3.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-08 10:11 ` Maxime Coquelin @ 2017-12-12 5:25 ` Tan, Jianfeng 2017-12-12 8:41 ` Maxime Coquelin 0 siblings, 1 reply; 10+ messages in thread From: Tan, Jianfeng @ 2017-12-12 5:25 UTC (permalink / raw) To: Maxime Coquelin, Victor Kaplansky, dev, yliu, Bie, Tiwei; +Cc: stable, jfreiman > -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > Sent: Friday, December 8, 2017 6:12 PM > To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie, > Tiwei > Cc: stable@dpdk.org; jfreiman@redhat.com > Subject: Re: [PATCH] vhost_user: protect active rings from async ring > changes > > Hi Jianfeng, > > On 12/08/2017 09:51 AM, Tan, Jianfeng wrote: > > > > > >> -----Original Message----- > >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > >> Sent: Friday, December 8, 2017 4:36 PM > >> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; > Bie, > >> Tiwei > >> Cc: stable@dpdk.org; jfreiman@redhat.com > >> Subject: Re: [PATCH] vhost_user: protect active rings from async ring > >> changes > >> > >> > >> > >> On 12/08/2017 03:14 AM, Tan, Jianfeng wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > >>>> Sent: Thursday, December 7, 2017 6:02 PM > >>>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; > yliu@fridaylinux.org; > >> Bie, > >>>> Tiwei > >>>> Cc: stable@dpdk.org; jfreiman@redhat.com > >>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring > >>>> changes > >>>> > >>>> > >>>> > >>>> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com] > >>>>>> Sent: Wednesday, December 6, 2017 9:56 PM > >>>>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; > >>>>>> vkaplans@redhat.com > >>>>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin > >>>>>> Subject: [PATCH] vhost_user: protect active rings from async ring > >> changes > >>>>>> > >>>>>> When performing live migration or memory hot-plugging, > >>>>>> the changes to the device and vrings made by message handler > >>>>>> done independently from vring usage by PMD threads. > >>>>>> > >>>>>> This causes for example segfauls during live-migration > >>>>> > >>>>> segfauls ->segfaults? > >>>>> > >>>>>> with MQ enable, but in general virtually any request > >>>>>> sent by qemu changing the state of device can cause > >>>>>> problems. > >>>>>> > >>>>>> These patches fixes all above issues by adding a spinlock > >>>>>> to every vring and requiring message handler to start operation > >>>>>> only after ensuring that all PMD threads related to the divece > >>>>> > >>>>> Another typo: divece. > >>>>> > >>>>>> are out of critical section accessing the vring data. > >>>>>> > >>>>>> Each vring has its own lock in order to not create contention > >>>>>> between PMD threads of different vrings and to prevent > >>>>>> performance degradation by scaling queue pair number. > >>>>> > >>>>> Also wonder how much overhead it brings. > >>>>> > >>>>> Instead of locking each vring, can we just, waiting a while (10us for > >> example) > >>>> after call destroy_device() callback so that every PMD thread has > enough > >>>> time to skip out the criterial area? > >>>> > >>>> No, because we are not destroying the device when it is needed. > >>>> Actually, once destroy_device() is called, it is likely that the > >>>> application has taken care the ring aren't being processed anymore > >>>> before returning from the callback (This is at least the case with Vhost > >>>> PMD). > >>> > >>> OK, I did not put it right way as there are multiple cases above: migration > >> and memory hot plug. Let me try again: > >>> > >>> Whenever a vhost thread handles a message affecting PMD threads, > (like > >> SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag - > >> VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out > of > >> those criterial area. After message handling, reset the flag - > >> VIRTIO_DEV_RUNNING. > >> > >> I think you mean clearing vq's enabled flag, because PMD threads never > >> check the VIRTIO_DEV_RUNNING flag. > > > > Ah, yes. > > > >> > >>> I suppose it can work, basing on an assumption that PMD threads work > in > >> polling mode and can skip criterial area quickly and inevitably. > >> > >> That sounds very fragile, because if the CPU aren't perfectly isolated, > >> your PMD thread can be preempted for interrupt handling for example. > >> > >> Or what if for some reason the PMD thread CPU stalls for a short while? > >> > >> The later is unlikely, but if it happens, it will be hard to debug. > >> > >> Let's see first the performance impact of using the spinlock. It might > >> not be that important because 99.9999% of the times, it will not even > >> spin. > > > > Fair enough. > > I did some benchmarks on my Broadwell test bench (see patch below), and > it seems that depending on the benchmark, perfs are on par, or better > with the spinlock! I guess it explains because with the spinlock there > is a better batching and less concurrent accesses on the rings, but I'm > not sure. > > Please find my results below (CPU E5-2667 v4 @ 3.20GHz): > > Bench v17.11 v17.11 + spinlock > ---------------- ----------- ------------------- > PVP Bidir run1 19.29Mpps 19.26Mpps > PVP Bidir run2 19.26Mpps 19.28Mpps > TxOnly 18.47Mpps 18.83Mpps > RxOnly 13.83Mpps 13.83Mpps > IO Loopback 7.94Mpps 7.97Mpps > This number seems really good for throughput. FYI, we are recently doing a test to measure how long it takes for a noop (means there is no packets on the virtqueue) vhost dequeue operation. It's like: start_tsc = rte_rdtsc(); for (j = 0; j < 10000000; ++j) nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst, nb_pkt_per_burst); end_tsc = rte_rdtsc(); printf("%"PRIu64"\n", (end_tsc - start_tsc) / 10000000); Turns out that this patch will make it from 50 cycles -> 80 cycles for each noop rte_eth_rx_burst() operation on vhost port. My server is Haswell 2.3GHz. Note that I'm not against this patch. Just that if such operation takes so many cycles, we might consider to introduce interrupt mode for vhost ports. Thanks, Jianfeng > > > Patch: > --------------------------------------------------------- > > From 7e18cefce682235558fed66a1fb87ab937ec9297 Mon Sep 17 00:00:00 > 2001 > From: Maxime Coquelin <maxime.coquelin@redhat.com> > Date: Fri, 8 Dec 2017 04:21:25 -0500 > Subject: [PATCH] vhost: spinlock benchmarking > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> > --- > lib/librte_vhost/vhost.c | 2 ++ > lib/librte_vhost/vhost.h | 2 ++ > lib/librte_vhost/virtio_net.c | 12 ++++++++++++ > 3 files changed, 16 insertions(+) > > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c > index 4f8b73a..d9752cf 100644 > --- a/lib/librte_vhost/vhost.c > +++ b/lib/librte_vhost/vhost.c > @@ -219,6 +219,8 @@ struct virtio_net * > vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; > vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD; > > + rte_spinlock_init(&vq->access_lock); > + > vhost_user_iotlb_init(dev, vring_idx); > /* Backends are set to -1 indicating an inactive device. */ > vq->backend = -1; > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 1cc81c1..56242a8 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -131,6 +131,8 @@ struct vhost_virtqueue { > struct batch_copy_elem *batch_copy_elems; > uint16_t batch_copy_nb_elems; > > + rte_spinlock_t access_lock; > + > rte_rwlock_t iotlb_lock; > rte_rwlock_t iotlb_pending_lock; > struct rte_mempool *iotlb_pool; > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > index 6fee16e..50cc3de 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -329,6 +329,8 @@ > if (unlikely(vq->enabled == 0)) > return 0; > > + rte_spinlock_lock(&vq->access_lock); > + > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > vhost_user_iotlb_rd_lock(vq); > > @@ -419,6 +421,8 @@ > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > vhost_user_iotlb_rd_unlock(vq); > > + rte_spinlock_unlock(&vq->access_lock); > + > return count; > } > > @@ -654,6 +658,8 @@ > if (unlikely(vq->enabled == 0)) > return 0; > > + rte_spinlock_lock(&vq->access_lock); > + > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > vhost_user_iotlb_rd_lock(vq); > > @@ -715,6 +721,8 @@ > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > vhost_user_iotlb_rd_unlock(vq); > > + rte_spinlock_unlock(&vq->access_lock); > + > return pkt_idx; > } > > @@ -1183,6 +1191,8 @@ > if (unlikely(vq->enabled == 0)) > return 0; > > + rte_spinlock_lock(&vq->access_lock); > + > vq->batch_copy_nb_elems = 0; > > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > @@ -1356,6 +1366,8 @@ > if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > vhost_user_iotlb_rd_unlock(vq); > > + rte_spinlock_unlock(&vq->access_lock); > + > if (unlikely(rarp_mbuf != NULL)) { > /* > * Inject it to the head of "pkts" array, so that switch's mac > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes 2017-12-12 5:25 ` Tan, Jianfeng @ 2017-12-12 8:41 ` Maxime Coquelin 0 siblings, 0 replies; 10+ messages in thread From: Maxime Coquelin @ 2017-12-12 8:41 UTC (permalink / raw) To: Tan, Jianfeng, Victor Kaplansky, dev, yliu, Bie, Tiwei; +Cc: stable, jfreiman On 12/12/2017 06:25 AM, Tan, Jianfeng wrote: > > >> -----Original Message----- >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >> Sent: Friday, December 8, 2017 6:12 PM >> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; Bie, >> Tiwei >> Cc: stable@dpdk.org; jfreiman@redhat.com >> Subject: Re: [PATCH] vhost_user: protect active rings from async ring >> changes >> >> Hi Jianfeng, >> >> On 12/08/2017 09:51 AM, Tan, Jianfeng wrote: >>> >>> >>>> -----Original Message----- >>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >>>> Sent: Friday, December 8, 2017 4:36 PM >>>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; yliu@fridaylinux.org; >> Bie, >>>> Tiwei >>>> Cc: stable@dpdk.org; jfreiman@redhat.com >>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring >>>> changes >>>> >>>> >>>> >>>> On 12/08/2017 03:14 AM, Tan, Jianfeng wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >>>>>> Sent: Thursday, December 7, 2017 6:02 PM >>>>>> To: Tan, Jianfeng; Victor Kaplansky; dev@dpdk.org; >> yliu@fridaylinux.org; >>>> Bie, >>>>>> Tiwei >>>>>> Cc: stable@dpdk.org; jfreiman@redhat.com >>>>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring >>>>>> changes >>>>>> >>>>>> >>>>>> >>>>>> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote: >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Victor Kaplansky [mailto:vkaplans@redhat.com] >>>>>>>> Sent: Wednesday, December 6, 2017 9:56 PM >>>>>>>> To: dev@dpdk.org; yliu@fridaylinux.org; Bie, Tiwei; Tan, Jianfeng; >>>>>>>> vkaplans@redhat.com >>>>>>>> Cc: stable@dpdk.org; jfreiman@redhat.com; Maxime Coquelin >>>>>>>> Subject: [PATCH] vhost_user: protect active rings from async ring >>>> changes >>>>>>>> >>>>>>>> When performing live migration or memory hot-plugging, >>>>>>>> the changes to the device and vrings made by message handler >>>>>>>> done independently from vring usage by PMD threads. >>>>>>>> >>>>>>>> This causes for example segfauls during live-migration >>>>>>> >>>>>>> segfauls ->segfaults? >>>>>>> >>>>>>>> with MQ enable, but in general virtually any request >>>>>>>> sent by qemu changing the state of device can cause >>>>>>>> problems. >>>>>>>> >>>>>>>> These patches fixes all above issues by adding a spinlock >>>>>>>> to every vring and requiring message handler to start operation >>>>>>>> only after ensuring that all PMD threads related to the divece >>>>>>> >>>>>>> Another typo: divece. >>>>>>> >>>>>>>> are out of critical section accessing the vring data. >>>>>>>> >>>>>>>> Each vring has its own lock in order to not create contention >>>>>>>> between PMD threads of different vrings and to prevent >>>>>>>> performance degradation by scaling queue pair number. >>>>>>> >>>>>>> Also wonder how much overhead it brings. >>>>>>> >>>>>>> Instead of locking each vring, can we just, waiting a while (10us for >>>> example) >>>>>> after call destroy_device() callback so that every PMD thread has >> enough >>>>>> time to skip out the criterial area? >>>>>> >>>>>> No, because we are not destroying the device when it is needed. >>>>>> Actually, once destroy_device() is called, it is likely that the >>>>>> application has taken care the ring aren't being processed anymore >>>>>> before returning from the callback (This is at least the case with Vhost >>>>>> PMD). >>>>> >>>>> OK, I did not put it right way as there are multiple cases above: migration >>>> and memory hot plug. Let me try again: >>>>> >>>>> Whenever a vhost thread handles a message affecting PMD threads, >> (like >>>> SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag - >>>> VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out >> of >>>> those criterial area. After message handling, reset the flag - >>>> VIRTIO_DEV_RUNNING. >>>> >>>> I think you mean clearing vq's enabled flag, because PMD threads never >>>> check the VIRTIO_DEV_RUNNING flag. >>> >>> Ah, yes. >>> >>>> >>>>> I suppose it can work, basing on an assumption that PMD threads work >> in >>>> polling mode and can skip criterial area quickly and inevitably. >>>> >>>> That sounds very fragile, because if the CPU aren't perfectly isolated, >>>> your PMD thread can be preempted for interrupt handling for example. >>>> >>>> Or what if for some reason the PMD thread CPU stalls for a short while? >>>> >>>> The later is unlikely, but if it happens, it will be hard to debug. >>>> >>>> Let's see first the performance impact of using the spinlock. It might >>>> not be that important because 99.9999% of the times, it will not even >>>> spin. >>> >>> Fair enough. >> >> I did some benchmarks on my Broadwell test bench (see patch below), and >> it seems that depending on the benchmark, perfs are on par, or better >> with the spinlock! I guess it explains because with the spinlock there >> is a better batching and less concurrent accesses on the rings, but I'm >> not sure. >> >> Please find my results below (CPU E5-2667 v4 @ 3.20GHz): >> >> Bench v17.11 v17.11 + spinlock >> ---------------- ----------- ------------------- >> PVP Bidir run1 19.29Mpps 19.26Mpps >> PVP Bidir run2 19.26Mpps 19.28Mpps >> TxOnly 18.47Mpps 18.83Mpps >> RxOnly 13.83Mpps 13.83Mpps >> IO Loopback 7.94Mpps 7.97Mpps >> > > This number seems really good for throughput. > > FYI, we are recently doing a test to measure how long it takes for a noop (means there is no packets on the virtqueue) vhost dequeue operation. It's like: > > start_tsc = rte_rdtsc(); > for (j = 0; j < 10000000; ++j) > nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst, nb_pkt_per_burst); > end_tsc = rte_rdtsc(); > printf("%"PRIu64"\n", (end_tsc - start_tsc) / 10000000); Thanks for sharing, it's an interesting benchmark. Any chance it could land somewhere in DPDK upstream repo as a test application or something else > Turns out that this patch will make it from 50 cycles -> 80 cycles for each noop rte_eth_rx_burst() operation on vhost port. My server is Haswell 2.3GHz. I declared the spinlock at the end of the struct, so it may not be in the same cache line as enabled flag. Maybe putting them on the same cache line could have a positive impact in the vring empty case? > Note that I'm not against this patch. Just that if such operation takes so many cycles, we might consider to introduce interrupt mode for vhost ports. Indeed, that could make sense in case of lot of vhost ports processed on a single CPU. Thanks, Maxime > Thanks, > Jianfeng > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-12-12 8:41 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-06 13:55 [dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes Victor Kaplansky 2017-12-06 14:11 ` Yuanhan Liu 2017-12-07 9:33 ` Tan, Jianfeng 2017-12-07 10:02 ` Maxime Coquelin 2017-12-08 2:14 ` Tan, Jianfeng 2017-12-08 8:35 ` Maxime Coquelin 2017-12-08 8:51 ` Tan, Jianfeng 2017-12-08 10:11 ` Maxime Coquelin 2017-12-12 5:25 ` Tan, Jianfeng 2017-12-12 8:41 ` Maxime Coquelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).