From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 53F611B1B8; Wed, 20 Dec 2017 21:06:32 +0100 (CET) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 51705C00163D; Wed, 20 Dec 2017 20:06:31 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3229266FE4; Wed, 20 Dec 2017 20:06:31 +0000 (UTC) Received: from zmail17.collab.prod.int.phx2.redhat.com (zmail17.collab.prod.int.phx2.redhat.com [10.5.83.19]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 106F24BB78; Wed, 20 Dec 2017 20:06:31 +0000 (UTC) Date: Wed, 20 Dec 2017 15:06:30 -0500 (EST) From: Victor Kaplansky To: Stephen Hemminger Cc: dev@dpdk.org, stable@dpdk.org, Jens Freimann , Maxime Coquelin , Yuanhan Liu , Tiwei Bie , Jianfeng Tan Message-ID: <634157847.2119460.1513800390896.JavaMail.zimbra@redhat.com> In-Reply-To: <20171220110616.21301e11@xeon-e3> References: <20171220163752-mutt-send-email-victork@redhat.com> <20171220110616.21301e11@xeon-e3> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.35.206.67, 10.4.195.12] Thread-Topic: vhost_user: protect active rings from async ring changes Thread-Index: XaU/0lXTHjC3oBPqfIECCLFWQSUkMg== X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 20 Dec 2017 20:06:31 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v4] vhost_user: protect active rings from async ring changes X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Dec 2017 20:06:32 -0000 ----- Original Message ----- > From: "Stephen Hemminger" > To: "Victor Kaplansky" > Cc: dev@dpdk.org, stable@dpdk.org, "Jens Freimann" , "Maxime Coquelin" > , "Yuanhan Liu" , "Tiwei Bie" , "Jianfeng > Tan" > Sent: Wednesday, December 20, 2017 9:06:16 PM > Subject: Re: [dpdk-dev] [PATCH v4] vhost_user: protect active rings from async ring changes > > On Wed, 20 Dec 2017 16:37:52 +0200 > Victor Kaplansky wrote: > > > When performing live migration or memory hot-plugging, > > the changes to the device and vrings made by message handler > > done independently from vring usage by PMD threads. > > > > This causes for example segfaults during live-migration > > with MQ enable, but in general virtually any request > > sent by qemu changing the state of device can cause > > problems. > > > > These patches fixes all above issues by adding a spinlock > > to every vring and requiring message handler to start operation > > only after ensuring that all PMD threads related to the device > > are out of critical section accessing the vring data. > > > > Each vring has its own lock in order to not create contention > > between PMD threads of different vrings and to prevent > > performance degradation by scaling queue pair number. > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=1450680 > > > > Signed-off-by: Victor Kaplansky > > --- > > > > v4: > > > > o moved access_unlock before accessing enable flag and > > access_unlock after iommu_unlock consistently. > > o cosmetics: removed blank line. > > o the access_lock variable moved to be in the same > > cache line with enable and access_ok flags. > > o dequeue path is now guarded with trylock and returning > > zero if unsuccessful. > > o GET_VRING_BASE operation is not guarded by access lock > > to avoid deadlock with device_destroy. See the comment > > in the code. > > o Fixed error path exit from enqueue and dequeue carefully > > unlocking access and iommu locks as appropriate. > > > > v3: > > o Added locking to enqueue flow. > > o Enqueue path guarded as well as dequeue path. > > o Changed name of active_lock. > > o Added initialization of guarding spinlock. > > o Reworked functions skimming over all virt-queues. > > o Performance measurements done by Maxime Coquelin shows > > no degradation in bandwidth and throughput. > > o Spelling. > > o Taking lock only on set operations. > > o IOMMU messages are not guarded by access lock. > > > > v2: > > o Fixed checkpatch complains. > > o Added Signed-off-by. > > o Refined placement of guard to exclude IOMMU messages. > > o TODO: performance degradation measurement. > > > > > > > > lib/librte_vhost/vhost.h | 25 +++++++++++++-- > > lib/librte_vhost/vhost.c | 1 + > > lib/librte_vhost/vhost_user.c | 71 > > +++++++++++++++++++++++++++++++++++++++++++ > > lib/librte_vhost/virtio_net.c | 28 ++++++++++++++--- > > 4 files changed, 119 insertions(+), 6 deletions(-) > > > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > > index 1cc81c17..f3e43e95 100644 > > --- a/lib/librte_vhost/vhost.h > > +++ b/lib/librte_vhost/vhost.h > > @@ -108,12 +108,14 @@ struct vhost_virtqueue { > > > > /* Backend value to determine if device should started/stopped */ > > int backend; > > + int enabled; > > + int access_ok; > > + rte_spinlock_t access_lock; > > Maybe these int's should be bool? In this patch I only moved the location of "enabled" and "access_ok", without changing their type. Changing to bool could be done by another patch. Anyway, from C/C++ standard point of view, bool type is only promised to be large enough to store the values 0 and 1, thus if we want a better control on memory layout, it would we better to leave them as int (or change to char). > > > + > > /* Used to notify the guest (trigger interrupt) */ > > int callfd; > > /* Currently unused as polling mode is enabled */ > > int kickfd; > > - int enabled; > > - int access_ok; > > > > /* Physical address of used ring, for logging */ > > uint64_t log_guest_addr; > > @@ -302,6 +304,25 @@ vhost_log_used_vring(struct virtio_net *dev, struct > > vhost_virtqueue *vq, > > vhost_log_write(dev, vq->log_guest_addr + offset, len); > > } > > > > +static __rte_always_inline int > > +vhost_user_access_trylock(struct vhost_virtqueue *vq) > > +{ > > + return rte_spinlock_trylock(&vq->access_lock); > > +} > > + > > +static __rte_always_inline void > > +vhost_user_access_lock(struct vhost_virtqueue *vq) > > +{ > > + rte_spinlock_lock(&vq->access_lock); > > +} > > + > > +static __rte_always_inline void > > +vhost_user_access_unlock(struct vhost_virtqueue *vq) > > +{ > > + rte_spinlock_unlock(&vq->access_lock); > > +} > > + > > + > > Wrapping locking inline's adds nothing and makes life harder > for static analysis tools. Yep. In this case it inhibits the details of how the locking is implemented (e.g. the name of the lock). It also facilitates replacement of locking mechanism, by another implementation. See below. > > The bigger problem is that doing locking on all enqueue/dequeue > can have a visible performance impact. Did you measure that? > > Could you invent an RCUish mechanism using compiler barriers? > I've played a bit with measuring performance impact. Successful lock adds on the average about 30 cycles on my Haswell cpu. (and it successes 99.999...% of time). I can investigate it more, but my initial feeling is that adding a memory barrier (the real one, not the compiler barrier) would add about the same overhead. By the way, the way update_queuing_status() in drivers/net/vhost/rte_eth_vhost.c tries to avoid contention with the active queue by playing with "allow_queuing" and "while_queuing" seems to be broken, since memory barriers are missing. -- Victor