From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by dpdk.org (Postfix) with ESMTP id 9570A378E for ; Thu, 17 Nov 2016 18:37:17 +0100 (CET) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9E9F3203E3; Thu, 17 Nov 2016 17:37:15 +0000 (UTC) Received: from redhat.com (50-79-175-210-static.hfc.comcastbusiness.net [50.79.175.210]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8576020256; Thu, 17 Nov 2016 17:37:13 +0000 (UTC) Date: Thu, 17 Nov 2016 19:37:09 +0200 From: "Michael S. Tsirkin" To: Yuanhan Liu Cc: Maxime Coquelin , dev@dpdk.org, Stephen Hemminger , qemu-devel@nongnu.org, libvir-list@redhat.com, vpp-dev@lists.fd.io, =?iso-8859-1?Q?Marc-Andr=E9?= Lureau Message-ID: <20161117192445-mutt-send-email-mst@kernel.org> References: <20161011173526-mutt-send-email-mst@kernel.org> <20161117082902.GM5048@yliu-dev.sh.intel.com> <20161117094936.GN5048@yliu-dev.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20161117094936.GN5048@yliu-dev.sh.intel.com> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Subject: Re: [dpdk-dev] dpdk/vpp and cross-version migration for vhost X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2016 17:37:18 -0000 On Thu, Nov 17, 2016 at 05:49:36PM +0800, Yuanhan Liu wrote: > On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote: > > > > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote: > > >As usaual, sorry for late response :/ > > > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote: > > >>Hi! > > >>So it looks like we face a problem with cross-version > > >>migration when using vhost. It's not new but became more > > >>acute with the advent of vhost user. > > >> > > >>For users to be able to migrate between different versions > > >>of the hypervisor the interface exposed to guests > > >>by hypervisor must stay unchanged. > > >> > > >>The problem is that a qemu device is connected > > >>to a backend in another process, so the interface > > >>exposed to guests depends on the capabilities of that > > >>process. > > >> > > >>Specifically, for vhost user interface based on virtio, this includes > > >>the "host features" bitmap that defines the interface, as well as more > > >>host values such as the max ring size. Adding new features/changing > > >>values to this interface is required to make progress, but on the other > > >>hand we need ability to get the old host features to be compatible. > > > > > >It looks like to the same issue of vhost-user reconnect to me. For example, > > > > > >- start dpdk 16.07 & qemu 2.5 > > >- kill dpdk > > >- start dpdk 16.11 > > > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect), > > >above should work. Because qemu saves the negotiated features before the > > >disconnect and stores it back after the reconnection. > > > > > > commit a463215b087c41d7ca94e51aa347cde523831873 > > > Author: Marc-André Lureau > > > Date: Mon Jun 6 18:45:05 2016 +0200 > > > > > > vhost-net: save & restore vhost-user acked features > > > > > > The initial vhost-user connection sets the features to be negotiated > > > with the driver. Renegotiation isn't possible without device reset. > > > > > > To handle reconnection of vhost-user backend, ensure the same set of > > > features are provided, and reuse already acked features. > > > > > > Signed-off-by: Marc-André Lureau > > > > > > > > >So we could do similar to vhost-user? I mean, save the acked features > > >before migration and store it back after it. This should be able to > > >keep the compatibility. If user downgrades DPDK version, it also could > > >be easily detected, and then exit with an error to user: migration > > >failed due to un-compatible vhost features. > > > > > >Just some rough thoughts. Makes tiny sense? > > > > My understanding is that the management tool has to know whether > > versions are compatible before initiating the migration: > > Makes sense. How about getting and restoring the acked features through > qemu command lines then, say, through the monitor interface? > > With that, it would be something like: > > - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host > > - read the acked features (through monitor interface) > > - start vhost-user backend in the dst host > > - start qemu in the dst host with the just queried acked features > > QEMU then is expected to use this feature set for the later vhost-user > feature negotitation. Exit if features compatibility is broken. > > Thoughts? > > --yliu You keep assuming that you have the VM started first and figure out things afterwards, but this does not work. Think about a cluster of machines. You want to start a VM in a way that will ensure compatibility with all hosts in a cluster. If you don't, guest visible interface will change and you won't be able to migrate. It does not make sense to discuss feature bits specifically since that is not the only part of interface. For example, max ring size supported might change. Let me describe how it works in qemu/libvirt. When you install a VM, you can specify compatibility level (aka "machine type"), and you can query the supported compatibility levels. Management uses that to find the supported compatibility and stores the compatibility in XML that is migrated with the VM. There's also a way to find the latest level which is the default unless overridden by user, again this level is recorded and then - management can make sure migration destination is compatible - management can avoid migration to hosts without that support We absolutely can QEMU be in control here, but what is missing is ability to query compatibility as above. > > 1. The downtime could be unpredictable if a VM has to move from hosts > > to hosts multiple times, which is problematic, especially for NFV. > > 2. If migration is not possible, maybe the management tool would > > prefer not to interrupt the VM on current host. > > > > I have little experience with migration though, so I could be mistaken. > > > > Thanks, > > Maxime > > > > > > > > --yliu > > >> > > >>To solve this problem within qemu, qemu has a versioning system based on > > >>a machine type concept which fundamentally is a version string, by > > >>specifying that string one can get hardware compatible with a previous > > >>qemu version. QEMU also reports the latest version and list of versions > > >>supported so libvirt records the version at VM creation and then is > > >>careful to use this machine version whenever it migrates a VM. > > >> > > >>One might wonder how is this solved with a kernel vhost backend. The > > >>answer is that it mostly isn't - instead an assumption is made, that > > >>qemu versions are deployed together with the kernel - this is generally > > >>true for downstreams. Thus whenever qemu gains a new feature, it is > > >>already supported by the kernel as well. However, if one attempts > > >>migration with a new qemu from a system with a new to old kernel, one > > >>would get a failure. > > >> > > >>In the world where we have multiple userspace backends, with some of > > >>these supplied by ISVs, this seems non-realistic. > > >> > > >>IMO we need to support vhost backend versioning, ideally > > >>in a way that will also work for vhost kernel backends. > > >> > > >>So I'd like to get some input from both backend and management > > >>developers on what a good solution would look like. > > >> > > >>If we want to emulate the qemu solution, this involves adding the > > >>concept of interface versions to dpdk. For example, dpdk could supply a > > >>file (or utility printing?) with list of versions: latest and versions > > >>supported. libvirt could read that and > > >>- store latest version at vm creation > > >>- pass it around with the vm > > >>- pass it to qemu > > >> > > >>>From here, qemu could pass this over the vhost-user channel, > > >>thus making sure it's initialized with the correct > > >>compatible interface. > > >> > > >>As version here is an opaque string for libvirt and qemu, > > >>anything can be used - but I suggest either a list > > >>of values defining the interface, e.g. > > >>any_layout=on,max_ring=256 > > >>or a version including the name and vendor of the backend, > > >>e.g. "org.dpdk.v4.5.6". > > >> > > >>Note that typically the list of supported versions can only be > > >>extended, not shrunk. Also, if the host/guest interface > > >>does not change, don't change the current version as > > >>this just creates work for everyone. > > >> > > >>Thoughts? Would this work well for management? dpdk? vpp? > > >> > > >>Thanks! > > >> > > >>-- > > >>MST