From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 8DD85558B for ; Tue, 22 Nov 2016 15:53:07 +0100 (CET) Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D0B85C054908; Tue, 22 Nov 2016 14:53:06 +0000 (UTC) Received: from redhat.com (ovpn-116-56.rdu2.redhat.com [10.10.116.56]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAMEr5FX018476; Tue, 22 Nov 2016 09:53:05 -0500 Date: Tue, 22 Nov 2016 16:53:05 +0200 From: "Michael S. Tsirkin" To: Yuanhan Liu Cc: Maxime Coquelin , dev@dpdk.org, Stephen Hemminger , qemu-devel@nongnu.org, libvir-list@redhat.com, vpp-dev@lists.fd.io, =?iso-8859-1?Q?Marc-Andr=E9?= Lureau Message-ID: <20161122164143-mutt-send-email-mst@kernel.org> References: <20161011173526-mutt-send-email-mst@kernel.org> <20161117082902.GM5048@yliu-dev.sh.intel.com> <20161117094936.GN5048@yliu-dev.sh.intel.com> <20161117192445-mutt-send-email-mst@kernel.org> <20161122130223.GW5048@yliu-dev.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20161122130223.GW5048@yliu-dev.sh.intel.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 22 Nov 2016 14:53:06 +0000 (UTC) Subject: Re: [dpdk-dev] dpdk/vpp and cross-version migration for vhost X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Nov 2016 14:53:07 -0000 On Tue, Nov 22, 2016 at 09:02:23PM +0800, Yuanhan Liu wrote: > On Thu, Nov 17, 2016 at 07:37:09PM +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 17, 2016 at 05:49:36PM +0800, Yuanhan Liu wrote: > > > On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote: > > > > > > > > > > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote: > > > > >As usaual, sorry for late response :/ > > > > > > > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote: > > > > >>Hi! > > > > >>So it looks like we face a problem with cross-version > > > > >>migration when using vhost. It's not new but became more > > > > >>acute with the advent of vhost user. > > > > >> > > > > >>For users to be able to migrate between different versions > > > > >>of the hypervisor the interface exposed to guests > > > > >>by hypervisor must stay unchanged. > > > > >> > > > > >>The problem is that a qemu device is connected > > > > >>to a backend in another process, so the interface > > > > >>exposed to guests depends on the capabilities of that > > > > >>process. > > > > >> > > > > >>Specifically, for vhost user interface based on virtio, this includes > > > > >>the "host features" bitmap that defines the interface, as well as more > > > > >>host values such as the max ring size. Adding new features/changing > > > > >>values to this interface is required to make progress, but on the other > > > > >>hand we need ability to get the old host features to be compatible. > > > > > > > > > >It looks like to the same issue of vhost-user reconnect to me. For example, > > > > > > > > > >- start dpdk 16.07 & qemu 2.5 > > > > >- kill dpdk > > > > >- start dpdk 16.11 > > > > > > > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect), > > > > >above should work. Because qemu saves the negotiated features before the > > > > >disconnect and stores it back after the reconnection. > > > > > > > > > > commit a463215b087c41d7ca94e51aa347cde523831873 > > > > > Author: Marc-André Lureau > > > > > Date: Mon Jun 6 18:45:05 2016 +0200 > > > > > > > > > > vhost-net: save & restore vhost-user acked features > > > > > > > > > > The initial vhost-user connection sets the features to be negotiated > > > > > with the driver. Renegotiation isn't possible without device reset. > > > > > > > > > > To handle reconnection of vhost-user backend, ensure the same set of > > > > > features are provided, and reuse already acked features. > > > > > > > > > > Signed-off-by: Marc-André Lureau > > > > > > > > > > > > > > >So we could do similar to vhost-user? I mean, save the acked features > > > > >before migration and store it back after it. This should be able to > > > > >keep the compatibility. If user downgrades DPDK version, it also could > > > > >be easily detected, and then exit with an error to user: migration > > > > >failed due to un-compatible vhost features. > > > > > > > > > >Just some rough thoughts. Makes tiny sense? > > > > > > > > My understanding is that the management tool has to know whether > > > > versions are compatible before initiating the migration: > > > > > > Makes sense. How about getting and restoring the acked features through > > > qemu command lines then, say, through the monitor interface? > > > > > > With that, it would be something like: > > > > > > - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host > > > > > > - read the acked features (through monitor interface) > > > > > > - start vhost-user backend in the dst host > > > > > > - start qemu in the dst host with the just queried acked features > > > > > > QEMU then is expected to use this feature set for the later vhost-user > > > feature negotitation. Exit if features compatibility is broken. > > > > > > Thoughts? > > > > > > --yliu > > > > > > You keep assuming that you have the VM started first and > > figure out things afterwards, but this does not work. > > > > Think about a cluster of machines. You want to start a VM in > > a way that will ensure compatibility with all hosts > > in a cluster. > > I see. I was more considering about the case when the dst > host (including the qemu and dpdk combo) is given, and > then determine whether it will be a successfull migration > or not. > > And you are asking that we need to know which host could > be a good candidate before starting the migration. In such > case, we indeed need some inputs from both the qemu and > vhost-user backend. > > For DPDK, I think it could be simple, just as you said, it > could be either a tiny script, or even a macro defined in > the source code file (we extend it every time we add a > new feature) to let the libvirt to read it. Or something > else. There's the issue of APIs that tweak features as Maxime suggested. Maybe the only thing to do is to deprecate it, but I feel some way for application to pass info into guest might be benefitial. > > If you don't, guest visible interface will change > > and you won't be able to migrate. > > > > It does not make sense to discuss feature bits specifically > > since that is not the only part of interface. > > For example, max ring size supported might change. > > I don't quite understand why we have to consider the max ring > size here? Isn't it a virtio device attribute, that QEMU could > provide such compatibility information? > > I mean, DPDK is supposed to support vary vring size, it's QEMU > to give a specifc value. If backend supports s/g of any size up to 2^16, there's no issue. ATM some backends might be assuming up to 1K s/g since QEMU never supported bigger ones. We might classify this as a bug, or not and add a feature flag. But it's just an example. There might be more values at issue in the future. > > Let me describe how it works in qemu/libvirt. > > When you install a VM, you can specify compatibility > > level (aka "machine type"), and you can query the supported compatibility > > levels. Management uses that to find the supported compatibility > > and stores the compatibility in XML that is migrated with the VM. > > There's also a way to find the latest level which is the > > default unless overridden by user, again this level > > is recorded and then > > - management can make sure migration destination is compatible > > - management can avoid migration to hosts without that support > > Thanks for the info, it helps. > > ... > > > > >>As version here is an opaque string for libvirt and qemu, > > > > >>anything can be used - but I suggest either a list > > > > >>of values defining the interface, e.g. > > > > >>any_layout=on,max_ring=256 > > > > >>or a version including the name and vendor of the backend, > > > > >>e.g. "org.dpdk.v4.5.6". > > The version scheme may not be ideal here. Assume a QEMU is supposed > to work with a specific DPDK version, however, user may disable some > newer features through qemu command line, that it also could work with > an elder DPDK version. Using the version scheme will not allow us doing > such migration to an elder DPDK version. The MTU is a lively example > here? (when MTU feature is provided by QEMU but is actually disabled > by user, that it could also work with an elder DPDK without MTU support). > > --yliu OK, so does a list of values look better to you then? > > > > >> > > > > >>Note that typically the list of supported versions can only be > > > > >>extended, not shrunk. Also, if the host/guest interface > > > > >>does not change, don't change the current version as > > > > >>this just creates work for everyone. > > > > >> > > > > >>Thoughts? Would this work well for management? dpdk? vpp? > > > > >> > > > > >>Thanks! > > > > >> > > > > >>-- > > > > >>MST