From: "Michael S. Tsirkin" <mst@redhat.com>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: "Maxime Coquelin" <maxime.coquelin@redhat.com>,
dev@dpdk.org, "Stephen Hemminger" <stephen@networkplumber.org>,
qemu-devel@nongnu.org, libvir-list@redhat.com,
vpp-dev@lists.fd.io,
"Marc-André Lureau" <marcandre.lureau@redhat.com>
Subject: Re: [dpdk-dev] dpdk/vpp and cross-version migration for vhost
Date: Tue, 22 Nov 2016 16:53:05 +0200 [thread overview]
Message-ID: <20161122164143-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20161122130223.GW5048@yliu-dev.sh.intel.com>
On Tue, Nov 22, 2016 at 09:02:23PM +0800, Yuanhan Liu wrote:
> On Thu, Nov 17, 2016 at 07:37:09PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 17, 2016 at 05:49:36PM +0800, Yuanhan Liu wrote:
> > > On Thu, Nov 17, 2016 at 09:47:09AM +0100, Maxime Coquelin wrote:
> > > >
> > > >
> > > > On 11/17/2016 09:29 AM, Yuanhan Liu wrote:
> > > > >As usaual, sorry for late response :/
> > > > >
> > > > >On Thu, Oct 13, 2016 at 08:50:52PM +0300, Michael S. Tsirkin wrote:
> > > > >>Hi!
> > > > >>So it looks like we face a problem with cross-version
> > > > >>migration when using vhost. It's not new but became more
> > > > >>acute with the advent of vhost user.
> > > > >>
> > > > >>For users to be able to migrate between different versions
> > > > >>of the hypervisor the interface exposed to guests
> > > > >>by hypervisor must stay unchanged.
> > > > >>
> > > > >>The problem is that a qemu device is connected
> > > > >>to a backend in another process, so the interface
> > > > >>exposed to guests depends on the capabilities of that
> > > > >>process.
> > > > >>
> > > > >>Specifically, for vhost user interface based on virtio, this includes
> > > > >>the "host features" bitmap that defines the interface, as well as more
> > > > >>host values such as the max ring size. Adding new features/changing
> > > > >>values to this interface is required to make progress, but on the other
> > > > >>hand we need ability to get the old host features to be compatible.
> > > > >
> > > > >It looks like to the same issue of vhost-user reconnect to me. For example,
> > > > >
> > > > >- start dpdk 16.07 & qemu 2.5
> > > > >- kill dpdk
> > > > >- start dpdk 16.11
> > > > >
> > > > >Though DPDK 16.11 has more features comparing to dpdk 16.07 (say, indirect),
> > > > >above should work. Because qemu saves the negotiated features before the
> > > > >disconnect and stores it back after the reconnection.
> > > > >
> > > > > commit a463215b087c41d7ca94e51aa347cde523831873
> > > > > Author: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > > > Date: Mon Jun 6 18:45:05 2016 +0200
> > > > >
> > > > > vhost-net: save & restore vhost-user acked features
> > > > >
> > > > > The initial vhost-user connection sets the features to be negotiated
> > > > > with the driver. Renegotiation isn't possible without device reset.
> > > > >
> > > > > To handle reconnection of vhost-user backend, ensure the same set of
> > > > > features are provided, and reuse already acked features.
> > > > >
> > > > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > > > >
> > > > >
> > > > >So we could do similar to vhost-user? I mean, save the acked features
> > > > >before migration and store it back after it. This should be able to
> > > > >keep the compatibility. If user downgrades DPDK version, it also could
> > > > >be easily detected, and then exit with an error to user: migration
> > > > >failed due to un-compatible vhost features.
> > > > >
> > > > >Just some rough thoughts. Makes tiny sense?
> > > >
> > > > My understanding is that the management tool has to know whether
> > > > versions are compatible before initiating the migration:
> > >
> > > Makes sense. How about getting and restoring the acked features through
> > > qemu command lines then, say, through the monitor interface?
> > >
> > > With that, it would be something like:
> > >
> > > - start vhost-user backend (DPDK, VPP, or whatever) & qemu in the src host
> > >
> > > - read the acked features (through monitor interface)
> > >
> > > - start vhost-user backend in the dst host
> > >
> > > - start qemu in the dst host with the just queried acked features
> > >
> > > QEMU then is expected to use this feature set for the later vhost-user
> > > feature negotitation. Exit if features compatibility is broken.
> > >
> > > Thoughts?
> > >
> > > --yliu
> >
> >
> > You keep assuming that you have the VM started first and
> > figure out things afterwards, but this does not work.
> >
> > Think about a cluster of machines. You want to start a VM in
> > a way that will ensure compatibility with all hosts
> > in a cluster.
>
> I see. I was more considering about the case when the dst
> host (including the qemu and dpdk combo) is given, and
> then determine whether it will be a successfull migration
> or not.
>
> And you are asking that we need to know which host could
> be a good candidate before starting the migration. In such
> case, we indeed need some inputs from both the qemu and
> vhost-user backend.
>
> For DPDK, I think it could be simple, just as you said, it
> could be either a tiny script, or even a macro defined in
> the source code file (we extend it every time we add a
> new feature) to let the libvirt to read it. Or something
> else.
There's the issue of APIs that tweak features as Maxime
suggested. Maybe the only thing to do is to deprecate it,
but I feel some way for application to pass info into
guest might be benefitial.
> > If you don't, guest visible interface will change
> > and you won't be able to migrate.
> >
> > It does not make sense to discuss feature bits specifically
> > since that is not the only part of interface.
> > For example, max ring size supported might change.
>
> I don't quite understand why we have to consider the max ring
> size here? Isn't it a virtio device attribute, that QEMU could
> provide such compatibility information?
>
> I mean, DPDK is supposed to support vary vring size, it's QEMU
> to give a specifc value.
If backend supports s/g of any size up to 2^16, there's no issue.
ATM some backends might be assuming up to 1K s/g since
QEMU never supported bigger ones. We might classify this
as a bug, or not and add a feature flag.
But it's just an example. There might be more values at issue
in the future.
> > Let me describe how it works in qemu/libvirt.
> > When you install a VM, you can specify compatibility
> > level (aka "machine type"), and you can query the supported compatibility
> > levels. Management uses that to find the supported compatibility
> > and stores the compatibility in XML that is migrated with the VM.
> > There's also a way to find the latest level which is the
> > default unless overridden by user, again this level
> > is recorded and then
> > - management can make sure migration destination is compatible
> > - management can avoid migration to hosts without that support
>
> Thanks for the info, it helps.
>
> ...
> > > > >>As version here is an opaque string for libvirt and qemu,
> > > > >>anything can be used - but I suggest either a list
> > > > >>of values defining the interface, e.g.
> > > > >>any_layout=on,max_ring=256
> > > > >>or a version including the name and vendor of the backend,
> > > > >>e.g. "org.dpdk.v4.5.6".
>
> The version scheme may not be ideal here. Assume a QEMU is supposed
> to work with a specific DPDK version, however, user may disable some
> newer features through qemu command line, that it also could work with
> an elder DPDK version. Using the version scheme will not allow us doing
> such migration to an elder DPDK version. The MTU is a lively example
> here? (when MTU feature is provided by QEMU but is actually disabled
> by user, that it could also work with an elder DPDK without MTU support).
>
> --yliu
OK, so does a list of values look better to you then?
> > > > >>
> > > > >>Note that typically the list of supported versions can only be
> > > > >>extended, not shrunk. Also, if the host/guest interface
> > > > >>does not change, don't change the current version as
> > > > >>this just creates work for everyone.
> > > > >>
> > > > >>Thoughts? Would this work well for management? dpdk? vpp?
> > > > >>
> > > > >>Thanks!
> > > > >>
> > > > >>--
> > > > >>MST
next prev parent reply other threads:[~2016-11-22 14:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-13 17:50 Michael S. Tsirkin
2016-11-16 20:43 ` Maxime Coquelin
2016-11-17 8:29 ` Yuanhan Liu
2016-11-17 8:47 ` Maxime Coquelin
2016-11-17 9:49 ` Yuanhan Liu
2016-11-17 15:25 ` [dpdk-dev] [vpp-dev] " Thomas F Herbert
2016-11-17 17:37 ` [dpdk-dev] " Michael S. Tsirkin
2016-11-22 13:02 ` Yuanhan Liu
2016-11-22 14:53 ` Michael S. Tsirkin [this message]
2016-11-24 6:31 ` Yuanhan Liu
2016-11-24 9:30 ` Kevin Traynor
2016-11-24 12:33 ` Yuanhan Liu
2016-11-24 12:47 ` Maxime Coquelin
2016-11-24 15:01 ` Kevin Traynor
2016-11-24 15:24 ` Kavanagh, Mark B
2016-11-28 15:28 ` Maxime Coquelin
2016-11-28 22:18 ` Thomas Monjalon
2016-11-29 8:09 ` Maxime Coquelin
2016-12-09 13:35 ` Maxime Coquelin
2016-12-09 14:42 ` Daniel P. Berrange
2016-12-09 16:45 ` Maxime Coquelin
2016-12-09 16:48 ` Daniel P. Berrange
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161122164143-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=dev@dpdk.org \
--cc=libvir-list@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stephen@networkplumber.org \
--cc=vpp-dev@lists.fd.io \
--cc=yuanhan.liu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).