From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 5B33B2C6E for ; Wed, 1 Feb 2017 10:14:59 +0100 (CET) Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 93FC681254; Wed, 1 Feb 2017 09:14:59 +0000 (UTC) Received: from [10.34.129.131] (dhcp129-131.brq.redhat.com [10.34.129.131]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v119EteG020507 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Feb 2017 04:14:57 -0500 To: Maxime Coquelin , Kevin Traynor , "Michael S. Tsirkin" , "Daniel P. Berrange" , Ciara Loftus , mark.b.kavanagh@intel.com, Flavio Leitner , Yuanhan Liu , Daniele Di Proietto References: Cc: "dev@openvswitch.org" , "dev@dpdk.org" , "libvir-list@redhat.com" From: Michal Privoznik Message-ID: Date: Wed, 1 Feb 2017 10:14:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 01 Feb 2017 09:14:59 +0000 (UTC) Subject: Re: [dpdk-dev] [libvirt] [RFC] Vhost-user backends cross-version migration support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Feb 2017 09:14:59 -0000 On 02/01/2017 09:35 AM, Maxime Coquelin wrote: > Hi, > > Few months ago, Michael reported a problem about migrating VMs relying > on vhost-user between hosts supporting different backend versions: > - Message-Id: <20161011173526-mutt-send-email-mst@kernel.org> > - https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg03026.html > > The goal of this thread is to draft a proposal based on the outcomes > of discussions with contributors of the different parties (DPDK/OVS > /libvirt/...). > > Problem statement: > ================== > > When migrating a VM from one host to another, the interfaces exposed by > QEMU must stay unchanged in order to guarantee a successful migration. > In the case of vhost-user interface, parameters like supported Virtio > feature set, max number of queues, max vring sizes,... must remain > compatible. Indeed, the frontend not being re-initialized, no > renegotiation happens at migration time. > > For example, we have a VM that runs on host A, which has its vhost-user > backend advertising VIRTIO_F_RING_INDIRECT_DESC feature. Since the Guest > also support this feature, it is successfully negotiated, and guest > transmit packets using indirect descriptor tables, that the backend > knows to handle. > At some point, the VM is being migrated to host B, which runs an older > version of the backend not supporting this VIRTIO_F_RING_INDIRECT_DESC > feature. The migration would break, because the Guest still have the > VIRTIO_F_RING_INDIRECT_DESC bit sets, and the virtqueue contains some > decriptors pointing to indirect tables, that backend B doesn't know to > handle. > This is just an example about Virtio features compatibility, but other > backend implementation details could cause other failures. Exactly. Libvirt can't possibly know which virtio features has guest negotiated to use. Therefore I don't think this falls into libvirt scope. > > What we need is to be able to query the destination host's backend to > ensure migration is possible. Also, we would need to query this > statically, even before the VM is started, to be sure it could be > migrated elsewhere for any reason. Again, if you have more than two hosts, say A-Z, I don't see how libvirt could know what hosts to asks (where you will migrate your guest), and what combination of virtio features is okay and which is a deal breaker. > > > Solution 1: Libvirt queries DPDK vhost lib: *KO* > ================================================ > > Initial idea was to have the management tool (libvirt) to query DPDK > vhost lib and get key/value pairs and check whether migration is > possible. This solution doesn't work for several reasons: > 1. Vhost lib API provide a way for the application to disable features > at runtime (understand, not at build time). So coming back to previous > example, DPDK v16.11 supports indirect descriptor features, but it could > be disabled by OVS. We had a look at whether this API was mandatory, and > it turns out to be, as TSO feature is supported on DPDK but not in OVS. > So we cannot rely on DPDK only. > 2. Some parameter may be not only DPDK specific, such as the maximum > number of queues for example. > > > Solution 2: Libvirt queries OVS for vhost backend key/value pairs: *KO* > ======================================================================= > > Second idea was for OVS to expose its vhost backend implementation > parameters as key/value pairs, for example in the DB or by a dedicated > tool. For example, you could have this kind of information: > - virtio-features: 0x12045694 > - max-rx-queues: 1024 > - max-rx-ring-size: 512 > Doing this, Libvirt has the information to take decision whether > migration is possible or not. > The problem is that Libvirt doesn't know (and want) to interpret these > values (should it be equal/lower/greater/...?), and each time a new > key is introduced in OVS, Libvirt will have to be updated to handle it, > adding an unwanted synchronization constraint between the projects. > > > Solution 3: Libvirt queries OVS for vhost backend version string: *OK* > ====================================================================== > > > The idea is to have a table of supported versions, associated to > key/value pairs. Libvirt could query the list of supported versions > strings for each hosts, and select the first common one among all hosts. How does libvirt know what hosts to ask? Libvirt aims on managing a single host. It has no knowledge of other hosts on the network. That's task for upper layers like RHEV, OpenStack, etc. > > Then, libvirt would ask OVS to probe the vhost-user interfaces in the > selected version (compatibility mode). For example host A runs OVS-2.7, > and host B OVS-2.6. Host A's OVS-2.7 has an OVS-2.6 compatibility mode > (e.g. with indirect descriptors disabled), which should be selected at > vhost-user interface probe time. > > Advantage of doing so is that libvirt does not need any update if new > keys are introduced (i.e. it does not need to know how the new keys have > to be handled), all these checks remain in OVS's vhost-user implementation. And that's where they should stay. Duplicating code between projects will inevitably lead to a divergence. > > Ideally, we would support per vhost-user interface compatibility mode, > which may have an impact also on DPDK API, as the Virtio feature update > API is global, and not per port. In general, I don't think we want any kind of this logic in libvirt. Either: a) fallback logic should be implemented in qemu (e.g. upon migration it should detect that the migrated guest uses certain version and thus set backend to use that version or error out and cancel migration), or b) libvirt would grew another element/attribute to specify version of vhost-user backend in use and do nothing more than pass it to qemu. At the same time, we can provide an API (or extend and existing one, e.g. virsh domcapabilities) to list all available versions on given host. Upper layer, which knows what are the possible hosts suitable for virtualization, can then use this API to ask all the hosts, construct the matrix, select preferred version and put it into libvirt's domain XML. But frankly, I don't like b) that much. Lets put the fact this is OVS aside for a moment. Just pretend this is a generic device in qemu. Would we do the same magic with it? No! Or lets talk about machine types. You spawn -M type$((X+1)) guest and then decide to migrate it to a host with older qemu wich supports just typeX. Well, you get an error. Do we care? Not at all! It's your responsibility (as user/admin) to upgrade the qemu so that it supports new machine type. I think the same applies to OVS. Sorry. Michal