From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id DA9C02C6E for ; Wed, 1 Feb 2017 09:35:29 +0100 (CET) Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2458961B9A; Wed, 1 Feb 2017 08:35:30 +0000 (UTC) Received: from [10.36.116.176] (ovpn-116-176.ams2.redhat.com [10.36.116.176]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v118ZNkr025648 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Feb 2017 03:35:26 -0500 From: Maxime Coquelin To: Kevin Traynor , "Michael S. Tsirkin" , "Daniel P. Berrange" , Ciara Loftus , mark.b.kavanagh@intel.com, Flavio Leitner , Yuanhan Liu , Daniele Di Proietto Cc: "dev@openvswitch.org" , "dev@dpdk.org" , "libvir-list@redhat.com" Message-ID: Date: Wed, 1 Feb 2017 09:35:22 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 01 Feb 2017 08:35:30 +0000 (UTC) Subject: [dpdk-dev] [RFC] Vhost-user backends cross-version migration support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Feb 2017 08:35:30 -0000 Hi, Few months ago, Michael reported a problem about migrating VMs relying on vhost-user between hosts supporting different backend versions: - Message-Id: <20161011173526-mutt-send-email-mst@kernel.org> - https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg03026.html The goal of this thread is to draft a proposal based on the outcomes of discussions with contributors of the different parties (DPDK/OVS /libvirt/...). Problem statement: ================== When migrating a VM from one host to another, the interfaces exposed by QEMU must stay unchanged in order to guarantee a successful migration. In the case of vhost-user interface, parameters like supported Virtio feature set, max number of queues, max vring sizes,... must remain compatible. Indeed, the frontend not being re-initialized, no renegotiation happens at migration time. For example, we have a VM that runs on host A, which has its vhost-user backend advertising VIRTIO_F_RING_INDIRECT_DESC feature. Since the Guest also support this feature, it is successfully negotiated, and guest transmit packets using indirect descriptor tables, that the backend knows to handle. At some point, the VM is being migrated to host B, which runs an older version of the backend not supporting this VIRTIO_F_RING_INDIRECT_DESC feature. The migration would break, because the Guest still have the VIRTIO_F_RING_INDIRECT_DESC bit sets, and the virtqueue contains some decriptors pointing to indirect tables, that backend B doesn't know to handle. This is just an example about Virtio features compatibility, but other backend implementation details could cause other failures. What we need is to be able to query the destination host's backend to ensure migration is possible. Also, we would need to query this statically, even before the VM is started, to be sure it could be migrated elsewhere for any reason. Solution 1: Libvirt queries DPDK vhost lib: *KO* ================================================ Initial idea was to have the management tool (libvirt) to query DPDK vhost lib and get key/value pairs and check whether migration is possible. This solution doesn't work for several reasons: 1. Vhost lib API provide a way for the application to disable features at runtime (understand, not at build time). So coming back to previous example, DPDK v16.11 supports indirect descriptor features, but it could be disabled by OVS. We had a look at whether this API was mandatory, and it turns out to be, as TSO feature is supported on DPDK but not in OVS. So we cannot rely on DPDK only. 2. Some parameter may be not only DPDK specific, such as the maximum number of queues for example. Solution 2: Libvirt queries OVS for vhost backend key/value pairs: *KO* ======================================================================= Second idea was for OVS to expose its vhost backend implementation parameters as key/value pairs, for example in the DB or by a dedicated tool. For example, you could have this kind of information: - virtio-features: 0x12045694 - max-rx-queues: 1024 - max-rx-ring-size: 512 Doing this, Libvirt has the information to take decision whether migration is possible or not. The problem is that Libvirt doesn't know (and want) to interpret these values (should it be equal/lower/greater/...?), and each time a new key is introduced in OVS, Libvirt will have to be updated to handle it, adding an unwanted synchronization constraint between the projects. Solution 3: Libvirt queries OVS for vhost backend version string: *OK* ====================================================================== The idea is to have a table of supported versions, associated to key/value pairs. Libvirt could query the list of supported versions strings for each hosts, and select the first common one among all hosts. Then, libvirt would ask OVS to probe the vhost-user interfaces in the selected version (compatibility mode). For example host A runs OVS-2.7, and host B OVS-2.6. Host A's OVS-2.7 has an OVS-2.6 compatibility mode (e.g. with indirect descriptors disabled), which should be selected at vhost-user interface probe time. Advantage of doing so is that libvirt does not need any update if new keys are introduced (i.e. it does not need to know how the new keys have to be handled), all these checks remain in OVS's vhost-user implementation. Ideally, we would support per vhost-user interface compatibility mode, which may have an impact also on DPDK API, as the Virtio feature update API is global, and not per port. - Implementation: ----------------- Goal here is just to illustrate this proposal, I'm sure you will have good suggestion to improve it. In OVS vhost-user library, we would introduce a new structure, for example (neither compiled nor tested): struct vhostuser_compat { char *version; uint64_t virtio_features; uint32_t max_rx_queue_sz; uint32_t max_nr_queues; }; *version* field is the compatibility version string. It could be something like: "upstream.ovs-dpdk.v2.6" In case for example Fedora adds some more patches to its package that would break migration to upstream version, it could have a dedicated compatibility string: "fc26.ovs-dpdk.v2.6". In case OVS-v2.7 does not break compatibility with previous OVS-v2.6 version, then no need to create a new compatibility entry, just keep v2.6 one. *virtio_features* field is the Virtio features set for a given compatibility version. When an OVS tag is to be created, it would be associated to a DPDK version. The Virtio features for these version would be stored in this field. It would allow to upgrade the DPDK package for example from v16.07 to v16.11 without breaking migration. In case the distribution wants to benefit from latests Virtio features, it would have to create a new entry to ensure migration won't be broken. *max_rx_queue_sz* *max_nr_queues* fields are just here for example, don't think this is needed today. I just want to illustrate that we have to anticipate other parameters than the Virtio feature set, even if not necessary at the moment. We create a table with different compatibility versions in OVS vhost-user lib: static struct vhostuser_compat vu_compat[] = { { .version = "upstream.ovs-dpdk.v2.7", .virtio_features = 0x12045694, .max_rx_queue_sz = 512, }, { .version = "upstream.ovs-dpdk.v2.6", .virtio_features = 0x10045694, .max_rx_queue_sz = 1024, }, } At some time during installation, or system init, the table would be parsed, and compatibility version strings would be stored into the OVS database, or a new tool would be created to list these strings. Before launching the VM, libvirt will query the version strings for each hosts using for example the JSON RPC API of OVS (maybe not the best solution, looking forward for your comments on this). Libvirt would then select the first common supported version, and insert this string into the vhost-user interfaces parameters in the OVS DBs of each host. When the vhost-user connection is initiated, OVS would know in which compatibility mode to init the interface, for example by restricting the support Virtio features of the interface. Do you think this is reasonable? Or maybe you have alternative ideas that would be best fit to ensure successful migration? Cheers, Maxime