From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 06C625597 for ; Wed, 16 Nov 2016 21:43:48 +0100 (CET) Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3F58BC04B958; Wed, 16 Nov 2016 20:43:48 +0000 (UTC) Received: from [10.36.5.68] (vpn1-5-68.ams2.redhat.com [10.36.5.68]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAGKhhse007735 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 16 Nov 2016 15:43:45 -0500 To: "Michael S. Tsirkin" , Yuanhan Liu References: <20161011173526-mutt-send-email-mst@kernel.org> Cc: dev@dpdk.org, Stephen Hemminger , qemu-devel@nongnu.org, libvir-list@redhat.com, vpp-dev@lists.fd.io From: Maxime Coquelin Message-ID: <8fe89be5-faa0-b779-ab02-b734bf7a2daf@redhat.com> Date: Wed, 16 Nov 2016 21:43:42 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20161011173526-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 16 Nov 2016 20:43:48 +0000 (UTC) Subject: Re: [dpdk-dev] dpdk/vpp and cross-version migration for vhost X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Nov 2016 20:43:49 -0000 Hi Michael, On 10/13/2016 07:50 PM, Michael S. Tsirkin wrote: > Hi! > So it looks like we face a problem with cross-version > migration when using vhost. It's not new but became more > acute with the advent of vhost user. > > For users to be able to migrate between different versions > of the hypervisor the interface exposed to guests > by hypervisor must stay unchanged. > > The problem is that a qemu device is connected > to a backend in another process, so the interface > exposed to guests depends on the capabilities of that > process. > > Specifically, for vhost user interface based on virtio, this includes > the "host features" bitmap that defines the interface, as well as more > host values such as the max ring size. Adding new features/changing > values to this interface is required to make progress, but on the other > hand we need ability to get the old host features to be compatible. > > To solve this problem within qemu, qemu has a versioning system based on > a machine type concept which fundamentally is a version string, by > specifying that string one can get hardware compatible with a previous > qemu version. QEMU also reports the latest version and list of versions > supported so libvirt records the version at VM creation and then is > careful to use this machine version whenever it migrates a VM. > > One might wonder how is this solved with a kernel vhost backend. The > answer is that it mostly isn't - instead an assumption is made, that > qemu versions are deployed together with the kernel - this is generally > true for downstreams. Thus whenever qemu gains a new feature, it is > already supported by the kernel as well. However, if one attempts > migration with a new qemu from a system with a new to old kernel, one > would get a failure. > > In the world where we have multiple userspace backends, with some of > these supplied by ISVs, this seems non-realistic. > > IMO we need to support vhost backend versioning, ideally > in a way that will also work for vhost kernel backends. > > So I'd like to get some input from both backend and management > developers on what a good solution would look like. > > If we want to emulate the qemu solution, this involves adding the > concept of interface versions to dpdk. For example, dpdk could supply a > file (or utility printing?) with list of versions: latest and versions > supported. libvirt could read that and So if I understand correctly, it would be generated at build time? One problem I see is that the DPDK's vhost-user lib API provides a way to disable features: " rte_vhost_feature_disable/rte_vhost_feature_enable(feature_mask) This function disables/enables some features. For example, it can be used to disable mergeable buffers and TSO features, which both are enabled by default. " I think we should not have this capability on host side, it should be guest's decision to use or not some features, and if it has to be done on host, QEMU already provides a way to disable features (moreover per-device, which is not the case with rte_vhost_feature_disable). IMHO, we should consider deprecating this API in v17.02. That said, the API is here, and it would break migration if the version file advertises some features the vSwitch has disabled at runtime. > - store latest version at vm creation > - pass it around with the vm > - pass it to qemu > From here, qemu could pass this over the vhost-user channel, > thus making sure it's initialized with the correct > compatible interface. Using vhost-user protocol features I guess? > As version here is an opaque string for libvirt and qemu, > anything can be used - but I suggest either a list > of values defining the interface, e.g. > any_layout=on,max_ring=256 > or a version including the name and vendor of the backend, > e.g. "org.dpdk.v4.5.6". I think the first option provides more flexibility. For example, we could imagine migrating from a process using DPDK's vhost-user lib, to another process using its own implementation (VPP has its own implementation currently if I'm not mistaken). Maybe this scenario does not make sense, but in this case, exposing values directly would avoid the need for synchronization between vhost-user implementations. > > Note that typically the list of supported versions can only be > extended, not shrunk. Also, if the host/guest interface > does not change, don't change the current version as > this just creates work for everyone. > > Thoughts? Would this work well for management? dpdk? vpp? One thing I'm not clear is how it will work for the MTU feature, if the process it is migrated to exposes a larger MTU that the guest doesn't support (if it has sized receive buffers to pre-migration MTU for example). Thanks, Maxime