From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id BC80947CD for ; Thu, 24 Nov 2016 10:30:53 +0100 (CET) Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0655F635E2; Thu, 24 Nov 2016 09:30:53 +0000 (UTC) Received: from ktraynor.remote.csb (vpn1-5-140.ams2.redhat.com [10.36.5.140]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAO9Unkt015904; Thu, 24 Nov 2016 04:30:50 -0500 To: Yuanhan Liu , "Michael S. Tsirkin" References: <20161011173526-mutt-send-email-mst@kernel.org> <20161117082902.GM5048@yliu-dev.sh.intel.com> <20161117094936.GN5048@yliu-dev.sh.intel.com> <20161117192445-mutt-send-email-mst@kernel.org> <20161122130223.GW5048@yliu-dev.sh.intel.com> <20161122164143-mutt-send-email-mst@kernel.org> <20161124063129.GE5048@yliu-dev.sh.intel.com> Cc: Maxime Coquelin , dev@dpdk.org, Stephen Hemminger , qemu-devel@nongnu.org, libvir-list@redhat.com, vpp-dev@lists.fd.io, =?UTF-8?Q?Marc-Andr=c3=a9_Lureau?= From: Kevin Traynor X-Enigmail-Draft-Status: N1110 Organization: Red Hat Message-ID: <4d6e8cf0-fe19-43a9-ff73-c2a9cdeb681e@redhat.com> Date: Thu, 24 Nov 2016 09:30:49 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161124063129.GE5048@yliu-dev.sh.intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Thu, 24 Nov 2016 09:30:53 +0000 (UTC) Subject: Re: [dpdk-dev] dpdk/vpp and cross-version migration for vhost X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Nov 2016 09:30:54 -0000 On 11/24/2016 06:31 AM, Yuanhan Liu wrote: > On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote: >>>> You keep assuming that you have the VM started first and >>>> figure out things afterwards, but this does not work. >>>> >>>> Think about a cluster of machines. You want to start a VM in >>>> a way that will ensure compatibility with all hosts >>>> in a cluster. >>> >>> I see. I was more considering about the case when the dst >>> host (including the qemu and dpdk combo) is given, and >>> then determine whether it will be a successfull migration >>> or not. >>> >>> And you are asking that we need to know which host could >>> be a good candidate before starting the migration. In such >>> case, we indeed need some inputs from both the qemu and >>> vhost-user backend. >>> >>> For DPDK, I think it could be simple, just as you said, it >>> could be either a tiny script, or even a macro defined in >>> the source code file (we extend it every time we add a >>> new feature) to let the libvirt to read it. Or something >>> else. >> >> There's the issue of APIs that tweak features as Maxime >> suggested. > > Yes, it's a good point. > >> Maybe the only thing to do is to deprecate it, > > Looks like so. > >> but I feel some way for application to pass info into >> guest might be benefitial. > > The two APIs are just for tweaking feature bits DPDK supports before > any device got connected. It's another way to disable some features > (the another obvious way is to through QEMU command lines). > > IMO, it's bit handy only in a case like: we have bunch of VMs. Instead > of disabling something though qemu one by one, we could disable it > once in DPDK. > > But I doubt the useful of it. It's only used in DPDK's vhost example > after all. Nor is it used in vhost pmd, neither is it used in OVS. rte_vhost_feature_disable() is currently used in OVS, lib/netdev-dpdk.c netdev_dpdk_vhost_class_init(void) { static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; /* This function can be called for different classes. The initialization * needs to be done only once */ if (ovsthread_once_start(&once)) { rte_vhost_driver_callback_register(&virtio_net_device_ops); rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4 | 1ULL << VIRTIO_NET_F_HOST_TSO6 | 1ULL << VIRTIO_NET_F_CSUM); > >>>> If you don't, guest visible interface will change >>>> and you won't be able to migrate. >>>> >>>> It does not make sense to discuss feature bits specifically >>>> since that is not the only part of interface. >>>> For example, max ring size supported might change. >>> >>> I don't quite understand why we have to consider the max ring >>> size here? Isn't it a virtio device attribute, that QEMU could >>> provide such compatibility information? >>> >>> I mean, DPDK is supposed to support vary vring size, it's QEMU >>> to give a specifc value. >> >> If backend supports s/g of any size up to 2^16, there's no issue. > > I don't know others, but I see no issues in DPDK. > >> ATM some backends might be assuming up to 1K s/g since >> QEMU never supported bigger ones. We might classify this >> as a bug, or not and add a feature flag. >> >> But it's just an example. There might be more values at issue >> in the future. > > Yeah, maybe. But we could analysis it one by one. > >>>> Let me describe how it works in qemu/libvirt. >>>> When you install a VM, you can specify compatibility >>>> level (aka "machine type"), and you can query the supported compatibility >>>> levels. Management uses that to find the supported compatibility >>>> and stores the compatibility in XML that is migrated with the VM. >>>> There's also a way to find the latest level which is the >>>> default unless overridden by user, again this level >>>> is recorded and then >>>> - management can make sure migration destination is compatible >>>> - management can avoid migration to hosts without that support >>> >>> Thanks for the info, it helps. >>> >>> ... >>>>>>>> As version here is an opaque string for libvirt and qemu, >>>>>>>> anything can be used - but I suggest either a list >>>>>>>> of values defining the interface, e.g. >>>>>>>> any_layout=on,max_ring=256 >>>>>>>> or a version including the name and vendor of the backend, >>>>>>>> e.g. "org.dpdk.v4.5.6". >>> >>> The version scheme may not be ideal here. Assume a QEMU is supposed >>> to work with a specific DPDK version, however, user may disable some >>> newer features through qemu command line, that it also could work with >>> an elder DPDK version. Using the version scheme will not allow us doing >>> such migration to an elder DPDK version. The MTU is a lively example >>> here? (when MTU feature is provided by QEMU but is actually disabled >>> by user, that it could also work with an elder DPDK without MTU support). >>> >>> --yliu >> >> OK, so does a list of values look better to you then? > > Yes, if there are no better way. > > And I think it may be better to not list all those features, literally. > But instead, using the number should be better, say, features=0xdeadbeef. > > Listing the feature names means we have to come to an agreement in all > components involved here (QEMU, libvirt, DPDK, VPP, and maybe more > backends), that we have to use the exact same feature names. Though it > may not be a big deal, it lacks some flexibility. > > A feature bits will not have this issue. > > --yliu > >> >> >>>>>>>> >>>>>>>> Note that typically the list of supported versions can only be >>>>>>>> extended, not shrunk. Also, if the host/guest interface >>>>>>>> does not change, don't change the current version as >>>>>>>> this just creates work for everyone. >>>>>>>> >>>>>>>> Thoughts? Would this work well for management? dpdk? vpp? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> -- >>>>>>>> MST