From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 76BF07E23 for ; Thu, 17 Dec 2015 17:44:01 +0100 (CET) Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id 98E62C0A805D; Thu, 17 Dec 2015 16:44:00 +0000 (UTC) Received: from t450s.home ([10.3.113.8]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id tBHGi0Z6022209; Thu, 17 Dec 2015 11:44:00 -0500 Message-ID: <1450370639.2674.93.camel@redhat.com> From: Alex Williamson To: "Burakov, Anatoly" , "Yigit, Ferruh" Date: Thu, 17 Dec 2015 09:43:59 -0700 In-Reply-To: References: <60420822.AbcfvjLZCk@xps13> <566B4A50.9090607@6wind.com> <1449874953.20509.6.camel@redhat.com> <26FA93C7ED1EAA44AB77D62FBE1D27BA6747CE55@IRSMSX108.ger.corp.intel.com> <1450198398.6042.32.camel@redhat.com> <20151216040408.GA18363@sivlogin002.ir.intel.com> <1450240711.2674.11.camel@redhat.com> <1450285912.2674.22.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] VFIO no-iommu X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2015 16:44:01 -0000 On Wed, 2015-12-16 at 17:22 +0000, Burakov, Anatoly wrote: > Hi Alex, > > > On Wed, 2015-12-16 at 08:35 +0000, Burakov, Anatoly wrote: > > > Hi Alex, > > > > > > > On Wed, 2015-12-16 at 04:04 +0000, Ferruh Yigit wrote: > > > > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson > > > > > wrote: > > > > > I tested the DPDK (HEAD of master) with the patch, with help > > > > > of > > > > > Anatoly, and DPDK works in no-iommu environment with a little > > > > > modification. > > > > > > > > > > Basically the only modification is adapt new group naming > > > > > (noiommu-$) > > > > > and > > > > > > > > Sorry, forgot to mention that one.  The intention with the > > > > modified > > > > group name is that I want to be very certain that a user > > > > intending > > > > to only support properly iommu isolated devices doesn't > > > > accidentally > > > > need to deal with these no-iommu mode devices. > > > > > > > > > disable dma mapping (VFIO_IOMMU_MAP_DMA) > > > > > > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in > > > > > vfio > > > > > module, > > > > > container->noiommu is not set before doing a > > > > > vfio_group_set_container() > > > > > and vfio_for_each_iommu_driver selects wrong driver. > > > > > > > > Running CHECK_EXTENSION on a container without the group > > > > attached is > > > > only going to tell you what extensions vfio is capable of, not > > > > necessarily what extensions are available to you with that > > > > group. > > > > Is this just a general dpdk- vfio ordering bug? > > > > > > Yes, that is how VFIO was implemented in DPDK. I was under the > > > impression that checking extension before assigning devices was > > > the > > > correct way to do things, so as to not to try anything we know > > > would > > > fail anyway. Does this imply that CHECK_EXTENSION needs to be > > > called > > > on both container and groups (or just on groups)? > > > > Hmm, in Documentation/vfio.txt we do give the following algorithm: > > > >         if (ioctl(container, VFIO_GET_API_VERSION) != > > VFIO_API_VERSION) > >                 /* Unknown API version */ > > > >         if (!ioctl(container, VFIO_CHECK_EXTENSION, > > VFIO_TYPE1_IOMMU)) > >                 /* Doesn't support the IOMMU driver we want. */ > >         ... > > > > That's just going to query each iommu driver and we can't yet say > > whether > > the group the user attaches to the container later will actually > > support that > > extension until we try to do it, that would come at VFIO_SET_IOMMU. > >  So is > > it perhaps a vfio bug that we're not advertising no-iommu until the > > group is > > attached?  After all, we are capable of it with just an empty > > container, just > > like we are with type1, but we're going to fail SET_IOMMU for the > > wrong > > combination. > >  This is exactly the sort of thing that makes me glad we reverted > > it without > > feedback from a working user driver.  Thanks, > > Whether it should be considered a "bug" in VFIO or "by design" is up > to you, of course, but at least according to the VFIO documentation, > we are meant to check for type 1 extension and then attach devices, > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as supported > even without any devices attached to the container (just like we get > type 1 as supported without any devices attached). Having said that, > if it was meant to attach devices first and then check the > extensions, then perhaps the documentation should also point out that > fact (or perhaps I missed that detail in my readings of the docs, in > which case my apologies). Hi Anatoly, Does the below patch make it behave more like you'd expect.  This applies to v4.4-rc4, I'd fold this into the base patch if we reincorporate it to a future kernel.  Thanks, Alex commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94 Author: Alex Williamson Date:   Wed Dec 16 19:02:01 2015 -0700     vfio: Fix no-iommu CHECK_EXTENSION          Previously the no-iommu iommu driver was only visible when the     container had an attached no-iommu group.  This means that     CHECK_EXTENSION on and empty container couldn't report the possibility     of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or not the user     can make use of it with the group, so this is inconsistent.  Add the     no-iommu iommu to the list of iommu drivers when enabled via module     option, but skip all the others if the container is attached to a     no-iommu groups.  Note that tainting is now done with the "unsafe"     module callback rather than explictly within vfio.          Also fixes module option and module description name inconsistency.          Also make vfio_noiommu_ops const.          Signed-off-by: Alex Williamson diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index de632da..d3a9432 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -99,9 +99,6 @@ struct vfio_device {    #ifdef CONFIG_VFIO_NOIOMMU  static bool noiommu __read_mostly; -module_param_named(enable_unsafe_noiommu_support, -    noiommu, bool, S_IRUGO | S_IWUSR); -MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false)");  #endif    /* @@ -138,17 +135,6 @@ struct iommu_group *vfio_iommu_group_get(struct device *dev)   iommu_group_put(group);   if (ret)   return NULL; - - /* -  * Where to taint?  At this point we've added an IOMMU group for a -  * device that is not backed by iommu_ops, therefore any iommu_ -  * callback using iommu_ops can legitimately Oops.  So, while we may -  * be about to give a DMA capable device to a user without IOMMU -  * protection, which is clearly taint-worthy, let's go ahead and do -  * it here. -  */ - add_taint(TAINT_USER, LOCKDEP_STILL_OK); - dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n");  #endif     return group; @@ -207,7 +193,7 @@ static void vfio_noiommu_detach_group(void *iommu_data,  {  }   -static struct vfio_iommu_driver_ops vfio_noiommu_ops = { +static const struct vfio_iommu_driver_ops vfio_noiommu_ops = {   .name = "vfio-noiommu",   .owner = THIS_MODULE,   .open = vfio_noiommu_open, @@ -217,24 +203,34 @@ static struct vfio_iommu_driver_ops vfio_noiommu_ops = {   .detach_group = vfio_noiommu_detach_group,  };   -static struct vfio_iommu_driver vfio_noiommu_driver = { - .ops = &vfio_noiommu_ops, +static int noiommu_param_set(const char *val, const struct kernel_param *kp) +{ + int ret; + + if (!val) + val = "1"; + + ret = strtobool(val, kp->arg); + if (ret) + return ret; + + if (noiommu) + ret = vfio_register_iommu_driver(&vfio_noiommu_ops); + else + vfio_unregister_iommu_driver(&vfio_noiommu_ops); + + return ret; +} + +static const struct kernel_param_ops noiommu_param_ops = { + .flags = KERNEL_PARAM_OPS_FL_NOARG, + .set = noiommu_param_set, + .get = param_get_bool,  };   -/* - * Wrap IOMMU drivers, the noiommu driver is the one and only driver for - * noiommu groups (and thus containers) and not available for normal groups. - */ -#define vfio_for_each_iommu_driver(con, pos) \ - for (pos = con->noiommu ? &vfio_noiommu_driver : \ -      list_first_entry(&vfio.iommu_drivers_list, \ -       struct vfio_iommu_driver, vfio_next); \ -      (con->noiommu ? pos != NULL : \ - &pos->vfio_next != &vfio.iommu_drivers_list); \ -       pos = con->noiommu ? NULL : list_next_entry(pos, vfio_next)) -#else -#define vfio_for_each_iommu_driver(con, pos) \ - list_for_each_entry(pos, &vfio.iommu_drivers_list, vfio_next) +module_param_cb_unsafe(enable_unsafe_noiommu_mode, &noiommu_param_ops, +        &noiommu, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false)");  #endif     @@ -999,7 +995,12 @@ static long vfio_ioctl_check_extension(struct vfio_container *container,    */   if (!driver) {   mutex_lock(&vfio.iommu_drivers_lock); - vfio_for_each_iommu_driver(container, driver) { + list_for_each_entry(driver, &vfio.iommu_drivers_list, +     vfio_next) { + if (container->noiommu && +     driver->ops != &vfio_noiommu_ops) + continue; +   if (!try_module_get(driver->ops->owner))   continue;   @@ -1068,9 +1069,12 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container,   }     mutex_lock(&vfio.iommu_drivers_lock); - vfio_for_each_iommu_driver(container, driver) { + list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {   void *data;   + if (container->noiommu && driver->ops != &vfio_noiommu_ops) + continue; +   if (!try_module_get(driver->ops->owner))   continue;