From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id B697A8D3A for ; Fri, 18 Dec 2015 11:43:14 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP; 18 Dec 2015 02:43:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,445,1444719600"; d="scan'208";a="863851831" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga001.fm.intel.com with ESMTP; 18 Dec 2015 02:43:13 -0800 Received: from sivlogin002.ir.intel.com (sivlogin002.ir.intel.com [10.237.217.37]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id tBIAhBQ6008324; Fri, 18 Dec 2015 10:43:11 GMT Received: from sivlogin002.ir.intel.com (localhost [127.0.0.1]) by sivlogin002.ir.intel.com with ESMTP id tBIAhBd6013235; Fri, 18 Dec 2015 10:43:11 GMT Received: (from fyigit@localhost) by sivlogin002.ir.intel.com with œ id tBIAhB4w013231; Fri, 18 Dec 2015 10:43:11 GMT X-Authentication-Warning: sivlogin002.ir.intel.com: fyigit set sender to ferruh.yigit@intel.com using -f Date: Fri, 18 Dec 2015 10:43:10 +0000 From: "Yigit, Ferruh" To: Alex Williamson Message-ID: <20151218104310.GA11371@sivlogin002.ir.intel.com> Mail-Followup-To: Alex Williamson , "Burakov, Anatoly" , "dev@dpdk.org" References: <566B4A50.9090607@6wind.com> <1449874953.20509.6.camel@redhat.com> <26FA93C7ED1EAA44AB77D62FBE1D27BA6747CE55@IRSMSX108.ger.corp.intel.com> <1450198398.6042.32.camel@redhat.com> <20151216040408.GA18363@sivlogin002.ir.intel.com> <1450240711.2674.11.camel@redhat.com> <1450285912.2674.22.camel@redhat.com> <1450370639.2674.93.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1450370639.2674.93.camel@redhat.com> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] VFIO no-iommu X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2015 10:43:15 -0000 On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson wrote: <...> > > > > > > > > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in > > > > > > vfio > > > > > > module, > > > > > > container->noiommu is not set before doing a > > > > > > vfio_group_set_container() > > > > > > and vfio_for_each_iommu_driver selects wrong driver. > > > > > > > > > > Running CHECK_EXTENSION on a container without the group > > > > > attached is > > > > > only going to tell you what extensions vfio is capable of, not > > > > > necessarily what extensions are available to you with that > > > > > group. > > > > > Is this just a general dpdk- vfio ordering bug? > > > > > > > > Yes, that is how VFIO was implemented in DPDK. I was under the > > > > impression that checking extension before assigning devices was > > > > the > > > > correct way to do things, so as to not to try anything we know > > > > would > > > > fail anyway. Does this imply that CHECK_EXTENSION needs to be > > > > called > > > > on both container and groups (or just on groups)? > > > > > > Hmm, in Documentation/vfio.txt we do give the following algorithm: > > > > > >         if (ioctl(container, VFIO_GET_API_VERSION) != > > > VFIO_API_VERSION) > > >                 /* Unknown API version */ > > > > > >         if (!ioctl(container, VFIO_CHECK_EXTENSION, > > > VFIO_TYPE1_IOMMU)) > > >                 /* Doesn't support the IOMMU driver we want. */ > > >         ... > > > > > > That's just going to query each iommu driver and we can't yet say > > > whether > > > the group the user attaches to the container later will actually > > > support that > > > extension until we try to do it, that would come at VFIO_SET_IOMMU. > > >  So is > > > it perhaps a vfio bug that we're not advertising no-iommu until the > > > group is > > > attached?  After all, we are capable of it with just an empty > > > container, just > > > like we are with type1, but we're going to fail SET_IOMMU for the > > > wrong > > > combination. > > >  This is exactly the sort of thing that makes me glad we reverted > > > it without > > > feedback from a working user driver.  Thanks, > > > > Whether it should be considered a "bug" in VFIO or "by design" is up > > to you, of course, but at least according to the VFIO documentation, > > we are meant to check for type 1 extension and then attach devices, > > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as supported > > even without any devices attached to the container (just like we get > > type 1 as supported without any devices attached). Having said that, > > if it was meant to attach devices first and then check the > > extensions, then perhaps the documentation should also point out that > > fact (or perhaps I missed that detail in my readings of the docs, in > > which case my apologies). > > Hi Anatoly, > > Does the below patch make it behave more like you'd expect.  This > applies to v4.4-rc4, I'd fold this into the base patch if we > reincorporate it to a future kernel.  Thanks, > > Alex > > commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94 > Author: Alex Williamson > Date:   Wed Dec 16 19:02:01 2015 -0700 > >     vfio: Fix no-iommu CHECK_EXTENSION >      >     Previously the no-iommu iommu driver was only visible when the >     container had an attached no-iommu group.  This means that >     CHECK_EXTENSION on and empty container couldn't report the possibility >     of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or not the user >     can make use of it with the group, so this is inconsistent.  Add the >     no-iommu iommu to the list of iommu drivers when enabled via module >     option, but skip all the others if the container is attached to a >     no-iommu groups.  Note that tainting is now done with the "unsafe" >     module callback rather than explictly within vfio. >      >     Also fixes module option and module description name inconsistency. >      >     Also make vfio_noiommu_ops const. >      >     Signed-off-by: Alex Williamson Hi Alex, I got following crash with this update: [ +0.046973] BUG: unable to handle kernel NULL pointer dereference at (null) [ +0.000031] IP: [] __list_add+V0x1f/0xc0 [ +0.000024] PGD 0 [ +0.000010] Oops: 0000 [#1] SMP ... [ +0.000028] Call Trace: [ +0.000014] [] __mutex_lock_slowpath+0x91/0x110 [ +0.000022] [] mutex_lock+0x23/0x40 [ +0.000020] [] vfio_register_iommu_driver+0x40/0xc0 [vfio] [ +0.000025] [] noiommu_param_set+0x51/0x60 [vfio] [ +0.000022] [] parse_args+0x1be/0x4a0 [ +0.000021] [] load_module+0xe30/0x2730 [ +0.000942] [] ? __symbol_put+0x60/0x60 [ +0.000921] [] ? kernel_read+0x50/0x80 [ +0.000917] [] SyS_finit_module+0xb9/0xf0 [ +0.000925] [] entry_SYSCALL_64_fastpath+0x12/0x71 > -static struct vfio_iommu_driver vfio_noiommu_driver = { > - .ops = &vfio_noiommu_ops, > +static int noiommu_param_set(const char *val, const struct kernel_param *kp) > +{ > + int ret; > + > + if (!val) > + val = "1"; > + > + ret = strtobool(val, kp->arg); > + if (ret) > + return ret; > + > + if (noiommu) > + ret = vfio_register_iommu_driver(&vfio_noiommu_ops); Is it possible that kernel_param_ops->set() called before module_init() that initializes the mutex. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/module.c?id=refs/tags/v4.4-rc5#n3517 > + else > + vfio_unregister_iommu_driver(&vfio_noiommu_ops); > + > + return ret; > +} > + > +static const struct kernel_param_ops noiommu_param_ops = { > + .flags = KERNEL_PARAM_OPS_FL_NOARG, > + .set = noiommu_param_set, > + .get = param_get_bool, >  }; >   Thanks, ferruh