From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 1EF4658CF for ; Fri, 10 Feb 2017 20:03:19 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 Feb 2017 11:03:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,142,1484035200"; d="scan'208";a="223817519" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.237.220.137]) ([10.237.220.137]) by fmsmga004.fm.intel.com with ESMTP; 10 Feb 2017 11:03:17 -0800 To: Alejandro Lucero References: <1484742475-41005-1-git-send-email-alejandro.lucero@netronome.com> <67a9fd3b-b7f5-c641-9f59-590155cbd30b@intel.com> Cc: dev From: Ferruh Yigit Message-ID: Date: Fri, 10 Feb 2017 19:03:17 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH] igb_uio: map dummy dma forcing iommu domain attachment X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2017 19:03:20 -0000 On 2/8/2017 11:54 AM, Alejandro Lucero wrote: > Hi Ferruh, > > On Tue, Feb 7, 2017 at 3:59 PM, Ferruh Yigit > wrote: > > Hi Alejandro, > > On 1/18/2017 12:27 PM, Alejandro Lucero wrote: > > For using a DPDK app when iommu is enabled, it requires to > > add iommu=pt to the kernel command line. But using igb_uio driver > > makes DMAR errors because the device has not an IOMMU domain. > > Please help to understand the scope of the problem, > > > After reading your reply, I realize I could have explained it better. > First of all, this is related to SRIOV, exactly when the VFs are created. > > > 1- How can you re-produce the problem? > > > Using a VF from a Intel card by a DPDK app in the host and a kernel >= > 3.15. Although usually VFs are assigned to VMs, it could also be an > option to use VFs by the host. > > BTW, I did not try to reproduce the problem with an Intel card. I > triggered this problem with an NFP, but because the problem behind, I > bet that is going to happen for an Intel one as well. I can able to reproduce the problem with ixgbe, by using VF on the host. And I verified your patch fixes it, it cause device attached to a vfio group. So, I believe good to get this patch, but it is already to late for 17.02 release. I suggest getting this one early 17.05, so it gives more time to test. > > > > 2- What happens get DMAR errors, is it prevents device work or some > annoying error messages? > > > A DMAR error implies the device can not access to the DMA address given > by the host. I have experienced several situations where it is just that > device not being able to work at all, but it also has more global > implications and you need to reboot the system because it is unreliable. > I think it depends on how these DMAR errors are handled, but in any > case, this is a bad thing. In my test, implication was device is not working. > > > > 3- Can you please share the error messages? > > > With this problem you can expect something like this: > > 559.163874] DMAR: DRHD: handling fault status reg 2 > [ 559.165427] DMAR: DMAR:[DMA Read] Request device [82:08.0] fault addr > e7b73b000 > [ 559.165427] DMAR:[fault reason 02] Present bit in context entry is clear > [ 568.367417] DMAR: DRHD: handling fault status reg 102 > [ 568.369025] DMAR: DMAR:[DMA Read] Request device [82:08.1] fault addr > ebb73b000 > [ 568.369025] DMAR:[fault reason 02] Present bit in context entry is clear > [ 571.773944] DMAR: DRHD: handling fault status reg 202 > [ 571.775550] DMAR: DMAR:[DMA Read] Request device [82:08.2] fault addr > efb73b000 > [ 571.775550] DMAR:[fault reason 02] Present bit in context entry is clear > [ 575.039654] DMAR: DRHD: handling fault status reg 302 > [ 575.041259] DMAR: DMAR:[DMA Read] Request device [82:08.3] fault addr > f3b73b000 > [ 575.041259] DMAR:[fault reason 02] Present bit in context entry is clear > > There are different DMAR errors, sometimes referring to a specific > address being wrong. In this case it is related to the device not having > a context or a IOMMU domain. > > Also note we got these errors for different devices/VFs. This was with a > DPDK app using several VFs. > > > > > > > > Since kernel 3.15, iommu=pt requires to use the internal kernel > > DMA API for attaching the device to the IOMMU 1:1 mapping, aka > > si_domain. Previous versions did attach the device to that > > domain when intel iommu notifier was called. > > Again, what is not working since 3.15? > > > This specific case, yes. With older kernels, when VFs are created, IOMMU > code is executed (notifier chain callback) and if iommu=pt, the VF is > attached to the si_domain, this is the 1:1 mapping. But this has changed > with newer kernels, and after VFs are created they have no IOMMU domain > at all. The kernel expects the driver to implicitly create such a domain > when the kernel DMA API is used. Thanks again for clarification. What will be the effect of your patch for kernel < 3.15, should your update be protected with a kernel version check, or is it safe for all? > > > > > > > This is not a problem if the driver does later some call to the > > DMA API because the mapping can be done then. But DPDK apps do > > not use that DMA API at all. > > Is this same/similar with: > http://dpdk.org/dev/patchwork/patch/12654/ > > > > That case was another issue regarding IOMMU and iommu=pt. The problem > there was when you detach a VF from a VM, but the VF was initially > attached to the si_domain because the kernel did so. The patch helped to > attach the VF again to that domain when binding to the UIO. > > Looking at that patch now (I did comment on it then), it just solved the > problem if the VF was detach form the UIO, something that could be > easily forgotten or simply not done because, apparently, it is not needed. I also able to reproduce this case. When driver switched from igb_uio -> vfio_pci -> igb_uio, it stops working, giving similar DMAR errors. Your patch also fixing this, at least for my test. When unbind from vfio_pci, iommu group removed, but binding igb_uio adds it back. > > What about to use VFIO? > > With that previous patch, it was not enough. I do not remember the > details now, and I'm not sure if VFIO created another IOMMU domain if > the device had one, but it could leave the device without an IOMMU > domain after the first use. > > In this particular case, VFIO would work, because the device gets its > own IOMMU domain. But there are two main problems if this is not fixed > when using UIO: > > 1) UIO is one of the two options for working with IOMMU. We all agree > VFIO is the right one for IOMMU, but as long as UIO is still an option, > that should be fixed. > > 2) Some installations need to work with and without IOMMU. Having same > module for both cases makes things simpler and therefore they use UIO > instead of VFIO. > > > > > > > Doing this dma map and unmap is harmless even when iommu is not > > enabled at all. > > > > Signed-off-by: Alejandro Lucero > Tested-by: Ferruh Yigit > <...> > > Thanks, > ferruh > > >