DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] VFIO no-iommu
@ 2015-12-11 16:28 Thomas Monjalon
  2015-12-11 22:12 ` Vincent JARDIN
  2015-12-11 23:20 ` Jan Viktorin
  0 siblings, 2 replies; 35+ messages in thread
From: Thomas Monjalon @ 2015-12-11 16:28 UTC (permalink / raw)
  To: dev; +Cc: Alex Williamson

Recently there were some discussions to have an upstream replacement
for our igb_uio module.
Several solutions were discussed (new uio driver, uio_pci_generic, vfio):
	https://lkml.org/lkml/2015/10/16/700

Alex Williamson (maintainer of VFIO driver), submitted a solution
and was waiting some feedback. Unfortunately, nobody caught it and
he has reverted his work:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ae5515d

It is an important challenge to remove our out-of-tree modules and
especially igb_uio. It is a long way to have a standard solution integrated
in every distributions.
The current cooking Linux kernel is 4.4 and will have a long term maintenance:
	https://kernel.org/releases.html
So it is a pity to miss this opportunity.

Stephen has fixed a bug to use the IOMMU group zero:
	http://dpdk.org/browse/dpdk/commit/?id=22215f141b1

Is there someone interested to work on VFIO no-iommu and provide
some feedbacks?
We also need to prepare a documentation patch to explain its usage
compared to the standard VFIO mode.

Thanks

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-11 16:28 [dpdk-dev] VFIO no-iommu Thomas Monjalon
@ 2015-12-11 22:12 ` Vincent JARDIN
  2015-12-11 23:02   ` Alex Williamson
  2015-12-11 23:20 ` Jan Viktorin
  1 sibling, 1 reply; 35+ messages in thread
From: Vincent JARDIN @ 2015-12-11 22:12 UTC (permalink / raw)
  To: dev, Alex Williamson

Thanks Thomas for putting back this topic.

Alex,

I'd like to hear more about the impacts of "unsupported":
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=033291eccbdb1b70ffc02641edae19ac825dc75d
   Use of this mode, specifically binding a device without a native
   IOMMU group to a VFIO bus driver will taint the kernel and should
   therefore not be considered supported.

It means that we get ride of uio; so it is a nice code cleanup: but why 
would VFIO/NO IOMMU be better if the bottomline is "unsupported"?

Thank you,
   Vincent

On 11/12/2015 17:28, Thomas Monjalon wrote:
> Recently there were some discussions to have an upstream replacement
> for our igb_uio module.
> Several solutions were discussed (new uio driver, uio_pci_generic, vfio):
> 	https://lkml.org/lkml/2015/10/16/700
>
> Alex Williamson (maintainer of VFIO driver), submitted a solution
> and was waiting some feedback. Unfortunately, nobody caught it and
> he has reverted his work:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ae5515d
>
> It is an important challenge to remove our out-of-tree modules and
> especially igb_uio. It is a long way to have a standard solution integrated
> in every distributions.
> The current cooking Linux kernel is 4.4 and will have a long term maintenance:
> 	https://kernel.org/releases.html
> So it is a pity to miss this opportunity.
>
> Stephen has fixed a bug to use the IOMMU group zero:
> 	http://dpdk.org/browse/dpdk/commit/?id=22215f141b1
>
> Is there someone interested to work on VFIO no-iommu and provide
> some feedbacks?
> We also need to prepare a documentation patch to explain its usage
> compared to the standard VFIO mode.
>
> Thanks
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-11 22:12 ` Vincent JARDIN
@ 2015-12-11 23:02   ` Alex Williamson
  2015-12-15 13:43     ` O'Driscoll, Tim
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2015-12-11 23:02 UTC (permalink / raw)
  To: Vincent JARDIN, dev

On Fri, 2015-12-11 at 23:12 +0100, Vincent JARDIN wrote:
> Thanks Thomas for putting back this topic.
> 
> Alex,
> 
> I'd like to hear more about the impacts of "unsupported":
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commi
> t/?id=033291eccbdb1b70ffc02641edae19ac825dc75d
>    Use of this mode, specifically binding a device without a native
>    IOMMU group to a VFIO bus driver will taint the kernel and should
>    therefore not be considered supported.
> 
> It means that we get ride of uio; so it is a nice code cleanup: but
> why 
> would VFIO/NO IOMMU be better if the bottomline is "unsupported"?

How supportable do you think the uio method is?  Fundamentally we have
a userspace driver doing unrestricted DMA; it can access and modify any
memory in the system.  This is the reason uio won't provide a mechanism
to enable MSI and if you ask the uio maintainer, they don't support DMA
at all, it's only intended as a programmed IO interface to the device.
 Unless we can sandbox a user owned device within an IOMMU protected
container, it's not supportable.  The VFIO no-iommu mode can simply
provide you that unsupported mode more easily since it leverages code
from the supported mode, which is IOMMU protected.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-11 16:28 [dpdk-dev] VFIO no-iommu Thomas Monjalon
  2015-12-11 22:12 ` Vincent JARDIN
@ 2015-12-11 23:20 ` Jan Viktorin
  2015-12-15 11:20   ` Alejandro Lucero
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Viktorin @ 2015-12-11 23:20 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Alex Williamson

Hello,

I am not involved in the vfio very much, however, I was watching some
vfio-related code in last few weeks. It looks promising to me and
IMHO it seems to the best way to bring a support of integrated Ethernet
MACs into DPDK (related to many SoCs). Unfortunately, the ARMv7 SoCs (I
know) lacks of an IOMMU... The only protection there is the TrustZone
technology but I have no idea of its support in the kernel. It's also
far from being a replacement of an IOMMU. When using FPGAs, it is
possible to put an IOMMU engine there (I've got such a prototype
somewhere in my VHDL library) but nobody will probably do use because
of saving on-chip resources.

The X-Gene SoC (ARM 64) contains 2x 10 Gbps EMACs on the chip. I have no
idea about IOMMUs there. Thus, this platform can probably benefit of
such driver as well. The question is whether there is some interest to
have this kind of support in DPDK.

Thus, I'd like to have the vfio/no-iommu to support the ARMv7 (otherwise
it would be effectively dead in DPDK). Unfortunately, it's not my
primary job at the moment.

Regards
Jan

Note: as far as I know, it is discouraged to refer to lkml.org as
it is often very slow - my case today :).

On Fri, 11 Dec 2015 17:28:43 +0100
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> Recently there were some discussions to have an upstream replacement
> for our igb_uio module.
> Several solutions were discussed (new uio driver, uio_pci_generic, vfio):
> 	https://lkml.org/lkml/2015/10/16/700
> 
> Alex Williamson (maintainer of VFIO driver), submitted a solution
> and was waiting some feedback. Unfortunately, nobody caught it and
> he has reverted his work:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ae5515d
> 
> It is an important challenge to remove our out-of-tree modules and
> especially igb_uio. It is a long way to have a standard solution integrated
> in every distributions.
> The current cooking Linux kernel is 4.4 and will have a long term maintenance:
> 	https://kernel.org/releases.html
> So it is a pity to miss this opportunity.
> 
> Stephen has fixed a bug to use the IOMMU group zero:
> 	http://dpdk.org/browse/dpdk/commit/?id=22215f141b1
> 
> Is there someone interested to work on VFIO no-iommu and provide
> some feedbacks?
> We also need to prepare a documentation patch to explain its usage
> compared to the standard VFIO mode.
> 
> Thanks

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-11 23:20 ` Jan Viktorin
@ 2015-12-15 11:20   ` Alejandro Lucero
  0 siblings, 0 replies; 35+ messages in thread
From: Alejandro Lucero @ 2015-12-15 11:20 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev, Alex Williamson

Hi,

I know a bit about VFIO implementation, have been debugging IOMMU (intel)
problems,  know how QEMU/KVM work about using legacy or vfio attached
devices, and I'm the maintainer of a DPDK PMD recently accepted upstream
which requires our particular UIO driver (not maintained upstream). So I
guess I could help with this effort and testing the code with our card. Of
course, I can not be full time on this but I will be happy to contribute.


On Fri, Dec 11, 2015 at 11:20 PM, Jan Viktorin <viktorin@rehivetech.com>
wrote:

> Hello,
>
> I am not involved in the vfio very much, however, I was watching some
> vfio-related code in last few weeks. It looks promising to me and
> IMHO it seems to the best way to bring a support of integrated Ethernet
> MACs into DPDK (related to many SoCs). Unfortunately, the ARMv7 SoCs (I
> know) lacks of an IOMMU... The only protection there is the TrustZone
> technology but I have no idea of its support in the kernel. It's also
> far from being a replacement of an IOMMU. When using FPGAs, it is
> possible to put an IOMMU engine there (I've got such a prototype
> somewhere in my VHDL library) but nobody will probably do use because
> of saving on-chip resources.
>
> The X-Gene SoC (ARM 64) contains 2x 10 Gbps EMACs on the chip. I have no
> idea about IOMMUs there. Thus, this platform can probably benefit of
> such driver as well. The question is whether there is some interest to
> have this kind of support in DPDK.
>
> Thus, I'd like to have the vfio/no-iommu to support the ARMv7 (otherwise
> it would be effectively dead in DPDK). Unfortunately, it's not my
> primary job at the moment.
>
> Regards
> Jan
>
> Note: as far as I know, it is discouraged to refer to lkml.org as
> it is often very slow - my case today :).
>
> On Fri, 11 Dec 2015 17:28:43 +0100
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>
> > Recently there were some discussions to have an upstream replacement
> > for our igb_uio module.
> > Several solutions were discussed (new uio driver, uio_pci_generic, vfio):
> >       https://lkml.org/lkml/2015/10/16/700
> >
> > Alex Williamson (maintainer of VFIO driver), submitted a solution
> > and was waiting some feedback. Unfortunately, nobody caught it and
> > he has reverted his work:
> >
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ae5515d
> >
> > It is an important challenge to remove our out-of-tree modules and
> > especially igb_uio. It is a long way to have a standard solution
> integrated
> > in every distributions.
> > The current cooking Linux kernel is 4.4 and will have a long term
> maintenance:
> >       https://kernel.org/releases.html
> > So it is a pity to miss this opportunity.
> >
> > Stephen has fixed a bug to use the IOMMU group zero:
> >       http://dpdk.org/browse/dpdk/commit/?id=22215f141b1
> >
> > Is there someone interested to work on VFIO no-iommu and provide
> > some feedbacks?
> > We also need to prepare a documentation patch to explain its usage
> > compared to the standard VFIO mode.
> >
> > Thanks
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-11 23:02   ` Alex Williamson
@ 2015-12-15 13:43     ` O'Driscoll, Tim
  2015-12-15 16:53       ` Alex Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: O'Driscoll, Tim @ 2015-12-15 13:43 UTC (permalink / raw)
  To: Alex Williamson, Vincent JARDIN, dev


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alex Williamson
> Sent: Friday, December 11, 2015 11:03 PM
> To: Vincent JARDIN; dev@dpdk.org
> Subject: Re: [dpdk-dev] VFIO no-iommu
> 
> On Fri, 2015-12-11 at 23:12 +0100, Vincent JARDIN wrote:
> > Thanks Thomas for putting back this topic.
> >
> > Alex,
> >
> > I'd like to hear more about the impacts of "unsupported":
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commi
> > t/?id=033291eccbdb1b70ffc02641edae19ac825dc75d
> >    Use of this mode, specifically binding a device without a native
> >    IOMMU group to a VFIO bus driver will taint the kernel and should
> >    therefore not be considered supported.
> >
> > It means that we get ride of uio; so it is a nice code cleanup: but
> > why
> > would VFIO/NO IOMMU be better if the bottomline is "unsupported"?
> 
> How supportable do you think the uio method is?  Fundamentally we have
> a userspace driver doing unrestricted DMA; it can access and modify any
> memory in the system.  This is the reason uio won't provide a mechanism
> to enable MSI and if you ask the uio maintainer, they don't support DMA
> at all, it's only intended as a programmed IO interface to the device.
>  Unless we can sandbox a user owned device within an IOMMU protected
> container, it's not supportable.  The VFIO no-iommu mode can simply
> provide you that unsupported mode more easily since it leverages code
> from the supported mode, which is IOMMU protected.  Thanks,

Thanks for clarifying.

This does seem like it would be useful for DPDK. We're doing some further investigation to see if it works out of the box with DPDK or if we need to make any changes to support it.

Thomas highlighted that your original commit for this had been reverted. What specifically would you need from us in order to re-submit the VFIO No-IOMMU support?


Tim

> 
> Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-15 13:43     ` O'Driscoll, Tim
@ 2015-12-15 16:53       ` Alex Williamson
  2015-12-16  4:04         ` Ferruh Yigit
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2015-12-15 16:53 UTC (permalink / raw)
  To: O'Driscoll, Tim, Vincent JARDIN, dev

On Tue, 2015-12-15 at 13:43 +0000, O'Driscoll, Tim wrote:
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alex
> > Williamson
> > Sent: Friday, December 11, 2015 11:03 PM
> > To: Vincent JARDIN; dev@dpdk.org
> > Subject: Re: [dpdk-dev] VFIO no-iommu
> > 
> > On Fri, 2015-12-11 at 23:12 +0100, Vincent JARDIN wrote:
> > > Thanks Thomas for putting back this topic.
> > > 
> > > Alex,
> > > 
> > > I'd like to hear more about the impacts of "unsupported":
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/c
> > > ommi
> > > t/?id=033291eccbdb1b70ffc02641edae19ac825dc75d
> > >    Use of this mode, specifically binding a device without a
> > > native
> > >    IOMMU group to a VFIO bus driver will taint the kernel and
> > > should
> > >    therefore not be considered supported.
> > > 
> > > It means that we get ride of uio; so it is a nice code cleanup:
> > > but
> > > why
> > > would VFIO/NO IOMMU be better if the bottomline is "unsupported"?
> > 
> > How supportable do you think the uio method is?  Fundamentally we
> > have
> > a userspace driver doing unrestricted DMA; it can access and modify
> > any
> > memory in the system.  This is the reason uio won't provide a
> > mechanism
> > to enable MSI and if you ask the uio maintainer, they don't support
> > DMA
> > at all, it's only intended as a programmed IO interface to the
> > device.
> >  Unless we can sandbox a user owned device within an IOMMU
> > protected
> > container, it's not supportable.  The VFIO no-iommu mode can simply
> > provide you that unsupported mode more easily since it leverages
> > code
> > from the supported mode, which is IOMMU protected.  Thanks,
> 
> Thanks for clarifying.
> 
> This does seem like it would be useful for DPDK. We're doing some
> further investigation to see if it works out of the box with DPDK or
> if we need to make any changes to support it.

The iommu model is different, there's no type1 interface available when
using this mode since we have no ability to provide translation.  The
no-iommu iommu model really does nothing, which is a possible issue for
userspace.  Is it sufficient?  We stopped short of creating a page
pinning interface through the no-iommu model because it requires code
and adding piles of new code for an interface we claim is unsupported
doesn't make a lot of sense.  The device interface should be identical
to existing vfio support.

> Thomas highlighted that your original commit for this had been
> reverted. What specifically would you need from us in order to re-
> submit the VFIO No-IOMMU support?

No API changes should ever go into the kernel without being validated
by a user.  Without that we're risking that the kernel interface is
broken and we're stuck supporting it.  In this case I tried to make
sure we had a working user before it went it, gambled that it was close
enough to put in anyway, then paid the price when development went
silent on the user side.  To get it back in, I'm going to need a
working use first.  You can re-apply 033291eccbdb or re-
revert ae5515d66362 for development of that.  I need to see that it
works and that there's some consensus from the dpdk community that it's
a worthwhile path forward for cases without an iommu.  There's no point
in merging it if it only becomes a userspace proof of concept.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-15 16:53       ` Alex Williamson
@ 2015-12-16  4:04         ` Ferruh Yigit
  2015-12-16  4:38           ` Alex Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Ferruh Yigit @ 2015-12-16  4:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dev

On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> On Tue, 2015-12-15 at 13:43 +0000, O'Driscoll, Tim wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alex
> > > Williamson
> > > Sent: Friday, December 11, 2015 11:03 PM
> > > To: Vincent JARDIN; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] VFIO no-iommu
> > > 
> > > On Fri, 2015-12-11 at 23:12 +0100, Vincent JARDIN wrote:
> > > > Thanks Thomas for putting back this topic.
> > > > 
> > > > Alex,
> > > > 
> > > > I'd like to hear more about the impacts of "unsupported":
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/c
> > > > ommi
> > > > t/?id=033291eccbdb1b70ffc02641edae19ac825dc75d
> > > >    Use of this mode, specifically binding a device without a
> > > > native
> > > >    IOMMU group to a VFIO bus driver will taint the kernel and
> > > > should
> > > >    therefore not be considered supported.
> > > > 
> > > > It means that we get ride of uio; so it is a nice code cleanup:
> > > > but
> > > > why
> > > > would VFIO/NO IOMMU be better if the bottomline is "unsupported"?
> > > 
> > > How supportable do you think the uio method is?  Fundamentally we
> > > have
> > > a userspace driver doing unrestricted DMA; it can access and modify
> > > any
> > > memory in the system.  This is the reason uio won't provide a
> > > mechanism
> > > to enable MSI and if you ask the uio maintainer, they don't support
> > > DMA
> > > at all, it's only intended as a programmed IO interface to the
> > > device.
> > >  Unless we can sandbox a user owned device within an IOMMU
> > > protected
> > > container, it's not supportable.  The VFIO no-iommu mode can simply
> > > provide you that unsupported mode more easily since it leverages
> > > code
> > > from the supported mode, which is IOMMU protected.  Thanks,
> > 
> > Thanks for clarifying.
> > 
> > This does seem like it would be useful for DPDK. We're doing some
> > further investigation to see if it works out of the box with DPDK or
> > if we need to make any changes to support it.
> 
> The iommu model is different, there's no type1 interface available when
> using this mode since we have no ability to provide translation.  The
> no-iommu iommu model really does nothing, which is a possible issue for
> userspace.  Is it sufficient?  We stopped short of creating a page
> pinning interface through the no-iommu model because it requires code
> and adding piles of new code for an interface we claim is unsupported
> doesn't make a lot of sense.  The device interface should be identical
> to existing vfio support.
> 
> > Thomas highlighted that your original commit for this had been
> > reverted. What specifically would you need from us in order to re-
> > submit the VFIO No-IOMMU support?
> 
> No API changes should ever go into the kernel without being validated
> by a user.  Without that we're risking that the kernel interface is
> broken and we're stuck supporting it.  In this case I tried to make
> sure we had a working user before it went it, gambled that it was close
> enough to put in anyway, then paid the price when development went
> silent on the user side.  To get it back in, I'm going to need a
> working use first.  You can re-apply 033291eccbdb or re-
> revert ae5515d66362 for development of that.  I need to see that it
> works and that there's some consensus from the dpdk community that it's
> a worthwhile path forward for cases without an iommu.  There's no point
> in merging it if it only becomes a userspace proof of concept.  Thanks,
> 
I tested the DPDK (HEAD of master) with the patch, with help of Anatoly,
and DPDK works in no-iommu environment with a little modification.

Basically the only modification is adapt new group naming (noiommu-$) and
disable dma mapping (VFIO_IOMMU_MAP_DMA)

Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in vfio module,
container->noiommu is not set before doing a vfio_group_set_container()
and vfio_for_each_iommu_driver selects wrong driver.

What I test is bind two different type of NICs into VFIO driver, and use
testpmd to confirm transfer is working.  Kernel booted without iommu enabled,
vfio module inserted with "enable_unsafe_noiommu_support" parameter.

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16  4:04         ` Ferruh Yigit
@ 2015-12-16  4:38           ` Alex Williamson
  2015-12-16  8:35             ` Burakov, Anatoly
  2016-01-14  6:03             ` Jike Song
  0 siblings, 2 replies; 35+ messages in thread
From: Alex Williamson @ 2015-12-16  4:38 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Wed, 2015-12-16 at 04:04 +0000, Ferruh Yigit wrote:
> On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > On Tue, 2015-12-15 at 13:43 +0000, O'Driscoll, Tim wrote:
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alex
> > > > Williamson
> > > > Sent: Friday, December 11, 2015 11:03 PM
> > > > To: Vincent JARDIN; dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] VFIO no-iommu
> > > > 
> > > > On Fri, 2015-12-11 at 23:12 +0100, Vincent JARDIN wrote:
> > > > > Thanks Thomas for putting back this topic.
> > > > > 
> > > > > Alex,
> > > > > 
> > > > > I'd like to hear more about the impacts of "unsupported":
> > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.g
> > > > > it/c
> > > > > ommi
> > > > > t/?id=033291eccbdb1b70ffc02641edae19ac825dc75d
> > > > >    Use of this mode, specifically binding a device without a
> > > > > native
> > > > >    IOMMU group to a VFIO bus driver will taint the kernel and
> > > > > should
> > > > >    therefore not be considered supported.
> > > > > 
> > > > > It means that we get ride of uio; so it is a nice code
> > > > > cleanup:
> > > > > but
> > > > > why
> > > > > would VFIO/NO IOMMU be better if the bottomline is
> > > > > "unsupported"?
> > > > 
> > > > How supportable do you think the uio method is?  Fundamentally
> > > > we
> > > > have
> > > > a userspace driver doing unrestricted DMA; it can access and
> > > > modify
> > > > any
> > > > memory in the system.  This is the reason uio won't provide a
> > > > mechanism
> > > > to enable MSI and if you ask the uio maintainer, they don't
> > > > support
> > > > DMA
> > > > at all, it's only intended as a programmed IO interface to the
> > > > device.
> > > >  Unless we can sandbox a user owned device within an IOMMU
> > > > protected
> > > > container, it's not supportable.  The VFIO no-iommu mode can
> > > > simply
> > > > provide you that unsupported mode more easily since it
> > > > leverages
> > > > code
> > > > from the supported mode, which is IOMMU protected.  Thanks,
> > > 
> > > Thanks for clarifying.
> > > 
> > > This does seem like it would be useful for DPDK. We're doing some
> > > further investigation to see if it works out of the box with DPDK
> > > or
> > > if we need to make any changes to support it.
> > 
> > The iommu model is different, there's no type1 interface available
> > when
> > using this mode since we have no ability to provide translation.
> >  The
> > no-iommu iommu model really does nothing, which is a possible issue
> > for
> > userspace.  Is it sufficient?  We stopped short of creating a page
> > pinning interface through the no-iommu model because it requires
> > code
> > and adding piles of new code for an interface we claim is
> > unsupported
> > doesn't make a lot of sense.  The device interface should be
> > identical
> > to existing vfio support.
> > 
> > > Thomas highlighted that your original commit for this had been
> > > reverted. What specifically would you need from us in order to
> > > re-
> > > submit the VFIO No-IOMMU support?
> > 
> > No API changes should ever go into the kernel without being
> > validated
> > by a user.  Without that we're risking that the kernel interface is
> > broken and we're stuck supporting it.  In this case I tried to make
> > sure we had a working user before it went it, gambled that it was
> > close
> > enough to put in anyway, then paid the price when development went
> > silent on the user side.  To get it back in, I'm going to need a
> > working use first.  You can re-apply 033291eccbdb or re-
> > revert ae5515d66362 for development of that.  I need to see that it
> > works and that there's some consensus from the dpdk community that
> > it's
> > a worthwhile path forward for cases without an iommu.  There's no
> > point
> > in merging it if it only becomes a userspace proof of concept.
> >  Thanks,
> > 
> I tested the DPDK (HEAD of master) with the patch, with help of
> Anatoly,
> and DPDK works in no-iommu environment with a little modification.
> 
> Basically the only modification is adapt new group naming (noiommu-$) 
> and

Sorry, forgot to mention that one.  The intention with the modified
group name is that I want to be very certain that a user intending to
only support properly iommu isolated devices doesn't accidentally need
to deal with these no-iommu mode devices.

> disable dma mapping (VFIO_IOMMU_MAP_DMA)
> 
> Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in vfio
> module,
> container->noiommu is not set before doing a
> vfio_group_set_container()
> and vfio_for_each_iommu_driver selects wrong driver.

Running CHECK_EXTENSION on a container without the group attached is
only going to tell you what extensions vfio is capable of, not
necessarily what extensions are available to you with that group.  Is
this just a general dpdk-vfio ordering bug?

> What I test is bind two different type of NICs into VFIO driver, and
> use
> testpmd to confirm transfer is working.  Kernel booted without iommu
> enabled,
> vfio module inserted with "enable_unsafe_noiommu_support" parameter.

So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
it imply deprecating the uio interface?  I believe the feature that
started this discussion was support for MSI/X interrupts so that VFs
can support some kind of interrupt (uio only supports INTx since it
doesn't allow DMA).  Implementing that would be the ultimate test of
whether this provides dpdk with not only a more consistent interface,
but the feature dpdk wants that's missing in uio. Thanks,

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16  4:38           ` Alex Williamson
@ 2015-12-16  8:35             ` Burakov, Anatoly
  2015-12-16 16:23               ` Burakov, Anatoly
  2015-12-16 17:11               ` Alex Williamson
  2016-01-14  6:03             ` Jike Song
  1 sibling, 2 replies; 35+ messages in thread
From: Burakov, Anatoly @ 2015-12-16  8:35 UTC (permalink / raw)
  To: Alex Williamson, Yigit, Ferruh; +Cc: dev

Hi Alex,

> On Wed, 2015-12-16 at 04:04 +0000, Ferruh Yigit wrote:
> > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > I tested the DPDK (HEAD of master) with the patch, with help of
> > Anatoly, and DPDK works in no-iommu environment with a little
> > modification.
> >
> > Basically the only modification is adapt new group naming (noiommu-$)
> > and
> 
> Sorry, forgot to mention that one.  The intention with the modified group
> name is that I want to be very certain that a user intending to only support
> properly iommu isolated devices doesn't accidentally need to deal with these
> no-iommu mode devices.
> 
> > disable dma mapping (VFIO_IOMMU_MAP_DMA)
> >
> > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in vfio
> > module,
> > container->noiommu is not set before doing a
> > vfio_group_set_container()
> > and vfio_for_each_iommu_driver selects wrong driver.
> 
> Running CHECK_EXTENSION on a container without the group attached is
> only going to tell you what extensions vfio is capable of, not necessarily what
> extensions are available to you with that group.  Is this just a general dpdk-
> vfio ordering bug?

Yes, that is how VFIO was implemented in DPDK. I was under the impression that checking extension before assigning devices was the correct way to do things, so as to not to try anything we know would fail anyway. Does this imply that CHECK_EXTENSION needs to be called on both container and groups (or just on groups)?

> 
> > What I test is bind two different type of NICs into VFIO driver, and
> > use testpmd to confirm transfer is working.  Kernel booted without
> > iommu enabled, vfio module inserted with
> > "enable_unsafe_noiommu_support" parameter.
> 
> So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does it imply
> deprecating the uio interface?  I believe the feature that started this
> discussion was support for MSI/X interrupts so that VFs can support some
> kind of interrupt (uio only supports INTx since it doesn't allow
> DMA).  Implementing that would be the ultimate test of whether this
> provides dpdk with not only a more consistent interface, but the feature
> dpdk wants that's missing in uio. Thanks,

More testing will be needed, especially regarding interrupts, we will keep you updated.

Thanks,
Anatoly

> 
> Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16  8:35             ` Burakov, Anatoly
@ 2015-12-16 16:23               ` Burakov, Anatoly
  2015-12-16 23:17                 ` Thomas Monjalon
  2015-12-16 17:11               ` Alex Williamson
  1 sibling, 1 reply; 35+ messages in thread
From: Burakov, Anatoly @ 2015-12-16 16:23 UTC (permalink / raw)
  To: Thomas Monjalon (thomas.monjalon@6wind.com); +Cc: 'dev@dpdk.org'

Hi Thomas,

 > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
> > it imply deprecating the uio interface?  I believe the feature that
> > started this discussion was support for MSI/X interrupts so that VFs
> > can support some kind of interrupt (uio only supports INTx since it
> > doesn't allow DMA).  Implementing that would be the ultimate test of
> > whether this provides dpdk with not only a more consistent interface,
> > but the feature dpdk wants that's missing in uio. Thanks,

Ferruh has done a great job so far testing Alex's patch, very few changes from DPDK side seem to be required as far as existing functionality goes (not sure about VF interrupts mentioned by Alex). However, one thing that concerns me is usability. While it is true that no-IOMMU mode in VFIO would mean uio interfaces could be deprecated in time, the no-iommu mode is way more hassle than using igb_uio/uio_pci_generic because it will require a kernel recompile as opposed to simply compiling and insmod'ding an out-of-tree driver. So, in essence, if you don't want an IOMMU, it's becoming that much harder to use DPDK. Would that be something DPDK is willing to live with in the absence of uio interfaces?

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16  8:35             ` Burakov, Anatoly
  2015-12-16 16:23               ` Burakov, Anatoly
@ 2015-12-16 17:11               ` Alex Williamson
  2015-12-16 17:22                 ` Burakov, Anatoly
  1 sibling, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2015-12-16 17:11 UTC (permalink / raw)
  To: Burakov, Anatoly, Yigit, Ferruh; +Cc: dev

On Wed, 2015-12-16 at 08:35 +0000, Burakov, Anatoly wrote:
> Hi Alex,
> 
> > On Wed, 2015-12-16 at 04:04 +0000, Ferruh Yigit wrote:
> > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > I tested the DPDK (HEAD of master) with the patch, with help of
> > > Anatoly, and DPDK works in no-iommu environment with a little
> > > modification.
> > > 
> > > Basically the only modification is adapt new group naming
> > > (noiommu-$)
> > > and
> > 
> > Sorry, forgot to mention that one.  The intention with the modified
> > group
> > name is that I want to be very certain that a user intending to
> > only support
> > properly iommu isolated devices doesn't accidentally need to deal
> > with these
> > no-iommu mode devices.
> > 
> > > disable dma mapping (VFIO_IOMMU_MAP_DMA)
> > > 
> > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in
> > > vfio
> > > module,
> > > container->noiommu is not set before doing a
> > > vfio_group_set_container()
> > > and vfio_for_each_iommu_driver selects wrong driver.
> > 
> > Running CHECK_EXTENSION on a container without the group attached
> > is
> > only going to tell you what extensions vfio is capable of, not
> > necessarily what
> > extensions are available to you with that group.  Is this just a
> > general dpdk-
> > vfio ordering bug?
> 
> Yes, that is how VFIO was implemented in DPDK. I was under the
> impression that checking extension before assigning devices was the
> correct way to do things, so as to not to try anything we know would
> fail anyway. Does this imply that CHECK_EXTENSION needs to be called
> on both container and groups (or just on groups)?

Hmm, in Documentation/vfio.txt we do give the following algorithm:

        if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION)
                /* Unknown API version */

        if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU))
                /* Doesn't support the IOMMU driver we want. */
        ...

That's just going to query each iommu driver and we can't yet say
whether the group the user attaches to the container later will
actually support that extension until we try to do it, that would come
at VFIO_SET_IOMMU.  So is it perhaps a vfio bug that we're not
advertising no-iommu until the group is attached?  After all, we are
capable of it with just an empty container, just like we are with
type1, but we're going to fail SET_IOMMU for the wrong combination.
 This is exactly the sort of thing that makes me glad we reverted it
without feedback from a working user driver.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16 17:11               ` Alex Williamson
@ 2015-12-16 17:22                 ` Burakov, Anatoly
  2015-12-17 16:43                   ` Alex Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Burakov, Anatoly @ 2015-12-16 17:22 UTC (permalink / raw)
  To: Alex Williamson, Yigit, Ferruh; +Cc: dev

Hi Alex,

> On Wed, 2015-12-16 at 08:35 +0000, Burakov, Anatoly wrote:
> > Hi Alex,
> >
> > > On Wed, 2015-12-16 at 04:04 +0000, Ferruh Yigit wrote:
> > > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > > I tested the DPDK (HEAD of master) with the patch, with help of
> > > > Anatoly, and DPDK works in no-iommu environment with a little
> > > > modification.
> > > >
> > > > Basically the only modification is adapt new group naming
> > > > (noiommu-$)
> > > > and
> > >
> > > Sorry, forgot to mention that one.  The intention with the modified
> > > group name is that I want to be very certain that a user intending
> > > to only support properly iommu isolated devices doesn't accidentally
> > > need to deal with these no-iommu mode devices.
> > >
> > > > disable dma mapping (VFIO_IOMMU_MAP_DMA)
> > > >
> > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in vfio
> > > > module,
> > > > container->noiommu is not set before doing a
> > > > vfio_group_set_container()
> > > > and vfio_for_each_iommu_driver selects wrong driver.
> > >
> > > Running CHECK_EXTENSION on a container without the group attached is
> > > only going to tell you what extensions vfio is capable of, not
> > > necessarily what extensions are available to you with that group.
> > > Is this just a general dpdk- vfio ordering bug?
> >
> > Yes, that is how VFIO was implemented in DPDK. I was under the
> > impression that checking extension before assigning devices was the
> > correct way to do things, so as to not to try anything we know would
> > fail anyway. Does this imply that CHECK_EXTENSION needs to be called
> > on both container and groups (or just on groups)?
> 
> Hmm, in Documentation/vfio.txt we do give the following algorithm:
> 
>         if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION)
>                 /* Unknown API version */
> 
>         if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU))
>                 /* Doesn't support the IOMMU driver we want. */
>         ...
> 
> That's just going to query each iommu driver and we can't yet say whether
> the group the user attaches to the container later will actually support that
> extension until we try to do it, that would come at VFIO_SET_IOMMU.  So is
> it perhaps a vfio bug that we're not advertising no-iommu until the group is
> attached?  After all, we are capable of it with just an empty container, just
> like we are with type1, but we're going to fail SET_IOMMU for the wrong
> combination.
>  This is exactly the sort of thing that makes me glad we reverted it without
> feedback from a working user driver.  Thanks,

Whether it should be considered a "bug" in VFIO or "by design" is up to you, of course, but at least according to the VFIO documentation, we are meant to check for type 1 extension and then attach devices, so it would be expected to get VFIO_NOIOMMU_IOMMU marked as supported even without any devices attached to the container (just like we get type 1 as supported without any devices attached). Having said that, if it was meant to attach devices first and then check the extensions, then perhaps the documentation should also point out that fact (or perhaps I missed that detail in my readings of the docs, in which case my apologies).

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16 16:23               ` Burakov, Anatoly
@ 2015-12-16 23:17                 ` Thomas Monjalon
  2015-12-17  9:52                   ` Burakov, Anatoly
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Monjalon @ 2015-12-16 23:17 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2015-12-16 16:23, Burakov, Anatoly:
> Hi Thomas,
> 
>  > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
> > > it imply deprecating the uio interface?  I believe the feature that
> > > started this discussion was support for MSI/X interrupts so that VFs
> > > can support some kind of interrupt (uio only supports INTx since it
> > > doesn't allow DMA).  Implementing that would be the ultimate test of
> > > whether this provides dpdk with not only a more consistent interface,
> > > but the feature dpdk wants that's missing in uio. Thanks,
> 
> Ferruh has done a great job so far testing Alex's patch, very few changes from DPDK side seem to be required as far as existing functionality goes (not sure about VF interrupts mentioned by Alex). However, one thing that concerns me is usability. While it is true that no-IOMMU mode in VFIO would mean uio interfaces could be deprecated in time, the no-iommu mode is way more hassle than using igb_uio/uio_pci_generic because it will require a kernel recompile as opposed to simply compiling and insmod'ding an out-of-tree driver. So, in essence, if you don't want an IOMMU, it's becoming that much harder to use DPDK. Would that be something DPDK is willing to live with in the absence of uio interfaces?

Excuse me if I missed something obvious.
Why a kernel compilation is needed?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16 23:17                 ` Thomas Monjalon
@ 2015-12-17  9:52                   ` Burakov, Anatoly
  2015-12-17 10:09                     ` Thomas Monjalon
  0 siblings, 1 reply; 35+ messages in thread
From: Burakov, Anatoly @ 2015-12-17  9:52 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> > Hi Thomas,
> >
> >  > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > > So it works.  Is it acceptable?  Useful?  Sufficiently complete?
> > > > Does it imply deprecating the uio interface?  I believe the
> > > > feature that started this discussion was support for MSI/X
> > > > interrupts so that VFs can support some kind of interrupt (uio
> > > > only supports INTx since it doesn't allow DMA).  Implementing that
> > > > would be the ultimate test of whether this provides dpdk with not
> > > > only a more consistent interface, but the feature dpdk wants
> > > > that's missing in uio. Thanks,
> >
> > Ferruh has done a great job so far testing Alex's patch, very few changes
> from DPDK side seem to be required as far as existing functionality goes (not
> sure about VF interrupts mentioned by Alex). However, one thing that
> concerns me is usability. While it is true that no-IOMMU mode in VFIO would
> mean uio interfaces could be deprecated in time, the no-iommu mode is way
> more hassle than using igb_uio/uio_pci_generic because it will require a
> kernel recompile as opposed to simply compiling and insmod'ding an out-of-
> tree driver. So, in essence, if you don't want an IOMMU, it's becoming that
> much harder to use DPDK. Would that be something DPDK is willing to live
> with in the absence of uio interfaces?
> 
> Excuse me if I missed something obvious.
> Why a kernel compilation is needed?

Well, not really full kernel compilation, but in the default configuration, VFIO driver would not support NOIOMMU mode. I.e. it's not compiled by default. Support for no-iommu should be enabled in kernel config and compiled in. So, whoever is going to use DPDK with VFIO-no-iommu will have to download kernel tree and recompile the VFIO module and install it. That's obviously way more hassle than simply compiling an out-of-tree driver that's already included and works with an out-of-the-box kernel.

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-17  9:52                   ` Burakov, Anatoly
@ 2015-12-17 10:09                     ` Thomas Monjalon
  2015-12-17 19:38                       ` Jan Viktorin
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Monjalon @ 2015-12-17 10:09 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

Hi,

2015-12-17 09:52, Burakov, Anatoly:
> > >  > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > > > So it works.  Is it acceptable?  Useful?  Sufficiently complete?
> > > > > Does it imply deprecating the uio interface?  I believe the
> > > > > feature that started this discussion was support for MSI/X
> > > > > interrupts so that VFs can support some kind of interrupt (uio
> > > > > only supports INTx since it doesn't allow DMA).  Implementing that
> > > > > would be the ultimate test of whether this provides dpdk with not
> > > > > only a more consistent interface, but the feature dpdk wants
> > > > > that's missing in uio. Thanks,
> > >
> > > Ferruh has done a great job so far testing Alex's patch, very few changes
> > from DPDK side seem to be required as far as existing functionality goes (not
> > sure about VF interrupts mentioned by Alex). However, one thing that
> > concerns me is usability. While it is true that no-IOMMU mode in VFIO would
> > mean uio interfaces could be deprecated in time, the no-iommu mode is way
> > more hassle than using igb_uio/uio_pci_generic because it will require a
> > kernel recompile as opposed to simply compiling and insmod'ding an out-of-
> > tree driver. So, in essence, if you don't want an IOMMU, it's becoming that
> > much harder to use DPDK. Would that be something DPDK is willing to live
> > with in the absence of uio interfaces?
> > 
> > Excuse me if I missed something obvious.
> > Why a kernel compilation is needed?
> 
> Well, not really full kernel compilation, but in the default configuration, VFIO driver would not support NOIOMMU mode. I.e. it's not compiled by default. Support for no-iommu should be enabled in kernel config and compiled in. So, whoever is going to use DPDK with VFIO-no-iommu will have to download kernel tree and recompile the VFIO module and install it. That's obviously way more hassle than simply compiling an out-of-tree driver that's already included and works with an out-of-the-box kernel.

The "out-of-the-box kernel" is configured by your distribution.
So we don't know yet what will be their choice.
If the distribution supports DPDK, it should be enabled.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16 17:22                 ` Burakov, Anatoly
@ 2015-12-17 16:43                   ` Alex Williamson
  2015-12-18 10:43                     ` Yigit, Ferruh
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2015-12-17 16:43 UTC (permalink / raw)
  To: Burakov, Anatoly, Yigit, Ferruh; +Cc: dev

On Wed, 2015-12-16 at 17:22 +0000, Burakov, Anatoly wrote:
> Hi Alex,
> 
> > On Wed, 2015-12-16 at 08:35 +0000, Burakov, Anatoly wrote:
> > > Hi Alex,
> > > 
> > > > On Wed, 2015-12-16 at 04:04 +0000, Ferruh Yigit wrote:
> > > > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson
> > > > > wrote:
> > > > > I tested the DPDK (HEAD of master) with the patch, with help
> > > > > of
> > > > > Anatoly, and DPDK works in no-iommu environment with a little
> > > > > modification.
> > > > > 
> > > > > Basically the only modification is adapt new group naming
> > > > > (noiommu-$)
> > > > > and
> > > > 
> > > > Sorry, forgot to mention that one.  The intention with the
> > > > modified
> > > > group name is that I want to be very certain that a user
> > > > intending
> > > > to only support properly iommu isolated devices doesn't
> > > > accidentally
> > > > need to deal with these no-iommu mode devices.
> > > > 
> > > > > disable dma mapping (VFIO_IOMMU_MAP_DMA)
> > > > > 
> > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in
> > > > > vfio
> > > > > module,
> > > > > container->noiommu is not set before doing a
> > > > > vfio_group_set_container()
> > > > > and vfio_for_each_iommu_driver selects wrong driver.
> > > > 
> > > > Running CHECK_EXTENSION on a container without the group
> > > > attached is
> > > > only going to tell you what extensions vfio is capable of, not
> > > > necessarily what extensions are available to you with that
> > > > group.
> > > > Is this just a general dpdk- vfio ordering bug?
> > > 
> > > Yes, that is how VFIO was implemented in DPDK. I was under the
> > > impression that checking extension before assigning devices was
> > > the
> > > correct way to do things, so as to not to try anything we know
> > > would
> > > fail anyway. Does this imply that CHECK_EXTENSION needs to be
> > > called
> > > on both container and groups (or just on groups)?
> > 
> > Hmm, in Documentation/vfio.txt we do give the following algorithm:
> > 
> >         if (ioctl(container, VFIO_GET_API_VERSION) !=
> > VFIO_API_VERSION)
> >                 /* Unknown API version */
> > 
> >         if (!ioctl(container, VFIO_CHECK_EXTENSION,
> > VFIO_TYPE1_IOMMU))
> >                 /* Doesn't support the IOMMU driver we want. */
> >         ...
> > 
> > That's just going to query each iommu driver and we can't yet say
> > whether
> > the group the user attaches to the container later will actually
> > support that
> > extension until we try to do it, that would come at VFIO_SET_IOMMU.
> >  So is
> > it perhaps a vfio bug that we're not advertising no-iommu until the
> > group is
> > attached?  After all, we are capable of it with just an empty
> > container, just
> > like we are with type1, but we're going to fail SET_IOMMU for the
> > wrong
> > combination.
> >  This is exactly the sort of thing that makes me glad we reverted
> > it without
> > feedback from a working user driver.  Thanks,
> 
> Whether it should be considered a "bug" in VFIO or "by design" is up
> to you, of course, but at least according to the VFIO documentation,
> we are meant to check for type 1 extension and then attach devices,
> so it would be expected to get VFIO_NOIOMMU_IOMMU marked as supported
> even without any devices attached to the container (just like we get
> type 1 as supported without any devices attached). Having said that,
> if it was meant to attach devices first and then check the
> extensions, then perhaps the documentation should also point out that
> fact (or perhaps I missed that detail in my readings of the docs, in
> which case my apologies).

Hi Anatoly,

Does the below patch make it behave more like you'd expect.  This
applies to v4.4-rc4, I'd fold this into the base patch if we
reincorporate it to a future kernel.  Thanks,

Alex

commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Wed Dec 16 19:02:01 2015 -0700

    vfio: Fix no-iommu CHECK_EXTENSION
    
    Previously the no-iommu iommu driver was only visible when the
    container had an attached no-iommu group.  This means that
    CHECK_EXTENSION on and empty container couldn't report the possibility
    of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or not the user
    can make use of it with the group, so this is inconsistent.  Add the
    no-iommu iommu to the list of iommu drivers when enabled via module
    option, but skip all the others if the container is attached to a
    no-iommu groups.  Note that tainting is now done with the "unsafe"
    module callback rather than explictly within vfio.
    
    Also fixes module option and module description name inconsistency.
    
    Also make vfio_noiommu_ops const.
    
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index de632da..d3a9432 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -99,9 +99,6 @@ struct vfio_device {
 
 #ifdef CONFIG_VFIO_NOIOMMU
 static bool noiommu __read_mostly;
-module_param_named(enable_unsafe_noiommu_support,
-		   noiommu, bool, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false)");
 #endif
 
 /*
@@ -138,17 +135,6 @@ struct iommu_group *vfio_iommu_group_get(struct device *dev)
 	iommu_group_put(group);
 	if (ret)
 		return NULL;
-
-	/*
-	 * Where to taint?  At this point we've added an IOMMU group for a
-	 * device that is not backed by iommu_ops, therefore any iommu_
-	 * callback using iommu_ops can legitimately Oops.  So, while we may
-	 * be about to give a DMA capable device to a user without IOMMU
-	 * protection, which is clearly taint-worthy, let's go ahead and do
-	 * it here.
-	 */
-	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
-	dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n");
 #endif
 
 	return group;
@@ -207,7 +193,7 @@ static void vfio_noiommu_detach_group(void *iommu_data,
 {
 }
 
-static struct vfio_iommu_driver_ops vfio_noiommu_ops = {
+static const struct vfio_iommu_driver_ops vfio_noiommu_ops = {
 	.name = "vfio-noiommu",
 	.owner = THIS_MODULE,
 	.open = vfio_noiommu_open,
@@ -217,24 +203,34 @@ static struct vfio_iommu_driver_ops vfio_noiommu_ops = {
 	.detach_group = vfio_noiommu_detach_group,
 };
 
-static struct vfio_iommu_driver vfio_noiommu_driver = {
-	.ops = &vfio_noiommu_ops,
+static int noiommu_param_set(const char *val, const struct kernel_param *kp)
+{
+	int ret;
+
+	if (!val)
+		val = "1";
+
+	ret = strtobool(val, kp->arg);
+	if (ret)
+		return ret;
+
+	if (noiommu)
+		ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
+	else
+		vfio_unregister_iommu_driver(&vfio_noiommu_ops);
+
+	return ret;
+}
+
+static const struct kernel_param_ops noiommu_param_ops = {
+	.flags = KERNEL_PARAM_OPS_FL_NOARG,
+	.set = noiommu_param_set,
+	.get = param_get_bool,
 };
 
-/*
- * Wrap IOMMU drivers, the noiommu driver is the one and only driver for
- * noiommu groups (and thus containers) and not available for normal groups.
- */
-#define vfio_for_each_iommu_driver(con, pos)				\
-	for (pos = con->noiommu ? &vfio_noiommu_driver :		\
-	     list_first_entry(&vfio.iommu_drivers_list,			\
-			      struct vfio_iommu_driver, vfio_next);	\
-	     (con->noiommu ? pos != NULL :				\
-			&pos->vfio_next != &vfio.iommu_drivers_list);	\
-	      pos = con->noiommu ? NULL : list_next_entry(pos, vfio_next))
-#else
-#define vfio_for_each_iommu_driver(con, pos)				\
-	list_for_each_entry(pos, &vfio.iommu_drivers_list, vfio_next)
+module_param_cb_unsafe(enable_unsafe_noiommu_mode, &noiommu_param_ops,
+		       &noiommu, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false)");
 #endif
 
 
@@ -999,7 +995,12 @@ static long vfio_ioctl_check_extension(struct vfio_container *container,
 		 */
 		if (!driver) {
 			mutex_lock(&vfio.iommu_drivers_lock);
-			vfio_for_each_iommu_driver(container, driver) {
+			list_for_each_entry(driver, &vfio.iommu_drivers_list,
+					    vfio_next) {
+				if (container->noiommu &&
+				    driver->ops != &vfio_noiommu_ops)
+					continue;
+
 				if (!try_module_get(driver->ops->owner))
 					continue;
 
@@ -1068,9 +1069,12 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container,
 	}
 
 	mutex_lock(&vfio.iommu_drivers_lock);
-	vfio_for_each_iommu_driver(container, driver) {
+	list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {
 		void *data;
 
+		if (container->noiommu && driver->ops != &vfio_noiommu_ops)
+			continue;
+
 		if (!try_module_get(driver->ops->owner))
 			continue;
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-17 10:09                     ` Thomas Monjalon
@ 2015-12-17 19:38                       ` Jan Viktorin
  2015-12-17 21:16                         ` Vincent JARDIN
  2015-12-17 23:29                         ` Stephen Hemminger
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Viktorin @ 2015-12-17 19:38 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Thu, 17 Dec 2015 11:09:23 +0100
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> Hi,
> 
> 2015-12-17 09:52, Burakov, Anatoly:
> > > >  > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > > > > So it works.  Is it acceptable?  Useful?  Sufficiently complete?
> > > > > > Does it imply deprecating the uio interface?  I believe the
> > > > > > feature that started this discussion was support for MSI/X
> > > > > > interrupts so that VFs can support some kind of interrupt (uio
> > > > > > only supports INTx since it doesn't allow DMA).  Implementing that
> > > > > > would be the ultimate test of whether this provides dpdk with not
> > > > > > only a more consistent interface, but the feature dpdk wants
> > > > > > that's missing in uio. Thanks,  
> > > >
> > > > Ferruh has done a great job so far testing Alex's patch, very few changes  
> > > from DPDK side seem to be required as far as existing functionality goes (not
> > > sure about VF interrupts mentioned by Alex). However, one thing that
> > > concerns me is usability. While it is true that no-IOMMU mode in VFIO would
> > > mean uio interfaces could be deprecated in time, the no-iommu mode is way
> > > more hassle than using igb_uio/uio_pci_generic because it will require a
> > > kernel recompile as opposed to simply compiling and insmod'ding an out-of-
> > > tree driver. So, in essence, if you don't want an IOMMU, it's becoming that
> > > much harder to use DPDK. Would that be something DPDK is willing to live
> > > with in the absence of uio interfaces?
> > > 
> > > Excuse me if I missed something obvious.
> > > Why a kernel compilation is needed?  
> > 
> > Well, not really full kernel compilation, but in the default configuration, VFIO driver would not support NOIOMMU mode. I.e. it's not compiled by default. Support for no-iommu should be enabled in kernel config and compiled in. So, whoever is going to use DPDK with VFIO-no-iommu will have to download kernel tree and recompile the VFIO module and install it. That's obviously way more hassle than simply compiling an out-of-tree driver that's already included and works with an out-of-the-box kernel.  
> 
> The "out-of-the-box kernel" is configured by your distribution.
> So we don't know yet what will be their choice.
> If the distribution supports DPDK, it should be enabled.

I have a question as I am not involved in all possible DPDK
configurations, platforms, etc. and not yet very involved in vfio. What
are the devices which do not have IOMMU? If I have, say, DPDK 2.3 with
vfio-noiommu, which platforms (or computer systems) I am targeting?

Would it be an Intel-based system? Would it be PPC8, ARM?

If it is ARMv7... I would say that the fact I have to explicitly enable
the no-IOMMU feature and rebuild the kernel (or whatever) is just OK. As
for such systems, it is common to have a quite customized OS. Well,
the big distributions are able to run on those devices, that's true...
However, in such case, the users are usually skilled enough to take
care of having their own special Linux kernel.

So, is the fact the distributions would not support the no-IOMMU setup
in their default configuration really an issue? Will some very common
Intel/DPDK-based box need this?

Regards
Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-17 19:38                       ` Jan Viktorin
@ 2015-12-17 21:16                         ` Vincent JARDIN
  2015-12-17 23:29                         ` Stephen Hemminger
  1 sibling, 0 replies; 35+ messages in thread
From: Vincent JARDIN @ 2015-12-17 21:16 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev

On 17/12/2015 20:38, Jan Viktorin wrote:
> which platforms (or computer systems) I am targeting?

It is about VMs on IOMMU capable systems. What if you need to use SRIOV 
with IXGBE, or IGB devices?

For some DPDK cases, like Mellanox or virtio, you do not need to use 
VFIO/UIO into the guests, so no issue. But for some other PMDs, you need 
a VFIO/UIO.

Best regards,
   Vincent

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-17 19:38                       ` Jan Viktorin
  2015-12-17 21:16                         ` Vincent JARDIN
@ 2015-12-17 23:29                         ` Stephen Hemminger
  1 sibling, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2015-12-17 23:29 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev

On Thu, 17 Dec 2015 20:38:16 +0100
Jan Viktorin <viktorin@rehivetech.com> wrote:

> On Thu, 17 Dec 2015 11:09:23 +0100
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
> > Hi,
> > 
> > 2015-12-17 09:52, Burakov, Anatoly:
> > > > >  > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > > > > > So it works.  Is it acceptable?  Useful?  Sufficiently complete?
> > > > > > > Does it imply deprecating the uio interface?  I believe the
> > > > > > > feature that started this discussion was support for MSI/X
> > > > > > > interrupts so that VFs can support some kind of interrupt (uio
> > > > > > > only supports INTx since it doesn't allow DMA).  Implementing that
> > > > > > > would be the ultimate test of whether this provides dpdk with not
> > > > > > > only a more consistent interface, but the feature dpdk wants
> > > > > > > that's missing in uio. Thanks,  
> > > > >
> > > > > Ferruh has done a great job so far testing Alex's patch, very few changes  
> > > > from DPDK side seem to be required as far as existing functionality goes (not
> > > > sure about VF interrupts mentioned by Alex). However, one thing that
> > > > concerns me is usability. While it is true that no-IOMMU mode in VFIO would
> > > > mean uio interfaces could be deprecated in time, the no-iommu mode is way
> > > > more hassle than using igb_uio/uio_pci_generic because it will require a
> > > > kernel recompile as opposed to simply compiling and insmod'ding an out-of-
> > > > tree driver. So, in essence, if you don't want an IOMMU, it's becoming that
> > > > much harder to use DPDK. Would that be something DPDK is willing to live
> > > > with in the absence of uio interfaces?
> > > > 
> > > > Excuse me if I missed something obvious.
> > > > Why a kernel compilation is needed?  
> > > 
> > > Well, not really full kernel compilation, but in the default configuration, VFIO driver would not support NOIOMMU mode. I.e. it's not compiled by default. Support for no-iommu should be enabled in kernel config and compiled in. So, whoever is going to use DPDK with VFIO-no-iommu will have to download kernel tree and recompile the VFIO module and install it. That's obviously way more hassle than simply compiling an out-of-tree driver that's already included and works with an out-of-the-box kernel.  
> > 
> > The "out-of-the-box kernel" is configured by your distribution.
> > So we don't know yet what will be their choice.
> > If the distribution supports DPDK, it should be enabled.
> 
> I have a question as I am not involved in all possible DPDK
> configurations, platforms, etc. and not yet very involved in vfio. What
> are the devices which do not have IOMMU? If I have, say, DPDK 2.3 with
> vfio-noiommu, which platforms (or computer systems) I am targeting?
> 
> Would it be an Intel-based system? Would it be PPC8, ARM?
> 
> If it is ARMv7... I would say that the fact I have to explicitly enable
> the no-IOMMU feature and rebuild the kernel (or whatever) is just OK. As
> for such systems, it is common to have a quite customized OS. Well,
> the big distributions are able to run on those devices, that's true...
> However, in such case, the users are usually skilled enough to take
> care of having their own special Linux kernel.
> 
> So, is the fact the distributions would not support the no-IOMMU setup
> in their default configuration really an issue? Will some very common
> Intel/DPDK-based box need this?
> 
> Regards
> Jan

So far:
  * broken hardware (many systems including those from Dell) do not provide
    working IOMMU because of Bios bugs etc.
  * Linux guest in VMware/KVM/Hyper-V. There is no IOMMU emulation in most
    of these systems.
  * Older smaller systems (ie Atom) may not have IOMMU

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-17 16:43                   ` Alex Williamson
@ 2015-12-18 10:43                     ` Yigit, Ferruh
  2015-12-18 14:38                       ` Alex Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Yigit, Ferruh @ 2015-12-18 10:43 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dev

On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson wrote:
<...>
> > > > > > 
> > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in
> > > > > > vfio
> > > > > > module,
> > > > > > container->noiommu is not set before doing a
> > > > > > vfio_group_set_container()
> > > > > > and vfio_for_each_iommu_driver selects wrong driver.
> > > > > 
> > > > > Running CHECK_EXTENSION on a container without the group
> > > > > attached is
> > > > > only going to tell you what extensions vfio is capable of, not
> > > > > necessarily what extensions are available to you with that
> > > > > group.
> > > > > Is this just a general dpdk- vfio ordering bug?
> > > > 
> > > > Yes, that is how VFIO was implemented in DPDK. I was under the
> > > > impression that checking extension before assigning devices was
> > > > the
> > > > correct way to do things, so as to not to try anything we know
> > > > would
> > > > fail anyway. Does this imply that CHECK_EXTENSION needs to be
> > > > called
> > > > on both container and groups (or just on groups)?
> > > 
> > > Hmm, in Documentation/vfio.txt we do give the following algorithm:
> > > 
> > >         if (ioctl(container, VFIO_GET_API_VERSION) !=
> > > VFIO_API_VERSION)
> > >                 /* Unknown API version */
> > > 
> > >         if (!ioctl(container, VFIO_CHECK_EXTENSION,
> > > VFIO_TYPE1_IOMMU))
> > >                 /* Doesn't support the IOMMU driver we want. */
> > >         ...
> > > 
> > > That's just going to query each iommu driver and we can't yet say
> > > whether
> > > the group the user attaches to the container later will actually
> > > support that
> > > extension until we try to do it, that would come at VFIO_SET_IOMMU.
> > >  So is
> > > it perhaps a vfio bug that we're not advertising no-iommu until the
> > > group is
> > > attached?  After all, we are capable of it with just an empty
> > > container, just
> > > like we are with type1, but we're going to fail SET_IOMMU for the
> > > wrong
> > > combination.
> > >  This is exactly the sort of thing that makes me glad we reverted
> > > it without
> > > feedback from a working user driver.  Thanks,
> > 
> > Whether it should be considered a "bug" in VFIO or "by design" is up
> > to you, of course, but at least according to the VFIO documentation,
> > we are meant to check for type 1 extension and then attach devices,
> > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as supported
> > even without any devices attached to the container (just like we get
> > type 1 as supported without any devices attached). Having said that,
> > if it was meant to attach devices first and then check the
> > extensions, then perhaps the documentation should also point out that
> > fact (or perhaps I missed that detail in my readings of the docs, in
> > which case my apologies).
> 
> Hi Anatoly,
> 
> Does the below patch make it behave more like you'd expect.  This
> applies to v4.4-rc4, I'd fold this into the base patch if we
> reincorporate it to a future kernel.  Thanks,
> 
> Alex
> 
> commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
> Author: Alex Williamson <alex.williamson@redhat.com>
> Date:   Wed Dec 16 19:02:01 2015 -0700
> 
>     vfio: Fix no-iommu CHECK_EXTENSION
>     
>     Previously the no-iommu iommu driver was only visible when the
>     container had an attached no-iommu group.  This means that
>     CHECK_EXTENSION on and empty container couldn't report the possibility
>     of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or not the user
>     can make use of it with the group, so this is inconsistent.  Add the
>     no-iommu iommu to the list of iommu drivers when enabled via module
>     option, but skip all the others if the container is attached to a
>     no-iommu groups.  Note that tainting is now done with the "unsafe"
>     module callback rather than explictly within vfio.
>     
>     Also fixes module option and module description name inconsistency.
>     
>     Also make vfio_noiommu_ops const.
>     
>     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Hi Alex,

I got following crash with this update:

[  +0.046973] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  +0.000031] IP: [<ffffffff813b92cf>] __list_add+V0x1f/0xc0
[  +0.000024] PGD 0 
[  +0.000010] Oops: 0000 [#1] SMP 
...
[  +0.000028] Call Trace:
[  +0.000014]  [<ffffffff81777301>] __mutex_lock_slowpath+0x91/0x110
[  +0.000022]  [<ffffffff817773a3>] mutex_lock+0x23/0x40
[  +0.000020]  [<ffffffffa0370c00>] vfio_register_iommu_driver+0x40/0xc0 [vfio]
[  +0.000025]  [<ffffffffa0370cd1>] noiommu_param_set+0x51/0x60 [vfio]
[  +0.000022]  [<ffffffff810bbe3e>] parse_args+0x1be/0x4a0
[  +0.000021]  [<ffffffff81121ea0>] load_module+0xe30/0x2730
[  +0.000942]  [<ffffffff8111f500>] ? __symbol_put+0x60/0x60
[  +0.000921]  [<ffffffff81223f60>] ? kernel_read+0x50/0x80
[  +0.000917]  [<ffffffff811239f9>] SyS_finit_module+0xb9/0xf0
[  +0.000925]  [<ffffffff817796ae>] entry_SYSCALL_64_fastpath+0x12/0x71

> -static struct vfio_iommu_driver vfio_noiommu_driver = {
> -	.ops = &vfio_noiommu_ops,
> +static int noiommu_param_set(const char *val, const struct kernel_param *kp)
> +{
> +	int ret;
> +
> +	if (!val)
> +		val = "1";
> +
> +	ret = strtobool(val, kp->arg);
> +	if (ret)
> +		return ret;
> +
> +	if (noiommu)
> +		ret = vfio_register_iommu_driver(&vfio_noiommu_ops);

Is it possible that kernel_param_ops->set() called before module_init() that initializes the mutex.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/module.c?id=refs/tags/v4.4-rc5#n3517

> +	else
> +		vfio_unregister_iommu_driver(&vfio_noiommu_ops);
> +
> +	return ret;
> +}
> +
> +static const struct kernel_param_ops noiommu_param_ops = {
> +	.flags = KERNEL_PARAM_OPS_FL_NOARG,
> +	.set = noiommu_param_set,
> +	.get = param_get_bool,
>  };
>  

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-18 10:43                     ` Yigit, Ferruh
@ 2015-12-18 14:38                       ` Alex Williamson
  2015-12-18 21:50                         ` Alex Williamson
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2015-12-18 14:38 UTC (permalink / raw)
  To: Yigit, Ferruh; +Cc: dev

On Fri, 2015-12-18 at 10:43 +0000, Yigit, Ferruh wrote:
> On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson wrote:
> <...>
> > > > > > > 
> > > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl,
> > > > > > > because in
> > > > > > > vfio
> > > > > > > module,
> > > > > > > container->noiommu is not set before doing a
> > > > > > > vfio_group_set_container()
> > > > > > > and vfio_for_each_iommu_driver selects wrong driver.
> > > > > > 
> > > > > > Running CHECK_EXTENSION on a container without the group
> > > > > > attached is
> > > > > > only going to tell you what extensions vfio is capable of,
> > > > > > not
> > > > > > necessarily what extensions are available to you with that
> > > > > > group.
> > > > > > Is this just a general dpdk- vfio ordering bug?
> > > > > 
> > > > > Yes, that is how VFIO was implemented in DPDK. I was under
> > > > > the
> > > > > impression that checking extension before assigning devices
> > > > > was
> > > > > the
> > > > > correct way to do things, so as to not to try anything we
> > > > > know
> > > > > would
> > > > > fail anyway. Does this imply that CHECK_EXTENSION needs to be
> > > > > called
> > > > > on both container and groups (or just on groups)?
> > > > 
> > > > Hmm, in Documentation/vfio.txt we do give the following
> > > > algorithm:
> > > > 
> > > >         if (ioctl(container, VFIO_GET_API_VERSION) !=
> > > > VFIO_API_VERSION)
> > > >                 /* Unknown API version */
> > > > 
> > > >         if (!ioctl(container, VFIO_CHECK_EXTENSION,
> > > > VFIO_TYPE1_IOMMU))
> > > >                 /* Doesn't support the IOMMU driver we want. */
> > > >         ...
> > > > 
> > > > That's just going to query each iommu driver and we can't yet
> > > > say
> > > > whether
> > > > the group the user attaches to the container later will
> > > > actually
> > > > support that
> > > > extension until we try to do it, that would come at
> > > > VFIO_SET_IOMMU.
> > > >  So is
> > > > it perhaps a vfio bug that we're not advertising no-iommu until
> > > > the
> > > > group is
> > > > attached?  After all, we are capable of it with just an empty
> > > > container, just
> > > > like we are with type1, but we're going to fail SET_IOMMU for
> > > > the
> > > > wrong
> > > > combination.
> > > >  This is exactly the sort of thing that makes me glad we
> > > > reverted
> > > > it without
> > > > feedback from a working user driver.  Thanks,
> > > 
> > > Whether it should be considered a "bug" in VFIO or "by design" is
> > > up
> > > to you, of course, but at least according to the VFIO
> > > documentation,
> > > we are meant to check for type 1 extension and then attach
> > > devices,
> > > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as
> > > supported
> > > even without any devices attached to the container (just like we
> > > get
> > > type 1 as supported without any devices attached). Having said
> > > that,
> > > if it was meant to attach devices first and then check the
> > > extensions, then perhaps the documentation should also point out
> > > that
> > > fact (or perhaps I missed that detail in my readings of the docs,
> > > in
> > > which case my apologies).
> > 
> > Hi Anatoly,
> > 
> > Does the below patch make it behave more like you'd expect.  This
> > applies to v4.4-rc4, I'd fold this into the base patch if we
> > reincorporate it to a future kernel.  Thanks,
> > 
> > Alex
> > 
> > commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
> > Author: Alex Williamson <alex.williamson@redhat.com>
> > Date:   Wed Dec 16 19:02:01 2015 -0700
> > 
> >     vfio: Fix no-iommu CHECK_EXTENSION
> >     
> >     Previously the no-iommu iommu driver was only visible when the
> >     container had an attached no-iommu group.  This means that
> >     CHECK_EXTENSION on and empty container couldn't report the
> > possibility
> >     of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or not
> > the user
> >     can make use of it with the group, so this is
> > inconsistent.  Add the
> >     no-iommu iommu to the list of iommu drivers when enabled via
> > module
> >     option, but skip all the others if the container is attached to
> > a
> >     no-iommu groups.  Note that tainting is now done with the
> > "unsafe"
> >     module callback rather than explictly within vfio.
> >     
> >     Also fixes module option and module description name
> > inconsistency.
> >     
> >     Also make vfio_noiommu_ops const.
> >     
> >     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Hi Alex,
> 
> I got following crash with this update:
> 
> [  +0.046973] BUG: unable to handle kernel NULL pointer dereference
> at           (null)
> [  +0.000031] IP: [<ffffffff813b92cf>] __list_add+V0x1f/0xc0
> [  +0.000024] PGD 0 
> [  +0.000010] Oops: 0000 [#1] SMP 
> ...
> [  +0.000028] Call Trace:
> [  +0.000014]  [<ffffffff81777301>] __mutex_lock_slowpath+0x91/0x110
> [  +0.000022]  [<ffffffff817773a3>] mutex_lock+0x23/0x40
> [  +0.000020]  [<ffffffffa0370c00>]
> vfio_register_iommu_driver+0x40/0xc0 [vfio]
> [  +0.000025]  [<ffffffffa0370cd1>] noiommu_param_set+0x51/0x60
> [vfio]
> [  +0.000022]  [<ffffffff810bbe3e>] parse_args+0x1be/0x4a0
> [  +0.000021]  [<ffffffff81121ea0>] load_module+0xe30/0x2730
> [  +0.000942]  [<ffffffff8111f500>] ? __symbol_put+0x60/0x60
> [  +0.000921]  [<ffffffff81223f60>] ? kernel_read+0x50/0x80
> [  +0.000917]  [<ffffffff811239f9>] SyS_finit_module+0xb9/0xf0
> [  +0.000925]  [<ffffffff817796ae>]
> entry_SYSCALL_64_fastpath+0x12/0x71
> 
> > -static struct vfio_iommu_driver vfio_noiommu_driver = {
> > -	.ops = &vfio_noiommu_ops,
> > +static int noiommu_param_set(const char *val, const struct
> > kernel_param *kp)
> > +{
> > +	int ret;
> > +
> > +	if (!val)
> > +		val = "1";
> > +
> > +	ret = strtobool(val, kp->arg);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (noiommu)
> > +		ret =
> > vfio_register_iommu_driver(&vfio_noiommu_ops);
> 
> Is it possible that kernel_param_ops->set() called before
> module_init() that initializes the mutex.
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/
> kernel/module.c?id=refs/tags/v4.4-rc5#n3517

Yes, guess I didn't think that one through very well.  I'll try again.
 Thank you for testing, sorry it was a dud.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-18 14:38                       ` Alex Williamson
@ 2015-12-18 21:50                         ` Alex Williamson
  2015-12-21 11:46                           ` Yigit, Ferruh
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2015-12-18 21:50 UTC (permalink / raw)
  To: Yigit, Ferruh; +Cc: dev

On Fri, 2015-12-18 at 07:38 -0700, Alex Williamson wrote:
> On Fri, 2015-12-18 at 10:43 +0000, Yigit, Ferruh wrote:
> > On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson wrote:
> > <...>
> > > > > > > > 
> > > > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl,
> > > > > > > > because in
> > > > > > > > vfio
> > > > > > > > module,
> > > > > > > > container->noiommu is not set before doing a
> > > > > > > > vfio_group_set_container()
> > > > > > > > and vfio_for_each_iommu_driver selects wrong driver.
> > > > > > > 
> > > > > > > Running CHECK_EXTENSION on a container without the group
> > > > > > > attached is
> > > > > > > only going to tell you what extensions vfio is capable
> > > > > > > of,
> > > > > > > not
> > > > > > > necessarily what extensions are available to you with
> > > > > > > that
> > > > > > > group.
> > > > > > > Is this just a general dpdk- vfio ordering bug?
> > > > > > 
> > > > > > Yes, that is how VFIO was implemented in DPDK. I was under
> > > > > > the
> > > > > > impression that checking extension before assigning devices
> > > > > > was
> > > > > > the
> > > > > > correct way to do things, so as to not to try anything we
> > > > > > know
> > > > > > would
> > > > > > fail anyway. Does this imply that CHECK_EXTENSION needs to
> > > > > > be
> > > > > > called
> > > > > > on both container and groups (or just on groups)?
> > > > > 
> > > > > Hmm, in Documentation/vfio.txt we do give the following
> > > > > algorithm:
> > > > > 
> > > > >         if (ioctl(container, VFIO_GET_API_VERSION) !=
> > > > > VFIO_API_VERSION)
> > > > >                 /* Unknown API version */
> > > > > 
> > > > >         if (!ioctl(container, VFIO_CHECK_EXTENSION,
> > > > > VFIO_TYPE1_IOMMU))
> > > > >                 /* Doesn't support the IOMMU driver we want.
> > > > > */
> > > > >         ...
> > > > > 
> > > > > That's just going to query each iommu driver and we can't yet
> > > > > say
> > > > > whether
> > > > > the group the user attaches to the container later will
> > > > > actually
> > > > > support that
> > > > > extension until we try to do it, that would come at
> > > > > VFIO_SET_IOMMU.
> > > > >  So is
> > > > > it perhaps a vfio bug that we're not advertising no-iommu
> > > > > until
> > > > > the
> > > > > group is
> > > > > attached?  After all, we are capable of it with just an empty
> > > > > container, just
> > > > > like we are with type1, but we're going to fail SET_IOMMU for
> > > > > the
> > > > > wrong
> > > > > combination.
> > > > >  This is exactly the sort of thing that makes me glad we
> > > > > reverted
> > > > > it without
> > > > > feedback from a working user driver.  Thanks,
> > > > 
> > > > Whether it should be considered a "bug" in VFIO or "by design"
> > > > is
> > > > up
> > > > to you, of course, but at least according to the VFIO
> > > > documentation,
> > > > we are meant to check for type 1 extension and then attach
> > > > devices,
> > > > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as
> > > > supported
> > > > even without any devices attached to the container (just like
> > > > we
> > > > get
> > > > type 1 as supported without any devices attached). Having said
> > > > that,
> > > > if it was meant to attach devices first and then check the
> > > > extensions, then perhaps the documentation should also point
> > > > out
> > > > that
> > > > fact (or perhaps I missed that detail in my readings of the
> > > > docs,
> > > > in
> > > > which case my apologies).
> > > 
> > > Hi Anatoly,
> > > 
> > > Does the below patch make it behave more like you'd expect.  This
> > > applies to v4.4-rc4, I'd fold this into the base patch if we
> > > reincorporate it to a future kernel.  Thanks,
> > > 
> > > Alex
> > > 
> > > commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
> > > Author: Alex Williamson <alex.williamson@redhat.com>
> > > Date:   Wed Dec 16 19:02:01 2015 -0700
> > > 
> > >     vfio: Fix no-iommu CHECK_EXTENSION
> > >     
> > >     Previously the no-iommu iommu driver was only visible when
> > > the
> > >     container had an attached no-iommu group.  This means that
> > >     CHECK_EXTENSION on and empty container couldn't report the
> > > possibility
> > >     of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or not
> > > the user
> > >     can make use of it with the group, so this is
> > > inconsistent.  Add the
> > >     no-iommu iommu to the list of iommu drivers when enabled via
> > > module
> > >     option, but skip all the others if the container is attached
> > > to
> > > a
> > >     no-iommu groups.  Note that tainting is now done with the
> > > "unsafe"
> > >     module callback rather than explictly within vfio.
> > >     
> > >     Also fixes module option and module description name
> > > inconsistency.
> > >     
> > >     Also make vfio_noiommu_ops const.
> > >     
> > >     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > 
> > Hi Alex,
> > 
> > I got following crash with this update:

Let's try this one:

commit 8ff839c6ffe9f3b50b50f1cc87e7afbf23171f05
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Fri Dec 18 14:45:55 2015 -0700

    v2 vfio fix no-iommu CHECK_EXTENSION
    
    Register and unregister the no-iommu iommu backend at module
    initialization and exit, but disable it unless enabled via module
    option.  Rather than modify the iommu driver walk, selectively skip
    combinations that aren't supported.  CHECK_EXTENSION on a container
    without any groups attached exposes all possible extensions.  Once a
    group is attached, the no-iommu backend is skipped for regular groups
    and regular iommu backends are skipped for no-iommu groups.
    
    This would be folded into a single patch to re-propose vfio no-iommu
    mode upstream.
    
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index de632da..85a5793 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -99,7 +99,7 @@ struct vfio_device {
 
 #ifdef CONFIG_VFIO_NOIOMMU
 static bool noiommu __read_mostly;
-module_param_named(enable_unsafe_noiommu_support,
+module_param_named(enable_unsafe_noiommu_mode,
 		   noiommu, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false)");
 #endif
@@ -185,7 +185,7 @@ static long vfio_noiommu_ioctl(void *iommu_data,
 			       unsigned int cmd, unsigned long arg)
 {
 	if (cmd == VFIO_CHECK_EXTENSION)
-		return arg == VFIO_NOIOMMU_IOMMU ? 1 : 0;
+		return noiommu && (arg == VFIO_NOIOMMU_IOMMU) ? 1 : 0;
 
 	return -ENOTTY;
 }
@@ -207,7 +207,7 @@ static void vfio_noiommu_detach_group(void *iommu_data,
 {
 }
 
-static struct vfio_iommu_driver_ops vfio_noiommu_ops = {
+static const struct vfio_iommu_driver_ops vfio_noiommu_ops = {
 	.name = "vfio-noiommu",
 	.owner = THIS_MODULE,
 	.open = vfio_noiommu_open,
@@ -216,25 +216,6 @@ static struct vfio_iommu_driver_ops vfio_noiommu_ops = {
 	.attach_group = vfio_noiommu_attach_group,
 	.detach_group = vfio_noiommu_detach_group,
 };
-
-static struct vfio_iommu_driver vfio_noiommu_driver = {
-	.ops = &vfio_noiommu_ops,
-};
-
-/*
- * Wrap IOMMU drivers, the noiommu driver is the one and only driver for
- * noiommu groups (and thus containers) and not available for normal groups.
- */
-#define vfio_for_each_iommu_driver(con, pos)				\
-	for (pos = con->noiommu ? &vfio_noiommu_driver :		\
-	     list_first_entry(&vfio.iommu_drivers_list,			\
-			      struct vfio_iommu_driver, vfio_next);	\
-	     (con->noiommu ? pos != NULL :				\
-			&pos->vfio_next != &vfio.iommu_drivers_list);	\
-	      pos = con->noiommu ? NULL : list_next_entry(pos, vfio_next))
-#else
-#define vfio_for_each_iommu_driver(con, pos)				\
-	list_for_each_entry(pos, &vfio.iommu_drivers_list, vfio_next)
 #endif
 
 
@@ -999,7 +980,14 @@ static long vfio_ioctl_check_extension(struct vfio_container *container,
 		 */
 		if (!driver) {
 			mutex_lock(&vfio.iommu_drivers_lock);
-			vfio_for_each_iommu_driver(container, driver) {
+			list_for_each_entry(driver, &vfio.iommu_drivers_list,
+					    vfio_next) {
+
+				if (!list_empty(&container->group_list) &&
+				    (container->noiommu !=
+				     (driver->ops == &vfio_noiommu_ops)))
+					continue;
+
 				if (!try_module_get(driver->ops->owner))
 					continue;
 
@@ -1068,9 +1056,16 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container,
 	}
 
 	mutex_lock(&vfio.iommu_drivers_lock);
-	vfio_for_each_iommu_driver(container, driver) {
+	list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {
 		void *data;
 
+		/*
+		 * Only noiommu containers can use vfio-noiommu and noiommu
+		 * containers can only use vfio-noiommu.
+		 */
+		if (container->noiommu != (driver->ops == &vfio_noiommu_ops))
+			continue;
+
 		if (!try_module_get(driver->ops->owner))
 			continue;
 
@@ -1799,6 +1794,9 @@ static int __init vfio_init(void)
 	request_module_nowait("vfio_iommu_type1");
 	request_module_nowait("vfio_iommu_spapr_tce");
 
+#ifdef CONFIG_VFIO_NOIOMMU
+	vfio_register_iommu_driver(&vfio_noiommu_ops);
+#endif
 	return 0;
 
 err_cdev_add:
@@ -1815,6 +1813,9 @@ static void __exit vfio_cleanup(void)
 {
 	WARN_ON(!list_empty(&vfio.group_list));
 
+#ifdef CONFIG_VFIO_NOIOMMU
+	vfio_unregister_iommu_driver(&vfio_noiommu_ops);
+#endif
 	idr_destroy(&vfio.group_idr);
 	cdev_del(&vfio.group_cdev);
 	unregister_chrdev_region(vfio.group_devt, MINORMASK);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-18 21:50                         ` Alex Williamson
@ 2015-12-21 11:46                           ` Yigit, Ferruh
  2015-12-21 12:18                             ` [dpdk-dev] [PATCH] vfio: add no-iommu support Ferruh Yigit
  2015-12-21 19:22                             ` [dpdk-dev] VFIO no-iommu Alex Williamson
  0 siblings, 2 replies; 35+ messages in thread
From: Yigit, Ferruh @ 2015-12-21 11:46 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dev

On Fri, Dec 18, 2015 at 02:50:17PM -0700, Alex Williamson wrote:
> On Fri, 2015-12-18 at 07:38 -0700, Alex Williamson wrote:
> > On Fri, 2015-12-18 at 10:43 +0000, Yigit, Ferruh wrote:
> > > On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson wrote:
> > > <...>
> > > > > > > > > 
> > > > > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl,
> > > > > > > > > because in
> > > > > > > > > vfio
> > > > > > > > > module,
> > > > > > > > > container->noiommu is not set before doing a
> > > > > > > > > vfio_group_set_container()
> > > > > > > > > and vfio_for_each_iommu_driver selects wrong driver.
> > > > > > > > 
> > > > > > > > Running CHECK_EXTENSION on a container without the group
> > > > > > > > attached is
> > > > > > > > only going to tell you what extensions vfio is capable
> > > > > > > > of,
> > > > > > > > not
> > > > > > > > necessarily what extensions are available to you with
> > > > > > > > that
> > > > > > > > group.
> > > > > > > > Is this just a general dpdk- vfio ordering bug?
> > > > > > > 
> > > > > > > Yes, that is how VFIO was implemented in DPDK. I was under
> > > > > > > the
> > > > > > > impression that checking extension before assigning devices
> > > > > > > was
> > > > > > > the
> > > > > > > correct way to do things, so as to not to try anything we
> > > > > > > know
> > > > > > > would
> > > > > > > fail anyway. Does this imply that CHECK_EXTENSION needs to
> > > > > > > be
> > > > > > > called
> > > > > > > on both container and groups (or just on groups)?
> > > > > > 
> > > > > > Hmm, in Documentation/vfio.txt we do give the following
> > > > > > algorithm:
> > > > > > 
> > > > > >         if (ioctl(container, VFIO_GET_API_VERSION) !=
> > > > > > VFIO_API_VERSION)
> > > > > >                 /* Unknown API version */
> > > > > > 
> > > > > >         if (!ioctl(container, VFIO_CHECK_EXTENSION,
> > > > > > VFIO_TYPE1_IOMMU))
> > > > > >                 /* Doesn't support the IOMMU driver we want.
> > > > > > */
> > > > > >         ...
> > > > > > 
> > > > > > That's just going to query each iommu driver and we can't yet
> > > > > > say
> > > > > > whether
> > > > > > the group the user attaches to the container later will
> > > > > > actually
> > > > > > support that
> > > > > > extension until we try to do it, that would come at
> > > > > > VFIO_SET_IOMMU.
> > > > > >  So is
> > > > > > it perhaps a vfio bug that we're not advertising no-iommu
> > > > > > until
> > > > > > the
> > > > > > group is
> > > > > > attached?  After all, we are capable of it with just an empty
> > > > > > container, just
> > > > > > like we are with type1, but we're going to fail SET_IOMMU for
> > > > > > the
> > > > > > wrong
> > > > > > combination.
> > > > > >  This is exactly the sort of thing that makes me glad we
> > > > > > reverted
> > > > > > it without
> > > > > > feedback from a working user driver.  Thanks,
> > > > > 
> > > > > Whether it should be considered a "bug" in VFIO or "by design"
> > > > > is
> > > > > up
> > > > > to you, of course, but at least according to the VFIO
> > > > > documentation,
> > > > > we are meant to check for type 1 extension and then attach
> > > > > devices,
> > > > > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as
> > > > > supported
> > > > > even without any devices attached to the container (just like
> > > > > we
> > > > > get
> > > > > type 1 as supported without any devices attached). Having said
> > > > > that,
> > > > > if it was meant to attach devices first and then check the
> > > > > extensions, then perhaps the documentation should also point
> > > > > out
> > > > > that
> > > > > fact (or perhaps I missed that detail in my readings of the
> > > > > docs,
> > > > > in
> > > > > which case my apologies).
> > > > 
> > > > Hi Anatoly,
> > > > 
> > > > Does the below patch make it behave more like you'd expect.  This
> > > > applies to v4.4-rc4, I'd fold this into the base patch if we
> > > > reincorporate it to a future kernel.  Thanks,
> > > > 
> > > > Alex
> > > > 
> > > > commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
> > > > Author: Alex Williamson <alex.williamson@redhat.com>
> > > > Date:   Wed Dec 16 19:02:01 2015 -0700
> > > > 
> > > >     vfio: Fix no-iommu CHECK_EXTENSION
> > > >     
> > > >     Previously the no-iommu iommu driver was only visible when
> > > > the
> > > >     container had an attached no-iommu group.  This means that
> > > >     CHECK_EXTENSION on and empty container couldn't report the
> > > > possibility
> > > >     of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or not
> > > > the user
> > > >     can make use of it with the group, so this is
> > > > inconsistent.  Add the
> > > >     no-iommu iommu to the list of iommu drivers when enabled via
> > > > module
> > > >     option, but skip all the others if the container is attached
> > > > to
> > > > a
> > > >     no-iommu groups.  Note that tainting is now done with the
> > > > "unsafe"
> > > >     module callback rather than explictly within vfio.
> > > >     
> > > >     Also fixes module option and module description name
> > > > inconsistency.
> > > >     
> > > >     Also make vfio_noiommu_ops const.
> > > >     
> > > >     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > 
> > > Hi Alex,
> > > 
> > > I got following crash with this update:
> 
> Let's try this one:
> 
> commit 8ff839c6ffe9f3b50b50f1cc87e7afbf23171f05
> Author: Alex Williamson <alex.williamson@redhat.com>
> Date:   Fri Dec 18 14:45:55 2015 -0700
> 
>     v2 vfio fix no-iommu CHECK_EXTENSION
>     
>     Register and unregister the no-iommu iommu backend at module
>     initialization and exit, but disable it unless enabled via module
>     option.  Rather than modify the iommu driver walk, selectively skip
>     combinations that aren't supported.  CHECK_EXTENSION on a container
>     without any groups attached exposes all possible extensions.  Once a
>     group is attached, the no-iommu backend is skipped for regular groups
>     and regular iommu backends are skipped for no-iommu groups.
>     
>     This would be folded into a single patch to re-propose vfio no-iommu
>     mode upstream.
>     
>     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
Hi Alex,

Thank you for the update. I have tested this on both no-iommu and iommu environment
and worked successfully. I believe this approach is better because it is simpler.

>From DPDK point of view, only update to support vfio no-iommu is: to use new group names
and disable DMA mapping.

If VFIO module compiled with "CONFIG_VFIO_NOIOMMU=y" by default, that makes things easier
for DPDK, in no-iommu environment inserting vfio module with proper parameter makes it
available for DPDK.

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [dpdk-dev] [PATCH] vfio: add no-iommu support
  2015-12-21 11:46                           ` Yigit, Ferruh
@ 2015-12-21 12:18                             ` Ferruh Yigit
  2015-12-21 15:15                               ` Burakov, Anatoly
  2015-12-21 19:22                             ` [dpdk-dev] VFIO no-iommu Alex Williamson
  1 sibling, 1 reply; 35+ messages in thread
From: Ferruh Yigit @ 2015-12-21 12:18 UTC (permalink / raw)
  To: dev

This is based on patch from Alex Williamson:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=033291eccbdb
plus
http://dpdk.org/dev/patchwork/patch/9598/

This patch is intended to test above patches on DPDK rather than
official patch to DPDK.

Test result is DPDK successfully run on no-iommu environment.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..90bba4a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -61,6 +61,18 @@
 
 #ifdef VFIO_PRESENT
 
+/*#define VFIO_NOIOMMU*/
+
+#ifndef VFIO_NOIOMMU_IOMMU
+#define VFIO_NOIOMMU_IOMMU 8
+#endif
+
+#ifdef VFIO_NOIOMMU
+#define VFIO_IOMMU_TYPE VFIO_NOIOMMU_IOMMU
+#else
+#define VFIO_IOMMU_TYPE VFIO_TYPE1_IOMMU
+#endif
+
 #define PAGE_SIZE   (sysconf(_SC_PAGESIZE))
 #define PAGE_MASK   (~(PAGE_SIZE - 1))
 
@@ -71,7 +83,11 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
+#ifdef VFIO_NOIOMMU
+#define VFIO_GROUP_FMT "/dev/vfio/noiommu-%u"
+#else
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#endif
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 
 /* per-process VFIO config */
@@ -212,17 +228,21 @@ pci_vfio_set_bus_master(int dev_fd)
 static int
 pci_vfio_setup_dma_maps(int vfio_container_fd)
 {
+#ifndef VFIO_NOIOMMU
 	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-	int i, ret;
+	int i;
+#endif
+	int ret;
 
 	ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-			VFIO_TYPE1_IOMMU);
+			VFIO_IOMMU_TYPE);
 	if (ret) {
 		RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
 				"error %i (%s)\n", errno, strerror(errno));
 		return -1;
 	}
 
+#ifndef VFIO_NOIOMMU
 	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
 	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
 		struct vfio_iommu_type1_dma_map dma_map;
@@ -245,6 +265,7 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
 			return -1;
 		}
 	}
+#endif
 
 	return 0;
 }
@@ -373,7 +394,8 @@ pci_vfio_get_container_fd(void)
 		}
 
 		/* check if we support IOMMU type 1 */
-		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU);
+		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION,
+				VFIO_IOMMU_TYPE);
 		if (ret != 1) {
 			if (ret < 0)
 				RTE_LOG(ERR, EAL, "  could not get IOMMU type, "
-- 
2.5.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] [PATCH] vfio: add no-iommu support
  2015-12-21 12:18                             ` [dpdk-dev] [PATCH] vfio: add no-iommu support Ferruh Yigit
@ 2015-12-21 15:15                               ` Burakov, Anatoly
  2015-12-21 15:26                                 ` Yigit, Ferruh
  0 siblings, 1 reply; 35+ messages in thread
From: Burakov, Anatoly @ 2015-12-21 15:15 UTC (permalink / raw)
  To: Yigit, Ferruh, dev

Hi Ferruh,

> This is based on patch from Alex Williamson:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=03
> 3291eccbdb
> plus
> http://dpdk.org/dev/patchwork/patch/9598/
> 
> This patch is intended to test above patches on DPDK rather than official
> patch to DPDK.
> 
> Test result is DPDK successfully run on no-iommu environment.
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 28
> +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> index 74f91ba..90bba4a 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> @@ -61,6 +61,18 @@
> 
>  #ifdef VFIO_PRESENT
> 
> +/*#define VFIO_NOIOMMU*/
> +
> +#ifndef VFIO_NOIOMMU_IOMMU
> +#define VFIO_NOIOMMU_IOMMU 8
> +#endif
> +
> +#ifdef VFIO_NOIOMMU
> +#define VFIO_IOMMU_TYPE VFIO_NOIOMMU_IOMMU #else #define
> +VFIO_IOMMU_TYPE VFIO_TYPE1_IOMMU #endif
> +
>  #define PAGE_SIZE   (sysconf(_SC_PAGESIZE))
>  #define PAGE_MASK   (~(PAGE_SIZE - 1))
> 
> @@ -71,7 +83,11 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
> 
>  #define VFIO_DIR "/dev/vfio"
>  #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
> +#ifdef VFIO_NOIOMMU
> +#define VFIO_GROUP_FMT "/dev/vfio/noiommu-%u"
> +#else
>  #define VFIO_GROUP_FMT "/dev/vfio/%u"
> +#endif
>  #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
> 
>  /* per-process VFIO config */
> @@ -212,17 +228,21 @@ pci_vfio_set_bus_master(int dev_fd)  static int
> pci_vfio_setup_dma_maps(int vfio_container_fd)  {
> +#ifndef VFIO_NOIOMMU
>  	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
> -	int i, ret;
> +	int i;
> +#endif
> +	int ret;
> 
>  	ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
> -			VFIO_TYPE1_IOMMU);
> +			VFIO_IOMMU_TYPE);
>  	if (ret) {
>  		RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
>  				"error %i (%s)\n", errno, strerror(errno));
>  		return -1;
>  	}
> 
> +#ifndef VFIO_NOIOMMU
>  	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>  	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>  		struct vfio_iommu_type1_dma_map dma_map; @@ -245,6
> +265,7 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
>  			return -1;
>  		}
>  	}
> +#endif
> 
>  	return 0;
>  }
> @@ -373,7 +394,8 @@ pci_vfio_get_container_fd(void)
>  		}
> 
>  		/* check if we support IOMMU type 1 */
> -		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION,
> VFIO_TYPE1_IOMMU);
> +		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION,
> +				VFIO_IOMMU_TYPE);
>  		if (ret != 1) {
>  			if (ret < 0)
>  				RTE_LOG(ERR, EAL, "  could not get IOMMU
> type, "
> --
> 2.5.0

This is one approach :) I was thinking of another, building some kind of more generic support for multiple VFIO drivers. It's a bit more code and probably overkill as a solution to this particular problem, but hopefully it'll make it easier to add new VFIO drivers down the line (with each driver having their own DMA mapping function), should we choose to do so. I'm still working on the patch, but if everyone is OK with this approach instead of a more general one, that's fine with me.

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] [PATCH] vfio: add no-iommu support
  2015-12-21 15:15                               ` Burakov, Anatoly
@ 2015-12-21 15:26                                 ` Yigit, Ferruh
  2015-12-21 15:28                                   ` Burakov, Anatoly
  0 siblings, 1 reply; 35+ messages in thread
From: Yigit, Ferruh @ 2015-12-21 15:26 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Mon, Dec 21, 2015 at 03:15:46PM +0000, Burakov, Anatoly wrote:
> > This is based on patch from Alex Williamson:
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=03
> > 3291eccbdb
> > plus
> > http://dpdk.org/dev/patchwork/patch/9598/
> > 
> > This patch is intended to test above patches on DPDK rather than official
> > patch to DPDK.
> > 
> > Test result is DPDK successfully run on no-iommu environment.
> > 
> 
> This is one approach :) I was thinking of another, building some kind of more generic support for multiple VFIO drivers. It's a bit more code and probably overkill as a solution to this particular problem, but hopefully it'll make it easier to add new VFIO drivers down the line (with each driver having their own DMA mapping function), should we choose to do so. I'm still working on the patch, but if everyone is OK with this approach instead of a more general one, that's fine with me.
> 
Hi Anatoly,

This patch sent just to show what changes done to test VFIO no-iommu I mentioned,
and to have a justification for the kernel patch, not sent as a final solution in DPDK,
sorry for interrupting your work.

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] [PATCH] vfio: add no-iommu support
  2015-12-21 15:26                                 ` Yigit, Ferruh
@ 2015-12-21 15:28                                   ` Burakov, Anatoly
  0 siblings, 0 replies; 35+ messages in thread
From: Burakov, Anatoly @ 2015-12-21 15:28 UTC (permalink / raw)
  To: Yigit, Ferruh; +Cc: dev

Hi Ferruh,

> On Mon, Dec 21, 2015 at 03:15:46PM +0000, Burakov, Anatoly wrote:
> > > This is based on patch from Alex Williamson:
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/comm
> > > it/?id=03
> > > 3291eccbdb
> > > plus
> > > http://dpdk.org/dev/patchwork/patch/9598/
> > >
> > > This patch is intended to test above patches on DPDK rather than
> > > official patch to DPDK.
> > >
> > > Test result is DPDK successfully run on no-iommu environment.
> > >
> >
> > This is one approach :) I was thinking of another, building some kind of
> more generic support for multiple VFIO drivers. It's a bit more code and
> probably overkill as a solution to this particular problem, but hopefully it'll
> make it easier to add new VFIO drivers down the line (with each driver
> having their own DMA mapping function), should we choose to do so. I'm still
> working on the patch, but if everyone is OK with this approach instead of a
> more general one, that's fine with me.
> >
> Hi Anatoly,
> 
> This patch sent just to show what changes done to test VFIO no-iommu I
> mentioned, and to have a justification for the kernel patch, not sent as a final
> solution in DPDK, sorry for interrupting your work.
> 
> Thanks,
> Ferruh

Ah OK, I misread the part where it said that it is not to be applied as-is. Thanks!

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-21 11:46                           ` Yigit, Ferruh
  2015-12-21 12:18                             ` [dpdk-dev] [PATCH] vfio: add no-iommu support Ferruh Yigit
@ 2015-12-21 19:22                             ` Alex Williamson
  2015-12-22 20:20                               ` Alex Williamson
  1 sibling, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2015-12-21 19:22 UTC (permalink / raw)
  To: Yigit, Ferruh; +Cc: dev

On Mon, 2015-12-21 at 11:46 +0000, Yigit, Ferruh wrote:
> On Fri, Dec 18, 2015 at 02:50:17PM -0700, Alex Williamson wrote:
> > On Fri, 2015-12-18 at 07:38 -0700, Alex Williamson wrote:
> > > On Fri, 2015-12-18 at 10:43 +0000, Yigit, Ferruh wrote:
> > > > On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson
> > > > wrote:
> > > > <...>
> > > > > > > > > > 
> > > > > > > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl,
> > > > > > > > > > because in
> > > > > > > > > > vfio
> > > > > > > > > > module,
> > > > > > > > > > container->noiommu is not set before doing a
> > > > > > > > > > vfio_group_set_container()
> > > > > > > > > > and vfio_for_each_iommu_driver selects wrong
> > > > > > > > > > driver.
> > > > > > > > > 
> > > > > > > > > Running CHECK_EXTENSION on a container without the
> > > > > > > > > group
> > > > > > > > > attached is
> > > > > > > > > only going to tell you what extensions vfio is
> > > > > > > > > capable
> > > > > > > > > of,
> > > > > > > > > not
> > > > > > > > > necessarily what extensions are available to you with
> > > > > > > > > that
> > > > > > > > > group.
> > > > > > > > > Is this just a general dpdk- vfio ordering bug?
> > > > > > > > 
> > > > > > > > Yes, that is how VFIO was implemented in DPDK. I was
> > > > > > > > under
> > > > > > > > the
> > > > > > > > impression that checking extension before assigning
> > > > > > > > devices
> > > > > > > > was
> > > > > > > > the
> > > > > > > > correct way to do things, so as to not to try anything
> > > > > > > > we
> > > > > > > > know
> > > > > > > > would
> > > > > > > > fail anyway. Does this imply that CHECK_EXTENSION needs
> > > > > > > > to
> > > > > > > > be
> > > > > > > > called
> > > > > > > > on both container and groups (or just on groups)?
> > > > > > > 
> > > > > > > Hmm, in Documentation/vfio.txt we do give the following
> > > > > > > algorithm:
> > > > > > > 
> > > > > > >         if (ioctl(container, VFIO_GET_API_VERSION) !=
> > > > > > > VFIO_API_VERSION)
> > > > > > >                 /* Unknown API version */
> > > > > > > 
> > > > > > >         if (!ioctl(container, VFIO_CHECK_EXTENSION,
> > > > > > > VFIO_TYPE1_IOMMU))
> > > > > > >                 /* Doesn't support the IOMMU driver we
> > > > > > > want.
> > > > > > > */
> > > > > > >         ...
> > > > > > > 
> > > > > > > That's just going to query each iommu driver and we can't
> > > > > > > yet
> > > > > > > say
> > > > > > > whether
> > > > > > > the group the user attaches to the container later will
> > > > > > > actually
> > > > > > > support that
> > > > > > > extension until we try to do it, that would come at
> > > > > > > VFIO_SET_IOMMU.
> > > > > > >  So is
> > > > > > > it perhaps a vfio bug that we're not advertising no-iommu
> > > > > > > until
> > > > > > > the
> > > > > > > group is
> > > > > > > attached?  After all, we are capable of it with just an
> > > > > > > empty
> > > > > > > container, just
> > > > > > > like we are with type1, but we're going to fail SET_IOMMU
> > > > > > > for
> > > > > > > the
> > > > > > > wrong
> > > > > > > combination.
> > > > > > >  This is exactly the sort of thing that makes me glad we
> > > > > > > reverted
> > > > > > > it without
> > > > > > > feedback from a working user driver.  Thanks,
> > > > > > 
> > > > > > Whether it should be considered a "bug" in VFIO or "by
> > > > > > design"
> > > > > > is
> > > > > > up
> > > > > > to you, of course, but at least according to the VFIO
> > > > > > documentation,
> > > > > > we are meant to check for type 1 extension and then attach
> > > > > > devices,
> > > > > > so it would be expected to get VFIO_NOIOMMU_IOMMU marked as
> > > > > > supported
> > > > > > even without any devices attached to the container (just
> > > > > > like
> > > > > > we
> > > > > > get
> > > > > > type 1 as supported without any devices attached). Having
> > > > > > said
> > > > > > that,
> > > > > > if it was meant to attach devices first and then check the
> > > > > > extensions, then perhaps the documentation should also
> > > > > > point
> > > > > > out
> > > > > > that
> > > > > > fact (or perhaps I missed that detail in my readings of the
> > > > > > docs,
> > > > > > in
> > > > > > which case my apologies).
> > > > > 
> > > > > Hi Anatoly,
> > > > > 
> > > > > Does the below patch make it behave more like you'd expect.
> > > > >  This
> > > > > applies to v4.4-rc4, I'd fold this into the base patch if we
> > > > > reincorporate it to a future kernel.  Thanks,
> > > > > 
> > > > > Alex
> > > > > 
> > > > > commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
> > > > > Author: Alex Williamson <alex.williamson@redhat.com>
> > > > > Date:   Wed Dec 16 19:02:01 2015 -0700
> > > > > 
> > > > >     vfio: Fix no-iommu CHECK_EXTENSION
> > > > >     
> > > > >     Previously the no-iommu iommu driver was only visible
> > > > > when
> > > > > the
> > > > >     container had an attached no-iommu group.  This means
> > > > > that
> > > > >     CHECK_EXTENSION on and empty container couldn't report
> > > > > the
> > > > > possibility
> > > > >     of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether or
> > > > > not
> > > > > the user
> > > > >     can make use of it with the group, so this is
> > > > > inconsistent.  Add the
> > > > >     no-iommu iommu to the list of iommu drivers when enabled
> > > > > via
> > > > > module
> > > > >     option, but skip all the others if the container is
> > > > > attached
> > > > > to
> > > > > a
> > > > >     no-iommu groups.  Note that tainting is now done with the
> > > > > "unsafe"
> > > > >     module callback rather than explictly within vfio.
> > > > >     
> > > > >     Also fixes module option and module description name
> > > > > inconsistency.
> > > > >     
> > > > >     Also make vfio_noiommu_ops const.
> > > > >     
> > > > >     Signed-off-by: Alex Williamson <alex.williamson@redhat.co
> > > > > m>
> > > > 
> > > > Hi Alex,
> > > > 
> > > > I got following crash with this update:
> > 
> > Let's try this one:
> > 
> > commit 8ff839c6ffe9f3b50b50f1cc87e7afbf23171f05
> > Author: Alex Williamson <alex.williamson@redhat.com>
> > Date:   Fri Dec 18 14:45:55 2015 -0700
> > 
> >     v2 vfio fix no-iommu CHECK_EXTENSION
> >     
> >     Register and unregister the no-iommu iommu backend at module
> >     initialization and exit, but disable it unless enabled via
> > module
> >     option.  Rather than modify the iommu driver walk, selectively
> > skip
> >     combinations that aren't supported.  CHECK_EXTENSION on a
> > container
> >     without any groups attached exposes all possible
> > extensions.  Once a
> >     group is attached, the no-iommu backend is skipped for regular
> > groups
> >     and regular iommu backends are skipped for no-iommu groups.
> >     
> >     This would be folded into a single patch to re-propose vfio no-
> > iommu
> >     mode upstream.
> >     
> >     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > 
> Hi Alex,
> 
> Thank you for the update. I have tested this on both no-iommu and
> iommu environment
> and worked successfully. I believe this approach is better because it
> is simpler.
> 
> From DPDK point of view, only update to support vfio no-iommu is: to
> use new group names
> and disable DMA mapping.
> 
> If VFIO module compiled with "CONFIG_VFIO_NOIOMMU=y" by default, that
> makes things easier
> for DPDK, in no-iommu environment inserting vfio module with proper
> parameter makes it
> available for DPDK.

Thanks for the update Ferruh.  Also note that the module option is
dynamically settable so that it can support running with statically
compiled modules where you may not know whether or not to enable no-
iommu until after boot, or where unloading and re-loading a module
might not be an option.  It will be up to each distro to decide whether
to enable the config option, but I think we at least highlight the
existing support issue for non-iommu protected userspace drivers, which
is something that was not at all clear with the uio driver approach.
 Thanks,

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-21 19:22                             ` [dpdk-dev] VFIO no-iommu Alex Williamson
@ 2015-12-22 20:20                               ` Alex Williamson
  2015-12-23 11:19                                 ` Burakov, Anatoly
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2015-12-22 20:20 UTC (permalink / raw)
  To: Yigit, Ferruh; +Cc: dev

On Mon, 2015-12-21 at 12:22 -0700, Alex Williamson wrote:
> On Mon, 2015-12-21 at 11:46 +0000, Yigit, Ferruh wrote:
> > On Fri, Dec 18, 2015 at 02:50:17PM -0700, Alex Williamson wrote:
> > > On Fri, 2015-12-18 at 07:38 -0700, Alex Williamson wrote:
> > > > On Fri, 2015-12-18 at 10:43 +0000, Yigit, Ferruh wrote:
> > > > > On Thu, Dec 17, 2015 at 09:43:59AM -0700, Alex Williamson
> > > > > wrote:
> > > > > <...>
> > > > > > > > > > > 
> > > > > > > > > > > Also I need to disable VFIO_CHECK_EXTENSION
> > > > > > > > > > > ioctl,
> > > > > > > > > > > because in
> > > > > > > > > > > vfio
> > > > > > > > > > > module,
> > > > > > > > > > > container->noiommu is not set before doing a
> > > > > > > > > > > vfio_group_set_container()
> > > > > > > > > > > and vfio_for_each_iommu_driver selects wrong
> > > > > > > > > > > driver.
> > > > > > > > > > 
> > > > > > > > > > Running CHECK_EXTENSION on a container without the
> > > > > > > > > > group
> > > > > > > > > > attached is
> > > > > > > > > > only going to tell you what extensions vfio is
> > > > > > > > > > capable
> > > > > > > > > > of,
> > > > > > > > > > not
> > > > > > > > > > necessarily what extensions are available to you
> > > > > > > > > > with
> > > > > > > > > > that
> > > > > > > > > > group.
> > > > > > > > > > Is this just a general dpdk- vfio ordering bug?
> > > > > > > > > 
> > > > > > > > > Yes, that is how VFIO was implemented in DPDK. I was
> > > > > > > > > under
> > > > > > > > > the
> > > > > > > > > impression that checking extension before assigning
> > > > > > > > > devices
> > > > > > > > > was
> > > > > > > > > the
> > > > > > > > > correct way to do things, so as to not to try
> > > > > > > > > anything
> > > > > > > > > we
> > > > > > > > > know
> > > > > > > > > would
> > > > > > > > > fail anyway. Does this imply that CHECK_EXTENSION
> > > > > > > > > needs
> > > > > > > > > to
> > > > > > > > > be
> > > > > > > > > called
> > > > > > > > > on both container and groups (or just on groups)?
> > > > > > > > 
> > > > > > > > Hmm, in Documentation/vfio.txt we do give the following
> > > > > > > > algorithm:
> > > > > > > > 
> > > > > > > >         if (ioctl(container, VFIO_GET_API_VERSION) !=
> > > > > > > > VFIO_API_VERSION)
> > > > > > > >                 /* Unknown API version */
> > > > > > > > 
> > > > > > > >         if (!ioctl(container, VFIO_CHECK_EXTENSION,
> > > > > > > > VFIO_TYPE1_IOMMU))
> > > > > > > >                 /* Doesn't support the IOMMU driver we
> > > > > > > > want.
> > > > > > > > */
> > > > > > > >         ...
> > > > > > > > 
> > > > > > > > That's just going to query each iommu driver and we
> > > > > > > > can't
> > > > > > > > yet
> > > > > > > > say
> > > > > > > > whether
> > > > > > > > the group the user attaches to the container later will
> > > > > > > > actually
> > > > > > > > support that
> > > > > > > > extension until we try to do it, that would come at
> > > > > > > > VFIO_SET_IOMMU.
> > > > > > > >  So is
> > > > > > > > it perhaps a vfio bug that we're not advertising no-
> > > > > > > > iommu
> > > > > > > > until
> > > > > > > > the
> > > > > > > > group is
> > > > > > > > attached?  After all, we are capable of it with just an
> > > > > > > > empty
> > > > > > > > container, just
> > > > > > > > like we are with type1, but we're going to fail
> > > > > > > > SET_IOMMU
> > > > > > > > for
> > > > > > > > the
> > > > > > > > wrong
> > > > > > > > combination.
> > > > > > > >  This is exactly the sort of thing that makes me glad
> > > > > > > > we
> > > > > > > > reverted
> > > > > > > > it without
> > > > > > > > feedback from a working user driver.  Thanks,
> > > > > > > 
> > > > > > > Whether it should be considered a "bug" in VFIO or "by
> > > > > > > design"
> > > > > > > is
> > > > > > > up
> > > > > > > to you, of course, but at least according to the VFIO
> > > > > > > documentation,
> > > > > > > we are meant to check for type 1 extension and then
> > > > > > > attach
> > > > > > > devices,
> > > > > > > so it would be expected to get VFIO_NOIOMMU_IOMMU marked
> > > > > > > as
> > > > > > > supported
> > > > > > > even without any devices attached to the container (just
> > > > > > > like
> > > > > > > we
> > > > > > > get
> > > > > > > type 1 as supported without any devices attached). Having
> > > > > > > said
> > > > > > > that,
> > > > > > > if it was meant to attach devices first and then check
> > > > > > > the
> > > > > > > extensions, then perhaps the documentation should also
> > > > > > > point
> > > > > > > out
> > > > > > > that
> > > > > > > fact (or perhaps I missed that detail in my readings of
> > > > > > > the
> > > > > > > docs,
> > > > > > > in
> > > > > > > which case my apologies).
> > > > > > 
> > > > > > Hi Anatoly,
> > > > > > 
> > > > > > Does the below patch make it behave more like you'd expect.
> > > > > >  This
> > > > > > applies to v4.4-rc4, I'd fold this into the base patch if
> > > > > > we
> > > > > > reincorporate it to a future kernel.  Thanks,
> > > > > > 
> > > > > > Alex
> > > > > > 
> > > > > > commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
> > > > > > Author: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Date:   Wed Dec 16 19:02:01 2015 -0700
> > > > > > 
> > > > > >     vfio: Fix no-iommu CHECK_EXTENSION
> > > > > >     
> > > > > >     Previously the no-iommu iommu driver was only visible
> > > > > > when
> > > > > > the
> > > > > >     container had an attached no-iommu group.  This means
> > > > > > that
> > > > > >     CHECK_EXTENSION on and empty container couldn't report
> > > > > > the
> > > > > > possibility
> > > > > >     of using VFIO_NOIOMMU_IOMMU.  We report TYPE1 whether
> > > > > > or
> > > > > > not
> > > > > > the user
> > > > > >     can make use of it with the group, so this is
> > > > > > inconsistent.  Add the
> > > > > >     no-iommu iommu to the list of iommu drivers when
> > > > > > enabled
> > > > > > via
> > > > > > module
> > > > > >     option, but skip all the others if the container is
> > > > > > attached
> > > > > > to
> > > > > > a
> > > > > >     no-iommu groups.  Note that tainting is now done with
> > > > > > the
> > > > > > "unsafe"
> > > > > >     module callback rather than explictly within vfio.
> > > > > >     
> > > > > >     Also fixes module option and module description name
> > > > > > inconsistency.
> > > > > >     
> > > > > >     Also make vfio_noiommu_ops const.
> > > > > >     
> > > > > >     Signed-off-by: Alex Williamson <alex.williamson@redhat.
> > > > > > co
> > > > > > m>
> > > > > 
> > > > > Hi Alex,
> > > > > 
> > > > > I got following crash with this update:
> > > 
> > > Let's try this one:
> > > 
> > > commit 8ff839c6ffe9f3b50b50f1cc87e7afbf23171f05
> > > Author: Alex Williamson <alex.williamson@redhat.com>
> > > Date:   Fri Dec 18 14:45:55 2015 -0700
> > > 
> > >     v2 vfio fix no-iommu CHECK_EXTENSION
> > >     
> > >     Register and unregister the no-iommu iommu backend at module
> > >     initialization and exit, but disable it unless enabled via
> > > module
> > >     option.  Rather than modify the iommu driver walk,
> > > selectively
> > > skip
> > >     combinations that aren't supported.  CHECK_EXTENSION on a
> > > container
> > >     without any groups attached exposes all possible
> > > extensions.  Once a
> > >     group is attached, the no-iommu backend is skipped for
> > > regular
> > > groups
> > >     and regular iommu backends are skipped for no-iommu groups.
> > >     
> > >     This would be folded into a single patch to re-propose vfio
> > > no-
> > > iommu
> > >     mode upstream.
> > >     
> > >     Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > 
> > Hi Alex,
> > 
> > Thank you for the update. I have tested this on both no-iommu and
> > iommu environment
> > and worked successfully. I believe this approach is better because
> > it
> > is simpler.
> > 
> > From DPDK point of view, only update to support vfio no-iommu is:
> > to
> > use new group names
> > and disable DMA mapping.
> > 
> > If VFIO module compiled with "CONFIG_VFIO_NOIOMMU=y" by default,
> > that
> > makes things easier
> > for DPDK, in no-iommu environment inserting vfio module with proper
> > parameter makes it
> > available for DPDK.
> 
> Thanks for the update Ferruh.  Also note that the module option is
> dynamically settable so that it can support running with statically
> compiled modules where you may not know whether or not to enable no-
> iommu until after boot, or where unloading and re-loading a module
> might not be an option.  It will be up to each distro to decide
> whether
> to enable the config option, but I think we at least highlight the
> existing support issue for non-iommu protected userspace drivers,
> which
> is something that was not at all clear with the uio driver approach.


Hi,

I've re-posted the unified patch upstream and it should start showing
up in the next linux-next build.  I expect the dpdk code won't be
merged until after this gets back into a proper kernel, but could we
get the dpdk modifications posted as rfc for others looking to try it?
 Thanks,

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-22 20:20                               ` Alex Williamson
@ 2015-12-23 11:19                                 ` Burakov, Anatoly
  2015-12-31 14:30                                   ` Santosh Shukla
  0 siblings, 1 reply; 35+ messages in thread
From: Burakov, Anatoly @ 2015-12-23 11:19 UTC (permalink / raw)
  To: Alex Williamson, Yigit, Ferruh; +Cc: dev

Hi Alex,

> I've re-posted the unified patch upstream and it should start showing up in
> the next linux-next build.  I expect the dpdk code won't be merged until
> after this gets back into a proper kernel, but could we get the dpdk
> modifications posted as rfc for others looking to try it?

I have already posted a patch that should work with No-IOMMU.

http://dpdk.org/dev/patchwork/patch/9619/

Apologies for not CC-ing you. I too would be interested to know if other people are having any issues with the patch.

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-23 11:19                                 ` Burakov, Anatoly
@ 2015-12-31 14:30                                   ` Santosh Shukla
  0 siblings, 0 replies; 35+ messages in thread
From: Santosh Shukla @ 2015-12-31 14:30 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, Alex Williamson

On Wed, Dec 23, 2015 at 4:49 PM, Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
> Hi Alex,
>
>> I've re-posted the unified patch upstream and it should start showing up in
>> the next linux-next build.  I expect the dpdk code won't be merged until
>> after this gets back into a proper kernel, but could we get the dpdk
>> modifications posted as rfc for others looking to try it?
>
> I have already posted a patch that should work with No-IOMMU.
>
> http://dpdk.org/dev/patchwork/patch/9619/
>
> Apologies for not CC-ing you. I too would be interested to know if other people are having any issues with the patch.
>

I tried this patch for virtio-net pmd driver on arm64 and It worked
for me.  I didn't reviewed patch, but functionally nothing broke in my
test environment, we'll review as time permit.. for now

Tested-by: Santosh Shukla <sshukla@mvista.com>


> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2015-12-16  4:38           ` Alex Williamson
  2015-12-16  8:35             ` Burakov, Anatoly
@ 2016-01-14  6:03             ` Jike Song
  2016-01-14  6:52               ` Alex Williamson
  1 sibling, 1 reply; 35+ messages in thread
From: Jike Song @ 2016-01-14  6:03 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dev

On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
> it imply deprecating the uio interface?  I believe the feature that
> started this discussion was support for MSI/X interrupts so that VFs
> can support some kind of interrupt (uio only supports INTx since it
> doesn't allow DMA).  Implementing that would be the ultimate test of
> whether this provides dpdk with not only a more consistent interface,
> but the feature dpdk wants that's missing in uio. Thanks,
>
Hi Alex,

Sorry for jumping in.  Just being curious, how does VFIO No-IOMMU mode
support DMA from userspace drivers?  If I understand correctly, due to
the absence of IOMMU, pcidev has to use physaddr to start a DMA
transaction, but how it is supposed to get physaddr from userspace
drivers, /proc/<pid>/pagemap or something else?


-- 
Thanks,
Jike

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2016-01-14  6:03             ` Jike Song
@ 2016-01-14  6:52               ` Alex Williamson
  2016-01-14  8:12                 ` Jike Song
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Williamson @ 2016-01-14  6:52 UTC (permalink / raw)
  To: Jike Song; +Cc: dev

On Thu, 2016-01-14 at 14:03 +0800, Jike Song wrote:
> On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > 
> > So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
> > it imply deprecating the uio interface?  I believe the feature that
> > started this discussion was support for MSI/X interrupts so that VFs
> > can support some kind of interrupt (uio only supports INTx since it
> > doesn't allow DMA).  Implementing that would be the ultimate test of
> > whether this provides dpdk with not only a more consistent interface,
> > but the feature dpdk wants that's missing in uio. Thanks,
> > 
> Hi Alex,
> 
> Sorry for jumping in.  Just being curious, how does VFIO No-IOMMU mode
> support DMA from userspace drivers?  If I understand correctly, due to
> the absence of IOMMU, pcidev has to use physaddr to start a DMA
> transaction, but how it is supposed to get physaddr from userspace
> drivers, /proc//pagemap or something else?

Hi Jike,

vfio no-iommu does nothing to facilitate DMA mappings, UIO didn't
either and DPDK managed to work with that.  vfio no-iommu is meant to
be an enabler and provide a consistent vfio device model (with MSI/X
interrupts), but fundamentally the idea of a non-iommu protected, user
owned device capable of DMA is unsupportable.  This is why vfio no-
iommu taints the kernel.  With that in mind, one of the design goals is
to introduce as little code as possible for vfio no-iommu.  A new vfio
iommu backend with pinning and virt-to-bus translation goes against
that design goal.  I don't know the details of how DPDK did this with
UIO, but the same lack of DMA mapping facilities is present with vfio
no-iommu.  It really just brings the vfio device model, nothing more.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dpdk-dev] VFIO no-iommu
  2016-01-14  6:52               ` Alex Williamson
@ 2016-01-14  8:12                 ` Jike Song
  0 siblings, 0 replies; 35+ messages in thread
From: Jike Song @ 2016-01-14  8:12 UTC (permalink / raw)
  To: Alex Williamson; +Cc: dev

On Thu, Jan 14, 2016 at 2:52 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Thu, 2016-01-14 at 14:03 +0800, Jike Song wrote:
>> On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> >
>> > So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
>> > it imply deprecating the uio interface?  I believe the feature that
>> > started this discussion was support for MSI/X interrupts so that VFs
>> > can support some kind of interrupt (uio only supports INTx since it
>> > doesn't allow DMA).  Implementing that would be the ultimate test of
>> > whether this provides dpdk with not only a more consistent interface,
>> > but the feature dpdk wants that's missing in uio. Thanks,
>> >
>> Hi Alex,
>>
>> Sorry for jumping in.  Just being curious, how does VFIO No-IOMMU mode
>> support DMA from userspace drivers?  If I understand correctly, due to
>> the absence of IOMMU, pcidev has to use physaddr to start a DMA
>> transaction, but how it is supposed to get physaddr from userspace
>> drivers, /proc//pagemap or something else?
>
> Hi Jike,
>
> vfio no-iommu does nothing to facilitate DMA mappings, UIO didn't
> either and DPDK managed to work with that.  vfio no-iommu is meant to
> be an enabler and provide a consistent vfio device model (with MSI/X
> interrupts), but fundamentally the idea of a non-iommu protected, user
> owned device capable of DMA is unsupportable.  This is why vfio no-
> iommu taints the kernel.  With that in mind, one of the design goals is
> to introduce as little code as possible for vfio no-iommu.  A new vfio
> iommu backend with pinning and virt-to-bus translation goes against
> that design goal.  I don't know the details of how DPDK did this with
> UIO, but the same lack of DMA mapping facilities is present with vfio
> no-iommu.  It really just brings the vfio device model, nothing more.
> Thanks,
>
> Alex

Thanks! - that addressed my question :)

By the way, my previous assumption(consulting /proc/<pid>/pagemap)
apparently doesn't work: one cannot assume a usespace buffer is
physically continuous.


-- 
Thanks,
Jike

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-01-14  8:12 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-11 16:28 [dpdk-dev] VFIO no-iommu Thomas Monjalon
2015-12-11 22:12 ` Vincent JARDIN
2015-12-11 23:02   ` Alex Williamson
2015-12-15 13:43     ` O'Driscoll, Tim
2015-12-15 16:53       ` Alex Williamson
2015-12-16  4:04         ` Ferruh Yigit
2015-12-16  4:38           ` Alex Williamson
2015-12-16  8:35             ` Burakov, Anatoly
2015-12-16 16:23               ` Burakov, Anatoly
2015-12-16 23:17                 ` Thomas Monjalon
2015-12-17  9:52                   ` Burakov, Anatoly
2015-12-17 10:09                     ` Thomas Monjalon
2015-12-17 19:38                       ` Jan Viktorin
2015-12-17 21:16                         ` Vincent JARDIN
2015-12-17 23:29                         ` Stephen Hemminger
2015-12-16 17:11               ` Alex Williamson
2015-12-16 17:22                 ` Burakov, Anatoly
2015-12-17 16:43                   ` Alex Williamson
2015-12-18 10:43                     ` Yigit, Ferruh
2015-12-18 14:38                       ` Alex Williamson
2015-12-18 21:50                         ` Alex Williamson
2015-12-21 11:46                           ` Yigit, Ferruh
2015-12-21 12:18                             ` [dpdk-dev] [PATCH] vfio: add no-iommu support Ferruh Yigit
2015-12-21 15:15                               ` Burakov, Anatoly
2015-12-21 15:26                                 ` Yigit, Ferruh
2015-12-21 15:28                                   ` Burakov, Anatoly
2015-12-21 19:22                             ` [dpdk-dev] VFIO no-iommu Alex Williamson
2015-12-22 20:20                               ` Alex Williamson
2015-12-23 11:19                                 ` Burakov, Anatoly
2015-12-31 14:30                                   ` Santosh Shukla
2016-01-14  6:03             ` Jike Song
2016-01-14  6:52               ` Alex Williamson
2016-01-14  8:12                 ` Jike Song
2015-12-11 23:20 ` Jan Viktorin
2015-12-15 11:20   ` Alejandro Lucero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).